AirPal to Apache? Re: [VOTE] Accept Airflow into the Incubator
Any plans from AirBnB to bring AirPal to Apache? On 3/24/16, 8:00 PM, "Siddharth Anand" wrote: >Following the discussion earlier: >https://s.apache.org/AirflowDiscussion > >I would like to call a VOTE for accepting Airflow as a new incubator >project. > >The proposal is available at: >https://wiki.apache.org/incubator/AirflowProposal > >The proposal is also included at the bottom of this email. > >Vote is open until at least Tues, 29 March 2016, 23:59:00 PDT >[ ] +1 accept Airflow into the Apache Incubator >[ ] ±0 >[ ] -1 because... > >+1 (non-binding) > >Thanks, >-s (Sid) > > >== Abstract == > >Airflow is a workflow automation and scheduling system that can be >used to author and manage data pipelines. > >== Proposal == > >Airflow provides a system for authoring and managing workflows a.k.a. >data pipelines a.k.a. DAGs (Directed Acyclic Graphs). The developer >authors DAGs in Python using an Airflow-provided framework. He/She >then executes the DAG using Airflow¹s scheduler or registers the DAG >for event-based execution. A web-based UI provides the developer with >a range of options for managing and viewing his/her data pipelines. >Background > >Airflow was developed at Airbnb to enable easier authorship and >management of DAGs than were possible with existing solutions such as >Oozie and Azkaban. For starters, both Oozie and Azkaban rely on one or >more XML or property files to be bundled together to define a >workflow. This separation of code and config can present a challenge >to understanding the DAG - in Azkaban, a DAG¹s structure is reflected >by its file system tree and one can find himself/herself traversing >the file system when inspecting or changing the structure of the DAG. >Airflow workflows, on the other hand, are simply and elegantly defined >in Python code, often a single file. Airflow merges the powerful >Web-based management aspects of projects like Azkaban and Oozie with >the simplicity and elegance of defining workflows in Python. Airflow, >less than a year old in terms of its Open Source launch, is currently >used in production environments in more than 30 companies and boasts >an active contributor list of more than 100 developers, the vast >majority of which (>95%) are outside of Airbnb. > >We would like to share it with the ASF and begin developing a >community of developers and users within Apache. > >== Rationale == > >Many organizations (>30) already benefit from running Airflow to >manage data pipelines. Our 100+ contributors continue to provide >integrations with 3rd party systems through the implementation of new >hooks and operators, both of which are used in defining the tasks that >compose workflows. > >== Current Status == > >=== Meritocracy === > >Our intent with this incubator proposal is to start building a diverse >developer community around Airflow following the Apache meritocracy >model. Since Airflow was open-sourced in mid-2015, we have had fast >adoption and contributions by multiple organizations the world over. >We plan to continue to support new contributors and we will work to >actively promote those who contribute significantly to the project to >committers. > >=== Community === > >Airflow is currently being used in over 30 companies. We hope to >extend our contributor base significantly and invite all those who are >interested in building large-scale distributed systems to participate. > >=== Core Developers === > >Airflow is currently being developed by four engineers: Maxime >Beauchemin, Siddharth Anand, Bolke de Bruin, and Chris Riccomini. >Chris is a member of the Apache Samza PMC and a contributor to various >Apache projects, including Apache Kafka and Apache YARN. Maxime, >Siddharth, and Bolke have contributed to Airflow. > >=== Alignment === >The ASF is the natural choice to host the Airflow project as its goal >of encouraging community-driven open-source projects fits with our >vision for Airflow. > >== Known Risks == > >=== Orphaned Products === > >The core developers plan to work part time on the project. There is >very little risk of Airflow being abandoned as all of our companies >rely on it. > >=== Inexperience with Open Source === > >All of the core developers have experience with open source >development. Chris is a member of the Apache Samza PMC and a >contributor to various Apache projects, including Apache Kafka and >Apache YARN. Bolke is contributor on multiple open source projects and >a few Apache projects as well, including Apache Hive, Apache Hadoop, >and Apache Ranger. > >=== Homogeneous Developers === > >The current core developers are all from different companies. Our >community of 100 contributors hail from over 30 different companies >from across the world. > >=== Reliance on Salaried Developers === > >Currently, the only developer paid to work on this project is Maxime. > >=== Relationships with Other Apache Products === > >Airflow is deeply integrated with Apache products. It currently >provides hooks and operators to
Re: AirPal to Apache? Re: [VOTE] Accept Airflow into the Incubator
I am hoping there are a few AirBnB guys on this DL, though its dependency on Presto could prohibit it becoming an Apache project… On 3/25/16, 7:38 AM, "John D. Ament" wrote: >You would need to ask AirBnB. > >John > >On Fri, Mar 25, 2016 at 9:20 AM Adunuthula, Seshu >wrote: > >> Any plans from AirBnB to bring AirPal to Apache? >> >> >> >> On 3/24/16, 8:00 PM, "Siddharth Anand" wrote: >> >> >Following the discussion earlier: >> >https://s.apache.org/AirflowDiscussion >> > >> >I would like to call a VOTE for accepting Airflow as a new incubator >> >project. >> > >> >The proposal is available at: >> >https://wiki.apache.org/incubator/AirflowProposal >> > >> >The proposal is also included at the bottom of this email. >> > >> >Vote is open until at least Tues, 29 March 2016, 23:59:00 PDT >> >[ ] +1 accept Airflow into the Apache Incubator >> >[ ] ±0 >> >[ ] -1 because... >> > >> >+1 (non-binding) >> > >> >Thanks, >> >-s (Sid) >> > >> > >> >== Abstract == >> > >> >Airflow is a workflow automation and scheduling system that can be >> >used to author and manage data pipelines. >> > >> >== Proposal == >> > >> >Airflow provides a system for authoring and managing workflows a.k.a. >> >data pipelines a.k.a. DAGs (Directed Acyclic Graphs). The developer >> >authors DAGs in Python using an Airflow-provided framework. He/She >> >then executes the DAG using Airflow¹s scheduler or registers the DAG >> >for event-based execution. A web-based UI provides the developer with >> >a range of options for managing and viewing his/her data pipelines. >> >Background >> > >> >Airflow was developed at Airbnb to enable easier authorship and >> >management of DAGs than were possible with existing solutions such as >> >Oozie and Azkaban. For starters, both Oozie and Azkaban rely on one or >> >more XML or property files to be bundled together to define a >> >workflow. This separation of code and config can present a challenge >> >to understanding the DAG - in Azkaban, a DAG¹s structure is reflected >> >by its file system tree and one can find himself/herself traversing >> >the file system when inspecting or changing the structure of the DAG. >> >Airflow workflows, on the other hand, are simply and elegantly defined >> >in Python code, often a single file. Airflow merges the powerful >> >Web-based management aspects of projects like Azkaban and Oozie with >> >the simplicity and elegance of defining workflows in Python. Airflow, >> >less than a year old in terms of its Open Source launch, is currently >> >used in production environments in more than 30 companies and boasts >> >an active contributor list of more than 100 developers, the vast >> >majority of which (>95%) are outside of Airbnb. >> > >> >We would like to share it with the ASF and begin developing a >> >community of developers and users within Apache. >> > >> >== Rationale == >> > >> >Many organizations (>30) already benefit from running Airflow to >> >manage data pipelines. Our 100+ contributors continue to provide >> >integrations with 3rd party systems through the implementation of new >> >hooks and operators, both of which are used in defining the tasks that >> >compose workflows. >> > >> >== Current Status == >> > >> >=== Meritocracy === >> > >> >Our intent with this incubator proposal is to start building a diverse >> >developer community around Airflow following the Apache meritocracy >> >model. Since Airflow was open-sourced in mid-2015, we have had fast >> >adoption and contributions by multiple organizations the world over. >> >We plan to continue to support new contributors and we will work to >> >actively promote those who contribute significantly to the project to >> >committers. >> > >> >=== Community === >> > >> >Airflow is currently being used in over 30 companies. We hope to >> >extend our contributor base significantly and invite all those who are >> >interested in building large-scale distributed systems to participate. >> > >> >=== Core Developers === >> > >> >Airflow is currently being developed by four engineers: Maxime >> >Beauchemin, Siddharth Anand, Bolke de Bruin, and Chris Riccomini. >&g
Re: [2nd DRAFT] Board Report for December 2014 - Please review
Kylin is a new podling which was approved, but do not see a mention here in this report. Thanks Seshu Adunuthula On 12/9/14, 6:13 PM, "John D. Ament" wrote: >Below is the "final" copy of the report, ready to be signed off on. I >removed non-reporting podlings. > >Can we get someone from MRQL and Brooklyn to sign off on those reports? >Brooklyn specifically has a *lot* of mentors to find no one to sign off on >it. > >BTW, in my last email I incorrectly listed log4cxx as non-reporting. They >did report. > >= Incubator PMC report for December 2014 = >=== Timeline === >||Wed December 03 ||Podling reports due by end of day || >||Sun December 07 ||Shepherd reviews due by end of day || >||Sun December 07 ||Summary due by end of day || >||Tue December 09 ||Mentor signoff due by end of day || >||Wed December 10 ||Report submitted to Board || >||Wed December 17 ||Board meeting || > > >=== Shepherd Assignments === >||Alan D. Cabrera ||Ignite || >||Andrei Savu ||Drill || >||Andrei Savu ||Johnzon || >||Dave Fisher ||NPanday || >||John Ament ||MRQL || >||John Ament ||log4cxx2 || >||Justin Mclean ||Tamaya || >||Konstantin Boudnik ||Argus || >||Matthew Franklin ||Brooklyn || >||Raphael Bircher ||Kalumet || >||Raphael Bircher ||Streams || >||Roman Shaposhnik ||Sentry || >||Ross Gardler ||Wave || >||Suresh Marru ||Falcon || >||Suresh Marru ||Lens || >||Timothy Chen ||Ripple || >||Timothy Chen ||Taverna || > > >=== Report content === >{{{ >Incubator PMC report for December 2014 > >The Apache Incubator is the entry path into the ASF for projects and >codebases >wishing to become part of the Foundation's efforts. > >There are currently 36 podlings undergoing incubation. Two podlings >joined >us this month, NiFi and Tamaya. Three new IPMC members and two new >Shepherds joined our ranks as well. > >* Community > > New IPMC members: > > Andrew L. Farris > Thejas Nair > Brock Noland > > New Incubator Shepherds: > > Timothy Chen > Andrew L. Farris > > People who left the IPMC: > > None > >* New Podlings > > Nifi > Tamaya > >* Graduations > > The board has motions for the following: > > The IPMC is currently voting on graudations for: > > Flink > >* Releases > > The following releases were made since the last Incubator report: > > apache-calcite-0.9.2-incubating > apache-twill-0.4.0-incubating > apache-parquet-format-2.2.0-incubating > apache-johnzon-0.2-incubating > apache-slider-0.60.0-incubating > metamodel-4.3.0-incubating > apache-aurora-0.6.0-incubating > >* IP Clearance > > Sling Sightly and XSS modules > >* Legal / Trademarks > > There are many on going Podling Name Search requests, > with few being closed. > > Droids still has an open name search. > Falcon is procesing a name search currently. > Tamaya successfully cleared Podling Name Search. > > Two podlings (BatchEE and Johnzon) are currently waiting > on the result of a new licensing agreement w/ Oracle to > gain access to the TCKs for new EE related JSRs. Until this > is done they could not be considered compliant implementations > >* Infrastructure > > SVN outage caused minor inconvenience to some podlings. > Argus/Ranger is facing some struggles with their rename. > >* Miscellaneous > > The Kalumet podling is currently thinking about throwing around a > retirement vote. > > Summary of podling reports > >* Still getting started at the Incubator > > Ignite > Lens > Nifi > Tamaya > Taverna > >* Not yet ready to graduate > > No release: > > Brooklyn > Wave > > Community growth: > > Falcon > Johnzon > Sentry > Streams > > >* Ready to graduate > > The Board has motions for the following: > > > >* Did not report, expected next month > > Ranger (formerly Argus) > Kalumet > NPanday > >* Not signed off by mentors > > Brooklyn > MRQL > > >-- > Table of Contents >Brooklyn >Falcon >Ignite >Johnzon >Lens >log4cxx2 >MRQL >NiFi >Ripple >Sentry >Streams >Tamaya >Taverna >Wave > >-- > >Brooklyn > >Brooklyn is a framework for modelling, monitoring, and managing >applications >through autonomic blueprints. > >Brooklyn has been incubating since 2014-05-01. > >Three most important issues to address in the move towards graduation: > > 1. Completing our first release under Apache > 2. Grow the community > 3. More diversity of the committers/PPMC (currently biased towards > employees of a single organization) > >Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be >aware of? > > None. > >How has the community developed since the last report? > > Our community continues to grow slowly but surely, and has received > interest and contributions from new community members. > > We had the opportunity to talk about our project at ApacheCon Europe > last week, which has introduced Brooklyn to
Re: [2nd DRAFT] Board Report for December 2014 - Please review
Thanks! On 12/10/14, 8:48 AM, "Roman Shaposhnik" wrote: >Indeed! I'll fix it. > >Thanks, >Roman. > >P.S. I could swear I saw in the report though, but you're right. > > >On Wed, Dec 10, 2014 at 8:44 AM, Adunuthula, Seshu >wrote: >> Kylin is a new podling which was approved, but do not see a mention here >> in this report. >> >> Thanks >> Seshu Adunuthula >> >> >> On 12/9/14, 6:13 PM, "John D. Ament" wrote: >> >>>Below is the "final" copy of the report, ready to be signed off on. I >>>removed non-reporting podlings. >>> >>>Can we get someone from MRQL and Brooklyn to sign off on those reports? >>>Brooklyn specifically has a *lot* of mentors to find no one to sign off >>>on >>>it. >>> >>>BTW, in my last email I incorrectly listed log4cxx as non-reporting. >>>They >>>did report. >>> >>>= Incubator PMC report for December 2014 = >>>=== Timeline === >>>||Wed December 03 ||Podling reports due by end of day || >>>||Sun December 07 ||Shepherd reviews due by end of day || >>>||Sun December 07 ||Summary due by end of day || >>>||Tue December 09 ||Mentor signoff due by end of day || >>>||Wed December 10 ||Report submitted to Board || >>>||Wed December 17 ||Board meeting || >>> >>> >>>=== Shepherd Assignments === >>>||Alan D. Cabrera ||Ignite || >>>||Andrei Savu ||Drill || >>>||Andrei Savu ||Johnzon || >>>||Dave Fisher ||NPanday || >>>||John Ament ||MRQL || >>>||John Ament ||log4cxx2 || >>>||Justin Mclean ||Tamaya || >>>||Konstantin Boudnik ||Argus || >>>||Matthew Franklin ||Brooklyn || >>>||Raphael Bircher ||Kalumet || >>>||Raphael Bircher ||Streams || >>>||Roman Shaposhnik ||Sentry || >>>||Ross Gardler ||Wave || >>>||Suresh Marru ||Falcon || >>>||Suresh Marru ||Lens || >>>||Timothy Chen ||Ripple || >>>||Timothy Chen ||Taverna || >>> >>> >>>=== Report content === >>>{{{ >>>Incubator PMC report for December 2014 >>> >>>The Apache Incubator is the entry path into the ASF for projects and >>>codebases >>>wishing to become part of the Foundation's efforts. >>> >>>There are currently 36 podlings undergoing incubation. Two podlings >>>joined >>>us this month, NiFi and Tamaya. Three new IPMC members and two new >>>Shepherds joined our ranks as well. >>> >>>* Community >>> >>> New IPMC members: >>> >>> Andrew L. Farris >>> Thejas Nair >>> Brock Noland >>> >>> New Incubator Shepherds: >>> >>> Timothy Chen >>> Andrew L. Farris >>> >>> People who left the IPMC: >>> >>> None >>> >>>* New Podlings >>> >>> Nifi >>> Tamaya >>> >>>* Graduations >>> >>> The board has motions for the following: >>> >>> The IPMC is currently voting on graudations for: >>> >>> Flink >>> >>>* Releases >>> >>> The following releases were made since the last Incubator report: >>> >>> apache-calcite-0.9.2-incubating >>> apache-twill-0.4.0-incubating >>> apache-parquet-format-2.2.0-incubating >>> apache-johnzon-0.2-incubating >>> apache-slider-0.60.0-incubating >>> metamodel-4.3.0-incubating >>> apache-aurora-0.6.0-incubating >>> >>>* IP Clearance >>> >>> Sling Sightly and XSS modules >>> >>>* Legal / Trademarks >>> >>> There are many on going Podling Name Search requests, >>> with few being closed. >>> >>> Droids still has an open name search. >>> Falcon is procesing a name search currently. >>> Tamaya successfully cleared Podling Name Search. >>> >>> Two podlings (BatchEE and Johnzon) are currently waiting >>> on the result of a new licensing agreement w/ Oracle to >>> gain access to the TCKs for new EE related JSRs. Until this >>> is done they could not be considered compliant implementations >>> >>>* Infrastructure >>> >>> SVN outage caused minor inconvenience to some podlings. >>> Argus/Ranger is facing some struggles with their rename. >>> >>>* Miscellaneous >>> >>> The Kalumet podling is curre
Re: [DISCUSS] Solicitation for IPMC Chair nomination
+1 for Ted Dunning and Henry Saputra Both are mentors of Apache Kylin and such fantastic mentors they areŠ Regards Seshu Adunuthula On 1/26/15, 11:16 AM, "Marvin Humphrey" wrote: >+1 for Ted Dunning. > >Ted has passion for the Incubator's mission. He is an excellent consensus >builder, with the right mix of patience and advocacy. He can get the job >done >while sending judicious amounts of email, which is important in keeping >traffic on general@incubator under control. He is politically adept and >tough >enough to handle the challenges of interfacing with the Board and with >outside >organizations. If he will take the job, the Incubator would be lucky to >have >him. > >FWIW, I don't have the expectation that Ted or any other Chair will *lead* >reform -- Apache Chairs are not executives. But if Ted chooses to be >be an active moderator, I have faith that he will do as well as anyone >could >in guiding consensus for whatever bottom-up proposals emerge. > >Marvin Humphrey > >On Mon, Jan 26, 2015 at 10:45 AM, Roman Shaposhnik wrote: >> Hi! >> >> after making sure that there's still an Incubator >> to be managed for the next 6-12 months, I'd like >> to open up a discussion thread on soliciting >> nominations for the next IPMC Chair. >> >> Feel free to self-nominate or nominate folks who >> you know. Provide a summary of your 'program' >> or not. At this point, we want as much feedback >> and discussion as possible. The VOTE thread will >> come in a few weeks. >> >> Things to keep in mind while thinking about nominating >> yourself or others: >>1. This is a 6-12 months commitment that, based on my >> personal experience, would require you to allocate 7-10 >> hours per week. >> >> 2. This is a rotating Chair and you would be expected to >> start a similar thread in 12 months. >> >> 3. From where I sit, the most important job for the new >> Chair for the next few months would be to help shape >> the incremental, actionable plan for improving the >> mentoring situation in the Incubator. >> >> 4. The situation around 'professional student' podlings >> is not improving nearly quick enough (4 years without >> a single release? really?). Anybody who has actionable >> ideas on how to improve it would get my support. >> >> Now, to get the ball rolling, here are the two folks I'd >> like to suggest as future IPMC Chairs: >> * Ted Dunning >> * Henry Saputra >> In my view, both have demonstrated an exceptional >> understanding of the 'Apache Way', dedication to >> mentoring podlings they are responsible for and enthusiasm >> around bringing new communities into the ASF family. >> On top of that, both have exercised a remarkable skill >> in conducting public discussions and driving towards >> consensus. >> >> Thanks, >> Roman. >> >> - >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> > >- >To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >For additional commands, e-mail: general-h...@incubator.apache.org > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Graduate Kylin from the Apache Incubator
+1 On 10/7/15, 8:32 AM, "John D. Ament" wrote: >I would be happy to see kylin graduate. >On Oct 7, 2015 11:28, "Luke Han" wrote: > >> The Kylin community and project made significant advances during the >> incubating (from Nov 2014) and >> believes it is ready to graduate as a top-level project. >> >> The Apache Kylin is very active. The PPMC doubled in size (added 6 >> committers and 2 mentors) and >> increased diversity in the past year. Released 3 version in the past 6 >> months. There were presentations about Kylin >> at most of the big conferences of the world (including Strata+Hadoop >>World >> London, Hadoop Summit San Jose, >> ApacheCon EU, Big Data Technology China, Database Technology Conference >> China) and some meetups (Bay Area, >> Beijing and one is coming in this weekend in Shanghai), and many talks >> around the world. >> The dev mailing list is growing very month, about 500+ topics per month >> now. >> The community created 1000+ JIRA tickets, many patches from >> contributors/committers have been merged into code base. >> >> A vote passed unanimously on the dev@ list (27 +1 votes). Please find >> below >> references to the graduation preparation artifacts: >> * discussion on dev list [1] >> * vote thread [2] >> * podling name search (still in progress) [3] >> * incubation status [4] >> * proposed resolution below >> >> We believe Apache Kylin is ready to become a top-level project and if >>the >> IPMC agree we will move to a formal vote. >> There are a few more items to be updated on the project status page and >> others during the next couple of days. >> >> >> Many thanks to the mentors and the IPMC for the support, >> Luke Han (on behalf of the Apache Kylin PPMC) >> >> [1] http://s.apache.org/KylinDisGraduate >> [2] http://s.apache.org/KylinGraduateVote >> [3] https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-86 >> [4] http://incubator.apache.org/projects/kylin.html >> >> >> >> Apache Kylin top-level project resolution: >> === >> >>WHEREAS, the Board of Directors deems it to be in the best >>interests of the Foundation and consistent with the >>Foundation's purpose to establish a Project Management >>Committee charged with the creation and maintenance of >>open-source software, for distribution at no charge to the >>public, relative to distributed and scalable OLAP engine >> >>NOW, THEREFORE, BE IT RESOLVED, that a Project Management >>Committee (PMC), to be known as the "Apache Kylin Project", >>be and hereby is established pursuant to Bylaws of the >>Foundation; and be it further >> >>RESOLVED, that the Apache Kylin Project be and hereby is >>responsible for the creation and maintenance of open-source >>software related to distributed and scalable OLAP engine; >>and be it further >> >>RESOLVED, that the office of "Vice President, Kylin" be and >>hereby is created, the person holding such office to serve at >>the direction of the Board of Directors as the chair of the >>Apache Kylin Project, and to have primary responsibility for >>management of the projects within the scope of responsibility >>of the Apache Kylin Project; and be it further >> >>RESOLVED, that the persons listed immediately below be and >>hereby are appointed to serve as the initial members of the >>Apache Kylin Project: >> >> * Dayue Gao >> * Jason Zhong >> * Julian Hyde >> * Luke Han >> * Henry Saputra >> * Hongbin Ma >> * Hua Huang >> * Owen O'Malley >> * P. Taylor Goetz >> * Qianhao Zhou >> * Shaofeng Shi >> * Song Yi >> * Ted Dunning >> * Xu Jiang >> * Yang Li >> * Yerui Sun < sunyerui at apache dot org> >> >> >>NOW, THEREFORE, BE IT FURTHER RESOLVED, that Luke Han >>be appointed to the office of Vice President, Kylin, to serve >>in accordance with and subject to the direction of the Board of >>Directors and the Bylaws of the Foundation until death, >>resignation, retirement, removal or disqualification, or until >>a successor is appointed; and be it further >> >>RESOLVED, that the initial Apache Kylin Project be and hereby >>is tasked with the creation of a set of bylaws intended to >>encourage open development and increased participation in the >>Kylin Project; and be it further >> >>RESOLVED, that the initial Apache Kylin Project be and hereby >>is tasked with the migration and rationalization of the Apache >>Incubator Kylin podling; and be it further >> >>RESOLVED, that all responsibility pertaining to the Apache >>Incubator Kylin podling encumbered upon the Apache Incubator >>PMC are hereafter discharged. >>
Re: [VOTE] Accept Eagle into Apache Incubation
+1 (non binding) On 10/23/15, 9:52 AM, "Hitesh Shah" wrote: >+1 (binding) > >‹ Hitesh > >On Oct 23, 2015, at 7:11 AM, Manoharan, Arun wrote: > >> Hello Everyone, >> >> Thanks for all the feedback on the Eagle Proposal. >> >> I would like to call for a [VOTE] on Eagle joining the ASF as an >>incubation project. >> >> The vote is open for 72 hours: >> >> [ ] +1 accept Eagle in the Incubator >> [ ] ±0 >> [ ] -1 (please give reason) >> >> Eagle is a Monitoring solution for Hadoop to instantly identify access >>to sensitive data, recognize attacks, malicious activities and take >>actions in real time. Eagle supports a wide variety of policies on HDFS >>data and Hive. Eagle also provides machine learning models for detecting >>anomalous user behavior in Hadoop. >> >> The proposal is available on the wiki here: >> https://wiki.apache.org/incubator/EagleProposal >> >> The text of the proposal is also available at the end of this email. >> >> Thanks for your time and help. >> >> Thanks, >> Arun >> >> >> >> Eagle >> >> Abstract >> Eagle is an Open Source Monitoring solution for Hadoop to instantly >>identify access to sensitive data, recognize attacks, malicious >>activities in hadoop and take actions. >> >> Proposal >> Eagle audits access to HDFS files, Hive and HBase tables in real time, >>enforces policies defined on sensitive data access and alerts or blocks >>user¹s access to that sensitive data in real time. Eagle also creates >>user profiles based on the typical access behaviour for HDFS and Hive >>and sends alerts when anomalous behaviour is detected. Eagle can also >>import sensitive data information classified by external classification >>engines to help define its policies. >> >> Overview of Eagle >> Eagle has 3 main parts. >> 1.Data collection and storage - Eagle collects data from various hadoop >>logs in real time using Kafka/Yarn API and uses HDFS and HBase for >>storage. >> 2.Data processing and policy engine - Eagle allows users to create >>policies based on various metadata properties on HDFS, Hive and HBase >>data. >> 3.Eagle services - Eagle services include policy manager, query service >>and the visualization component. Eagle provides intuitive user interface >>to administer Eagle and an alert dashboard to respond to real time >>alerts. >> >> Data Collection and Storage: >> Eagle provides programming API for extending Eagle to integrate any >>data source into Eagle policy evaluation framework. For example, Eagle >>hdfs audit monitoring collects data from Kafka which is populated from >>namenode log4j appender or from logstash agent. Eagle hive monitoring >>collects hive query logs from running job through YARN API, which is >>designed to be scalable and fault-tolerant. Eagle uses HBase as storage >>for storing metadata and metrics data, and also supports relational >>database through configuration change. >> >> Data Processing and Policy Engine: >> Processing Engine: Eagle provides stream processing API which is an >>abstraction of Apache Storm. It can also be extended to other streaming >>engines. This abstraction allows developers to assemble data >>transformation, filtering, external data join etc. without physically >>bound to a specific streaming platform. Eagle streaming API allows >>developers to easily integrate business logic with Eagle policy engine >>and internally Eagle framework compiles business logic execution DAG >>into program primitives of underlying stream infrastructure e.g. Apache >>Storm. For example, Eagle HDFS monitoring transforms audit log from >>Namenode to object and joins sensitivity metadata, security zone >>metadata which are generated from external programs or configured by >>user. Eagle hive monitoring filters running jobs to get hive query >>string and parses query string into object and then joins sensitivity >>metadata. >> Alerting Framework: Eagle Alert Framework includes stream metadata API, >>scalable policy engine framework, extensible policy engine framework. >>Stream metadata API allows developers to declare event schema including >>what attributes constitute an event, what is the type for each >>attribute, and how to dynamically resolve attribute value in runtime >>when user configures policy. Scalable policy engine framework allows >>policies to be executed on different physical nodes in parallel. It is >>also used to define your own policy partitioner class. Policy engine >>framework together with streaming partitioning capability provided by >>all streaming platforms will make sure policies and events can be >>evaluated in a fully distributed way. Extensible policy engine framework >>allows developer to plugin a new policy engine with a few lines of >>codes. WSO2 Siddhi CEP engine is the policy engine which Eagle supports >>as first-class citizen. >> Machine Learning module: Eagle provides capabilities to define user >>activity patterns or user profiles for Hadoop users based on the user >>behaviour in the platform. These user p
Re: [DISCUSS] SystemML Incubator Proposal
Hello Luciano, Recently heard the presentation on SystemML at Apache BigData conference and it sounds exciting. Looking forward to Apache Incubation. Regards Seshu Adunuthula On 10/23/15, 5:34 PM, "Luciano Resende" wrote: >On Fri, Oct 23, 2015 at 5:30 PM, Henry Saputra >wrote: > >> Hi Luciano, >> >> Good proposal, but looks like >> https://wiki.apache.org/incubator/SystemM does not exist? >> > >Good catch, it's a typo on the original link and it's missing the L at the >end, here is the correct link > >https://wiki.apache.org/incubator/SystemML > > > >> >> Also, Reynold Xin and Patrick Wendell are not member of IPMCs so I >> don't they could be mentors of this project, yet. >> >> They can ask to be member of IPMCs since both are already member of >> ASF. But for now need to remove it from proposal. >> >> >> >Yes, they are aware of the requirement, and this will be fixed before we >call a vote on the proposal. > > > >> - Henry >> >> On Fri, Oct 23, 2015 at 4:34 PM, Luciano Resende >> wrote: >> > We would like to start a discussion on accepting SystemML as an Apache >> > Incubator project. >> > >> > The proposal is available at : >> > https://wiki.apache.org/incubator/SystemM >> > >> > And it's contents is also copied below. >> > >> > Thanks in Advance for you time reviewing and providing feedback. >> > >> > == >> > >> > = SystemML = >> > >> > == Abstract == >> > >> > SystemML provides declarative large-scale machine learning (ML) that >>aims >> > at flexible specification of ML algorithms and automatic generation of >> > hybrid runtime plans ranging from single node, in-memory >>computations, to >> > distributed computations on Apache Hadoop and Apache Spark. ML >> algorithms >> > are expressed in an R-like syntax, that includes linear algebra >> primitives, >> > statistical functions, and ML-specific constructs. This high-level >> language >> > significantly increases the productivity of data scientists as it >> provides >> > (1) full flexibility in expressing custom analytics, and (2) data >> > independence from the underlying input formats and physical data >> > representations. Automatic optimization according to data >>characteristics >> > such as distribution on the disk file system, and sparsity as well as >> > processing characteristics in the distributed environment like number >>of >> > nodes, CPU, memory per node, ensures both efficiency and scalability. >> > >> > == Proposal == >> > >> > The goal of SystemML is to create a commercial friendly, scalable and >> > extensible machine learning framework for data scientists to create or >> > extend machine learning algorithms using a declarative syntax. The >> machine >> > learning framework enables data scientists to develop algorithms >>locally >> > without the need of a distributed cluster, and scale up and scale out >>the >> > execution of these algorithms to distributed Hadoop or Spark clusters. >> > >> > == Background == >> > >> > SystemML started as a research project in the IBM Almaden Research >>Center >> > around 2010 aiming to enable data scientists to develop machine >>learning >> > algorithms independent of data and cluster characteristics. >> > >> > == Rationale == >> > >> > SystemML enables the specification of machine learning algorithms >>using a >> > declarative machine learning (DML) language. DML includes linear >>algebra >> > primitives, statistical functions, and additional constructs. This >> > high-level language significantly increases the productivity of data >> > scientists as it provides (1) full flexibility in expressing custom >> > analytics and (2) data independence from the underlying input formats >>and >> > physical data representations. >> > >> > SystemML computations can be executed in a variety of different >>modes. It >> > supports single node in-memory computations and large-scale >>distributed >> > cluster computations. This allows the user to quickly prototype new >> > algorithms in local environments but automatically scale to large data >> > sizes as well without changing the algorithm implementation. >> > >> > Algorithms specified in DML are dynamically compiled and optimized >>based >> on >> > data and cluster characteristics using rule-based and cost-based >> > optimization techniques. The optimizer automatically generates hybrid >> > runtime execution plans ranging from in-memory single-node execution >>to >> > distributed computations on Spark or Hadoop. This ensures both >>efficiency >> > and scalability. Automatic optimization reduces or eliminates the >>need to >> > hand-tune distributed runtime execution plans and system >>configurations. >> > >> > == Initial Goals == >> > >> > The initial goals to move SystemML to the Apache Incubator is to >>broaden >> > the community foster the contributions from data scientists to develop >> new >> > machine learning algorithms and enhance the existing ones. Ultimately, >> this >> > may lead to the creation of an industry standard in specifyin
Re: [DISCUSS] Impala incubator proposal
Awesome! Glad to see this becoming part of ASFŠ On 11/17/15, 10:49 AM, "Henry Robinson" wrote: >Hi all - > >We'd like to start a discussion regarding a proposal to submit Impala to >the Apache Incubator. > >The proposal text is available on the Wiki here: >https://wiki.apache.org/incubator/ImpalaProposal > >and pasted below for convenience. > >I'm excited to make this proposal, and look forward to the community's >input! > >Best, >Henry > > >= Abstract = >Impala is a high-performance C++ and Java SQL query engine for data stored >in Apache Hadoop-based clusters. > >= Proposal = > >We propose to contribute the Impala codebase and associated artifacts >(e.g. >documentation, web-site content etc.) to the Apache Software Foundation >with the intent of forming a productive, meritocratic and open community >around Impala¹s continued development, according to the ŒApache Way¹. > >Cloudera owns several trademarks regarding Impala, and proposes to >transfer >ownership of those trademarks in full to the ASF. > >= Background = >Engineers at Cloudera developed Impala and released it as an >Apache-licensed open-source project in Fall 2012. Impala was written as a >brand-new, modern C++ SQL engine targeted from the start for data stored >in >Apache Hadoop clusters. > >Impala¹s most important benefit to users is high-performance, making it >extremely appropriate for common enterprise analytic and business >intelligence workloads. This is achieved by a number of software >techniques, including: native support for data stored in HDFS and related >filesystems, just-in-time compilation and optimization of individual query >plans, high-performance C++ codebase and massively-parallel distributed >architecture. In benchmarks, Impala is routinely amongst the very highest >performing SQL query engines. > >= Rationale = > >Despite the exciting innovation in the so-called Œbig-data¹ space, SQL >remains by far the most common interface for interacting with data in both >traditional warehouses and modern Œbig-data¹ clusters. There is clearly a >need, as evidenced by the eager adoption of Impala and other SQL engines >in >enterprise contexts, for a query engine that offers the familiar SQL >interface, but that has been specifically designed to operate in massive, >distributed clusters rather than in traditional, fixed-hardware, >warehouse-specific deployments. Impala is one such query engine. > >We believe that the ASF is the right venue to foster an open-source >community around Impala¹s development. We expect that Impala will benefit >from more productive collaboration with related Apache projects, and under >the auspices of the ASF will attract talented contributors who will push >Impala¹s development forward at pace. > >We believe that the timing is right for Impala¹s development to move >wholesale to the ASF: Impala is well-established, has been Apache-licensed >open-source for more than three years, and the core project is relatively >stable. We are excited to see where an ASF-based community can take Impala >from this strong starting point. > >= Initial Goals = >Our initial goals are as follows: > >* Establish ASF-compatible engineering practices and workflows >* Refactor and publish existing internal build scripts and test >infrastructure, in order to make them usable by any community member. >* Transfer source code, documentation and associated artifacts to the ASF. >* Grow the user and developer communities > >= Current Status = > >Impala is developed as an Apache-licensed open-source project. The source >code is available at http://github.com/cloudera/Impala, and developer >documentation is at https://github.com/cloudera/Impala/wiki. The majority >of commits to the project have come from Cloudera-employed developers, but >we have accepted some contributions from individuals from other >organizations. > >All code reviews are done via a public instance of the Gerrit review tool >at http://gerrit.cloudera.org:8080/, and discussed on a public mailing >list. All patches must be reviewed before they are accepted into the >codebase, via a voting mechanism that is similar to that used on Apache >projects such as Hadoop and HBase. > >Before a patch is committed, it must pass a suite of pre-commit tests. >These tests are currently run on Cloudera¹s internal infrastructure. One >of >our initial goals will be to work with the ASF Infrastructure team to find >a way to run these tests in an acceptable way on publicly accessible >machines. > >Issues are tracked in JIRA at https://issues.cloudera.org/projects/IMPALA, >in a way that is extremely similar to existing practices at other ASF >projects. > >= Meritocracy = > >We understand the central importance of meritocracy to the Apache Way. We >will work to establish a welcoming, fair and meritocratic community, in >part by expanding the set of committers on the project. Although Impala¹s >committer list will initially be dominated by members of the Impala >engineering team at Cloudera,
Re: [DISCUSS] Apache Dataflow Incubator Proposal
Awesome to see CloudDataFlow coming to Apache. The Stream Processing area has been in general fragmented with a variety of solutions, hoping the community galvanizes around Apache Data Flow. We are still in the "Apache Storm" world, Any chance for folks building a "Storm Runner²? On 1/20/16, 9:39 AM, "James Malone" wrote: >> Great proposal. I like that your proposal includes a well presented >> roadmap, but I don't see any goals that directly address building a >>larger >> community. Y'all have any ideas around outreach that will help with >> adoption? >> > >Thank you and fair point. We have a few additional ideas which we can put >into the Community section. > > >> >> As a start, I recommend y'all add a section to the proposal on the wiki >> page for "Additional Interested Contributors" so that folks who want to >> sign up to participate in the project can do so without requesting >> additions to the initial committer list. >> >> >This is a great idea and I think it makes a lot of sense to add an >"Additional >Interested Contributors" section to the proposal. > > >> On Wed, Jan 20, 2016 at 10:32 AM, James Malone < >> jamesmal...@google.com.invalid> wrote: >> >> > Hello everyone, >> > >> > Attached to this message is a proposed new project - Apache Dataflow, >>a >> > unified programming model for data processing and integration. >> > >> > The text of the proposal is included below. Additionally, the >>proposal is >> > in draft form on the wiki where we will make any required changes: >> > >> > https://wiki.apache.org/incubator/DataflowProposal >> > >> > We look forward to your feedback and input. >> > >> > Best, >> > >> > James >> > >> > >> > >> > = Apache Dataflow = >> > >> > == Abstract == >> > >> > Dataflow is an open source, unified model and set of language-specific >> SDKs >> > for defining and executing data processing workflows, and also data >> > ingestion and integration flows, supporting Enterprise Integration >> Patterns >> > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines >>simplify >> > the mechanics of large-scale batch and streaming data processing and >>can >> > run on a number of runtimes like Apache Flink, Apache Spark, and >>Google >> > Cloud Dataflow (a cloud service). Dataflow also brings DSL in >>different >> > languages, allowing users to easily implement their data integration >> > processes. >> > >> > == Proposal == >> > >> > Dataflow is a simple, flexible, and powerful system for distributed >>data >> > processing at any scale. Dataflow provides a unified programming >>model, a >> > software development kit to define and construct data processing >> pipelines, >> > and runners to execute Dataflow pipelines in several runtime engines, >> like >> > Apache Spark, Apache Flink, or Google Cloud Dataflow. Dataflow can be >> used >> > for a variety of streaming or batch data processing goals including >>ETL, >> > stream analysis, and aggregate computation. The underlying programming >> > model for Dataflow provides MapReduce-like parallelism, combined with >> > support for powerful data windowing, and fine-grained correctness >> control. >> > >> > == Background == >> > >> > Dataflow started as a set of Google projects focused on making data >> > processing easier, faster, and less costly. The Dataflow model is a >> > successor to MapReduce, FlumeJava, and Millwheel inside Google and is >> > focused on providing a unified solution for batch and stream >>processing. >> > These projects on which Dataflow is based have been published in >>several >> > papers made available to the public: >> > >> > * MapReduce - http://research.google.com/archive/mapreduce.html >> > >> > * Dataflow model - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf >> > >> > * FlumeJava - http://notes.stephenholiday.com/FlumeJava.pdf >> > >> > * MillWheel - http://research.google.com/pubs/pub41378.html >> > >> > Dataflow was designed from the start to provide a portable programming >> > layer. When you define a data processing pipeline with the Dataflow >> model, >> > you are creating a job which is capable of being processed by any >>number >> of >> > Dataflow processing engines. Several engines have been developed to >>run >> > Dataflow pipelines in other open source runtimes, including a Dataflow >> > runner for Apache Flink and Apache Spark. There is also a ³direct >> runner², >> > for execution on the developer machine (mainly for dev/debug >>purposes). >> > Another runner allows a Dataflow program to run on a managed service, >> > Google Cloud Dataflow, in Google Cloud Platform. The Dataflow Java >>SDK is >> > already available on GitHub, and independent from the Google Cloud >> Dataflow >> > service. Another Python SDK is currently in active development. >> > >> > In this proposal, the Dataflow SDKs, model, and a set of runners will >>be >> > submitted as an OSS project under the ASF. The runners which are a >>part >> of >> > this proposal include those for Spark (from Cloudera), Flink
Re: [DISCUSS] Apache Dataflow Incubator Proposal
Did not get a chance to play with it yet, Within Google is it used more as a MR replacement or a Stream processing engine? Or it does both of them fantastically well? On 1/22/16, 10:58 AM, "Frances Perry" wrote: >Crunch started as a clone of FlumeJava, which was Google internal. In the >meantime inside Google, FlumeJava evolved into Dataflow. So all three >share >a number of concepts like PCollections, ParDo, DoFn, etc. However, >Dataflow >adds a number of new things -- the biggest being a unified batch/streaming >semantics using concepts like Windowing and Triggers. Tyler Akidau's >OReilly post has a really nice explanation: >https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 > >On Fri, Jan 22, 2016 at 10:42 AM, Ashish wrote: > >> Crunch has Spark pipelines, but not sure about the runner abstraction. >> >> May be Josh Wills or Tom White can provide more insight on this topic. >> They are core devs for both projects :) >> >> On Fri, Jan 22, 2016 at 9:47 AM, Jean-Baptiste Onofré >> wrote: >> > Hi, >> > >> > I don't know deeply Crunch, but AFAIK, Crunch creates MapReduce >> pipeline, it >> > doesn't provide runner abstraction. It's based on FlumeJava. >> > >> > The logic is very similar (with DoFns, pipelines, ...). Correct me if >>I'm >> > wrong, but Crunch started after Google Dataflow, especially because >> Dataflow >> > was not opensourced at that time. >> > >> > So, I agree it's very similar/close. >> > >> > Regards >> > JB >> > >> > >> > On 01/22/2016 05:51 PM, Ashish wrote: >> >> >> >> Hi JB, >> >> >> >> Curious to know about how it compares to Apache Crunch? Constructs >> >> looks very familiar (had used Crunch long ago) >> >> >> >> Thoughts? >> >> >> >> - Ashish >> >> >> >> On Fri, Jan 22, 2016 at 6:33 AM, Jean-Baptiste Onofré >> >> >> wrote: >> >>> >> >>> Hi Seshu, >> >>> >> >>> I blogged about Apache Dataflow proposal: >> >>> http://blog.nanthrax.net/2016/01/introducing-apache-dataflow/ >> >>> >> >>> You can see in the "what's next ?" section that new runners, skins >>and >> >>> sources are on our roadmap. Definitely, a storm runner could be >>part of >> >>> this. >> >>> >> >>> Regards >> >>> JB >> >>> >> >>> >> >>> On 01/22/2016 03:31 PM, Adunuthula, Seshu wrote: >> >>>> >> >>>> >> >>>> Awesome to see CloudDataFlow coming to Apache. The Stream >>Processing >> >>>> area >> >>>> has been in general fragmented with a variety of solutions, hoping >>the >> >>>> community galvanizes around Apache Data Flow. >> >>>> >> >>>> We are still in the "Apache Storm" world, Any chance for folks >> building >> >>>> a >> >>>> "Storm Runner²? >> >>>> >> >>>> >> >>>> On 1/20/16, 9:39 AM, "James Malone" >> >> >>>> wrote: >> >>>> >> >>>>>> Great proposal. I like that your proposal includes a well >>presented >> >>>>>> roadmap, but I don't see any goals that directly address >>building a >> >>>>>> larger >> >>>>>> community. Y'all have any ideas around outreach that will help >>with >> >>>>>> adoption? >> >>>>>> >> >>>>> >> >>>>> Thank you and fair point. We have a few additional ideas which we >>can >> >>>>> put >> >>>>> into the Community section. >> >>>>> >> >>>>> >> >>>>>> >> >>>>>> As a start, I recommend y'all add a section to the proposal on >>the >> >>>>>> wiki >> >>>>>> page for "Additional Interested Contributors" so that folks who >>want >> >>>>>> to >> >>>>>> sign up to participate in the project can do so without >>requesting >> >>>>>> additions to the initial committer list. >> >>>
Re: [VOTE] Accept Beam into the Apache Incubator
+1 (non-binding) On 1/28/16, 12:05 PM, "Julian Hyde" wrote: >+1 (binding) > >> On Jan 28, 2016, at 10:42 AM, Mayank Bansal wrote: >> >> +1 (non-binding) >> >> Thanks, >> Mayank >> >> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh < >> venkat...@innerzeal.com> wrote: >> >>> +1 (binding). >>> >>> Thanks! >>> >>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning >>> wrote: >>> +1 On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament wrote: > +1 > > On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré > > wrote: > >> Hi, >> >> the Beam proposal (initially Dataflow) was proposed last week. >> >> The complete discussion thread is available here: >> >> >> > >>> >>>http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/% >>>3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.c >>>om%3E >> >> As reminder the BeamProposal is here: >> >> https://wiki.apache.org/incubator/BeamProposal >> >> Regarding all the great feedbacks we received on the mailing list, >>we >> think it's time to call a vote to accept Beam into the Incubator. >> >> Please cast your vote to: >> [] +1 - accept Apache Beam as a new incubating project >> [] 0 - not sure >> [] -1 - do not accept the Apache Beam project (because: ...) >> >> Thanks, >> Regards >> JB >> >> ## page was renamed from DataflowProposal >> = Apache Beam = >> >> == Abstract == >> >> Apache Beam is an open source, unified model and set of >> language-specific SDKs for defining and executing data processing >> workflows, and also data ingestion and integration flows, supporting >> Enterprise Integration Patterns (EIPs) and Domain Specific Languages >> (DSLs). Dataflow pipelines simplify the mechanics of large-scale >>> batch >> and streaming data processing and can run on a number of runtimes >>> like >> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service). >> Beam also brings DSL in different languages, allowing users to >>easily >> implement their data integration processes. >> >> == Proposal == >> >> Beam is a simple, flexible, and powerful system for distributed data >> processing at any scale. Beam provides a unified programming model, >>a >> software development kit to define and construct data processing >> pipelines, and runners to execute Beam pipelines in several runtime >> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow. Beam >> can be used for a variety of streaming or batch data processing >>goals >> including ETL, stream analysis, and aggregate computation. The >> underlying programming model for Beam provides MapReduce-like >> parallelism, combined with support for powerful data windowing, and >> fine-grained correctness control. >> >> == Background == >> >> Beam started as a set of Google projects (Google Cloud Dataflow) focused >> on making data processing easier, faster, and less costly. The Beam >> model is a successor to MapReduce, FlumeJava, and Millwheel inside >> Google and is focused on providing a unified solution for batch and >> stream processing. These projects on which Beam is based have been >> published in several papers made available to the public: >> >> * MapReduce - http://research.google.com/archive/mapreduce.html >> * Dataflow model - >>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf >> * FlumeJava - http://research.google.com/pubs/pub35650.html >> * MillWheel - http://research.google.com/pubs/pub41378.html >> >> Beam was designed from the start to provide a portable programming >> layer. When you define a data processing pipeline with the Beam >>> model, >> you are creating a job which is capable of being processed by any number >> of Beam processing engines. Several engines have been developed to >>> run >> Beam pipelines in other open source runtimes, including a Beam >>runner >> for Apache Flink and Apache Spark. There is also a ³direct runner², >>> for >> execution on the developer machine (mainly for dev/debug purposes). >> Another runner allows a Beam program to run on a managed service, Google >> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is >> already available on GitHub, and independent from the Google Cloud >> Dataflow service. Another Python SDK is currently in active development. >> >> In this proposal, the Beam SDKs, model, and a set of runners will be >> submitted as an OSS project under the ASF. The runners which are a >>> part >> of this proposal include those for Spark (from Cloudera), Flink >>(from >> data Artisans), and local developme
Re: [VOTE] Accept Joshua as an Apache Incubator Podling
Is there a fail grade? ;) On 2/12/16, 11:57 AM, "Tom Barber" wrote: >You're making the presumption its passed its vote! ;) > >On Fri, Feb 12, 2016 at 7:33 PM, Mattmann, Chris A (3980) < >chris.a.mattm...@jpl.nasa.gov> wrote: > >> Yep, will send a result shortly. >> >> Lewis, after that, can you help me get the podling bootstrap tasks >> started? >> >> ++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: chris.a.mattm...@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++ >> Adjunct Associate Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++ >> >> >> >> >> >> -Original Message- >> From: Lewis John Mcgibbney >> Reply-To: "general@incubator.apache.org" >> Date: Friday, February 12, 2016 at 11:31 AM >> To: "general@incubator.apache.org" >> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling >> >> >Hi Chris, >> >Is it time to close out this VOTE and bring Joshua on board? >> >Lewis >> > >> >On Wed, Feb 3, 2016 at 4:01 PM, >>> > >> >wrote: >> > >> >> >> >> From: Danese Cooper >> >> To: "general@incubator.apache.org" >> >> Cc: "p...@cs.jhu.edu" >> >> Date: Wed, 3 Feb 2016 07:43:11 -0800 >> >> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling >> >> +1 (binding) Accept Joshua as an Apache Incubator podling. >> >> >> >> D >> >> >> >> > On Jan 30, 2016, at 12:00 PM, Mattmann, Chris A (3980) < >> >> chris.a.mattm...@jpl.nasa.gov> wrote: >> >> > >> >> > Hi Everyone, >> >> > >> >> > OK the discussion is now completed. Please VOTE to accept Joshua >> >> > into the Apache Incubator. I’ll leave the VOTE open for at least >> >> > the next 72 hours, with hopes to close it next Friday the 5th of >> >> > February, 2016. >> >> > >> >> > [ ] +1 Accept Joshua as an Apache Incubator podling. >> >> > [ ] +0 Abstain. >> >> > [ ] -1 Don’t accept Joshua as an Apache Incubator podling >>because.. >> >> > >> >> > Of course, I am +1 on this. Please note VOTEs from Incubator PMC >> >> > members are binding but all are welcome to VOTE! >> >> > >> >> > Cheers, >> >> > Chris >> >> > >> >> > ++ >> >> > Chris Mattmann, Ph.D. >> >> > Chief Architect >> >> > Instrument Software and Science Data Systems Section (398) >> >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> > Office: 168-519, Mailstop: 168-527 >> >> > Email: chris.a.mattm...@nasa.gov >> >> > WWW: http://sunset.usc.edu/~mattmann/ >> >> > ++ >> >> > Adjunct Associate Professor, Computer Science Department >> >> > University of Southern California, Los Angeles, CA 90089 USA >> >> > ++ >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -Original Message- >> >> > From: jpluser >> >> > Date: Tuesday, January 12, 2016 at 10:56 PM >> >> > To: "general@incubator.apache.org" >> >> > Cc: "p...@cs.jhu.edu" >> >> > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine >> >>Translation >> >> > Toolkit >> >> > >> >> >> Hi Everyone, >> >> >> >> >> >> Please find attached for your viewing pleasure a proposed new >> >>project, >> >> >> Apache Joshua, a statistical machine translation toolkit. The >> >>proposal >> >> >> is in wiki draft form at: >> >> https://wiki.apache.org/incubator/JoshuaProposal >> >> >> >> >> >> Proposal text is copied below. I’ll leave the discussion open >>for a >> >> week >> >> >> and we are interested in folks who would like to be initial >> >>committers >> >> >> and mentors. Please discuss here on the thread. >> >> >> >> >> >> Thanks! >> >> >> >> >> >> Cheers, >> >> >> Chris (Champion) >> >> >> >> >> >> ——— >> >> >> >> >> >> = Joshua Proposal = >> >> >> >> >> >> == Abstract == >> >> >> [[joshua-decoder.org|Joshua]] is an open-source statistical >>machine >> >> >> translation toolkit. It includes a Java-based decoder for >>translating >> >> with >> >> >> phrase-based, hierarchical, and syntax-based translation models, a >> >> >> Hadoop-based grammar extractor (Thrax), and an extensive set of >>tools >> >> and >> >> >> scripts for training and evaluating new models from parallel text. >> >> >> >> >> >> == Proposal == >> >> >> Joshua is a state of the art statistical machine translation >>system >> >>that >> >> >> provides a number of features: >> >> >> >> >> >> * Support for the two main paradigms in statistical machine >> >>translation: >> >> >> phrase-based and hierarchical / syntactic. >> >> >> * A sparse feature API that makes it easy to add new feature >> >>templates >> >> >> supporting m