AirPal to Apache? Re: [VOTE] Accept Airflow into the Incubator

2016-03-25 Thread Adunuthula, Seshu
Any plans from AirBnB to bring AirPal to Apache?



On 3/24/16, 8:00 PM, "Siddharth Anand"  wrote:

>Following the discussion earlier:
>https://s.apache.org/AirflowDiscussion
>
>I would like to call a VOTE for accepting Airflow as a new incubator
>project.
>
>The proposal is available at:
>https://wiki.apache.org/incubator/AirflowProposal
>
>The proposal is also included at the bottom of this email.
>
>Vote is open until at least Tues, 29 March 2016, 23:59:00 PDT
>[ ] +1 accept Airflow into the Apache Incubator
>[ ] ±0
>[ ] -1 because...
>
>+1 (non-binding)
>
>Thanks,
>-s (Sid)
>
>
>== Abstract ==
>
>Airflow is a workflow automation and scheduling system that can be
>used to author and manage data pipelines.
>
>== Proposal ==
>
>Airflow provides a system for authoring and managing workflows a.k.a.
>data pipelines a.k.a. DAGs (Directed Acyclic Graphs). The developer
>authors DAGs in Python using an Airflow-provided framework. He/She
>then executes the DAG using Airflow¹s scheduler or registers the DAG
>for event-based execution. A web-based UI provides the developer with
>a range of options for managing and viewing his/her data pipelines.
>Background
>
>Airflow was developed at Airbnb to enable easier authorship and
>management of DAGs than were possible with existing solutions such as
>Oozie and Azkaban. For starters, both Oozie and Azkaban rely on one or
>more XML or property files to be bundled together to define a
>workflow. This separation of code and config can present a challenge
>to understanding the DAG - in Azkaban, a DAG¹s structure is reflected
>by its file system tree and one can find himself/herself traversing
>the file system when inspecting or changing the structure of the DAG.
>Airflow workflows, on the other hand, are simply and elegantly defined
>in Python code, often a single file. Airflow merges the powerful
>Web-based management aspects of projects like Azkaban and Oozie with
>the simplicity and elegance of defining workflows in Python. Airflow,
>less than a year old in terms of its Open Source launch, is currently
>used in production environments in more than 30 companies and boasts
>an active contributor list of more than 100 developers, the vast
>majority of which (>95%) are outside of Airbnb.
>
>We would like to share it with the ASF and begin developing a
>community of developers and users within Apache.
>
>== Rationale ==
>
>Many organizations (>30) already benefit from running Airflow to
>manage data pipelines. Our 100+ contributors continue to provide
>integrations with 3rd party systems through the implementation of new
>hooks and operators, both of which are used in defining the tasks that
>compose workflows.
>
>== Current Status ==
>
>=== Meritocracy ===
>
>Our intent with this incubator proposal is to start building a diverse
>developer community around Airflow following the Apache meritocracy
>model. Since Airflow was open-sourced in mid-2015, we have had fast
>adoption and contributions by multiple organizations the world over.
>We plan to continue to support new contributors and we will work to
>actively promote those who contribute significantly to the project to
>committers.
>
>=== Community ===
>
>Airflow is currently being used in over 30 companies. We hope to
>extend our contributor base significantly and invite all those who are
>interested in building large-scale distributed systems to participate.
>
>=== Core Developers ===
>
>Airflow is currently being developed by four engineers: Maxime
>Beauchemin, Siddharth Anand, Bolke de Bruin, and Chris Riccomini.
>Chris is a member of the Apache Samza PMC and a contributor to various
>Apache projects, including Apache Kafka and Apache YARN. Maxime,
>Siddharth, and Bolke have contributed to Airflow.
>
>=== Alignment ===
>The ASF is the natural choice to host the Airflow project as its goal
>of encouraging community-driven open-source projects fits with our
>vision for Airflow.
>
>== Known Risks ==
>
>=== Orphaned Products ===
>
>The core developers plan to work part time on the project. There is
>very little risk of Airflow being abandoned as all of our companies
>rely on it.
>
>=== Inexperience with Open Source ===
>
>All of the core developers have experience with open source
>development. Chris is a member of the Apache Samza PMC and a
>contributor to various Apache projects, including Apache Kafka and
>Apache YARN. Bolke is contributor on multiple open source projects and
>a few Apache projects as well, including Apache Hive, Apache Hadoop,
>and Apache Ranger.
>
>=== Homogeneous Developers ===
>
>The current core developers are all from different companies. Our
>community of 100 contributors hail from over 30 different companies
>from across the world.
>
>=== Reliance on Salaried Developers ===
>
>Currently, the only developer paid to work on this project is Maxime.
>
>=== Relationships with Other Apache Products ===
>
>Airflow is deeply integrated with Apache products. It currently
>provides hooks and operators to

Re: AirPal to Apache? Re: [VOTE] Accept Airflow into the Incubator

2016-03-25 Thread Adunuthula, Seshu
I am hoping there are a few AirBnB guys on this DL, though its dependency
on Presto could prohibit it becoming an Apache project…


On 3/25/16, 7:38 AM, "John D. Ament"  wrote:

>You would need to ask AirBnB.
>
>John
>
>On Fri, Mar 25, 2016 at 9:20 AM Adunuthula, Seshu 
>wrote:
>
>> Any plans from AirBnB to bring AirPal to Apache?
>>
>>
>>
>> On 3/24/16, 8:00 PM, "Siddharth Anand"  wrote:
>>
>> >Following the discussion earlier:
>> >https://s.apache.org/AirflowDiscussion
>> >
>> >I would like to call a VOTE for accepting Airflow as a new incubator
>> >project.
>> >
>> >The proposal is available at:
>> >https://wiki.apache.org/incubator/AirflowProposal
>> >
>> >The proposal is also included at the bottom of this email.
>> >
>> >Vote is open until at least Tues, 29 March 2016, 23:59:00 PDT
>> >[ ] +1 accept Airflow into the Apache Incubator
>> >[ ] ±0
>> >[ ] -1 because...
>> >
>> >+1 (non-binding)
>> >
>> >Thanks,
>> >-s (Sid)
>> >
>> >
>> >== Abstract ==
>> >
>> >Airflow is a workflow automation and scheduling system that can be
>> >used to author and manage data pipelines.
>> >
>> >== Proposal ==
>> >
>> >Airflow provides a system for authoring and managing workflows a.k.a.
>> >data pipelines a.k.a. DAGs (Directed Acyclic Graphs). The developer
>> >authors DAGs in Python using an Airflow-provided framework. He/She
>> >then executes the DAG using Airflow¹s scheduler or registers the DAG
>> >for event-based execution. A web-based UI provides the developer with
>> >a range of options for managing and viewing his/her data pipelines.
>> >Background
>> >
>> >Airflow was developed at Airbnb to enable easier authorship and
>> >management of DAGs than were possible with existing solutions such as
>> >Oozie and Azkaban. For starters, both Oozie and Azkaban rely on one or
>> >more XML or property files to be bundled together to define a
>> >workflow. This separation of code and config can present a challenge
>> >to understanding the DAG - in Azkaban, a DAG¹s structure is reflected
>> >by its file system tree and one can find himself/herself traversing
>> >the file system when inspecting or changing the structure of the DAG.
>> >Airflow workflows, on the other hand, are simply and elegantly defined
>> >in Python code, often a single file. Airflow merges the powerful
>> >Web-based management aspects of projects like Azkaban and Oozie with
>> >the simplicity and elegance of defining workflows in Python. Airflow,
>> >less than a year old in terms of its Open Source launch, is currently
>> >used in production environments in more than 30 companies and boasts
>> >an active contributor list of more than 100 developers, the vast
>> >majority of which (>95%) are outside of Airbnb.
>> >
>> >We would like to share it with the ASF and begin developing a
>> >community of developers and users within Apache.
>> >
>> >== Rationale ==
>> >
>> >Many organizations (>30) already benefit from running Airflow to
>> >manage data pipelines. Our 100+ contributors continue to provide
>> >integrations with 3rd party systems through the implementation of new
>> >hooks and operators, both of which are used in defining the tasks that
>> >compose workflows.
>> >
>> >== Current Status ==
>> >
>> >=== Meritocracy ===
>> >
>> >Our intent with this incubator proposal is to start building a diverse
>> >developer community around Airflow following the Apache meritocracy
>> >model. Since Airflow was open-sourced in mid-2015, we have had fast
>> >adoption and contributions by multiple organizations the world over.
>> >We plan to continue to support new contributors and we will work to
>> >actively promote those who contribute significantly to the project to
>> >committers.
>> >
>> >=== Community ===
>> >
>> >Airflow is currently being used in over 30 companies. We hope to
>> >extend our contributor base significantly and invite all those who are
>> >interested in building large-scale distributed systems to participate.
>> >
>> >=== Core Developers ===
>> >
>> >Airflow is currently being developed by four engineers: Maxime
>> >Beauchemin, Siddharth Anand, Bolke de Bruin, and Chris Riccomini.
>&g

Re: [2nd DRAFT] Board Report for December 2014 - Please review

2014-12-10 Thread Adunuthula, Seshu
Kylin is a new podling which was approved, but do not see a mention here
in this report. 

Thanks
Seshu Adunuthula


On 12/9/14, 6:13 PM, "John D. Ament"  wrote:

>Below is the "final" copy of the report, ready to be signed off on.  I
>removed non-reporting podlings.
>
>Can we get someone from MRQL and Brooklyn to sign off on those reports?
>Brooklyn specifically has a *lot* of mentors to find no one to sign off on
>it.
>
>BTW, in my last email I incorrectly listed log4cxx as non-reporting.  They
>did report.
>
>= Incubator PMC report for December 2014 =
>=== Timeline ===
>||Wed December 03 ||Podling reports due by end of day ||
>||Sun December 07 ||Shepherd reviews due by end of day ||
>||Sun December 07 ||Summary due by end of day ||
>||Tue December 09 ||Mentor signoff due by end of day ||
>||Wed December 10 ||Report submitted to Board ||
>||Wed December 17 ||Board meeting ||
>
>
>=== Shepherd Assignments ===
>||Alan D. Cabrera ||Ignite ||
>||Andrei Savu ||Drill ||
>||Andrei Savu ||Johnzon ||
>||Dave Fisher ||NPanday ||
>||John Ament ||MRQL ||
>||John Ament ||log4cxx2 ||
>||Justin Mclean ||Tamaya ||
>||Konstantin Boudnik ||Argus ||
>||Matthew Franklin ||Brooklyn ||
>||Raphael Bircher ||Kalumet ||
>||Raphael Bircher ||Streams ||
>||Roman Shaposhnik ||Sentry ||
>||Ross Gardler ||Wave ||
>||Suresh Marru ||Falcon ||
>||Suresh Marru ||Lens ||
>||Timothy Chen ||Ripple ||
>||Timothy Chen ||Taverna ||
>
>
>=== Report content ===
>{{{
>Incubator PMC report for December 2014
>
>The Apache Incubator is the entry path into the ASF for projects and
>codebases
>wishing to become part of the Foundation's efforts.
>
>There are currently 36 podlings undergoing incubation.  Two podlings
>joined
>us this month, NiFi and Tamaya.  Three new IPMC members and two new
>Shepherds joined our ranks as well.
>
>* Community
>
>  New IPMC members:
>
>  Andrew L. Farris
>  Thejas Nair
>  Brock Noland
>
>  New Incubator Shepherds:
>
>  Timothy Chen
>  Andrew L. Farris
>
>  People who left the IPMC:
>
>  None
>
>* New Podlings
>
>  Nifi
>  Tamaya
>
>* Graduations
>
>  The board has motions for the following:
>
>  The IPMC is currently voting on graudations for:
>
>  Flink
>
>* Releases
>
>  The following releases were made since the last Incubator report:
>
>  apache-calcite-0.9.2-incubating
>  apache-twill-0.4.0-incubating
>  apache-parquet-format-2.2.0-incubating
>  apache-johnzon-0.2-incubating
>  apache-slider-0.60.0-incubating
>  metamodel-4.3.0-incubating
>  apache-aurora-0.6.0-incubating
>
>* IP Clearance
>
>  Sling Sightly and XSS modules
>
>* Legal / Trademarks
>
> There are many on going Podling Name Search requests,
> with few being closed.
>
>  Droids still has an open name search.
>  Falcon is procesing a name search currently.
>  Tamaya successfully cleared Podling Name Search.
>
>  Two podlings (BatchEE and Johnzon) are currently waiting
>  on the result of a new licensing agreement w/ Oracle to
>  gain access to the TCKs for new EE related JSRs.  Until this
>  is done they could not be considered compliant implementations
>
>* Infrastructure
>
>  SVN outage caused minor inconvenience to some podlings.
>  Argus/Ranger is facing some struggles with their rename.
>
>* Miscellaneous
>
>  The Kalumet podling is currently thinking about throwing around a
>  retirement vote.
>
> Summary of podling reports 
>
>* Still getting started at the Incubator
>
>  Ignite
>  Lens
>  Nifi
>  Tamaya
>  Taverna
>
>* Not yet ready to graduate
>
>  No release:
>
>  Brooklyn
>  Wave
>
>  Community growth:
>
>  Falcon
>  Johnzon
>  Sentry
>  Streams
>
>
>* Ready to graduate
>
>  The Board has motions for the following:
>
>
>
>* Did not report, expected next month
>
>  Ranger (formerly Argus)
>  Kalumet
>  NPanday
>
>* Not signed off by mentors
>
>  Brooklyn
>  MRQL
>
>
>--
>   Table of Contents
>Brooklyn
>Falcon
>Ignite
>Johnzon
>Lens
>log4cxx2
>MRQL
>NiFi
>Ripple
>Sentry
>Streams
>Tamaya
>Taverna
>Wave
>
>--
>
>Brooklyn
>
>Brooklyn is a framework for modelling, monitoring, and managing
>applications
>through autonomic blueprints.
>
>Brooklyn has been incubating since 2014-05-01.
>
>Three most important issues to address in the move towards graduation:
>
>  1. Completing our first release under Apache
>  2. Grow the community
>  3. More diversity of the committers/PPMC (currently biased towards
> employees of a single organization)
>
>Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
>aware of?
>
>  None.
>
>How has the community developed since the last report?
>
>  Our community continues to grow slowly but surely, and has received
>  interest and contributions from new community members.
>
>  We had the opportunity to talk about our project at ApacheCon Europe
>  last week, which has introduced Brooklyn to 

Re: [2nd DRAFT] Board Report for December 2014 - Please review

2014-12-10 Thread Adunuthula, Seshu
Thanks!

On 12/10/14, 8:48 AM, "Roman Shaposhnik"  wrote:

>Indeed! I'll fix it.
>
>Thanks,
>Roman.
>
>P.S. I could swear I saw in the report though, but you're right.
>
>
>On Wed, Dec 10, 2014 at 8:44 AM, Adunuthula, Seshu 
>wrote:
>> Kylin is a new podling which was approved, but do not see a mention here
>> in this report.
>>
>> Thanks
>> Seshu Adunuthula
>>
>>
>> On 12/9/14, 6:13 PM, "John D. Ament"  wrote:
>>
>>>Below is the "final" copy of the report, ready to be signed off on.  I
>>>removed non-reporting podlings.
>>>
>>>Can we get someone from MRQL and Brooklyn to sign off on those reports?
>>>Brooklyn specifically has a *lot* of mentors to find no one to sign off
>>>on
>>>it.
>>>
>>>BTW, in my last email I incorrectly listed log4cxx as non-reporting.
>>>They
>>>did report.
>>>
>>>= Incubator PMC report for December 2014 =
>>>=== Timeline ===
>>>||Wed December 03 ||Podling reports due by end of day ||
>>>||Sun December 07 ||Shepherd reviews due by end of day ||
>>>||Sun December 07 ||Summary due by end of day ||
>>>||Tue December 09 ||Mentor signoff due by end of day ||
>>>||Wed December 10 ||Report submitted to Board ||
>>>||Wed December 17 ||Board meeting ||
>>>
>>>
>>>=== Shepherd Assignments ===
>>>||Alan D. Cabrera ||Ignite ||
>>>||Andrei Savu ||Drill ||
>>>||Andrei Savu ||Johnzon ||
>>>||Dave Fisher ||NPanday ||
>>>||John Ament ||MRQL ||
>>>||John Ament ||log4cxx2 ||
>>>||Justin Mclean ||Tamaya ||
>>>||Konstantin Boudnik ||Argus ||
>>>||Matthew Franklin ||Brooklyn ||
>>>||Raphael Bircher ||Kalumet ||
>>>||Raphael Bircher ||Streams ||
>>>||Roman Shaposhnik ||Sentry ||
>>>||Ross Gardler ||Wave ||
>>>||Suresh Marru ||Falcon ||
>>>||Suresh Marru ||Lens ||
>>>||Timothy Chen ||Ripple ||
>>>||Timothy Chen ||Taverna ||
>>>
>>>
>>>=== Report content ===
>>>{{{
>>>Incubator PMC report for December 2014
>>>
>>>The Apache Incubator is the entry path into the ASF for projects and
>>>codebases
>>>wishing to become part of the Foundation's efforts.
>>>
>>>There are currently 36 podlings undergoing incubation.  Two podlings
>>>joined
>>>us this month, NiFi and Tamaya.  Three new IPMC members and two new
>>>Shepherds joined our ranks as well.
>>>
>>>* Community
>>>
>>>  New IPMC members:
>>>
>>>  Andrew L. Farris
>>>  Thejas Nair
>>>  Brock Noland
>>>
>>>  New Incubator Shepherds:
>>>
>>>  Timothy Chen
>>>  Andrew L. Farris
>>>
>>>  People who left the IPMC:
>>>
>>>  None
>>>
>>>* New Podlings
>>>
>>>  Nifi
>>>  Tamaya
>>>
>>>* Graduations
>>>
>>>  The board has motions for the following:
>>>
>>>  The IPMC is currently voting on graudations for:
>>>
>>>  Flink
>>>
>>>* Releases
>>>
>>>  The following releases were made since the last Incubator report:
>>>
>>>  apache-calcite-0.9.2-incubating
>>>  apache-twill-0.4.0-incubating
>>>  apache-parquet-format-2.2.0-incubating
>>>  apache-johnzon-0.2-incubating
>>>  apache-slider-0.60.0-incubating
>>>  metamodel-4.3.0-incubating
>>>  apache-aurora-0.6.0-incubating
>>>
>>>* IP Clearance
>>>
>>>  Sling Sightly and XSS modules
>>>
>>>* Legal / Trademarks
>>>
>>> There are many on going Podling Name Search requests,
>>> with few being closed.
>>>
>>>  Droids still has an open name search.
>>>  Falcon is procesing a name search currently.
>>>  Tamaya successfully cleared Podling Name Search.
>>>
>>>  Two podlings (BatchEE and Johnzon) are currently waiting
>>>  on the result of a new licensing agreement w/ Oracle to
>>>  gain access to the TCKs for new EE related JSRs.  Until this
>>>  is done they could not be considered compliant implementations
>>>
>>>* Infrastructure
>>>
>>>  SVN outage caused minor inconvenience to some podlings.
>>>  Argus/Ranger is facing some struggles with their rename.
>>>
>>>* Miscellaneous
>>>
>>>  The Kalumet podling is curre

Re: [DISCUSS] Solicitation for IPMC Chair nomination

2015-01-27 Thread Adunuthula, Seshu
+1 for Ted Dunning and Henry Saputra

Both are mentors of Apache Kylin and such fantastic mentors they areŠ

Regards
Seshu Adunuthula


On 1/26/15, 11:16 AM, "Marvin Humphrey"  wrote:

>+1 for Ted Dunning.
>
>Ted has passion for the Incubator's mission.  He is an excellent consensus
>builder, with the right mix of patience and advocacy.  He can get the job
>done
>while sending judicious amounts of email, which is important in keeping
>traffic on general@incubator under control.  He is politically adept and
>tough
>enough to handle the challenges of interfacing with the Board and with
>outside
>organizations.  If he will take the job, the Incubator would be lucky to
>have
>him.
>
>FWIW, I don't have the expectation that Ted or any other Chair will *lead*
>reform -- Apache Chairs are not executives.   But if Ted chooses to be
>be an active moderator, I have faith that he will do as well as anyone
>could
>in guiding consensus for whatever bottom-up proposals emerge.
>
>Marvin Humphrey
>
>On Mon, Jan 26, 2015 at 10:45 AM, Roman Shaposhnik  wrote:
>> Hi!
>>
>> after making sure that there's still an Incubator
>> to be managed for the next 6-12 months, I'd like
>> to open up a discussion thread on soliciting
>> nominations for the next IPMC Chair.
>>
>> Feel free to self-nominate or nominate folks who
>> you know. Provide a summary of your 'program'
>> or not. At this point, we want as much feedback
>> and discussion as possible. The VOTE thread will
>> come in a few weeks.
>>
>> Things to keep in mind while thinking about nominating
>> yourself or others:
>>1. This is a 6-12 months commitment that, based on my
>> personal experience, would require you to allocate 7-10
>> hours per week.
>>
>> 2. This is a rotating Chair and you would be expected to
>> start a similar thread in 12 months.
>>
>> 3. From where I sit, the most important job for the new
>> Chair for the next few months would be to help shape
>> the incremental, actionable plan for improving the
>> mentoring situation in the Incubator.
>>
>> 4. The situation around 'professional student' podlings
>> is not improving nearly quick enough (4 years without
>> a single release? really?). Anybody who has actionable
>> ideas on how to improve it would get my support.
>>
>> Now, to get the ball rolling, here are the two folks I'd
>> like to suggest as future IPMC Chairs:
>> * Ted Dunning
>> * Henry Saputra
>> In my view, both have demonstrated an exceptional
>> understanding of the 'Apache Way', dedication to
>> mentoring podlings they are responsible for and enthusiasm
>> around bringing new communities into the ASF family.
>> On top of that, both have exercised a remarkable skill
>> in conducting public discussions and driving towards
>> consensus.
>>
>> Thanks,
>> Roman.
>>
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>
>
>-
>To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>For additional commands, e-mail: general-h...@incubator.apache.org
>


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Graduate Kylin from the Apache Incubator

2015-10-09 Thread Adunuthula, Seshu
+1 

On 10/7/15, 8:32 AM, "John D. Ament"  wrote:

>I would be happy to see kylin graduate.
>On Oct 7, 2015 11:28, "Luke Han"  wrote:
>
>> The Kylin community and project made significant advances during the
>> incubating (from Nov 2014) and
>> believes it is ready to graduate as a top-level project.
>>
>> The Apache Kylin is very active. The PPMC doubled in size (added 6
>> committers and 2 mentors) and
>>  increased diversity in the past year. Released 3 version in the past 6
>> months. There were presentations about Kylin
>> at most of the big conferences of the world (including Strata+Hadoop
>>World
>> London, Hadoop Summit San Jose,
>> ApacheCon EU, Big Data Technology China, Database Technology Conference
>> China) and some meetups (Bay Area,
>> Beijing and one is coming in this weekend in Shanghai), and many talks
>> around the world.
>> The dev mailing list is growing very month, about 500+ topics per month
>> now.
>> The community created 1000+ JIRA tickets, many patches from
>> contributors/committers have been merged into code base.
>>
>> A vote passed unanimously on the dev@ list (27 +1 votes). Please find
>> below
>> references to the graduation preparation artifacts:
>> * discussion on dev list [1]
>> * vote thread [2]
>> * podling name search (still in progress) [3]
>> * incubation status [4]
>> * proposed resolution below
>>
>> We believe Apache Kylin is ready to become a top-level project and if
>>the
>> IPMC agree we will move to a formal vote.
>> There are a few more items to be updated on the project status page and
>> others during the next couple of days.
>>
>>
>> Many thanks to the mentors and the IPMC for the support,
>> Luke Han (on behalf of the Apache Kylin PPMC)
>>
>> [1] http://s.apache.org/KylinDisGraduate
>> [2] http://s.apache.org/KylinGraduateVote
>> [3] https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-86
>> [4] http://incubator.apache.org/projects/kylin.html
>>
>>
>>
>> Apache Kylin top-level project resolution:
>> ===
>>
>>WHEREAS, the Board of Directors deems it to be in the best
>>interests of the Foundation and consistent with the
>>Foundation's purpose to establish a Project Management
>>Committee charged with the creation and maintenance of
>>open-source software, for distribution at no charge to the
>>public, relative to distributed and scalable OLAP engine
>>
>>NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>>Committee (PMC), to be known as the "Apache Kylin Project",
>>be and hereby is established pursuant to Bylaws of the
>>Foundation; and be it further
>>
>>RESOLVED, that the Apache Kylin Project be and hereby is
>>responsible for the creation and maintenance of open-source
>>software related to distributed and scalable OLAP engine;
>>and be it further
>>
>>RESOLVED, that the office of "Vice President, Kylin" be and
>>hereby is created, the person holding such office to serve at
>>the direction of the Board of Directors as the chair of the
>>Apache Kylin Project, and to have primary responsibility for
>>management of the projects within the scope of responsibility
>>of the Apache Kylin Project; and be it further
>>
>>RESOLVED, that the persons listed immediately below be and
>>hereby are appointed to serve as the initial members of the
>>Apache Kylin Project:
>>
>> * Dayue Gao 
>> * Jason Zhong 
>> * Julian Hyde 
>> * Luke Han 
>> * Henry Saputra 
>> * Hongbin Ma 
>> * Hua Huang 
>> * Owen O'Malley 
>> * P. Taylor Goetz 
>> * Qianhao Zhou 
>> * Shaofeng Shi 
>> * Song Yi 
>> * Ted Dunning 
>> * Xu Jiang 
>> * Yang Li 
>> * Yerui Sun < sunyerui at apache dot org>
>>
>>
>>NOW, THEREFORE, BE IT FURTHER RESOLVED, that Luke Han
>>be appointed to the office of Vice President, Kylin, to serve
>>in accordance with and subject to the direction of the Board of
>>Directors and the Bylaws of the Foundation until death,
>>resignation, retirement, removal or disqualification, or until
>>a successor is appointed; and be it further
>>
>>RESOLVED, that the initial Apache Kylin Project be and hereby
>>is tasked with the creation of a set of bylaws intended to
>>encourage open development and increased participation in the
>>Kylin Project; and be it further
>>
>>RESOLVED, that the initial Apache Kylin Project be and hereby
>>is tasked with the migration and rationalization of the Apache
>>Incubator Kylin podling; and be it further
>>
>>RESOLVED, that all responsibility pertaining to the Apache
>>Incubator Kylin podling encumbered upon the Apache Incubator
>>PMC are hereafter discharged.
>>



Re: [VOTE] Accept Eagle into Apache Incubation

2015-10-23 Thread Adunuthula, Seshu
+1 (non binding)

On 10/23/15, 9:52 AM, "Hitesh Shah"  wrote:

>+1 (binding)
>
>‹ Hitesh
>
>On Oct 23, 2015, at 7:11 AM, Manoharan, Arun  wrote:
>
>> Hello Everyone,
>> 
>> Thanks for all the feedback on the Eagle Proposal.
>> 
>> I would like to call for a [VOTE] on Eagle joining the ASF as an
>>incubation project.
>> 
>> The vote is open for 72 hours:
>> 
>> [ ] +1 accept Eagle in the Incubator
>> [ ] ±0
>> [ ] -1 (please give reason)
>> 
>> Eagle is a Monitoring solution for Hadoop to instantly identify access
>>to sensitive data, recognize attacks, malicious activities and take
>>actions in real time. Eagle supports a wide variety of policies on HDFS
>>data and Hive. Eagle also provides machine learning models for detecting
>>anomalous user behavior in Hadoop.
>> 
>> The proposal is available on the wiki here:
>> https://wiki.apache.org/incubator/EagleProposal
>> 
>> The text of the proposal is also available at the end of this email.
>> 
>> Thanks for your time and help.
>> 
>> Thanks,
>> Arun
>> 
>> 
>> 
>> Eagle
>> 
>> Abstract
>> Eagle is an Open Source Monitoring solution for Hadoop to instantly
>>identify access to sensitive data, recognize attacks, malicious
>>activities in hadoop and take actions.
>> 
>> Proposal
>> Eagle audits access to HDFS files, Hive and HBase tables in real time,
>>enforces policies defined on sensitive data access and alerts or blocks
>>user¹s access to that sensitive data in real time. Eagle also creates
>>user profiles based on the typical access behaviour for HDFS and Hive
>>and sends alerts when anomalous behaviour is detected. Eagle can also
>>import sensitive data information classified by external classification
>>engines to help define its policies.
>> 
>> Overview of Eagle
>> Eagle has 3 main parts.
>> 1.Data collection and storage - Eagle collects data from various hadoop
>>logs in real time using Kafka/Yarn API and uses HDFS and HBase for
>>storage.
>> 2.Data processing and policy engine - Eagle allows users to create
>>policies based on various metadata properties on HDFS, Hive and HBase
>>data.
>> 3.Eagle services - Eagle services include policy manager, query service
>>and the visualization component. Eagle provides intuitive user interface
>>to administer Eagle and an alert dashboard to respond to real time
>>alerts.
>> 
>> Data Collection and Storage:
>> Eagle provides programming API for extending Eagle to integrate any
>>data source into Eagle policy evaluation framework. For example, Eagle
>>hdfs audit monitoring collects data from Kafka which is populated from
>>namenode log4j appender or from logstash agent. Eagle hive monitoring
>>collects hive query logs from running job through YARN API, which is
>>designed to be scalable and fault-tolerant. Eagle uses HBase as storage
>>for storing metadata and metrics data, and also supports relational
>>database through configuration change.
>> 
>> Data Processing and Policy Engine:
>> Processing Engine: Eagle provides stream processing API which is an
>>abstraction of Apache Storm. It can also be extended to other streaming
>>engines. This abstraction allows developers to assemble data
>>transformation, filtering, external data join etc. without physically
>>bound to a specific streaming platform. Eagle streaming API allows
>>developers to easily integrate business logic with Eagle policy engine
>>and internally Eagle framework compiles business logic execution DAG
>>into program primitives of underlying stream infrastructure e.g. Apache
>>Storm. For example, Eagle HDFS monitoring transforms audit log from
>>Namenode to object and joins sensitivity metadata, security zone
>>metadata which are generated from external programs or configured by
>>user. Eagle hive monitoring filters running jobs to get hive query
>>string and parses query string into object and then joins sensitivity
>>metadata.
>> Alerting Framework: Eagle Alert Framework includes stream metadata API,
>>scalable policy engine framework, extensible policy engine framework.
>>Stream metadata API allows developers to declare event schema including
>>what attributes constitute an event, what is the type for each
>>attribute, and how to dynamically resolve attribute value in runtime
>>when user configures policy. Scalable policy engine framework allows
>>policies to be executed on different physical nodes in parallel. It is
>>also used to define your own policy partitioner class. Policy engine
>>framework together with streaming partitioning capability provided by
>>all streaming platforms will make sure policies and events can be
>>evaluated in a fully distributed way. Extensible policy engine framework
>>allows developer to plugin a new policy engine with a few lines of
>>codes. WSO2 Siddhi CEP engine is the policy engine which Eagle supports
>>as first-class citizen.
>> Machine Learning module: Eagle provides capabilities to define user
>>activity patterns or user profiles for Hadoop users based on the user
>>behaviour in the platform. These user p

Re: [DISCUSS] SystemML Incubator Proposal

2015-10-24 Thread Adunuthula, Seshu
Hello Luciano,

Recently heard the presentation on SystemML at Apache BigData conference
and it sounds exciting. Looking forward to Apache Incubation.

Regards
Seshu Adunuthula


On 10/23/15, 5:34 PM, "Luciano Resende"  wrote:

>On Fri, Oct 23, 2015 at 5:30 PM, Henry Saputra 
>wrote:
>
>> Hi Luciano,
>>
>> Good proposal, but looks like
>> https://wiki.apache.org/incubator/SystemM does not exist?
>>
>
>Good catch, it's a typo on the original link and it's missing the L at the
>end, here is the correct link
>
>https://wiki.apache.org/incubator/SystemML
>
>
>
>>
>> Also, Reynold Xin and Patrick Wendell are not member of IPMCs so I
>> don't they could be mentors of this project, yet.
>>
>> They can ask to be member of IPMCs since both are already member of
>> ASF. But for now need to remove it from proposal.
>>
>>
>>
>Yes, they are aware of the requirement, and this will be fixed before we
>call a vote on the proposal.
>
>
>
>> - Henry
>>
>> On Fri, Oct 23, 2015 at 4:34 PM, Luciano Resende 
>> wrote:
>> > We would like to start a discussion on accepting SystemML as an Apache
>> > Incubator project.
>> >
>> > The proposal is available at :
>> > https://wiki.apache.org/incubator/SystemM
>> >
>> > And it's contents is also copied below.
>> >
>> > Thanks in Advance for you time reviewing and providing feedback.
>> >
>> > ==
>> >
>> > = SystemML =
>> >
>> > == Abstract ==
>> >
>> > SystemML provides declarative large-scale machine learning (ML) that
>>aims
>> > at flexible specification of ML algorithms and automatic generation of
>> > hybrid runtime plans ranging from single node, in-memory
>>computations, to
>> > distributed computations on Apache Hadoop and  Apache Spark. ML
>> algorithms
>> > are expressed in an R-like syntax, that includes linear algebra
>> primitives,
>> > statistical functions, and ML-specific constructs. This high-level
>> language
>> > significantly increases the productivity of data scientists as it
>> provides
>> > (1) full flexibility in expressing custom analytics, and (2) data
>> > independence from the underlying input formats and physical data
>> > representations. Automatic optimization according to data
>>characteristics
>> > such as distribution on the disk file system, and sparsity as well as
>> > processing characteristics in the distributed environment like number
>>of
>> > nodes, CPU, memory per node, ensures both efficiency and scalability.
>> >
>> > == Proposal ==
>> >
>> > The goal of SystemML is to create a commercial friendly, scalable and
>> > extensible machine learning framework for data scientists to create or
>> > extend machine learning algorithms using a declarative syntax. The
>> machine
>> > learning framework enables data scientists to develop algorithms
>>locally
>> > without the need of a distributed cluster, and scale up and scale out
>>the
>> > execution of these algorithms to distributed Hadoop or Spark clusters.
>> >
>> > == Background ==
>> >
>> > SystemML started as a research project in the IBM Almaden Research
>>Center
>> > around 2010 aiming to enable data scientists to develop machine
>>learning
>> > algorithms independent of data and cluster characteristics.
>> >
>> > == Rationale ==
>> >
>> > SystemML enables the specification of machine learning algorithms
>>using a
>> > declarative machine learning (DML) language. DML includes linear
>>algebra
>> > primitives, statistical functions, and additional constructs. This
>> > high-level language significantly increases the productivity of data
>> > scientists as it provides (1) full flexibility in expressing custom
>> > analytics and (2) data independence from the underlying input formats
>>and
>> > physical data representations.
>> >
>> > SystemML computations can be executed in a variety of different
>>modes. It
>> > supports single node in-memory computations and large-scale
>>distributed
>> > cluster computations. This allows the user to quickly prototype new
>> > algorithms in local environments but automatically scale to large data
>> > sizes as well without changing the algorithm implementation.
>> >
>> > Algorithms specified in DML are dynamically compiled and optimized
>>based
>> on
>> > data and cluster characteristics using rule-based and cost-based
>> > optimization techniques. The optimizer automatically generates hybrid
>> > runtime execution plans ranging from in-memory single-node execution
>>to
>> > distributed computations on Spark or Hadoop. This ensures both
>>efficiency
>> > and scalability. Automatic optimization reduces or eliminates the
>>need to
>> > hand-tune distributed runtime execution plans and system
>>configurations.
>> >
>> > == Initial Goals ==
>> >
>> > The initial goals to move SystemML to the Apache Incubator is to
>>broaden
>> > the community foster the contributions from data scientists to develop
>> new
>> > machine learning algorithms and enhance the existing ones. Ultimately,
>> this
>> > may lead to the creation of an industry standard in specifyin

Re: [DISCUSS] Impala incubator proposal

2015-11-17 Thread Adunuthula, Seshu
Awesome! Glad to see this becoming part of ASFŠ


On 11/17/15, 10:49 AM, "Henry Robinson"  wrote:

>Hi all -
>
>We'd like to start a discussion regarding a proposal to submit Impala to
>the Apache Incubator.
>
>The proposal text is available on the Wiki here:
>https://wiki.apache.org/incubator/ImpalaProposal
>
>and pasted below for convenience.
>
>I'm excited to make this proposal, and look forward to the community's
>input!
>
>Best,
>Henry
>
>
>= Abstract =
>Impala is a high-performance C++ and Java SQL query engine for data stored
>in Apache Hadoop-based clusters.
>
>= Proposal =
>
>We propose to contribute the Impala codebase and associated artifacts
>(e.g.
>documentation, web-site content etc.) to the Apache Software Foundation
>with the intent of forming a productive, meritocratic and open community
>around Impala¹s continued development, according to the ŒApache Way¹.
>
>Cloudera owns several trademarks regarding Impala, and proposes to
>transfer
>ownership of those trademarks in full to the ASF.
>
>= Background =
>Engineers at Cloudera developed Impala and released it as an
>Apache-licensed open-source project in Fall 2012. Impala was written as a
>brand-new, modern C++ SQL engine targeted from the start for data stored
>in
>Apache Hadoop clusters.
>
>Impala¹s most important benefit to users is high-performance, making it
>extremely appropriate for common enterprise analytic and business
>intelligence workloads. This is achieved by a number of software
>techniques, including: native support for data stored in HDFS and related
>filesystems, just-in-time compilation and optimization of individual query
>plans, high-performance C++ codebase and massively-parallel distributed
>architecture. In benchmarks, Impala is routinely amongst the very highest
>performing SQL query engines.
>
>= Rationale =
>
>Despite the exciting innovation in the so-called Œbig-data¹ space, SQL
>remains by far the most common interface for interacting with data in both
>traditional warehouses and modern Œbig-data¹ clusters. There is clearly a
>need, as evidenced by the eager adoption of Impala and other SQL engines
>in
>enterprise contexts, for a query engine that offers the familiar SQL
>interface, but that has been specifically designed to operate in massive,
>distributed clusters rather than in traditional, fixed-hardware,
>warehouse-specific deployments. Impala is one such query engine.
>
>We believe that the ASF is the right venue to foster an open-source
>community around Impala¹s development. We expect that Impala will benefit
>from more productive collaboration with related Apache projects, and under
>the auspices of the ASF will attract talented contributors who will push
>Impala¹s development forward at pace.
>
>We believe that the timing is right for Impala¹s development to move
>wholesale to the ASF: Impala is well-established, has been Apache-licensed
>open-source for more than three years, and the core project is relatively
>stable. We are excited to see where an ASF-based community can take Impala
>from this strong starting point.
>
>= Initial Goals =
>Our initial goals are as follows:
>
>* Establish ASF-compatible engineering practices and workflows
>* Refactor and publish existing internal build scripts and test
>infrastructure, in order to make them usable by any community member.
>* Transfer source code, documentation and associated artifacts to the ASF.
>* Grow the user and developer communities
>
>= Current Status =
>
>Impala is developed as an Apache-licensed open-source project. The source
>code is available at http://github.com/cloudera/Impala, and developer
>documentation is at https://github.com/cloudera/Impala/wiki. The majority
>of commits to the project have come from Cloudera-employed developers, but
>we have accepted some contributions from individuals from other
>organizations.
>
>All code reviews are done via a public instance of the Gerrit review tool
>at http://gerrit.cloudera.org:8080/, and discussed on a public mailing
>list. All patches must be reviewed before they are accepted into the
>codebase, via a voting mechanism that is similar to that used on Apache
>projects such as Hadoop and HBase.
>
>Before a patch is committed, it must pass a suite of pre-commit tests.
>These tests are currently run on Cloudera¹s internal infrastructure. One
>of
>our initial goals will be to work with the ASF Infrastructure team to find
>a way to run these tests in an acceptable way on publicly accessible
>machines.
>
>Issues are tracked in JIRA at https://issues.cloudera.org/projects/IMPALA,
>in a way that is extremely similar to existing practices at other ASF
>projects.
>
>= Meritocracy =
>
>We understand the central importance of meritocracy to the Apache Way. We
>will work to establish a welcoming, fair and meritocratic community, in
>part by expanding the set of committers on the project. Although Impala¹s
>committer list will initially be dominated by members of the Impala
>engineering team at Cloudera, 

Re: [DISCUSS] Apache Dataflow Incubator Proposal

2016-01-22 Thread Adunuthula, Seshu
Awesome to see CloudDataFlow coming to Apache. The Stream Processing area
has been in general fragmented with a variety of solutions, hoping the
community galvanizes around Apache Data Flow.

We are still in the "Apache Storm" world, Any chance for folks building a
"Storm Runner²?
 

On 1/20/16, 9:39 AM, "James Malone"  wrote:

>> Great proposal. I like that your proposal includes a well presented
>> roadmap, but I don't see any goals that directly address building a
>>larger
>> community. Y'all have any ideas around outreach that will help with
>> adoption?
>>
>
>Thank you and fair point. We have a few additional ideas which we can put
>into the Community section.
>
>
>>
>> As a start, I recommend y'all add a section to the proposal on the wiki
>> page for "Additional Interested Contributors" so that folks who want to
>> sign up to participate in the project can do so without requesting
>> additions to the initial committer list.
>>
>>
>This is a great idea and I think it makes a lot of sense to add an
>"Additional
>Interested Contributors" section to the proposal.
>
>
>> On Wed, Jan 20, 2016 at 10:32 AM, James Malone <
>> jamesmal...@google.com.invalid> wrote:
>>
>> > Hello everyone,
>> >
>> > Attached to this message is a proposed new project - Apache Dataflow,
>>a
>> > unified programming model for data processing and integration.
>> >
>> > The text of the proposal is included below. Additionally, the
>>proposal is
>> > in draft form on the wiki where we will make any required changes:
>> >
>> > https://wiki.apache.org/incubator/DataflowProposal
>> >
>> > We look forward to your feedback and input.
>> >
>> > Best,
>> >
>> > James
>> >
>> > 
>> >
>> > = Apache Dataflow =
>> >
>> > == Abstract ==
>> >
>> > Dataflow is an open source, unified model and set of language-specific
>> SDKs
>> > for defining and executing data processing workflows, and also data
>> > ingestion and integration flows, supporting Enterprise Integration
>> Patterns
>> > (EIPs) and Domain Specific Languages (DSLs). Dataflow pipelines
>>simplify
>> > the mechanics of large-scale batch and streaming data processing and
>>can
>> > run on a number of runtimes like Apache Flink, Apache Spark, and
>>Google
>> > Cloud Dataflow (a cloud service). Dataflow also brings DSL in
>>different
>> > languages, allowing users to easily implement their data integration
>> > processes.
>> >
>> > == Proposal ==
>> >
>> > Dataflow is a simple, flexible, and powerful system for distributed
>>data
>> > processing at any scale. Dataflow provides a unified programming
>>model, a
>> > software development kit to define and construct data processing
>> pipelines,
>> > and runners to execute Dataflow pipelines in several runtime engines,
>> like
>> > Apache Spark, Apache Flink, or Google Cloud Dataflow. Dataflow can be
>> used
>> > for a variety of streaming or batch data processing goals including
>>ETL,
>> > stream analysis, and aggregate computation. The underlying programming
>> > model for Dataflow provides MapReduce-like parallelism, combined with
>> > support for powerful data windowing, and fine-grained correctness
>> control.
>> >
>> > == Background ==
>> >
>> > Dataflow started as a set of Google projects focused on making data
>> > processing easier, faster, and less costly. The Dataflow model is a
>> > successor to MapReduce, FlumeJava, and Millwheel inside Google and is
>> > focused on providing a unified solution for batch and stream
>>processing.
>> > These projects on which Dataflow is based have been published in
>>several
>> > papers made available to the public:
>> >
>> > * MapReduce - http://research.google.com/archive/mapreduce.html
>> >
>> > * Dataflow model  - http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>> >
>> > * FlumeJava - http://notes.stephenholiday.com/FlumeJava.pdf
>> >
>> > * MillWheel - http://research.google.com/pubs/pub41378.html
>> >
>> > Dataflow was designed from the start to provide a portable programming
>> > layer. When you define a data processing pipeline with the Dataflow
>> model,
>> > you are creating a job which is capable of being processed by any
>>number
>> of
>> > Dataflow processing engines. Several engines have been developed to
>>run
>> > Dataflow pipelines in other open source runtimes, including a Dataflow
>> > runner for Apache Flink and Apache Spark. There is also a ³direct
>> runner²,
>> > for execution on the developer machine (mainly for dev/debug
>>purposes).
>> > Another runner allows a Dataflow program to run on a managed service,
>> > Google Cloud Dataflow, in Google Cloud Platform. The Dataflow Java
>>SDK is
>> > already available on GitHub, and independent from the Google Cloud
>> Dataflow
>> > service. Another Python SDK is currently in active development.
>> >
>> > In this proposal, the Dataflow SDKs, model, and a set of runners will
>>be
>> > submitted as an OSS project under the ASF. The runners which are a
>>part
>> of
>> > this proposal include those for Spark (from Cloudera), Flink

Re: [DISCUSS] Apache Dataflow Incubator Proposal

2016-01-23 Thread Adunuthula, Seshu
Did not get a chance to play with it yet, Within Google is it used more as
a MR replacement or a Stream processing engine? Or it does both of them
fantastically well?


On 1/22/16, 10:58 AM, "Frances Perry"  wrote:

>Crunch started as a clone of FlumeJava, which was Google internal. In the
>meantime inside Google, FlumeJava evolved into Dataflow. So all three
>share
>a number of concepts like PCollections, ParDo, DoFn, etc. However,
>Dataflow
>adds a number of new things -- the biggest being a unified batch/streaming
>semantics using concepts like Windowing and Triggers. Tyler Akidau's
>OReilly post has a really nice explanation:
>https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
>
>On Fri, Jan 22, 2016 at 10:42 AM, Ashish  wrote:
>
>> Crunch has Spark pipelines, but not sure about the runner abstraction.
>>
>> May be Josh Wills or Tom White can provide more insight on this topic.
>> They are core devs for both projects :)
>>
>> On Fri, Jan 22, 2016 at 9:47 AM, Jean-Baptiste Onofré 
>> wrote:
>> > Hi,
>> >
>> > I don't know deeply Crunch, but AFAIK, Crunch creates MapReduce
>> pipeline, it
>> > doesn't provide runner abstraction. It's based on FlumeJava.
>> >
>> > The logic is very similar (with DoFns, pipelines, ...). Correct me if
>>I'm
>> > wrong, but Crunch started after Google Dataflow, especially because
>> Dataflow
>> > was not opensourced at that time.
>> >
>> > So, I agree it's very similar/close.
>> >
>> > Regards
>> > JB
>> >
>> >
>> > On 01/22/2016 05:51 PM, Ashish wrote:
>> >>
>> >> Hi JB,
>> >>
>> >> Curious to know about how it compares to Apache Crunch? Constructs
>> >> looks very familiar (had used Crunch long ago)
>> >>
>> >> Thoughts?
>> >>
>> >> - Ashish
>> >>
>> >> On Fri, Jan 22, 2016 at 6:33 AM, Jean-Baptiste Onofré
>>
>> >> wrote:
>> >>>
>> >>> Hi Seshu,
>> >>>
>> >>> I blogged about Apache Dataflow proposal:
>> >>> http://blog.nanthrax.net/2016/01/introducing-apache-dataflow/
>> >>>
>> >>> You can see in the "what's next ?" section that new runners, skins
>>and
>> >>> sources are on our roadmap. Definitely, a storm runner could be
>>part of
>> >>> this.
>> >>>
>> >>> Regards
>> >>> JB
>> >>>
>> >>>
>> >>> On 01/22/2016 03:31 PM, Adunuthula, Seshu wrote:
>> >>>>
>> >>>>
>> >>>> Awesome to see CloudDataFlow coming to Apache. The Stream
>>Processing
>> >>>> area
>> >>>> has been in general fragmented with a variety of solutions, hoping
>>the
>> >>>> community galvanizes around Apache Data Flow.
>> >>>>
>> >>>> We are still in the "Apache Storm" world, Any chance for folks
>> building
>> >>>> a
>> >>>> "Storm Runner²?
>> >>>>
>> >>>>
>> >>>> On 1/20/16, 9:39 AM, "James Malone"
>>
>> >>>> wrote:
>> >>>>
>> >>>>>> Great proposal. I like that your proposal includes a well
>>presented
>> >>>>>> roadmap, but I don't see any goals that directly address
>>building a
>> >>>>>> larger
>> >>>>>> community. Y'all have any ideas around outreach that will help
>>with
>> >>>>>> adoption?
>> >>>>>>
>> >>>>>
>> >>>>> Thank you and fair point. We have a few additional ideas which we
>>can
>> >>>>> put
>> >>>>> into the Community section.
>> >>>>>
>> >>>>>
>> >>>>>>
>> >>>>>> As a start, I recommend y'all add a section to the proposal on
>>the
>> >>>>>> wiki
>> >>>>>> page for "Additional Interested Contributors" so that folks who
>>want
>> >>>>>> to
>> >>>>>> sign up to participate in the project can do so without
>>requesting
>> >>>>>> additions to the initial committer list.
>> >>>

Re: [VOTE] Accept Beam into the Apache Incubator

2016-01-28 Thread Adunuthula, Seshu
+1 (non-binding)

On 1/28/16, 12:05 PM, "Julian Hyde"  wrote:

>+1 (binding)
>
>> On Jan 28, 2016, at 10:42 AM, Mayank Bansal  wrote:
>> 
>> +1 (non-binding)
>> 
>> Thanks,
>> Mayank
>> 
>> On Thu, Jan 28, 2016 at 10:23 AM, Seetharam Venkatesh <
>> venkat...@innerzeal.com> wrote:
>> 
>>> +1 (binding).
>>> 
>>> Thanks!
>>> 
>>> On Thu, Jan 28, 2016 at 10:19 AM Ted Dunning 
>>> wrote:
>>> 
 +1
 
 
 
 On Thu, Jan 28, 2016 at 10:02 AM, John D. Ament

 wrote:
 
> +1
> 
> On Thu, Jan 28, 2016 at 9:28 AM Jean-Baptiste Onofré
>
> wrote:
> 
>> Hi,
>> 
>> the Beam proposal (initially Dataflow) was proposed last week.
>> 
>> The complete discussion thread is available here:
>> 
>> 
>> 
> 
 
>>> 
>>>http://mail-archives.apache.org/mod_mbox/incubator-general/201601.mbox/%
>>>3CCA%2B%3DKJmvj4wyosNTXVpnsH8PhS7jEyzkZngc682rGgZ3p28L42Q%40mail.gmail.c
>>>om%3E
>> 
>> As reminder the BeamProposal is here:
>> 
>> https://wiki.apache.org/incubator/BeamProposal
>> 
>> Regarding all the great feedbacks we received on the mailing list,
>>we
>> think it's time to call a vote to accept Beam into the Incubator.
>> 
>> Please cast your vote to:
>> [] +1 - accept Apache Beam as a new incubating project
>> []  0 - not sure
>> [] -1 - do not accept the Apache Beam project (because: ...)
>> 
>> Thanks,
>> Regards
>> JB
>> 
>> ## page was renamed from DataflowProposal
>> = Apache Beam =
>> 
>> == Abstract ==
>> 
>> Apache Beam is an open source, unified model and set of
>> language-specific SDKs for defining and executing data processing
>> workflows, and also data ingestion and integration flows, supporting
>> Enterprise Integration Patterns (EIPs) and Domain Specific Languages
>> (DSLs). Dataflow pipelines simplify the mechanics of large-scale
>>> batch
>> and streaming data processing and can run on a number of runtimes
>>> like
>> Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud
 service).
>> Beam also brings DSL in different languages, allowing users to
>>easily
>> implement their data integration processes.
>> 
>> == Proposal ==
>> 
>> Beam is a simple, flexible, and powerful system for distributed data
>> processing at any scale. Beam provides a unified programming model,
>>a
>> software development kit to define and construct data processing
>> pipelines, and runners to execute Beam pipelines in several runtime
>> engines, like Apache Spark, Apache Flink, or Google Cloud Dataflow.
 Beam
>> can be used for a variety of streaming or batch data processing
>>goals
>> including ETL, stream analysis, and aggregate computation. The
>> underlying programming model for Beam provides MapReduce-like
>> parallelism, combined with support for powerful data windowing, and
>> fine-grained correctness control.
>> 
>> == Background ==
>> 
>> Beam started as a set of Google projects (Google Cloud Dataflow)
 focused
>> on making data processing easier, faster, and less costly. The Beam
>> model is a successor to MapReduce, FlumeJava, and Millwheel inside
>> Google and is focused on providing a unified solution for batch and
>> stream processing. These projects on which Beam is based have been
>> published in several papers made available to the public:
>> 
>>  * MapReduce - http://research.google.com/archive/mapreduce.html
>>  * Dataflow model  -
>>> http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>>  * FlumeJava - http://research.google.com/pubs/pub35650.html
>>  * MillWheel - http://research.google.com/pubs/pub41378.html
>> 
>> Beam was designed from the start to provide a portable programming
>> layer. When you define a data processing pipeline with the Beam
>>> model,
>> you are creating a job which is capable of being processed by any
 number
>> of Beam processing engines. Several engines have been developed to
>>> run
>> Beam pipelines in other open source runtimes, including a Beam
>>runner
>> for Apache Flink and Apache Spark. There is also a ³direct runner²,
>>> for
>> execution on the developer machine (mainly for dev/debug purposes).
>> Another runner allows a Beam program to run on a managed service,
 Google
>> Cloud Dataflow, in Google Cloud Platform. The Dataflow Java SDK is
>> already available on GitHub, and independent from the Google Cloud
>> Dataflow service. Another Python SDK is currently in active
 development.
>> 
>> In this proposal, the Beam SDKs, model, and a set of runners will be
>> submitted as an OSS project under the ASF. The runners which are a
>>> part
>> of this proposal include those for Spark (from Cloudera), Flink
>>(from
>> data Artisans), and local developme

Re: [VOTE] Accept Joshua as an Apache Incubator Podling

2016-02-12 Thread Adunuthula, Seshu
Is there a fail grade? ;)


On 2/12/16, 11:57 AM, "Tom Barber"  wrote:

>You're making the presumption its passed its vote! ;)
>
>On Fri, Feb 12, 2016 at 7:33 PM, Mattmann, Chris A (3980) <
>chris.a.mattm...@jpl.nasa.gov> wrote:
>
>> Yep, will send a result shortly.
>>
>> Lewis, after that, can you help me get the podling bootstrap tasks
>> started?
>>
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>>
>>
>>
>>
>>
>> -Original Message-
>> From: Lewis John Mcgibbney 
>> Reply-To: "general@incubator.apache.org" 
>> Date: Friday, February 12, 2016 at 11:31 AM
>> To: "general@incubator.apache.org" 
>> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
>>
>> >Hi Chris,
>> >Is it time to close out this VOTE and bring Joshua on board?
>> >Lewis
>> >
>> >On Wed, Feb 3, 2016 at 4:01 PM,
>>> >
>> >wrote:
>> >
>> >>
>> >> From: Danese Cooper 
>> >> To: "general@incubator.apache.org" 
>> >> Cc: "p...@cs.jhu.edu" 
>> >> Date: Wed, 3 Feb 2016 07:43:11 -0800
>> >> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling
>> >> +1 (binding) Accept Joshua as an Apache Incubator podling.
>> >>
>> >> D
>> >>
>> >> > On Jan 30, 2016, at 12:00 PM, Mattmann, Chris A (3980) <
>> >> chris.a.mattm...@jpl.nasa.gov> wrote:
>> >> >
>> >> > Hi Everyone,
>> >> >
>> >> > OK the discussion is now completed. Please VOTE to accept Joshua
>> >> > into the Apache Incubator. I’ll leave the VOTE open for at least
>> >> > the next 72 hours, with hopes to close it next Friday the 5th of
>> >> > February, 2016.
>> >> >
>> >> > [ ] +1 Accept Joshua as an Apache Incubator podling.
>> >> > [ ] +0 Abstain.
>> >> > [ ] -1 Don’t accept Joshua as an Apache Incubator podling
>>because..
>> >> >
>> >> > Of course, I am +1 on this. Please note VOTEs from Incubator PMC
>> >> > members are binding but all are welcome to VOTE!
>> >> >
>> >> > Cheers,
>> >> > Chris
>> >> >
>> >> > ++
>> >> > Chris Mattmann, Ph.D.
>> >> > Chief Architect
>> >> > Instrument Software and Science Data Systems Section (398)
>> >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> > Office: 168-519, Mailstop: 168-527
>> >> > Email: chris.a.mattm...@nasa.gov
>> >> > WWW:  http://sunset.usc.edu/~mattmann/
>> >> > ++
>> >> > Adjunct Associate Professor, Computer Science Department
>> >> > University of Southern California, Los Angeles, CA 90089 USA
>> >> > ++
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > -Original Message-
>> >> > From: jpluser 
>> >> > Date: Tuesday, January 12, 2016 at 10:56 PM
>> >> > To: "general@incubator.apache.org" 
>> >> > Cc: "p...@cs.jhu.edu" 
>> >> > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine
>> >>Translation
>> >> > Toolkit
>> >> >
>> >> >> Hi Everyone,
>> >> >>
>> >> >> Please find attached for your viewing pleasure a proposed new
>> >>project,
>> >> >> Apache Joshua, a statistical machine translation toolkit. The
>> >>proposal
>> >> >> is in wiki draft form at:
>> >> https://wiki.apache.org/incubator/JoshuaProposal
>> >> >>
>> >> >> Proposal text is copied below. I’ll leave the discussion open
>>for a
>> >> week
>> >> >> and we are interested in folks who would like to be initial
>> >>committers
>> >> >> and mentors. Please discuss here on the thread.
>> >> >>
>> >> >> Thanks!
>> >> >>
>> >> >> Cheers,
>> >> >> Chris (Champion)
>> >> >>
>> >> >> ———
>> >> >>
>> >> >> = Joshua Proposal =
>> >> >>
>> >> >> == Abstract ==
>> >> >> [[joshua-decoder.org|Joshua]] is an open-source statistical
>>machine
>> >> >> translation toolkit. It includes a Java-based decoder for
>>translating
>> >> with
>> >> >> phrase-based, hierarchical, and syntax-based translation models, a
>> >> >> Hadoop-based grammar extractor (Thrax), and an extensive set of
>>tools
>> >> and
>> >> >> scripts for training and evaluating new models from parallel text.
>> >> >>
>> >> >> == Proposal ==
>> >> >> Joshua is a state of the art statistical machine translation
>>system
>> >>that
>> >> >> provides a number of features:
>> >> >>
>> >> >> * Support for the two main paradigms in statistical machine
>> >>translation:
>> >> >> phrase-based and hierarchical / syntactic.
>> >> >> * A sparse feature API that makes it easy to add new feature
>> >>templates
>> >> >> supporting m