Re: [VOTE] Accept Provisionr into the Apache Incubator

2013-03-06 Thread Andrei Savu
Thanks to all who voted! With 18 +1s (10 binding) the vote passes.

I'll start the work to get the podling started.

Thanks,
Andrei

On Mon, Mar 4, 2013 at 9:31 PM, Henry Saputra wrote:

> +1 non-binding
>
> Good luck
>
>
> On Sat, Mar 2, 2013 at 3:35 PM, Andrei Savu  wrote:
>
> > Hi Guys,
> >
> > I'd like to call a VOTE for acceptance of Provisionr into the Apache
> > Incubator.
> >
> > The vote will close on March 8.
> >
> > [] +1 Accept Provisionr into the Apache incubator
> > [] +0 Don't care.
> > [] -1 Don't accept Provisionr into the incubator because...
> >
> > Full proposal is pasted at the bottom on this email, and the
> corresponding
> > wiki is http://wiki.apache.org/incubator/ProvisionrProposal
> >
> > Only VOTEs from Incubator PMC members are binding, but all are welcome to
> > express their thoughts.
> >
> > Thanks,
> > Andrei Savu
> >
> > --
> > Provisionr Proposal
> >
> > == Abstract ==
> >
> > Provisionr is an effort to develop a service that can be used to create
> and
> > manage pools of virtual machines on multiple clouds. Our focus is on
> > semi-automated workflows and cloud portability.
> >
> > == Proposal ==
> >
> > Provisionr solves the problem of cloud portability by hiding completely
> the
> > APIs and only focusing on building a cluster that matches the same set of
> > assumptions on all clouds, assumptions like: running a specific operating
> > system (e.g. Ubuntu 12.04 LTS), having the same set of pre-installed
> > packages and binaries, sane dns settings (forward & reverse ip
> resolution -
> > as needed for Hadoop), ntp settings, networking settings, firewall, ssh
> > admin access, vpn access etc.
> >
> > As a secondary goal Provisionr should also provide primitives for
> building
> > automatic or semi-automatic workflows for configuring services, workflows
> > that assume that all the machines share a common set of characteristics
> as
> > described above.
> >
> > == Background ==
> >
> > Creating clusters on cloud infrastructure is non-trivial because careful
> > orchestration is required. To make it easy to deploy services we need to
> > start from a foundation that matches a common set of assumptions on
> > multiple providers.
> >
> > == Rationale ==
> >
> > This project started as a re-write of the core of Apache Whirr but has a
> > different target being more focused on semi-automated workflows and cloud
> > portability.
> >
> > == Initial Goals ==
> >
> >  * Build a community
> >  * Provide an excellent user experience for semi-automatic workflows
> (e.g.
> > using Rundeck)
> >  * Implement a REST service and a Web Console
> >  * Add support for more providers
> >
> > == Current Status ==
> >
> > Provisionr had four releases on [[
> > https://github.com/axemblr/axemblr-provisionr/wiki|GitHub]] and it's
> used
> > to deploy Hadoop clusters on-demand at Axemblr and infrastructure for
> > testing / QA.
> >
> > === Meritocracy ===
> >
> > We plan to invest in supporting a meritocracy. We will discuss the
> > requirements in an open forum. Several companies have already expressed
> > interest in this project, and we intend to invite additional developers
> to
> > participate. We will encourage and monitor community participation so
> that
> > privileges can be extended to those that contribute.
> >
> > === Community ===
> >
> > The community interested in cloud service infrastructure is currently
> > spread across many smaller projects, and one of the main goals of this
> > project is to build a vibrant community to share best practices and build
> > common infrastructure.
> >
> > === Core developers ===
> >
> > Core developers are very experienced in the Apache ecosystem. To achieve
> > more diversity of developers, we will be eager to recruit developers from
> > diverse companies.
> >
> >  * Andrei Savu - asavu at apache dot org  (Apache Whirr PMC)
> >  * Ioan Eugen Stan - ieugen at apache dot org (Apache James PMC)
> >  * Alex Ciminian -  alex.ciminian at gmail dot org
> >
> > === Alignment ===
> >
> > Provisionr complements Apache Whirr and later on it should provide a
> robust
> > foundation for more advanced functionalities.
> >
> > == Known Risks ==
> >
> > === Orphaned products ===
> >
> > The contributors have significant open source experience and the project
> is
> > being used as part of a commercial product, so the risk of being orphaned
> > is relatively low. We plan to mitigate this risk by recruiting additional
> > committers.
> >
> > === Inexperience with Open Source ===
> >
> > Most of the initial committers have experience working on open source
> > projects. Andrei Savu and Ioan Eugen Stan have experience as committers
> and
> > PMC members on other Apache projects.
> >
> > === Homogenous Developers ===
> >
> > We are committed to recruiting additional committers from other companies
> > based on their contributions to the project.
> >
> > === Reliance on Salaried Developers ===
> >
> > It is expected that Provisionr development will occur on bo

Re: [VOTE] Accept Curator into the Incubator

2013-03-06 Thread Mahadev Konar
+1 (binding)


thanks
mahadev

On Wed, Mar 6, 2013 at 7:14 PM, Enis Söztutar  wrote:

> +1 (binding)
>
> Disclosure: I am one of the mentors.
>
>
> On Wed, Mar 6, 2013 at 4:27 AM, Ioan Eugen Stan  >wrote:
>
> > +1 non binding
> >
> > --
> > Ioan Eugen Stan
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>


Re: [VOTE] Accept Curator into the Incubator

2013-03-06 Thread Enis Söztutar
+1 (binding)

Disclosure: I am one of the mentors.


On Wed, Mar 6, 2013 at 4:27 AM, Ioan Eugen Stan wrote:

> +1 non binding
>
> --
> Ioan Eugen Stan
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [VOTE] Accept MRQL into the Incubator

2013-03-06 Thread Edward J. Yoon
+1

On Thu, Mar 7, 2013 at 2:11 AM, Tommaso Teofili
 wrote:
> +1
>
> Tommaso
>
>
> 2013/3/6 Alex Karasulu 
>
>> +1 (binding)
>>
>>
>> On Wed, Mar 6, 2013 at 7:04 PM, Leonidas Fegaras > >wrote:
>>
>> > Dear ASF members,
>> > I would like to call for a VOTE for acceptance of MRQL into the
>> Incubator.
>> > The vote will close on Monday March 11, 2013.
>> >
>> > [ ] +1 Accept MRQL into the Apache incubator
>> > [ ] +0 Don't care.
>> > [ ] -1 Don't accept MRQL into the incubator because...
>> >
>> > Full proposal is pasted below and the corresponding wiki is
>> >
>> > http://wiki.apache.org/**incubator/MRQLProposal<
>> http://wiki.apache.org/incubator/MRQLProposal>
>> >
>> > Only VOTEs from Incubator PMC members are binding,
>> > but all are welcome to express their thoughts.
>> > Sincerely,
>> > Leonidas Fegaras
>> >
>> >
>> > = Abstract =
>> >
>> > MRQL is a query processing and optimization system for large-scale,
>> > distributed data analysis, built on top of Apache Hadoop and Hama.
>> >
>> > = Proposal =
>> >
>> > MRQL (pronounced ''miracle'') is a query processing and optimization
>> > system for large-scale, distributed data analysis. MRQL (the MapReduce
>> > Query Language) is an SQL-like query language for large-scale data
>> > analysis on a cluster of computers. The MRQL query processing system
>> > can evaluate MRQL queries in two modes: in MapReduce mode on top of
>> > Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of
>> > Apache Hama. The MRQL query language is powerful enough to express
>> > most common data analysis tasks over many forms of raw ''in-situ''
>> > data, such as XML and JSON documents, binary files, and CSV
>> > documents. MRQL is more powerful than other current high-level
>> > MapReduce languages, such as Hive and PigLatin, since it can operate
>> > on more complex data and supports more powerful query constructs, thus
>> > eliminating the need for using explicit MapReduce code. With MRQL,
>> > users will be able to express complex data analysis tasks, such as
>> > PageRank, k-means clustering, matrix factorization, etc, using
>> > SQL-like queries exclusively, while the MRQL query processing system
>> > will be able to compile these queries to efficient Java code.
>> >
>> > = Background =
>> >
>> > The initial code was developed at the University of Texas of Arlington
>> > (UTA) by a research team, led by Leonidas Fegaras. The software was
>> > first released in May 2011. The original goal of this project was to
>> > build a query processing system that translates SQL-like data analysis
>> > queries to efficient workflows of MapReduce jobs. A design goal was to
>> > use HDFS as the physical storage layer, without any indexing, data
>> > partitioning, or data normalization, and to use Hadoop (without
>> > extensions) as the run-time engine. The motivation behind this work
>> > was to build a platform to test new ideas on query processing and
>> > optimization techniques applicable to the MapReduce framework.
>> >
>> > A year ago, MRQL was extended to run on Hama. The motivation for this
>> > extension was that Hadoop MapReduce jobs were required to read their
>> > input and write their output on HDFS. This simplifies reliability and
>> > fault tolerance but it imposes a high overhead to complex MapReduce
>> > workflows and graph algorithms, such as PageRank, which require
>> > repetitive jobs. In addition, Hadoop does not preserve data in memory
>> > across consecutive MapReduce jobs. This restriction requires to read
>> > data at every step, even when the data is constant. BSP, on the other
>> > hand, does not suffer from this restriction, and, under certain
>> > circumstances, allows complex repetitive algorithms to run entirely in
>> > the collective memory of a cluster. Thus, the goal was to be able to
>> > run the same MRQL queries in both modes, MapReduce and BSP, without
>> > modifying the queries: If there are enough resources available, and
>> > low latency and speed are more important than resilience, queries may
>> > run in BSP mode; otherwise, the same queries may run in MapReduce
>> > mode. BSP evaluation was found to be a good choice when fault
>> > tolerance is not critical, data (both input and intermediate) can fit
>> > in the cluster memory, and data processing requires complex/repetitive
>> > steps.
>> >
>> > The research results of this ongoing work have already been published
>> > in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors
>> > have already received positive feedback from researchers in academia
>> > and industry who were attending these conferences.
>> >
>> > = Rationale =
>> >
>> > * MRQL will be the first general-purpose, SQL-like query language for
>> > data analysis based on BSP.
>> > Currently, many programmers prefer to code their MapReduce
>> > applications in a higher-level query language, rather than an
>> > algorithmic language. For instance, Pig is used for 60% of Yahoo
>> > MapReduce jobs, while Hive is

Re: [VOTE] Accept MRQL into the Incubator

2013-03-06 Thread Tommaso Teofili
+1

Tommaso


2013/3/6 Alex Karasulu 

> +1 (binding)
>
>
> On Wed, Mar 6, 2013 at 7:04 PM, Leonidas Fegaras  >wrote:
>
> > Dear ASF members,
> > I would like to call for a VOTE for acceptance of MRQL into the
> Incubator.
> > The vote will close on Monday March 11, 2013.
> >
> > [ ] +1 Accept MRQL into the Apache incubator
> > [ ] +0 Don't care.
> > [ ] -1 Don't accept MRQL into the incubator because...
> >
> > Full proposal is pasted below and the corresponding wiki is
> >
> > http://wiki.apache.org/**incubator/MRQLProposal<
> http://wiki.apache.org/incubator/MRQLProposal>
> >
> > Only VOTEs from Incubator PMC members are binding,
> > but all are welcome to express their thoughts.
> > Sincerely,
> > Leonidas Fegaras
> >
> >
> > = Abstract =
> >
> > MRQL is a query processing and optimization system for large-scale,
> > distributed data analysis, built on top of Apache Hadoop and Hama.
> >
> > = Proposal =
> >
> > MRQL (pronounced ''miracle'') is a query processing and optimization
> > system for large-scale, distributed data analysis. MRQL (the MapReduce
> > Query Language) is an SQL-like query language for large-scale data
> > analysis on a cluster of computers. The MRQL query processing system
> > can evaluate MRQL queries in two modes: in MapReduce mode on top of
> > Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of
> > Apache Hama. The MRQL query language is powerful enough to express
> > most common data analysis tasks over many forms of raw ''in-situ''
> > data, such as XML and JSON documents, binary files, and CSV
> > documents. MRQL is more powerful than other current high-level
> > MapReduce languages, such as Hive and PigLatin, since it can operate
> > on more complex data and supports more powerful query constructs, thus
> > eliminating the need for using explicit MapReduce code. With MRQL,
> > users will be able to express complex data analysis tasks, such as
> > PageRank, k-means clustering, matrix factorization, etc, using
> > SQL-like queries exclusively, while the MRQL query processing system
> > will be able to compile these queries to efficient Java code.
> >
> > = Background =
> >
> > The initial code was developed at the University of Texas of Arlington
> > (UTA) by a research team, led by Leonidas Fegaras. The software was
> > first released in May 2011. The original goal of this project was to
> > build a query processing system that translates SQL-like data analysis
> > queries to efficient workflows of MapReduce jobs. A design goal was to
> > use HDFS as the physical storage layer, without any indexing, data
> > partitioning, or data normalization, and to use Hadoop (without
> > extensions) as the run-time engine. The motivation behind this work
> > was to build a platform to test new ideas on query processing and
> > optimization techniques applicable to the MapReduce framework.
> >
> > A year ago, MRQL was extended to run on Hama. The motivation for this
> > extension was that Hadoop MapReduce jobs were required to read their
> > input and write their output on HDFS. This simplifies reliability and
> > fault tolerance but it imposes a high overhead to complex MapReduce
> > workflows and graph algorithms, such as PageRank, which require
> > repetitive jobs. In addition, Hadoop does not preserve data in memory
> > across consecutive MapReduce jobs. This restriction requires to read
> > data at every step, even when the data is constant. BSP, on the other
> > hand, does not suffer from this restriction, and, under certain
> > circumstances, allows complex repetitive algorithms to run entirely in
> > the collective memory of a cluster. Thus, the goal was to be able to
> > run the same MRQL queries in both modes, MapReduce and BSP, without
> > modifying the queries: If there are enough resources available, and
> > low latency and speed are more important than resilience, queries may
> > run in BSP mode; otherwise, the same queries may run in MapReduce
> > mode. BSP evaluation was found to be a good choice when fault
> > tolerance is not critical, data (both input and intermediate) can fit
> > in the cluster memory, and data processing requires complex/repetitive
> > steps.
> >
> > The research results of this ongoing work have already been published
> > in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors
> > have already received positive feedback from researchers in academia
> > and industry who were attending these conferences.
> >
> > = Rationale =
> >
> > * MRQL will be the first general-purpose, SQL-like query language for
> > data analysis based on BSP.
> > Currently, many programmers prefer to code their MapReduce
> > applications in a higher-level query language, rather than an
> > algorithmic language. For instance, Pig is used for 60% of Yahoo
> > MapReduce jobs, while Hive is used for 90% of Facebook MapReduce
> > jobs. This, we believe, will also be the trend for BSP applications,
> > because, even though, in principle, the BSP model is v

Re: [PROPOSAL] MRQL for the Apache Incubator

2013-03-06 Thread Mohammad Nour El-Din
I added myself as a mentor. Welcome aboard.


On Wed, Mar 6, 2013 at 9:02 AM, Edward J. Yoon wrote:

> I think it's time to call for vote.
>
> On Mon, Mar 4, 2013 at 9:25 PM, Tommaso Teofili
>  wrote:
> > Nice proposal indeed, I'd say having 3 mentors is usually better to avoid
> > release headaches.
> > Regards,
> > Tommaso
> >
> >
> > 2013/3/4 Edward J. Yoon 
> >
> >> Sure I can. :)
> >>
> >> Of course, we'll welcome more mentors from incubator IPMC if there're
> >> volunteers.
> >>
> >> On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu 
> >> wrote:
> >> > On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz <
> >> bdelacre...@apache.org
> >> >> wrote:
> >> >
> >> >> On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras <
> fega...@cse.uta.edu>
> >> >> wrote:
> >> >> > == Champion ==
> >> >> > * Edward J. Yoon 
> >> >> > == Nominated Mentors ==
> >> >> > * Alex Karasulu 
> >> >> >...
> >> >>
> >> >> Is Edward going to stay on as a mentor as well?
> >> >>
> >> >> Two (active) mentors is the bare minimum IMO.
> >> >>
> >> >>
> >> > I suspect so but let's hear from Edward himself.
> >> >
> >> > Best Regards,
> >> > -- Alex
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
> >> -
> >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >> For additional commands, e-mail: general-h...@incubator.apache.org
> >>
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


-- 
Thanks
- Mohammad Nour

"Life is like riding a bicycle. To keep your balance you must keep moving"
- Albert Einstein


Re: [VOTE] Accept MRQL into the Incubator

2013-03-06 Thread Alex Karasulu
+1 (binding)


On Wed, Mar 6, 2013 at 7:04 PM, Leonidas Fegaras wrote:

> Dear ASF members,
> I would like to call for a VOTE for acceptance of MRQL into the Incubator.
> The vote will close on Monday March 11, 2013.
>
> [ ] +1 Accept MRQL into the Apache incubator
> [ ] +0 Don't care.
> [ ] -1 Don't accept MRQL into the incubator because...
>
> Full proposal is pasted below and the corresponding wiki is
>
> http://wiki.apache.org/**incubator/MRQLProposal
>
> Only VOTEs from Incubator PMC members are binding,
> but all are welcome to express their thoughts.
> Sincerely,
> Leonidas Fegaras
>
>
> = Abstract =
>
> MRQL is a query processing and optimization system for large-scale,
> distributed data analysis, built on top of Apache Hadoop and Hama.
>
> = Proposal =
>
> MRQL (pronounced ''miracle'') is a query processing and optimization
> system for large-scale, distributed data analysis. MRQL (the MapReduce
> Query Language) is an SQL-like query language for large-scale data
> analysis on a cluster of computers. The MRQL query processing system
> can evaluate MRQL queries in two modes: in MapReduce mode on top of
> Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of
> Apache Hama. The MRQL query language is powerful enough to express
> most common data analysis tasks over many forms of raw ''in-situ''
> data, such as XML and JSON documents, binary files, and CSV
> documents. MRQL is more powerful than other current high-level
> MapReduce languages, such as Hive and PigLatin, since it can operate
> on more complex data and supports more powerful query constructs, thus
> eliminating the need for using explicit MapReduce code. With MRQL,
> users will be able to express complex data analysis tasks, such as
> PageRank, k-means clustering, matrix factorization, etc, using
> SQL-like queries exclusively, while the MRQL query processing system
> will be able to compile these queries to efficient Java code.
>
> = Background =
>
> The initial code was developed at the University of Texas of Arlington
> (UTA) by a research team, led by Leonidas Fegaras. The software was
> first released in May 2011. The original goal of this project was to
> build a query processing system that translates SQL-like data analysis
> queries to efficient workflows of MapReduce jobs. A design goal was to
> use HDFS as the physical storage layer, without any indexing, data
> partitioning, or data normalization, and to use Hadoop (without
> extensions) as the run-time engine. The motivation behind this work
> was to build a platform to test new ideas on query processing and
> optimization techniques applicable to the MapReduce framework.
>
> A year ago, MRQL was extended to run on Hama. The motivation for this
> extension was that Hadoop MapReduce jobs were required to read their
> input and write their output on HDFS. This simplifies reliability and
> fault tolerance but it imposes a high overhead to complex MapReduce
> workflows and graph algorithms, such as PageRank, which require
> repetitive jobs. In addition, Hadoop does not preserve data in memory
> across consecutive MapReduce jobs. This restriction requires to read
> data at every step, even when the data is constant. BSP, on the other
> hand, does not suffer from this restriction, and, under certain
> circumstances, allows complex repetitive algorithms to run entirely in
> the collective memory of a cluster. Thus, the goal was to be able to
> run the same MRQL queries in both modes, MapReduce and BSP, without
> modifying the queries: If there are enough resources available, and
> low latency and speed are more important than resilience, queries may
> run in BSP mode; otherwise, the same queries may run in MapReduce
> mode. BSP evaluation was found to be a good choice when fault
> tolerance is not critical, data (both input and intermediate) can fit
> in the cluster memory, and data processing requires complex/repetitive
> steps.
>
> The research results of this ongoing work have already been published
> in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors
> have already received positive feedback from researchers in academia
> and industry who were attending these conferences.
>
> = Rationale =
>
> * MRQL will be the first general-purpose, SQL-like query language for
> data analysis based on BSP.
> Currently, many programmers prefer to code their MapReduce
> applications in a higher-level query language, rather than an
> algorithmic language. For instance, Pig is used for 60% of Yahoo
> MapReduce jobs, while Hive is used for 90% of Facebook MapReduce
> jobs. This, we believe, will also be the trend for BSP applications,
> because, even though, in principle, the BSP model is very simple to
> understand, it is hard to develop, optimize, and maintain non-trivial
> BSP applications coded in a general-purpose programming
> language. Currently, there is no widely acceptable declarative BSP
> query language, alth

Re: [VOTE] Accept MRQL into the Incubator

2013-03-06 Thread Mohammad Nour El-Din
+1


On Wed, Mar 6, 2013 at 6:04 PM, Leonidas Fegaras wrote:

> Dear ASF members,
> I would like to call for a VOTE for acceptance of MRQL into the Incubator.
> The vote will close on Monday March 11, 2013.
>
> [ ] +1 Accept MRQL into the Apache incubator
> [ ] +0 Don't care.
> [ ] -1 Don't accept MRQL into the incubator because...
>
> Full proposal is pasted below and the corresponding wiki is
>
> http://wiki.apache.org/**incubator/MRQLProposal
>
> Only VOTEs from Incubator PMC members are binding,
> but all are welcome to express their thoughts.
> Sincerely,
> Leonidas Fegaras
>
>
> = Abstract =
>
> MRQL is a query processing and optimization system for large-scale,
> distributed data analysis, built on top of Apache Hadoop and Hama.
>
> = Proposal =
>
> MRQL (pronounced ''miracle'') is a query processing and optimization
> system for large-scale, distributed data analysis. MRQL (the MapReduce
> Query Language) is an SQL-like query language for large-scale data
> analysis on a cluster of computers. The MRQL query processing system
> can evaluate MRQL queries in two modes: in MapReduce mode on top of
> Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of
> Apache Hama. The MRQL query language is powerful enough to express
> most common data analysis tasks over many forms of raw ''in-situ''
> data, such as XML and JSON documents, binary files, and CSV
> documents. MRQL is more powerful than other current high-level
> MapReduce languages, such as Hive and PigLatin, since it can operate
> on more complex data and supports more powerful query constructs, thus
> eliminating the need for using explicit MapReduce code. With MRQL,
> users will be able to express complex data analysis tasks, such as
> PageRank, k-means clustering, matrix factorization, etc, using
> SQL-like queries exclusively, while the MRQL query processing system
> will be able to compile these queries to efficient Java code.
>
> = Background =
>
> The initial code was developed at the University of Texas of Arlington
> (UTA) by a research team, led by Leonidas Fegaras. The software was
> first released in May 2011. The original goal of this project was to
> build a query processing system that translates SQL-like data analysis
> queries to efficient workflows of MapReduce jobs. A design goal was to
> use HDFS as the physical storage layer, without any indexing, data
> partitioning, or data normalization, and to use Hadoop (without
> extensions) as the run-time engine. The motivation behind this work
> was to build a platform to test new ideas on query processing and
> optimization techniques applicable to the MapReduce framework.
>
> A year ago, MRQL was extended to run on Hama. The motivation for this
> extension was that Hadoop MapReduce jobs were required to read their
> input and write their output on HDFS. This simplifies reliability and
> fault tolerance but it imposes a high overhead to complex MapReduce
> workflows and graph algorithms, such as PageRank, which require
> repetitive jobs. In addition, Hadoop does not preserve data in memory
> across consecutive MapReduce jobs. This restriction requires to read
> data at every step, even when the data is constant. BSP, on the other
> hand, does not suffer from this restriction, and, under certain
> circumstances, allows complex repetitive algorithms to run entirely in
> the collective memory of a cluster. Thus, the goal was to be able to
> run the same MRQL queries in both modes, MapReduce and BSP, without
> modifying the queries: If there are enough resources available, and
> low latency and speed are more important than resilience, queries may
> run in BSP mode; otherwise, the same queries may run in MapReduce
> mode. BSP evaluation was found to be a good choice when fault
> tolerance is not critical, data (both input and intermediate) can fit
> in the cluster memory, and data processing requires complex/repetitive
> steps.
>
> The research results of this ongoing work have already been published
> in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors
> have already received positive feedback from researchers in academia
> and industry who were attending these conferences.
>
> = Rationale =
>
> * MRQL will be the first general-purpose, SQL-like query language for
> data analysis based on BSP.
> Currently, many programmers prefer to code their MapReduce
> applications in a higher-level query language, rather than an
> algorithmic language. For instance, Pig is used for 60% of Yahoo
> MapReduce jobs, while Hive is used for 90% of Facebook MapReduce
> jobs. This, we believe, will also be the trend for BSP applications,
> because, even though, in principle, the BSP model is very simple to
> understand, it is hard to develop, optimize, and maintain non-trivial
> BSP applications coded in a general-purpose programming
> language. Currently, there is no widely acceptable declarative BSP
> query language, although there

[VOTE] Accept MRQL into the Incubator

2013-03-06 Thread Leonidas Fegaras

Dear ASF members,
I would like to call for a VOTE for acceptance of MRQL into the  
Incubator.

The vote will close on Monday March 11, 2013.

[ ] +1 Accept MRQL into the Apache incubator
[ ] +0 Don't care.
[ ] -1 Don't accept MRQL into the incubator because...

Full proposal is pasted below and the corresponding wiki is

http://wiki.apache.org/incubator/MRQLProposal

Only VOTEs from Incubator PMC members are binding,
but all are welcome to express their thoughts.
Sincerely,
Leonidas Fegaras


= Abstract =

MRQL is a query processing and optimization system for large-scale,
distributed data analysis, built on top of Apache Hadoop and Hama.

= Proposal =

MRQL (pronounced ''miracle'') is a query processing and optimization
system for large-scale, distributed data analysis. MRQL (the MapReduce
Query Language) is an SQL-like query language for large-scale data
analysis on a cluster of computers. The MRQL query processing system
can evaluate MRQL queries in two modes: in MapReduce mode on top of
Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of
Apache Hama. The MRQL query language is powerful enough to express
most common data analysis tasks over many forms of raw ''in-situ''
data, such as XML and JSON documents, binary files, and CSV
documents. MRQL is more powerful than other current high-level
MapReduce languages, such as Hive and PigLatin, since it can operate
on more complex data and supports more powerful query constructs, thus
eliminating the need for using explicit MapReduce code. With MRQL,
users will be able to express complex data analysis tasks, such as
PageRank, k-means clustering, matrix factorization, etc, using
SQL-like queries exclusively, while the MRQL query processing system
will be able to compile these queries to efficient Java code.

= Background =

The initial code was developed at the University of Texas of Arlington
(UTA) by a research team, led by Leonidas Fegaras. The software was
first released in May 2011. The original goal of this project was to
build a query processing system that translates SQL-like data analysis
queries to efficient workflows of MapReduce jobs. A design goal was to
use HDFS as the physical storage layer, without any indexing, data
partitioning, or data normalization, and to use Hadoop (without
extensions) as the run-time engine. The motivation behind this work
was to build a platform to test new ideas on query processing and
optimization techniques applicable to the MapReduce framework.

A year ago, MRQL was extended to run on Hama. The motivation for this
extension was that Hadoop MapReduce jobs were required to read their
input and write their output on HDFS. This simplifies reliability and
fault tolerance but it imposes a high overhead to complex MapReduce
workflows and graph algorithms, such as PageRank, which require
repetitive jobs. In addition, Hadoop does not preserve data in memory
across consecutive MapReduce jobs. This restriction requires to read
data at every step, even when the data is constant. BSP, on the other
hand, does not suffer from this restriction, and, under certain
circumstances, allows complex repetitive algorithms to run entirely in
the collective memory of a cluster. Thus, the goal was to be able to
run the same MRQL queries in both modes, MapReduce and BSP, without
modifying the queries: If there are enough resources available, and
low latency and speed are more important than resilience, queries may
run in BSP mode; otherwise, the same queries may run in MapReduce
mode. BSP evaluation was found to be a good choice when fault
tolerance is not critical, data (both input and intermediate) can fit
in the cluster memory, and data processing requires complex/repetitive
steps.

The research results of this ongoing work have already been published
in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors
have already received positive feedback from researchers in academia
and industry who were attending these conferences.

= Rationale =

* MRQL will be the first general-purpose, SQL-like query language for
data analysis based on BSP.
Currently, many programmers prefer to code their MapReduce
applications in a higher-level query language, rather than an
algorithmic language. For instance, Pig is used for 60% of Yahoo
MapReduce jobs, while Hive is used for 90% of Facebook MapReduce
jobs. This, we believe, will also be the trend for BSP applications,
because, even though, in principle, the BSP model is very simple to
understand, it is hard to develop, optimize, and maintain non-trivial
BSP applications coded in a general-purpose programming
language. Currently, there is no widely acceptable declarative BSP
query language, although there are a few special-purpose BSP systems
for graph analysis, such as Google Pregel and Apache Giraph, for
machine learning, such as BSML, and for scientific data analysis.

* MRQL can capture many complex data analysis algorithms in
declarative form.
Existing MapReduce query language

Re: Suggested change to the ppmc guide

2013-03-06 Thread Chip Childers
On Wed, Mar 06, 2013 at 08:00:43AM -0800, Craig L Russell wrote:
> Hi Daniel,
> 
> On Mar 6, 2013, at 7:09 AM, Daniel Shahaf wrote:
> 
> >I believe the change represents current IPMC consensus but it'd be
> >nice
> >if the change documented the rationale for the policy as well (at
> >least
> >in the log message).
> 
> There is no change to the process, policy, or consensus. The only
> thing that is different is the emphasis on forwarding instead of cc
> or bcc.
> 
> The phrase "should be forwarded to the IPMC" is not ambiguous, but
> it's apparently easy to overlook.

Correct - it wasn't clear enough for my (apparently) think skull, although it 
was specific and accurate. Hopefully my patch will help others in the future.



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Suggested change to the ppmc guide

2013-03-06 Thread Craig L Russell

Hi Daniel,

On Mar 6, 2013, at 7:09 AM, Daniel Shahaf wrote:

I believe the change represents current IPMC consensus but it'd be  
nice
if the change documented the rationale for the policy as well (at  
least

in the log message).


There is no change to the process, policy, or consensus. The only  
thing that is different is the emphasis on forwarding instead of cc or  
bcc.


The phrase "should be forwarded to the IPMC" is not ambiguous, but  
it's apparently easy to overlook.


Craig



Daniel
(last week I ran into a 10 years old change that didn't have any
justification anywhere)

Chip Childers wrote on Wed, Mar 06, 2013 at 10:00:10 -0500:

Hi all,

After spamming the private@i.a.o list, and being asked to stop it,  
I'd

like to suggest the following changes to the PPMC guide:


Index: content/guides/ppmc.xml
===
--- content/guides/ppmc.xml (revision 1453351)
+++ content/guides/ppmc.xml (working copy)
@@ -168,7 +168,8 @@
  [VOTE] Joe Bob as committer. The [VOTE] message should be  
forwarded

  to the IPMC (mailto:priv...@incubator.apache.org";>
  priv...@incubator.apache.org) to notify them that the
-  vote is underway
+  vote is underway. Do not BCC or CC the IPMC on the VOTE  
thread.

+  Instead, forward the initial VOTE email.

  To be successful the vote requires at least three  
+1 votes

  from PPMC members, including at least one +1
@@ -179,7 +180,8 @@
  a message to the PPMC private alias, and forward it to the  
IPMC,

  with the subject line of [VOTE][RESULT] Joe Bob as committer.
  The message should include the usual vote tally, indicating  
which

-  mentor or IPMC member votes cause it to be valid.
+  mentor or IPMC member votes cause it to be valid. Do not
+  BCC or CC the IPMC on the results email.  Instead, forward it.
  

  
@@ -229,8 +231,9 @@
  [VOTE] Joe Bob PPMC membership. The [VOTE] message should be  
forwarded

  to the IPMC (mailto:priv...@incubator.apache.org";>
  priv...@incubator.apache.org) to notify them that the
-  vote is underway. If the vote is successful, the proposer  
should send

-  a message to the PPMC private alias, with
+  vote is underway. Do not CC or BCC the IPMC on this thread.   
Instead,
+  forward the initial VOTE email.  If the vote is successful,  
the proposer

+  should send a message to the PPMC private alias, with
  the subject line of [VOTE][RESULT] Joe Bob PPMC membership. The
  message id of the [VOTE][RESULT] message should be preserved  
for
  the message to the Incubator PMC after Joe Bob accepts. Now,  
Joe Bob


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Craig L Russell
Architect, Oracle
http://db.apache.org/jdo
408 276-5638 mailto:craig.russ...@oracle.com
P.S. A good JDO? O, Gasp!


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Suggested change to the ppmc guide

2013-03-06 Thread Chip Childers
On Wed, Mar 06, 2013 at 04:24:20PM +0100, Bertrand Delacretaz wrote:
> On Wed, Mar 6, 2013 at 4:00 PM, Chip Childers  
> wrote:
> > ...After spamming the private@i.a.o list, and being asked to stop it, I'd
> > like to suggest the following changes to the PPMC guide...
> 
> +1, and +1 to Daniels comment, you could point to the private@ thread
> where this was discussed, by Message-Id

Thanks.

Committed.  Please let me know if you believe I didn't provide enough
information in the commit message.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Suggested change to the ppmc guide

2013-03-06 Thread Bertrand Delacretaz
On Wed, Mar 6, 2013 at 4:00 PM, Chip Childers  wrote:
> ...After spamming the private@i.a.o list, and being asked to stop it, I'd
> like to suggest the following changes to the PPMC guide...

+1, and +1 to Daniels comment, you could point to the private@ thread
where this was discussed, by Message-Id

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Suggested change to the ppmc guide

2013-03-06 Thread Daniel Shahaf
I believe the change represents current IPMC consensus but it'd be nice
if the change documented the rationale for the policy as well (at least
in the log message).

Daniel
(last week I ran into a 10 years old change that didn't have any
justification anywhere)

Chip Childers wrote on Wed, Mar 06, 2013 at 10:00:10 -0500:
> Hi all,
> 
> After spamming the private@i.a.o list, and being asked to stop it, I'd
> like to suggest the following changes to the PPMC guide:
> 
> 
> Index: content/guides/ppmc.xml
> ===
> --- content/guides/ppmc.xml (revision 1453351)
> +++ content/guides/ppmc.xml (working copy)
> @@ -168,7 +168,8 @@
>[VOTE] Joe Bob as committer. The [VOTE] message should be forwarded
>to the IPMC (mailto:priv...@incubator.apache.org";>
>priv...@incubator.apache.org) to notify them that the
> -  vote is underway
> +  vote is underway. Do not BCC or CC the IPMC on the VOTE thread.
> +  Instead, forward the initial VOTE email.
>  
>To be successful the vote requires at least three +1 votes
>from PPMC members, including at least one +1
> @@ -179,7 +180,8 @@
>a message to the PPMC private alias, and forward it to the IPMC,
>with the subject line of [VOTE][RESULT] Joe Bob as committer.
>The message should include the usual vote tally, indicating which
> -  mentor or IPMC member votes cause it to be valid.
> +  mentor or IPMC member votes cause it to be valid. Do not
> +  BCC or CC the IPMC on the results email.  Instead, forward it.
> 
>  
>
> @@ -229,8 +231,9 @@
>[VOTE] Joe Bob PPMC membership. The [VOTE] message should be forwarded
>to the IPMC (mailto:priv...@incubator.apache.org";>
>priv...@incubator.apache.org) to notify them that the
> -  vote is underway. If the vote is successful, the proposer should send 
> -  a message to the PPMC private alias, with
> +  vote is underway. Do not CC or BCC the IPMC on this thread.  Instead,
> +  forward the initial VOTE email.  If the vote is successful, the 
> proposer 
> +  should send a message to the PPMC private alias, with
>the subject line of [VOTE][RESULT] Joe Bob PPMC membership. The
>message id of the [VOTE][RESULT] message should be preserved for
>the message to the Incubator PMC after Joe Bob accepts. Now, Joe Bob
> 
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
> 

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Suggested change to the ppmc guide

2013-03-06 Thread Chip Childers
Hi all,

After spamming the private@i.a.o list, and being asked to stop it, I'd
like to suggest the following changes to the PPMC guide:


Index: content/guides/ppmc.xml
===
--- content/guides/ppmc.xml (revision 1453351)
+++ content/guides/ppmc.xml (working copy)
@@ -168,7 +168,8 @@
   [VOTE] Joe Bob as committer. The [VOTE] message should be forwarded
   to the IPMC (mailto:priv...@incubator.apache.org";>
   priv...@incubator.apache.org) to notify them that the
-  vote is underway
+  vote is underway. Do not BCC or CC the IPMC on the VOTE thread.
+  Instead, forward the initial VOTE email.
 
   To be successful the vote requires at least three +1 votes
   from PPMC members, including at least one +1
@@ -179,7 +180,8 @@
   a message to the PPMC private alias, and forward it to the IPMC,
   with the subject line of [VOTE][RESULT] Joe Bob as committer.
   The message should include the usual vote tally, indicating which
-  mentor or IPMC member votes cause it to be valid.
+  mentor or IPMC member votes cause it to be valid. Do not
+  BCC or CC the IPMC on the results email.  Instead, forward it.

 
   
@@ -229,8 +231,9 @@
   [VOTE] Joe Bob PPMC membership. The [VOTE] message should be forwarded
   to the IPMC (mailto:priv...@incubator.apache.org";>
   priv...@incubator.apache.org) to notify them that the
-  vote is underway. If the vote is successful, the proposer should send 
-  a message to the PPMC private alias, with
+  vote is underway. Do not CC or BCC the IPMC on this thread.  Instead,
+  forward the initial VOTE email.  If the vote is successful, the proposer 
+  should send a message to the PPMC private alias, with
   the subject line of [VOTE][RESULT] Joe Bob PPMC membership. The
   message id of the [VOTE][RESULT] message should be preserved for
   the message to the Incubator PMC after Joe Bob accepts. Now, Joe Bob

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Permission to edit incubator wiki

2013-03-06 Thread Joachim Dreimann
Can someone grant me permissions to edit the incubator wiki?

My wiki id is jdreimann

Thanks!

Joe


Re: [VOTE] Accept Curator into the Incubator

2013-03-06 Thread Ioan Eugen Stan
+1 non binding

-- 
Ioan Eugen Stan

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Curator into the Incubator

2013-03-06 Thread Ioannis Canellos
+1 non-binding


Re: [PROPOSAL] MRQL for the Apache Incubator

2013-03-06 Thread Edward J. Yoon
I think it's time to call for vote.

On Mon, Mar 4, 2013 at 9:25 PM, Tommaso Teofili
 wrote:
> Nice proposal indeed, I'd say having 3 mentors is usually better to avoid
> release headaches.
> Regards,
> Tommaso
>
>
> 2013/3/4 Edward J. Yoon 
>
>> Sure I can. :)
>>
>> Of course, we'll welcome more mentors from incubator IPMC if there're
>> volunteers.
>>
>> On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu 
>> wrote:
>> > On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz <
>> bdelacre...@apache.org
>> >> wrote:
>> >
>> >> On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras 
>> >> wrote:
>> >> > == Champion ==
>> >> > * Edward J. Yoon 
>> >> > == Nominated Mentors ==
>> >> > * Alex Karasulu 
>> >> >...
>> >>
>> >> Is Edward going to stay on as a mentor as well?
>> >>
>> >> Two (active) mentors is the bare minimum IMO.
>> >>
>> >>
>> > I suspect so but let's hear from Edward himself.
>> >
>> > Best Regards,
>> > -- Alex
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org