Re: [PROPOSAL] MRQL for the Apache Incubator
I added myself as a mentor. Welcome aboard. On Wed, Mar 6, 2013 at 9:02 AM, Edward J. Yoon wrote: > I think it's time to call for vote. > > On Mon, Mar 4, 2013 at 9:25 PM, Tommaso Teofili > wrote: > > Nice proposal indeed, I'd say having 3 mentors is usually better to avoid > > release headaches. > > Regards, > > Tommaso > > > > > > 2013/3/4 Edward J. Yoon > > > >> Sure I can. :) > >> > >> Of course, we'll welcome more mentors from incubator IPMC if there're > >> volunteers. > >> > >> On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu > >> wrote: > >> > On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz < > >> bdelacre...@apache.org > >> >> wrote: > >> > > >> >> On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras < > fega...@cse.uta.edu> > >> >> wrote: > >> >> > == Champion == > >> >> > * Edward J. Yoon > >> >> > == Nominated Mentors == > >> >> > * Alex Karasulu > >> >> >... > >> >> > >> >> Is Edward going to stay on as a mentor as well? > >> >> > >> >> Two (active) mentors is the bare minimum IMO. > >> >> > >> >> > >> > I suspect so but let's hear from Edward himself. > >> > > >> > Best Regards, > >> > -- Alex > >> > >> > >> > >> -- > >> Best Regards, Edward J. Yoon > >> @eddieyoon > >> > >> - > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >> For additional commands, e-mail: general-h...@incubator.apache.org > >> > >> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > -- Thanks - Mohammad Nour "Life is like riding a bicycle. To keep your balance you must keep moving" - Albert Einstein
Re: [PROPOSAL] MRQL for the Apache Incubator
I think it's time to call for vote. On Mon, Mar 4, 2013 at 9:25 PM, Tommaso Teofili wrote: > Nice proposal indeed, I'd say having 3 mentors is usually better to avoid > release headaches. > Regards, > Tommaso > > > 2013/3/4 Edward J. Yoon > >> Sure I can. :) >> >> Of course, we'll welcome more mentors from incubator IPMC if there're >> volunteers. >> >> On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu >> wrote: >> > On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz < >> bdelacre...@apache.org >> >> wrote: >> > >> >> On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras >> >> wrote: >> >> > == Champion == >> >> > * Edward J. Yoon >> >> > == Nominated Mentors == >> >> > * Alex Karasulu >> >> >... >> >> >> >> Is Edward going to stay on as a mentor as well? >> >> >> >> Two (active) mentors is the bare minimum IMO. >> >> >> >> >> > I suspect so but let's hear from Edward himself. >> > >> > Best Regards, >> > -- Alex >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >> >> - >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> -- Best Regards, Edward J. Yoon @eddieyoon - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] MRQL for the Apache Incubator
Nice proposal indeed, I'd say having 3 mentors is usually better to avoid release headaches. Regards, Tommaso 2013/3/4 Edward J. Yoon > Sure I can. :) > > Of course, we'll welcome more mentors from incubator IPMC if there're > volunteers. > > On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu > wrote: > > On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz < > bdelacre...@apache.org > >> wrote: > > > >> On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras > >> wrote: > >> > == Champion == > >> > * Edward J. Yoon > >> > == Nominated Mentors == > >> > * Alex Karasulu > >> >... > >> > >> Is Edward going to stay on as a mentor as well? > >> > >> Two (active) mentors is the bare minimum IMO. > >> > >> > > I suspect so but let's hear from Edward himself. > > > > Best Regards, > > -- Alex > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >
Re: [PROPOSAL] MRQL for the Apache Incubator
Sure I can. :) Of course, we'll welcome more mentors from incubator IPMC if there're volunteers. On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu wrote: > On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz > wrote: > >> On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras >> wrote: >> > == Champion == >> > * Edward J. Yoon >> > == Nominated Mentors == >> > * Alex Karasulu >> >... >> >> Is Edward going to stay on as a mentor as well? >> >> Two (active) mentors is the bare minimum IMO. >> >> > I suspect so but let's hear from Edward himself. > > Best Regards, > -- Alex -- Best Regards, Edward J. Yoon @eddieyoon - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] MRQL for the Apache Incubator
On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz wrote: > On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras > wrote: > > == Champion == > > * Edward J. Yoon > > == Nominated Mentors == > > * Alex Karasulu > >... > > Is Edward going to stay on as a mentor as well? > > Two (active) mentors is the bare minimum IMO. > > I suspect so but let's hear from Edward himself. Best Regards, -- Alex
Re: [PROPOSAL] MRQL for the Apache Incubator
On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras wrote: > == Champion == > * Edward J. Yoon > == Nominated Mentors == > * Alex Karasulu >... Is Edward going to stay on as a mentor as well? Two (active) mentors is the bare minimum IMO. -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] MRQL for the Apache Incubator
Sounds awesome guys look forward to the VOTE. Cheers, Chris On 3/2/13 7:12 AM, "Leonidas Fegaras" wrote: >Dear ASF members, > >We would like to propose a new project to the incubator, called MRQL. >Edward J. Yoon has volunteered to be the champion for this project. >The proposal draft is available at: > >http://wiki.apache.org/incubator/MRQLProposal > >We are very excited about having this opportunity to work with ASF to >create an incubator project. We are looking forward to your feedback >and suggestions. >Best regards >Leonidas Fegaras > > >= Abstract = > >MRQL is a query processing and optimization system for large-scale, >distributed data analysis, built on top of Apache Hadoop and Hama. > >= Proposal = > >MRQL (pronounced ''miracle'') is a query processing and optimization >system for large-scale, distributed data analysis. MRQL (the MapReduce >Query Language) is an SQL-like query language for large-scale data >analysis on a cluster of computers. The MRQL query processing system >can evaluate MRQL queries in two modes: in MapReduce mode on top of >Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of >Apache Hama. The MRQL query language is powerful enough to express >most common data analysis tasks over many forms of raw ''in-situ'' >data, such as XML and JSON documents, binary files, and CSV >documents. MRQL is more powerful than other current high-level >MapReduce languages, such as Hive and PigLatin, since it can operate >on more complex data and supports more powerful query constructs, thus >eliminating the need for using explicit MapReduce code. With MRQL, >users will be able to express complex data analysis tasks, such as >PageRank, k-means clustering, matrix factorization, etc, using >SQL-like queries exclusively, while the MRQL query processing system >will be able to compile these queries to efficient Java code. > >= Background = > >The initial code was developed at the University of Texas of Arlington >(UTA) by a research team, led by Leonidas Fegaras. The software was >first released in May 2011. The original goal of this project was to >build a query processing system that translates SQL-like data analysis >queries to efficient workflows of MapReduce jobs. A design goal was to >use HDFS as the physical storage layer, without any indexing, data >partitioning, or data normalization, and to use Hadoop (without >extensions) as the run-time engine. The motivation behind this work >was to built a platform to test new ideas on query processing and >optimization techniques applicable to the MapReduce framework. > >A year ago, MRQL was extended to run on Hama. The motivation for this >extension was that Hadoop MapReduce jobs were required to read their >input and write their output on HDFS. This simplifies reliability and >fault tolerance but it imposes a high overhead to complex MapReduce >workflows and graph algorithms, such as PageRank, which require >repetitive jobs. In addition, Hadoop does not preserve data in memory >across consecutive MapReduce jobs. This restriction requires to read >data at every step, even when the data is constant. BSP, on the other >hand, does not suffer from this restriction, and, under certain >circumstances, allows complex repetitive algorithms to run entirely in >the collective memory of a cluster. Thus, the goal was to be able to >run the same MRQL queries in both modes, MapReduce and BSP, without >modifying the queries: If there are enough resources available, and >low latency and speed are more important than resilience, queries may >run in BSP mode; otherwise, the same queries may run in MapReduce >mode. BSP evaluation was found to be a good choice when fault >tolerance is not critical, data (both input and intermediate) can fit >in the cluster memory, and data processing requires complex/repetitive >steps. > >The research results of this ongoing work have already been published >in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors >have already received positive feedback from researchers in academia >and industry who were attending these conferences. > >= Rationale = > >* MRQL will be the first general-purpose, SQL-like query language for >data analysis based on BSP. >Currently, many programmers prefer to code their MapReduce >applications in a higher-level query language, rather than an >algorithmic language. For instance, Pig is used for 60% of Yahoo >MapReduce jobs, while Hive is used for 90% of Facebook MapReduce >jobs. This, we believe, will also be the trend for BSP applications, >because, even though, in principle, the BSP model is very simple to >understand, it is hard to develop, optimize, and maintain non-trivial >BSP applications coded in a general-purpose programming >language. Currently, there is no widely acceptable declarative BSP >query language, although there are a few special-purpose BSP systems >for graph analysis, such as Google Pregel and Apache Giraph, for >machine learning, such as BSML, and for scient
[PROPOSAL] MRQL for the Apache Incubator
Dear ASF members, We would like to propose a new project to the incubator, called MRQL. Edward J. Yoon has volunteered to be the champion for this project. The proposal draft is available at: http://wiki.apache.org/incubator/MRQLProposal We are very excited about having this opportunity to work with ASF to create an incubator project. We are looking forward to your feedback and suggestions. Best regards Leonidas Fegaras = Abstract = MRQL is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop and Hama. = Proposal = MRQL (pronounced ''miracle'') is a query processing and optimization system for large-scale, distributed data analysis. MRQL (the MapReduce Query Language) is an SQL-like query language for large-scale data analysis on a cluster of computers. The MRQL query processing system can evaluate MRQL queries in two modes: in MapReduce mode on top of Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of Apache Hama. The MRQL query language is powerful enough to express most common data analysis tasks over many forms of raw ''in-situ'' data, such as XML and JSON documents, binary files, and CSV documents. MRQL is more powerful than other current high-level MapReduce languages, such as Hive and PigLatin, since it can operate on more complex data and supports more powerful query constructs, thus eliminating the need for using explicit MapReduce code. With MRQL, users will be able to express complex data analysis tasks, such as PageRank, k-means clustering, matrix factorization, etc, using SQL-like queries exclusively, while the MRQL query processing system will be able to compile these queries to efficient Java code. = Background = The initial code was developed at the University of Texas of Arlington (UTA) by a research team, led by Leonidas Fegaras. The software was first released in May 2011. The original goal of this project was to build a query processing system that translates SQL-like data analysis queries to efficient workflows of MapReduce jobs. A design goal was to use HDFS as the physical storage layer, without any indexing, data partitioning, or data normalization, and to use Hadoop (without extensions) as the run-time engine. The motivation behind this work was to built a platform to test new ideas on query processing and optimization techniques applicable to the MapReduce framework. A year ago, MRQL was extended to run on Hama. The motivation for this extension was that Hadoop MapReduce jobs were required to read their input and write their output on HDFS. This simplifies reliability and fault tolerance but it imposes a high overhead to complex MapReduce workflows and graph algorithms, such as PageRank, which require repetitive jobs. In addition, Hadoop does not preserve data in memory across consecutive MapReduce jobs. This restriction requires to read data at every step, even when the data is constant. BSP, on the other hand, does not suffer from this restriction, and, under certain circumstances, allows complex repetitive algorithms to run entirely in the collective memory of a cluster. Thus, the goal was to be able to run the same MRQL queries in both modes, MapReduce and BSP, without modifying the queries: If there are enough resources available, and low latency and speed are more important than resilience, queries may run in BSP mode; otherwise, the same queries may run in MapReduce mode. BSP evaluation was found to be a good choice when fault tolerance is not critical, data (both input and intermediate) can fit in the cluster memory, and data processing requires complex/repetitive steps. The research results of this ongoing work have already been published in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors have already received positive feedback from researchers in academia and industry who were attending these conferences. = Rationale = * MRQL will be the first general-purpose, SQL-like query language for data analysis based on BSP. Currently, many programmers prefer to code their MapReduce applications in a higher-level query language, rather than an algorithmic language. For instance, Pig is used for 60% of Yahoo MapReduce jobs, while Hive is used for 90% of Facebook MapReduce jobs. This, we believe, will also be the trend for BSP applications, because, even though, in principle, the BSP model is very simple to understand, it is hard to develop, optimize, and maintain non-trivial BSP applications coded in a general-purpose programming language. Currently, there is no widely acceptable declarative BSP query language, although there are a few special-purpose BSP systems for graph analysis, such as Google Pregel and Apache Giraph, for machine learning, such as BSML, and for scientific data analysis. * MRQL can capture many complex data analysis algorithms in declarative form. Existing MapReduce query languages, such as HiveQL and PigLatin, provide a limited syntax for operating