subject:"\[PROPOSAL\] MRQL for the Apache Incubator"

Re: [PROPOSAL] MRQL for the Apache Incubator

2013-03-06 Thread Mohammad Nour El-Din

I added myself as a mentor. Welcome aboard.


On Wed, Mar 6, 2013 at 9:02 AM, Edward J. Yoon wrote:

> I think it's time to call for vote.
>
> On Mon, Mar 4, 2013 at 9:25 PM, Tommaso Teofili
>  wrote:
> > Nice proposal indeed, I'd say having 3 mentors is usually better to avoid
> > release headaches.
> > Regards,
> > Tommaso
> >
> >
> > 2013/3/4 Edward J. Yoon 
> >
> >> Sure I can. :)
> >>
> >> Of course, we'll welcome more mentors from incubator IPMC if there're
> >> volunteers.
> >>
> >> On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu 
> >> wrote:
> >> > On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz <
> >> bdelacre...@apache.org
> >> >> wrote:
> >> >
> >> >> On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras <
> fega...@cse.uta.edu>
> >> >> wrote:
> >> >> > == Champion ==
> >> >> > * Edward J. Yoon 
> >> >> > == Nominated Mentors ==
> >> >> > * Alex Karasulu 
> >> >> >...
> >> >>
> >> >> Is Edward going to stay on as a mentor as well?
> >> >>
> >> >> Two (active) mentors is the bare minimum IMO.
> >> >>
> >> >>
> >> > I suspect so but let's hear from Edward himself.
> >> >
> >> > Best Regards,
> >> > -- Alex
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
> >> -
> >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >> For additional commands, e-mail: general-h...@incubator.apache.org
> >>
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


-- 
Thanks
- Mohammad Nour

"Life is like riding a bicycle. To keep your balance you must keep moving"
- Albert Einstein

Re: [PROPOSAL] MRQL for the Apache Incubator

2013-03-06 Thread Edward J. Yoon

I think it's time to call for vote.

On Mon, Mar 4, 2013 at 9:25 PM, Tommaso Teofili
 wrote:
> Nice proposal indeed, I'd say having 3 mentors is usually better to avoid
> release headaches.
> Regards,
> Tommaso
>
>
> 2013/3/4 Edward J. Yoon 
>
>> Sure I can. :)
>>
>> Of course, we'll welcome more mentors from incubator IPMC if there're
>> volunteers.
>>
>> On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu 
>> wrote:
>> > On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz <
>> bdelacre...@apache.org
>> >> wrote:
>> >
>> >> On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras 
>> >> wrote:
>> >> > == Champion ==
>> >> > * Edward J. Yoon 
>> >> > == Nominated Mentors ==
>> >> > * Alex Karasulu 
>> >> >...
>> >>
>> >> Is Edward going to stay on as a mentor as well?
>> >>
>> >> Two (active) mentors is the bare minimum IMO.
>> >>
>> >>
>> > I suspect so but let's hear from Edward himself.
>> >
>> > Best Regards,
>> > -- Alex
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [PROPOSAL] MRQL for the Apache Incubator

2013-03-04 Thread Tommaso Teofili

Nice proposal indeed, I'd say having 3 mentors is usually better to avoid
release headaches.
Regards,
Tommaso


2013/3/4 Edward J. Yoon 

> Sure I can. :)
>
> Of course, we'll welcome more mentors from incubator IPMC if there're
> volunteers.
>
> On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu 
> wrote:
> > On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz <
> bdelacre...@apache.org
> >> wrote:
> >
> >> On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras 
> >> wrote:
> >> > == Champion ==
> >> > * Edward J. Yoon 
> >> > == Nominated Mentors ==
> >> > * Alex Karasulu 
> >> >...
> >>
> >> Is Edward going to stay on as a mentor as well?
> >>
> >> Two (active) mentors is the bare minimum IMO.
> >>
> >>
> > I suspect so but let's hear from Edward himself.
> >
> > Best Regards,
> > -- Alex
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>

Re: [PROPOSAL] MRQL for the Apache Incubator

2013-03-04 Thread Edward J. Yoon

Sure I can. :)

Of course, we'll welcome more mentors from incubator IPMC if there're
volunteers.

On Mon, Mar 4, 2013 at 7:34 PM, Alex Karasulu  wrote:
> On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz > wrote:
>
>> On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras 
>> wrote:
>> > == Champion ==
>> > * Edward J. Yoon 
>> > == Nominated Mentors ==
>> > * Alex Karasulu 
>> >...
>>
>> Is Edward going to stay on as a mentor as well?
>>
>> Two (active) mentors is the bare minimum IMO.
>>
>>
> I suspect so but let's hear from Edward himself.
>
> Best Regards,
> -- Alex



-- 
Best Regards, Edward J. Yoon
@eddieyoon

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [PROPOSAL] MRQL for the Apache Incubator

2013-03-04 Thread Alex Karasulu

On Mon, Mar 4, 2013 at 12:31 PM, Bertrand Delacretaz  wrote:

> On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras 
> wrote:
> > == Champion ==
> > * Edward J. Yoon 
> > == Nominated Mentors ==
> > * Alex Karasulu 
> >...
>
> Is Edward going to stay on as a mentor as well?
>
> Two (active) mentors is the bare minimum IMO.
>
>
I suspect so but let's hear from Edward himself.

Best Regards,
-- Alex

Re: [PROPOSAL] MRQL for the Apache Incubator

2013-03-04 Thread Bertrand Delacretaz

On Sat, Mar 2, 2013 at 7:12 AM, Leonidas Fegaras  wrote:
> == Champion ==
> * Edward J. Yoon 
> == Nominated Mentors ==
> * Alex Karasulu 
>...

Is Edward going to stay on as a mentor as well?

Two (active) mentors is the bare minimum IMO.

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [PROPOSAL] MRQL for the Apache Incubator

2013-03-02 Thread Mattmann, Chris A (388J)

Sounds awesome guys look forward to the VOTE.

Cheers,
Chris

On 3/2/13 7:12 AM, "Leonidas Fegaras"  wrote:

>Dear ASF members,
>
>We would like to propose a new project to the incubator, called MRQL.
>Edward J. Yoon has volunteered to be the champion for this project.
>The proposal draft is available at:
>
>http://wiki.apache.org/incubator/MRQLProposal
>
>We are very excited about having this opportunity to work with ASF to
>create an incubator project. We are looking forward to your feedback
>and suggestions.
>Best regards
>Leonidas Fegaras
>
>
>= Abstract =
>
>MRQL is a query processing and optimization system for large-scale,
>distributed data analysis, built on top of Apache Hadoop and Hama.
>
>= Proposal =
>
>MRQL (pronounced ''miracle'') is a query processing and optimization
>system for large-scale, distributed data analysis. MRQL (the MapReduce
>Query Language) is an SQL-like query language for large-scale data
>analysis on a cluster of computers. The MRQL query processing system
>can evaluate MRQL queries in two modes: in MapReduce mode on top of
>Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of
>Apache Hama. The MRQL query language is powerful enough to express
>most common data analysis tasks over many forms of raw ''in-situ''
>data, such as XML and JSON documents, binary files, and CSV
>documents. MRQL is more powerful than other current high-level
>MapReduce languages, such as Hive and PigLatin, since it can operate
>on more complex data and supports more powerful query constructs, thus
>eliminating the need for using explicit MapReduce code. With MRQL,
>users will be able to express complex data analysis tasks, such as
>PageRank, k-means clustering, matrix factorization, etc, using
>SQL-like queries exclusively, while the MRQL query processing system
>will be able to compile these queries to efficient Java code.
>
>= Background =
>
>The initial code was developed at the University of Texas of Arlington
>(UTA) by a research team, led by Leonidas Fegaras. The software was
>first released in May 2011. The original goal of this project was to
>build a query processing system that translates SQL-like data analysis
>queries to efficient workflows of MapReduce jobs. A design goal was to
>use HDFS as the physical storage layer, without any indexing, data
>partitioning, or data normalization, and to use Hadoop (without
>extensions) as the run-time engine. The motivation behind this work
>was to built a platform to test new ideas on query processing and
>optimization techniques applicable to the MapReduce framework.
>
>A year ago, MRQL was extended to run on Hama. The motivation for this
>extension was that Hadoop MapReduce jobs were required to read their
>input and write their output on HDFS. This simplifies reliability and
>fault tolerance but it imposes a high overhead to complex MapReduce
>workflows and graph algorithms, such as PageRank, which require
>repetitive jobs. In addition, Hadoop does not preserve data in memory
>across consecutive MapReduce jobs. This restriction requires to read
>data at every step, even when the data is constant. BSP, on the other
>hand, does not suffer from this restriction, and, under certain
>circumstances, allows complex repetitive algorithms to run entirely in
>the collective memory of a cluster. Thus, the goal was to be able to
>run the same MRQL queries in both modes, MapReduce and BSP, without
>modifying the queries: If there are enough resources available, and
>low latency and speed are more important than resilience, queries may
>run in BSP mode; otherwise, the same queries may run in MapReduce
>mode. BSP evaluation was found to be a good choice when fault
>tolerance is not critical, data (both input and intermediate) can fit
>in the cluster memory, and data processing requires complex/repetitive
>steps.
>
>The research results of this ongoing work have already been published
>in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors
>have already received positive feedback from researchers in academia
>and industry who were attending these conferences.
>
>= Rationale =
>
>* MRQL will be the first general-purpose, SQL-like query language for
>data analysis based on BSP.
>Currently, many programmers prefer to code their MapReduce
>applications in a higher-level query language, rather than an
>algorithmic language. For instance, Pig is used for 60% of Yahoo
>MapReduce jobs, while Hive is used for 90% of Facebook MapReduce
>jobs. This, we believe, will also be the trend for BSP applications,
>because, even though, in principle, the BSP model is very simple to
>understand, it is hard to develop, optimize, and maintain non-trivial
>BSP applications coded in a general-purpose programming
>language. Currently, there is no widely acceptable declarative BSP
>query language, although there are a few special-purpose BSP systems
>for graph analysis, such as Google Pregel and Apache Giraph, for
>machine learning, such as BSML, and for scient

[PROPOSAL] MRQL for the Apache Incubator

2013-03-02 Thread Leonidas Fegaras


Dear ASF members,

We would like to propose a new project to the incubator, called MRQL.
Edward J. Yoon has volunteered to be the champion for this project.
The proposal draft is available at:

http://wiki.apache.org/incubator/MRQLProposal

We are very excited about having this opportunity to work with ASF to
create an incubator project. We are looking forward to your feedback
and suggestions.
Best regards
Leonidas Fegaras


= Abstract =

MRQL is a query processing and optimization system for large-scale,
distributed data analysis, built on top of Apache Hadoop and Hama.

= Proposal =

MRQL (pronounced ''miracle'') is a query processing and optimization
system for large-scale, distributed data analysis. MRQL (the MapReduce
Query Language) is an SQL-like query language for large-scale data
analysis on a cluster of computers. The MRQL query processing system
can evaluate MRQL queries in two modes: in MapReduce mode on top of
Apache Hadoop or in Bulk Synchronous Parallel (BSP) mode on top of
Apache Hama. The MRQL query language is powerful enough to express
most common data analysis tasks over many forms of raw ''in-situ''
data, such as XML and JSON documents, binary files, and CSV
documents. MRQL is more powerful than other current high-level
MapReduce languages, such as Hive and PigLatin, since it can operate
on more complex data and supports more powerful query constructs, thus
eliminating the need for using explicit MapReduce code. With MRQL,
users will be able to express complex data analysis tasks, such as
PageRank, k-means clustering, matrix factorization, etc, using
SQL-like queries exclusively, while the MRQL query processing system
will be able to compile these queries to efficient Java code.

= Background =

The initial code was developed at the University of Texas of Arlington
(UTA) by a research team, led by Leonidas Fegaras. The software was
first released in May 2011. The original goal of this project was to
build a query processing system that translates SQL-like data analysis
queries to efficient workflows of MapReduce jobs. A design goal was to
use HDFS as the physical storage layer, without any indexing, data
partitioning, or data normalization, and to use Hadoop (without
extensions) as the run-time engine. The motivation behind this work
was to built a platform to test new ideas on query processing and
optimization techniques applicable to the MapReduce framework.

A year ago, MRQL was extended to run on Hama. The motivation for this
extension was that Hadoop MapReduce jobs were required to read their
input and write their output on HDFS. This simplifies reliability and
fault tolerance but it imposes a high overhead to complex MapReduce
workflows and graph algorithms, such as PageRank, which require
repetitive jobs. In addition, Hadoop does not preserve data in memory
across consecutive MapReduce jobs. This restriction requires to read
data at every step, even when the data is constant. BSP, on the other
hand, does not suffer from this restriction, and, under certain
circumstances, allows complex repetitive algorithms to run entirely in
the collective memory of a cluster. Thus, the goal was to be able to
run the same MRQL queries in both modes, MapReduce and BSP, without
modifying the queries: If there are enough resources available, and
low latency and speed are more important than resilience, queries may
run in BSP mode; otherwise, the same queries may run in MapReduce
mode. BSP evaluation was found to be a good choice when fault
tolerance is not critical, data (both input and intermediate) can fit
in the cluster memory, and data processing requires complex/repetitive
steps.

The research results of this ongoing work have already been published
in conferences (WebDB'11, EDBT'12, and DataCloud'12) and the authors
have already received positive feedback from researchers in academia
and industry who were attending these conferences.

= Rationale =

* MRQL will be the first general-purpose, SQL-like query language for
data analysis based on BSP.
Currently, many programmers prefer to code their MapReduce
applications in a higher-level query language, rather than an
algorithmic language. For instance, Pig is used for 60% of Yahoo
MapReduce jobs, while Hive is used for 90% of Facebook MapReduce
jobs. This, we believe, will also be the trend for BSP applications,
because, even though, in principle, the BSP model is very simple to
understand, it is hard to develop, optimize, and maintain non-trivial
BSP applications coded in a general-purpose programming
language. Currently, there is no widely acceptable declarative BSP
query language, although there are a few special-purpose BSP systems
for graph analysis, such as Google Pregel and Apache Giraph, for
machine learning, such as BSML, and for scientific data analysis.

* MRQL can capture many complex data analysis algorithms in
declarative form.
Existing MapReduce query languages, such as HiveQL and PigLatin,
provide a limited syntax for operating

Re: [PROPOSAL] MRQL for the Apache Incubator

Re: [PROPOSAL] MRQL for the Apache Incubator

Re: [PROPOSAL] MRQL for the Apache Incubator

Re: [PROPOSAL] MRQL for the Apache Incubator

Re: [PROPOSAL] MRQL for the Apache Incubator

Re: [PROPOSAL] MRQL for the Apache Incubator

Re: [PROPOSAL] MRQL for the Apache Incubator

[PROPOSAL] MRQL for the Apache Incubator

8 matches

Site Navigation

Mail list logo

Footer information