Re: [DISCUSS] the future direction of MRQL

Edward J. Yoon Wed, 11 Dec 2013 16:50:55 -0800

Thanks for your clarification.

The only thing I worry about is that most people perceives MRQL as
another SQL on Hadoop. For example, look at this first sentence on our
website, 'MRQL is a query processing and optimization system for
large-scale, distributed data analysis, built on top of Apache Hadoop,
Hama, and Spark'. I think, we need to push the scientific and complex
side of our project.


> big data analysis but soon this may change. I think people will start
> using fault-tolerant in-memory distributed systems for data analysis,

Agree. BTW, should we support MapReduce continuously in the future?

> difference from others. The fact that it can run on multiple platforms
> is a big plus, but is secondary. Currently, most people use Hadoop for

Flexible and Extensible architecture of execution engine can be a
plus. However, that model should be based on contributions from
diverse assistants. Since we're very early stage, I think we should
focus on main execution engine. That's why I think It's meaningless
(at this time).

WDYT?

On Thu, Dec 12, 2013 at 12:15 AM, Leonidas Fegaras <[email protected]> wrote:
> Thanks Edward,
> Our biggest concern is that there is no activity in the user@mrql
> list. Does this mean that there no one using MRQL or that nobody
> posts any messages? Is there a way to get the number of people
> registered in this list? Can we also get the number of times MRQL has
> been downloaded from Apache mirrors after its first release? It was
> hoped that after the first release people will start downloading MRQL
> and will register at user@mrql list to ask questions, report bugs, ask
> for new features, etc. It hasn't happened yet. Maybe it's too soon.
>
> There are other query languages for big data analysis in ASF. All
> except MRQL are SQL-based data warehousing systems for Hadoop
> (eg, Hive and Tajo). MRQL is a query system for complex data analysis,
> including machine learning and scientific computing. This is the main
> difference from others. The fact that it can run on multiple platforms
> is a big plus, but is secondary. Currently, most people use Hadoop for
> big data analysis but soon this may change. I think people will start
> using fault-tolerant in-memory distributed systems for data analysis,
> such as Spark. Hama too may play a big role. So supporting multiple
> platforms will allow users to deploy applications using MRQL very fast
> and experiment with all these platforms without having to change the
> query. The whole idea of expressing distributed applications using an
> SQL-like query system is rapid and easy prototyping, without
> sacrificing performance. So performance is a very important factor.
> If MRQL is slow, nobody will use it. I think in this area, we are doing
> an excellent job because of the very advanced optimizer that allows
> operations such as matrix multiplication to be done using very fast
> algorithms.
>
> Leonidas Fegaras
>
>
>
> On 12/10/2013 08:33 PM, Edward J. Yoon wrote:
>>
>> All,
>>
>> Since there are too many similar projects, I'd like to suggest that we
>> change the future direction of MRQL to a powerful *analytics* query
>> language on top of Hadoop beyond ETL processing. In my eyes,
>> supporting multi-platforms (MapReduce, Hama, Spark, ...,etc) also
>> seems pointless. WDYT?
>>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: [DISCUSS] the future direction of MRQL

Reply via email to