Thanks for your clarification. The only thing I worry about is that most people perceives MRQL as another SQL on Hadoop. For example, look at this first sentence on our website, 'MRQL is a query processing and optimization system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, and Spark'. I think, we need to push the scientific and complex side of our project.
> big data analysis but soon this may change. I think people will start > using fault-tolerant in-memory distributed systems for data analysis, Agree. BTW, should we support MapReduce continuously in the future? > difference from others. The fact that it can run on multiple platforms > is a big plus, but is secondary. Currently, most people use Hadoop for Flexible and Extensible architecture of execution engine can be a plus. However, that model should be based on contributions from diverse assistants. Since we're very early stage, I think we should focus on main execution engine. That's why I think It's meaningless (at this time). WDYT? On Thu, Dec 12, 2013 at 12:15 AM, Leonidas Fegaras <[email protected]> wrote: > Thanks Edward, > Our biggest concern is that there is no activity in the user@mrql > list. Does this mean that there no one using MRQL or that nobody > posts any messages? Is there a way to get the number of people > registered in this list? Can we also get the number of times MRQL has > been downloaded from Apache mirrors after its first release? It was > hoped that after the first release people will start downloading MRQL > and will register at user@mrql list to ask questions, report bugs, ask > for new features, etc. It hasn't happened yet. Maybe it's too soon. > > There are other query languages for big data analysis in ASF. All > except MRQL are SQL-based data warehousing systems for Hadoop > (eg, Hive and Tajo). MRQL is a query system for complex data analysis, > including machine learning and scientific computing. This is the main > difference from others. The fact that it can run on multiple platforms > is a big plus, but is secondary. Currently, most people use Hadoop for > big data analysis but soon this may change. I think people will start > using fault-tolerant in-memory distributed systems for data analysis, > such as Spark. Hama too may play a big role. So supporting multiple > platforms will allow users to deploy applications using MRQL very fast > and experiment with all these platforms without having to change the > query. The whole idea of expressing distributed applications using an > SQL-like query system is rapid and easy prototyping, without > sacrificing performance. So performance is a very important factor. > If MRQL is slow, nobody will use it. I think in this area, we are doing > an excellent job because of the very advanced optimizer that allows > operations such as matrix multiplication to be done using very fast > algorithms. > > Leonidas Fegaras > > > > On 12/10/2013 08:33 PM, Edward J. Yoon wrote: >> >> All, >> >> Since there are too many similar projects, I'd like to suggest that we >> change the future direction of MRQL to a powerful *analytics* query >> language on top of Hadoop beyond ETL processing. In my eyes, >> supporting multi-platforms (MapReduce, Hama, Spark, ...,etc) also >> seems pointless. WDYT? >> > -- Best Regards, Edward J. Yoon @eddieyoon
