Awesome, indeed! Looking forward to trying it out. :)
On Wed, Aug 27, 2014 at 10:52 PM, Sebastian Schelter <s...@apache.org> wrote: > Awesome! > > > 2014-08-27 13:49 GMT-07:00 Leonidas Fegaras <fega...@cse.uta.edu>: > > > Hello, > > I would like to let you know that Apache MRQL can now run queries on > Flink. > > MRQL is a query processing and optimization system for large-scale, > > distributed data analysis, built on top of Apache Hadoop/map-reduce, > > Hama, Spark, and now Flink. MRQL queries are SQL-like but not SQL. > > They can work on complex, user-defined data (such as JSON and XML) and > > can express complex queries (such as pagerank and matrix factorization). > > > > MRQL on Flink has been tested on local mode and on a small Yarn cluster. > > > > Here are the directions on how to build the latest MRQL snapshot: > > > > git clone https://git-wip-us.apache.org/repos/asf/incubator-mrql.git > mrql > > cd mrql > > mvn -Pyarn clean install > > > > To make it run on your cluster, edit conf/mrql-env.sh and set the > > Java, the Hadoop, and the Flink installation directories. > > > > Here is how to run PageRank. First, you need to generate a random > > graph and store it in a file using the MRQL query RMAT.mrql: > > > > bin/mrql.flink -local queries/RMAT.mrql 1000 10000 > > > > This will create a graph with 1K nodes and 10K edges using the RMAT > > algorithm, will remove duplicate edges, and will store the graph in > > the binary file graph.bin. Then, run PageRank on Flink mode using: > > > > bin/mrql.flink -local queries/pagerank.mrql > > > > To run MRQL/Flink on a Yarn cluster, first start the Flink container > > on Yarn by running the script yarn-session.sh, such as: > > > > ${FLINK_HOME}/bin/yarn-session.sh -n 8 > > > > This will print the name of the Flink JobManager, which can be used in: > > > > export FLINK_MASTER=name-of-the-Flink-JobManager > > bin/mrql.flink -dist -nodes 16 queries/RMAT.mrql 1000000 10000000 > > > > This will create a graph with 1M nodes and 10M edges using RMAT on 16 > > nodes (slaves). You can adjust these numbers to fit your cluster. > > Then, run PageRank using: > > > > bin/mrql.flink -dist -nodes 16 queries/pagerank.mrql > > > > The MRQL project page is at: http://mrql.incubator.apache.org/ > > > > Let me know if you have any questions. > > Leonidas Fegaras > > > > >