That's really cool!

I'm also curious about your experience with Flink. Did you find major
obstacles that you needed to overcome for the integration?
Is there some write-up / report available somewhere (maybe in JIRA) that
discusses the integration? Are you using Flink's full operator set or do
you compile everything into Map and Reduce?

Best, Fabian


2014-08-28 7:37 GMT+02:00 Aljoscha Krettek <aljos...@apache.org>:

> Very nice indeed! How well is this tested? Can it already run all the
> example queries you have? Can you say anything about the performance
> of the different underlying execution engines?
>
> On Thu, Aug 28, 2014 at 12:58 AM, Stephan Ewen <se...@apache.org> wrote:
> > Wow, that is impressive!
> >
> >
> > On Thu, Aug 28, 2014 at 12:06 AM, Ufuk Celebi <u...@apache.org> wrote:
> >
> >> Awesome, indeed! Looking forward to trying it out. :)
> >>
> >>
> >> On Wed, Aug 27, 2014 at 10:52 PM, Sebastian Schelter <s...@apache.org>
> >> wrote:
> >>
> >> > Awesome!
> >> >
> >> >
> >> > 2014-08-27 13:49 GMT-07:00 Leonidas Fegaras <fega...@cse.uta.edu>:
> >> >
> >> > > Hello,
> >> > > I would like to let you know that Apache MRQL can now run queries on
> >> > Flink.
> >> > > MRQL is a query processing and optimization system for large-scale,
> >> > > distributed data analysis, built on top of Apache Hadoop/map-reduce,
> >> > > Hama, Spark, and now Flink. MRQL queries are SQL-like but not SQL.
> >> > > They can work on complex, user-defined data (such as JSON and XML)
> and
> >> > > can express complex queries (such as pagerank and matrix
> >> factorization).
> >> > >
> >> > > MRQL on Flink has been tested on local mode and on a small Yarn
> >> cluster.
> >> > >
> >> > > Here are the directions on how to build the latest MRQL snapshot:
> >> > >
> >> > > git clone
> https://git-wip-us.apache.org/repos/asf/incubator-mrql.git
> >> > mrql
> >> > > cd mrql
> >> > > mvn -Pyarn clean install
> >> > >
> >> > > To make it run on your cluster, edit conf/mrql-env.sh and set the
> >> > > Java, the Hadoop, and the Flink installation directories.
> >> > >
> >> > > Here is how to run PageRank. First, you need to generate a random
> >> > > graph and store it in a file using the MRQL query RMAT.mrql:
> >> > >
> >> > > bin/mrql.flink -local queries/RMAT.mrql 1000 10000
> >> > >
> >> > > This will create a graph with 1K nodes and 10K edges using the RMAT
> >> > > algorithm, will remove duplicate edges, and will store the graph in
> >> > > the binary file graph.bin. Then, run PageRank on Flink mode using:
> >> > >
> >> > > bin/mrql.flink -local queries/pagerank.mrql
> >> > >
> >> > > To run MRQL/Flink on a Yarn cluster, first start the Flink container
> >> > > on Yarn by running the script yarn-session.sh, such as:
> >> > >
> >> > > ${FLINK_HOME}/bin/yarn-session.sh -n 8
> >> > >
> >> > > This will print the name of the Flink JobManager, which can be used
> in:
> >> > >
> >> > > export FLINK_MASTER=name-of-the-Flink-JobManager
> >> > > bin/mrql.flink -dist -nodes 16 queries/RMAT.mrql 1000000 10000000
> >> > >
> >> > > This will create a graph with 1M nodes and 10M edges using RMAT on
> 16
> >> > > nodes (slaves). You can adjust these numbers to fit your cluster.
> >> > > Then, run PageRank using:
> >> > >
> >> > > bin/mrql.flink -dist -nodes 16 queries/pagerank.mrql
> >> > >
> >> > > The MRQL project page is at: http://mrql.incubator.apache.org/
> >> > >
> >> > > Let me know if you have any questions.
> >> > > Leonidas Fegaras
> >> > >
> >> > >
> >> >
> >>
>

Reply via email to