Leonidas Fegaras created MRQL-12:
------------------------------------
Summary: Support query evaluation in Spark mode
Key: MRQL-12
URL: https://issues.apache.org/jira/browse/MRQL-12
Project: MRQL
Issue Type: Improvement
Components: Run-Time Data
Affects Versions: 0.9.0
Environment: Apache Spark http://spark-project.org/
Reporter: Leonidas Fegaras
Assignee: Leonidas Fegaras
Spark provides primitives for in-memory cluster computing
(http://spark-project.org/). It has been developed at UC Berkeley and has
recently accepted as an ASF incubating project. It has already attracted many
developers and I think it will play a major role in the hadoop ecosystem. So, I
thought it will be nice to be able to evaluate MRQL queries in a Spark cluster.
Spark already supports Hive (called Shark). Like Hama, Spark can evaluate
queries in memory but unlike Hama, it supports full fault-tolerance. I have
already written all the code but I have only tested it in local mode (on a
single multi-core node). This task turned out to be easier than I thought
because MRQL plans are similar to Spark operations. The only annoyance was that
I had to make all data structures Serializable. I also had to include the Gen
source code (the Java preprocessor), with ASF licence, which will make the
transition to maven easier.
I am attaching the patch below. The actual code that contains the Spark
evaluator is the file Evaluator.gen which is attached separately.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira