Leonidas Fegaras created MRQL-25:
------------------------------------
Summary: Changed Translator/Evaluator interface to improve Spark
efficiency
Key: MRQL-25
URL: https://issues.apache.org/jira/browse/MRQL-25
Project: MRQL
Issue Type: Improvement
Reporter: Leonidas Fegaras
Assignee: Leonidas Fegaras
Priority: Minor
Attachments: MRQL-25.patch
The following patch extends the old interface between the translator and the
three evaluators to make the Spark evaluation more efficient. More
specifically, the old interface simply used the method collect over datasets to
lazily collect data from a distributed dataset (it returns an Iterator). But
there is no public method in Spark to pull data from an RDD in the form of an
Iterator. Previously, I had to dump an RDD into a binary file and create a
Iterator reader, which was inefficient. Now, in addition to collect, the
interface includes take and reduce: take takes the first elements of a dataset
and reduce reduces the values of a dataset using an associative accumulator.
Some translator methods, such as print and dump to binary or text files, had to
be rewritten to use this new interface.
--
This message was sent by Atlassian JIRA
(v6.1#6144)