[
https://issues.apache.org/jira/browse/MRQL-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Leonidas Fegaras updated MRQL-25:
---------------------------------
Attachment: MRQL-25.patch
> Changed Translator/Evaluator interface to improve Spark efficiency
> ------------------------------------------------------------------
>
> Key: MRQL-25
> URL: https://issues.apache.org/jira/browse/MRQL-25
> Project: MRQL
> Issue Type: Improvement
> Reporter: Leonidas Fegaras
> Assignee: Leonidas Fegaras
> Priority: Minor
> Attachments: MRQL-25.patch
>
>
> The following patch extends the old interface between the translator and the
> three evaluators to make the Spark evaluation more efficient. More
> specifically, the old interface simply used the method collect over datasets
> to lazily collect data from a distributed dataset (it returns an Iterator).
> But there is no public method in Spark to pull data from an RDD in the form
> of an Iterator. Previously, I had to dump an RDD into a binary file and
> create a Iterator reader, which was inefficient. Now, in addition to collect,
> the interface includes take and reduce: take takes the first elements of a
> dataset and reduce reduces the values of a dataset using an associative
> accumulator. Some translator methods, such as print and dump to binary or
> text files, had to be rewritten to use this new interface.
--
This message was sent by Atlassian JIRA
(v6.1#6144)