[ 
https://issues.apache.org/jira/browse/MRQL-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonidas Fegaras updated MRQL-25:
---------------------------------

    Attachment: MRQL-25.patch

> Changed Translator/Evaluator interface to improve Spark efficiency
> ------------------------------------------------------------------
>
>                 Key: MRQL-25
>                 URL: https://issues.apache.org/jira/browse/MRQL-25
>             Project: MRQL
>          Issue Type: Improvement
>            Reporter: Leonidas Fegaras
>            Assignee: Leonidas Fegaras
>            Priority: Minor
>         Attachments: MRQL-25.patch
>
>
> The following patch extends the old interface between the translator and the 
> three evaluators to make the Spark evaluation more efficient. More 
> specifically, the old interface simply used the method collect over datasets 
> to lazily collect data from a distributed dataset (it returns an Iterator). 
> But there is no public method in Spark to pull data from an RDD in the form 
> of an Iterator. Previously, I had to dump an RDD into a binary file and 
> create a Iterator reader, which was inefficient. Now, in addition to collect, 
> the interface includes take and reduce: take takes the first elements of a 
> dataset and reduce reduces the values of a dataset using an associative 
> accumulator. Some translator methods, such as print and dump to binary or 
> text files, had to be rewritten to use this new interface.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to