Leonidas Fegaras created MRQL-25:
------------------------------------

             Summary: Changed Translator/Evaluator interface to improve Spark 
efficiency
                 Key: MRQL-25
                 URL: https://issues.apache.org/jira/browse/MRQL-25
             Project: MRQL
          Issue Type: Improvement
            Reporter: Leonidas Fegaras
            Assignee: Leonidas Fegaras
            Priority: Minor
         Attachments: MRQL-25.patch

The following patch extends the old interface between the translator and the 
three evaluators to make the Spark evaluation more efficient. More 
specifically, the old interface simply used the method collect over datasets to 
lazily collect data from a distributed dataset (it returns an Iterator). But 
there is no public method in Spark to pull data from an RDD in the form of an 
Iterator. Previously, I had to dump an RDD into a binary file and create a 
Iterator reader, which was inefficient. Now, in addition to collect, the 
interface includes take and reduce: take takes the first elements of a dataset 
and reduce reduces the values of a dataset using an associative accumulator. 
Some translator methods, such as print and dump to binary or text files, had to 
be rewritten to use this new interface.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to