[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0

Jeff Zhang (JIRA) Wed, 24 May 2017 01:17:22 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022499#comment-16022499
 ]


Jeff Zhang commented on PIG-5157:
---------------------------------

bq.  I think (and correct me if I'm wrong) we don't have to change physical and 
logical plan, but we've to modify how the plan is mapped to Spark: modify the 
converters from RDD converter to DataSet converter.
That's correct.

bq. we should try to migrate to DataSet API only for spark 2.1. As far as I 
know Spark 1.6 has DataFrames API, but since it was experimental that time, I 
think we shouldn't change that, RDDs are fine for Spark 1.6
DataFrame API is not experimental for spark 1.6, it is pretty stable for 1.6. I 
guess you mean DataSet API instead of DataFrame API.  In Spark 2.x DataFrame is 
a just a alias of DataSet[Row].  I think pig don't need DataSet, it only needs 
DataFrame, DataSet is for strong typing such as java beans, but seems pig only 
use Tuple, so pig don't needs the feature of DataSet, DataFrame is sufficient 
for pig. 

> Upgrade to Spark 2.0
> --------------------
>
>                 Key: PIG-5157
>                 URL: https://issues.apache.org/jira/browse/PIG-5157
>             Project: Pig
>          Issue Type: Improvement
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: Nandor Kollar
>             Fix For: 0.18.0
>
>
> Upgrade to Spark 2.0 (or latest)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0

Reply via email to