[
https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022499#comment-16022499
]
Jeff Zhang commented on PIG-5157:
---------------------------------
bq. I think (and correct me if I'm wrong) we don't have to change physical and
logical plan, but we've to modify how the plan is mapped to Spark: modify the
converters from RDD converter to DataSet converter.
That's correct.
bq. we should try to migrate to DataSet API only for spark 2.1. As far as I
know Spark 1.6 has DataFrames API, but since it was experimental that time, I
think we shouldn't change that, RDDs are fine for Spark 1.6
DataFrame API is not experimental for spark 1.6, it is pretty stable for 1.6. I
guess you mean DataSet API instead of DataFrame API. In Spark 2.x DataFrame is
a just a alias of DataSet[Row]. I think pig don't need DataSet, it only needs
DataFrame, DataSet is for strong typing such as java beans, but seems pig only
use Tuple, so pig don't needs the feature of DataSet, DataFrame is sufficient
for pig.
> Upgrade to Spark 2.0
> --------------------
>
> Key: PIG-5157
> URL: https://issues.apache.org/jira/browse/PIG-5157
> Project: Pig
> Issue Type: Improvement
> Components: spark
> Reporter: Nandor Kollar
> Assignee: Nandor Kollar
> Fix For: 0.18.0
>
>
> Upgrade to Spark 2.0 (or latest)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)