[
https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022446#comment-16022446
]
liyunzhang_intel commented on PIG-5157:
---------------------------------------
[~nkollar]:
bq. the optimizations offered (project Tungsten and Catalyst optimizer) looks
promising
If use catalyst optimizer, do we need
{{org.apache.pig.newplan.logical.relational.LogicalPlan}},{{org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan}}?
{{Catalyst optimizer}} optimizes the spark plan generated by spark sql.
bq. however it seems that it is build around Java beans
I guess DataSet/DataFrame api provide row-based operation. see the
[patch|https://issues.apache.org/jira/secure/attachment/12847623/PIG-5080-1.patch]
of PIG-5080
{code}
SparkContext context = SparkContext.getOrCreate();
SQLContext sqlContext = SQLContext.getOrCreate(context);
DataFrame df = sqlContext.table("complex_data");
Row[] rows = df.collect();
assertEquals(10, rows.length);
for (int i = 0; i < rows.length; i++) {
assertEquals(i, rows[i].getJavaMap(0).get("key_" + i));
}
{code}
[~zjffdu]: appreciate if you can give us your suggetion as you are more
familiar with spark.
> Upgrade to Spark 2.0
> --------------------
>
> Key: PIG-5157
> URL: https://issues.apache.org/jira/browse/PIG-5157
> Project: Pig
> Issue Type: Improvement
> Components: spark
> Reporter: Nandor Kollar
> Assignee: Nandor Kollar
> Fix For: 0.18.0
>
>
> Upgrade to Spark 2.0 (or latest)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)