[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0

liyunzhang_intel (JIRA) Wed, 24 May 2017 00:27:48 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022446#comment-16022446
 ]


liyunzhang_intel commented on PIG-5157:
---------------------------------------

[~nkollar]:
bq. the optimizations offered (project Tungsten and Catalyst optimizer) looks 
promising
If use catalyst optimizer, do we need 
{{org.apache.pig.newplan.logical.relational.LogicalPlan}},{{org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan}}?
 {{Catalyst optimizer}} optimizes the spark plan generated by spark sql.
bq. however it seems that it is build around Java beans
  I guess DataSet/DataFrame api provide row-based operation. see the 
[patch|https://issues.apache.org/jira/secure/attachment/12847623/PIG-5080-1.patch]
 of PIG-5080
 {code}
  SparkContext context = SparkContext.getOrCreate();
            SQLContext sqlContext = SQLContext.getOrCreate(context);
            DataFrame df = sqlContext.table("complex_data");
            Row[] rows = df.collect();
            assertEquals(10, rows.length);
            for (int i = 0; i < rows.length; i++) {
              assertEquals(i, rows[i].getJavaMap(0).get("key_" + i));
            }
{code}

[~zjffdu]: appreciate if you can give us your suggetion as you are more 
familiar with spark.

> Upgrade to Spark 2.0
> --------------------
>
>                 Key: PIG-5157
>                 URL: https://issues.apache.org/jira/browse/PIG-5157
>             Project: Pig
>          Issue Type: Improvement
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: Nandor Kollar
>             Fix For: 0.18.0
>
>
> Upgrade to Spark 2.0 (or latest)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5157) Upgrade to Spark 2.0

Reply via email to