[jira] [Updated] (PIG-4667) Enable Pig on Spark to run on Yarn Client/Cluster mode

Srikanth Sundarrajan (JIRA) Wed, 02 Sep 2015 02:05:29 -0700

     [ 
https://issues.apache.org/jira/browse/PIG-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Srikanth Sundarrajan updated PIG-4667:
--------------------------------------
    Attachment: PIG-4667-logs.tgz

Was able to get Pig/Spark to run on Yarn (client-mode)

Here is a simple script that I used for testing

{noformat}
A = LOAD '/tmp/x' USING PigStorage('\t') AS (line);
STORE A INTO '/tmp/y' USING PigStorage(',');
{noformat}

Used the following environment setting before launching the script. 
{noformat}
declare -x HADOOP_CONF_DIR="/opt/hadoop-2.6.0.2.2.0.0-2041/etc/hadoop/"
declare -x HADOOP_HOME="/opt/hadoop-2.6.0.2.2.0.0-2041/"
declare -x SPARK_HOME="/opt/spark-1.4.1-bin-without-hadoop"
declare -x 
SPARK_JARS="/projects/pig/lib/spark-assembly-1.4.1-hadoop2.2.0.jar,/projects/pig/lib/joda-time-2.5.jar"
declare -x SPARK_MASTER="yarn-client"
declare -x SPARK_PIG_JAR="/projects/pig/pig-0.15.0-SNAPSHOT-core-h2.jar"
{noformat}

Pig launched via : hadoop fs -rmr /tmp/y; bin/pig -x spark -4 
conf/log4j.properties test.pig

Had to do the following changes additionally.
* Removed kryo dependency from pig directly
* Removed spark-* dependency directly from pig, but add spark-assembly (without 
hadoop, but including yarn) instead

Will now use this as spring board to get it to integrate better. 

Attached logs from AM, containers and pig launchers if anyone is curious

> Enable Pig on Spark to run on Yarn Client/Cluster mode
> ------------------------------------------------------
>
>                 Key: PIG-4667
>                 URL: https://issues.apache.org/jira/browse/PIG-4667
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Srikanth Sundarrajan
>            Assignee: Srikanth Sundarrajan
>             Fix For: spark-branch
>
>         Attachments: PIG-4667-logs.tgz
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4667) Enable Pig on Spark to run on Yarn Client/Cluster mode

Reply via email to