[
https://issues.apache.org/jira/browse/PIG-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Srikanth Sundarrajan updated PIG-4667:
--------------------------------------
Attachment: PIG-4667-logs.tgz
Was able to get Pig/Spark to run on Yarn (client-mode)
Here is a simple script that I used for testing
{noformat}
A = LOAD '/tmp/x' USING PigStorage('\t') AS (line);
STORE A INTO '/tmp/y' USING PigStorage(',');
{noformat}
Used the following environment setting before launching the script.
{noformat}
declare -x HADOOP_CONF_DIR="/opt/hadoop-2.6.0.2.2.0.0-2041/etc/hadoop/"
declare -x HADOOP_HOME="/opt/hadoop-2.6.0.2.2.0.0-2041/"
declare -x SPARK_HOME="/opt/spark-1.4.1-bin-without-hadoop"
declare -x
SPARK_JARS="/projects/pig/lib/spark-assembly-1.4.1-hadoop2.2.0.jar,/projects/pig/lib/joda-time-2.5.jar"
declare -x SPARK_MASTER="yarn-client"
declare -x SPARK_PIG_JAR="/projects/pig/pig-0.15.0-SNAPSHOT-core-h2.jar"
{noformat}
Pig launched via : hadoop fs -rmr /tmp/y; bin/pig -x spark -4
conf/log4j.properties test.pig
Had to do the following changes additionally.
* Removed kryo dependency from pig directly
* Removed spark-* dependency directly from pig, but add spark-assembly (without
hadoop, but including yarn) instead
Will now use this as spring board to get it to integrate better.
Attached logs from AM, containers and pig launchers if anyone is curious
> Enable Pig on Spark to run on Yarn Client/Cluster mode
> ------------------------------------------------------
>
> Key: PIG-4667
> URL: https://issues.apache.org/jira/browse/PIG-4667
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Srikanth Sundarrajan
> Assignee: Srikanth Sundarrajan
> Fix For: spark-branch
>
> Attachments: PIG-4667-logs.tgz
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)