[
https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057075#comment-16057075
]
liyunzhang_intel commented on PIG-5157:
---------------------------------------
[~nkollar]: after using the patch and test a simple query in yarn-client env.
build jar:
{noformat}ant clean -v -Dhadoopversion=2 jar-spark12{noformat}
testJoin.pig
{code}
A = load './SkewedJoinInput1.txt' as (id,name,n);
B = load './SkewedJoinInput2.txt' as (id,name);
D = join A by (id,name), B by (id,name) parallel 10;
store D into './testJoin.out';
{code}
spark1:
export SPARK_HOME=xxxx
export export
SPARK_JAR=hdfs://xxxx:8020/user/root/spark-assembly-1.6.1-hadoop2.6.0.jar
$PIG_HOME/bin/pig -x spark -logfile $PIG_HOME/logs/pig.log testJoin.pig
error in logs/pig
{noformat}
java.lang.NoClassDefFoundError:
org/apache/spark/scheduler/SparkListenerInterface
at
org.apache.pig.backend.hadoop.executionengine.spark.SparkExecutionEngine.<init>(SparkExecutionEngine.java:35)
at
org.apache.pig.backend.hadoop.executionengine.spark.SparkExecType.getExecutionEngine(SparkExecType.java:42)
at org.apache.pig.impl.PigContext.<init>(PigContext.java:269)
at org.apache.pig.impl.PigContext.<init>(PigContext.java:256)
at org.apache.pig.Main.run(Main.java:389)
at org.apache.pig.Main.main(Main.java:175)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException:
org.apache.spark.scheduler.SparkListenerInterface
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 12 more
{noformat}
spark2( patch PIG-5246_2.patch)
export SPARK_HOME=xxxx
$PIG_HOME/bin/pig -x spark -logfile $PIG_HOME/logs/pig.log testJoin.pig
error in logs/pig
{noformat}
[main] 2017-06-21 14:14:05,791 ERROR spark.JobGraphBuilder
(JobGraphBuilder.java:sparkOperToRDD(187)) - throw exception in sparkOperToRDD:
org.apache.spark.SparkException: Task not serializable
at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:763)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:762)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:762)
at
org.apache.spark.api.java.JavaRDDLike$class.mapPartitions(JavaRDDLike.scala:166)
at
org.apache.spark.api.java.AbstractJavaRDDLike.mapPartitions(JavaRDDLike.scala:45)
at
org.apache.pig.backend.hadoop.executionengine.spark.converter.ForEachConverter.convert(ForEachConverter.java:64)
at
org.apache.pig.backend.hadoop.executionengine.spark.converter.ForEachConverter.convert(ForEachConverter.java:45)
at
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:292)
at
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:248)
at
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:248)
at
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:248)
at
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:182)
at
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112)
at
org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
at
org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
at
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
at
org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:233)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1475)
{noformat}
will investigate the reason but please retest it in your env. If there is
misunderstanding , please tell me.
> Upgrade to Spark 2.0
> --------------------
>
> Key: PIG-5157
> URL: https://issues.apache.org/jira/browse/PIG-5157
> Project: Pig
> Issue Type: Improvement
> Components: spark
> Reporter: Nandor Kollar
> Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5157.patch
>
>
> Upgrade to Spark 2.0 (or latest)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)