[ https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290655#comment-15290655 ]
Alexandre Linte commented on OOZIE-2482: ---------------------------------------- Hi [~satishsaley], sorry for the delay. I tried your solution (PYSPARK_ARCHIVES_PATH and py4j-0.9-src.zip + pyspark.zip). The result is better but it's not working at 100%. When it fails I have the following logs on the Oozie server. {noformat} 2016-05-11 08:43:28,391 WARN CoordActionReadyXCommand:523 - USER[czfv1086] GROUP[-] TOKEN[] APP[coord_app_2ip_loadicxip] JOB[0000024-160426195954711-oozie-C] ACTION[] No actions to start for jobId=0000024-160426195954711-oozie-C as max concurrency reached! 2016-05-11 08:43:30,971 INFO SparkActionExecutor:520 - USER[czfv1086] GROUP[-] TOKEN[] APP[wf_app_2ip_loadicxip] JOB[0000660-160510172237486-oozie-W] ACTION[0000660-160510172237486-oozie-W@launch_streaming] checking action, hadoop job ID [job_1461692698792_19420] status [RUNNING] 2016-05-11 08:43:47,663 INFO StatusTransitService$StatusTransitRunnable:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for [org.apache.oozie.service.StatusTransitService] 2016-05-11 08:43:47,664 INFO StatusTransitService$StatusTransitRunnable:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Running coordinator status service from last instance time = 2016-05-11T06:42Z 2016-05-11 08:43:47,671 INFO StatusTransitService$StatusTransitRunnable:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Running bundle status service from last instance time = 2016-05-11T06:42Z 2016-05-11 08:43:47,674 INFO StatusTransitService$StatusTransitRunnable:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.StatusTransitService] 2016-05-11 08:43:52,311 INFO PauseTransitService:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for [org.apache.oozie.service.PauseTransitService] 2016-05-11 08:43:52,338 INFO PauseTransitService:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.PauseTransitService] 2016-05-11 08:43:54,799 INFO CallbackServlet:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000819-160510172237486-oozie-W] ACTION[0000819-160510172237486-oozie-W@spark-node] callback for action [0000819-160510172237486-oozie-W@spark-node] 2016-05-11 08:43:55,130 INFO SparkActionExecutor:520 - USER[shfs3453] GROUP[-] TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] ACTION[0000819-160510172237486-oozie-W@spark-node] action completed, external ID [job_1461692698792_19524] 2016-05-11 08:43:55,136 WARN SparkActionExecutor:523 - USER[shfs3453] GROUP[-] TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] ACTION[0000819-160510172237486-oozie-W@spark-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Application application_1461692698792_19525 finished with failed status 2016-05-11 08:43:55,136 WARN SparkActionExecutor:523 - USER[shfs3453] GROUP[-] TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] ACTION[0000819-160510172237486-oozie-W@spark-node] Launcher exception: Application application_1461692698792_19525 finished with failed status org.apache.spark.SparkException: Application application_1461692698792_19525 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104) at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47) at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:380) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:301) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:187) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2016-05-11 08:43:55,193 INFO ActionEndXCommand:520 - USER[shfs3453] GROUP[-] TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] ACTION[0000819-160510172237486-oozie-W@spark-node] ERROR is considered as FAILED for SLA 2016-05-11 08:43:55,319 INFO ActionStartXCommand:520 - USER[shfs3453] GROUP[-] TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] ACTION[0000819-160510172237486-oozie-W@fail] Start action [0000819-160510172237486-oozie-W@fail] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2016-05-11 08:43:55,321 INFO ActionStartXCommand:520 - USER[shfs3453] GROUP[-] TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] ACTION[0000819-160510172237486-oozie-W@fail] [***0000819-160510172237486-oozie-W@fail***]Action status=DONE 2016-05-11 08:43:55,321 INFO ActionStartXCommand:520 - USER[shfs3453] GROUP[-] TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] ACTION[0000819-160510172237486-oozie-W@fail] [***0000819-160510172237486-oozie-W@fail***]Action updated in DB! 2016-05-11 08:43:55,401 INFO WorkflowNotificationXCommand:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000819-160510172237486-oozie-W] ACTION[0000819-160510172237486-oozie-W@fail] No Notification URL is defined. Therefore nothing to notify for job 0000819-160510172237486-oozie-W@fail 2016-05-11 08:43:55,401 INFO WorkflowNotificationXCommand:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000819-160510172237486-oozie-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000819-160510172237486-oozie-W 2016-05-11 08:43:55,402 INFO WorkflowNotificationXCommand:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000819-160510172237486-oozie-W] ACTION[0000819-160510172237486-oozie-W@spark-node] No Notification URL is defined. Therefore nothing to notify for job 0000819-160510172237486-oozie-W@spark-node {noformat} > Pyspark job fails with Oozie > ---------------------------- > > Key: OOZIE-2482 > URL: https://issues.apache.org/jira/browse/OOZIE-2482 > Project: Oozie > Issue Type: Bug > Components: core, workflow > Affects Versions: 4.2.0 > Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0 > Cluster secured with Kerberos > Reporter: Alexandre Linte > Assignee: Satish Subhashrao Saley > Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, > OOZIE-2482-zip.patch, py4j-0.9-src.zip, pyspark.zip > > > Hello, > I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made > failed for the same reason: key not found: SPARK_HOME. > Note: A scala job works well in the environment with Oozie. > The logs on the executors are: > {noformat} > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > log4j:ERROR setFile(null,true) call failed. > java.io.FileNotFoundException: > /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_000001 > (Is a directory) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.<init>(FileOutputStream.java:221) > at java.io.FileOutputStream.<init>(FileOutputStream.java:142) > at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) > at > org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) > at > org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55) > at > org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) > at > org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) > at > org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) > at > org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809) > at > org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735) > at > org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615) > at > org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502) > at > org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547) > at > org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483) > at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) > at > org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64) > at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285) > at > org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155) > at > org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132) > at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275) > at > org.apache.hadoop.service.AbstractService.<clinit>(AbstractService.java:43) > Using properties file: null > Parsed arguments: > master yarn-master > deployMode cluster > executorMemory null > executorCores null > totalExecutorCores null > propertiesFile null > driverMemory null > driverCores null > driverExtraClassPath null > driverExtraLibraryPath null > driverExtraJavaOptions null > supervise false > queue null > numExecutors null > files null > pyFiles null > archives null > mainClass null > primaryResource > hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py > name Pysparkpi example > childArgs [100] > jars null > packages null > packagesExclusions null > repositories null > verbose true > Spark properties used, including those specified through > --conf and those from the properties file null: > spark.executorEnv.SPARK_HOME -> /opt/application/Spark/current > spark.executorEnv.PYTHONPATH -> /opt/application/Spark/current/python > spark.yarn.appMasterEnv.SPARK_HOME -> /opt/application/Spark/current > Main class: > org.apache.spark.deploy.yarn.Client > Arguments: > --name > Pysparkpi example > --primary-py-file > hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py > --class > org.apache.spark.deploy.PythonRunner > --arg > 100 > System properties: > spark.executorEnv.SPARK_HOME -> /opt/application/Spark/current > spark.executorEnv.PYTHONPATH -> /opt/application/Spark/current/python > SPARK_SUBMIT -> true > spark.app.name -> Pysparkpi example > spark.submit.deployMode -> cluster > spark.yarn.appMasterEnv.SPARK_HOME -> /opt/application/Spark/current > spark.yarn.isPython -> true > spark.master -> yarn-cluster > Classpath elements: > Failing Oozie Launcher, Main class > [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, key not > found: SPARK_HOME > java.util.NoSuchElementException: key not found: SPARK_HOME > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:58) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:58) > at > org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:1045) > at > org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:1044) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.deploy.yarn.Client.findPySparkArchives(Client.scala:1044) > at > org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:717) > at > org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) > at org.apache.spark.deploy.yarn.Client.run(Client.scala:1016) > at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1076) > at org.apache.spark.deploy.yarn.Client.main(Client.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > at > org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104) > at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95) > at > org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47) > at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at > org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:380) > at > org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:301) > at > org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:187) > at > org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:230) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > log4j:WARN No appenders could be found for logger > (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > {noformat} > The workflow used for Oozie is the following: > {noformat} > <workflow-app xmlns='uri:oozie:workflow:0.5' name='PysparkPi-test'> > <start to='spark-node' /> > <action name='spark-node'> > <spark xmlns="uri:oozie:spark-action:0.1"> > <job-tracker>${jobTracker}</job-tracker> > <name-node>${nameNode}</name-node> > <master>${master}</master> > <mode>${mode}</mode> > <name>Pysparkpi example</name> > <class></class> > > <jar>${nameNode}/User/toto/WORK/Oozie/pyspark/lib/pi.py</jar> > <spark-opts>--conf > spark.yarn.appMasterEnv.SPARK_HOME=/opt/application/Spark/current --conf > spark.executorEnv.SPARK_HOME=/opt/application/Spark/current --conf > spark.executorEnv.PYTHONPATH=/opt/application/Spark/current/python</spark-opts> > <arg>100</arg> > </spark> > <ok to="end" /> > <error to="fail" /> > </action> > <kill name="fail"> > <message>Workflow failed, error > message[${wf:errorMessage(wf:lastErrorNode())}]</message> > </kill> > <end name='end' /> > </workflow-app> > {noformat} > I also created a JIRA for Spark: > [https://issues.apache.org/jira/browse/SPARK-13679] -- This message was sent by Atlassian JIRA (v6.3.4#6332)