[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

Alexandre Linte (JIRA) Thu, 19 May 2016 01:01:49 -0700

    [ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290655#comment-15290655
 ]


Alexandre Linte commented on OOZIE-2482:
----------------------------------------

Hi [~satishsaley], sorry for the delay. 
I tried your solution (PYSPARK_ARCHIVES_PATH and py4j-0.9-src.zip  + 
pyspark.zip). The result is better but it's not working at 100%. When it fails 
I have the following logs on the Oozie server.
{noformat}
2016-05-11 08:43:28,391  WARN CoordActionReadyXCommand:523 - USER[czfv1086] 
GROUP[-] TOKEN[] APP[coord_app_2ip_loadicxip] 
JOB[0000024-160426195954711-oozie-C] ACTION[] No actions to start for 
jobId=0000024-160426195954711-oozie-C as max concurrency reached!
2016-05-11 08:43:30,971  INFO SparkActionExecutor:520 - USER[czfv1086] GROUP[-] 
TOKEN[] APP[wf_app_2ip_loadicxip] JOB[0000660-160510172237486-oozie-W] 
ACTION[0000660-160510172237486-oozie-W@launch_streaming] checking action, 
hadoop job ID [job_1461692698792_19420] status [RUNNING]
2016-05-11 08:43:47,663  INFO StatusTransitService$StatusTransitRunnable:520 - 
USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for 
[org.apache.oozie.service.StatusTransitService]
2016-05-11 08:43:47,664  INFO StatusTransitService$StatusTransitRunnable:520 - 
USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Running coordinator status 
service from last instance time =  2016-05-11T06:42Z
2016-05-11 08:43:47,671  INFO StatusTransitService$StatusTransitRunnable:520 - 
USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Running bundle status service 
from last instance time =  2016-05-11T06:42Z
2016-05-11 08:43:47,674  INFO StatusTransitService$StatusTransitRunnable:520 - 
USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for 
[org.apache.oozie.service.StatusTransitService]
2016-05-11 08:43:52,311  INFO PauseTransitService:520 - USER[-] GROUP[-] 
TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for 
[org.apache.oozie.service.PauseTransitService]
2016-05-11 08:43:52,338  INFO PauseTransitService:520 - USER[-] GROUP[-] 
TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for 
[org.apache.oozie.service.PauseTransitService]
2016-05-11 08:43:54,799  INFO CallbackServlet:520 - USER[-] GROUP[-] TOKEN[-] 
APP[-] JOB[0000819-160510172237486-oozie-W] 
ACTION[0000819-160510172237486-oozie-W@spark-node] callback for action 
[0000819-160510172237486-oozie-W@spark-node]
2016-05-11 08:43:55,130  INFO SparkActionExecutor:520 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] 
ACTION[0000819-160510172237486-oozie-W@spark-node] action completed, external 
ID [job_1461692698792_19524]
2016-05-11 08:43:55,136  WARN SparkActionExecutor:523 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] 
ACTION[0000819-160510172237486-oozie-W@spark-node] Launcher ERROR, reason: Main 
class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, 
Application application_1461692698792_19525 finished with failed status
2016-05-11 08:43:55,136  WARN SparkActionExecutor:523 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] 
ACTION[0000819-160510172237486-oozie-W@spark-node] Launcher exception: 
Application application_1461692698792_19525 finished with failed status
org.apache.spark.SparkException: Application application_1461692698792_19525 
finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
        at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
        at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
        at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
        at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:380)
        at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:301)
        at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:187)
        at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:230)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

2016-05-11 08:43:55,193  INFO ActionEndXCommand:520 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] 
ACTION[0000819-160510172237486-oozie-W@spark-node] ERROR is considered as 
FAILED for SLA
2016-05-11 08:43:55,319  INFO ActionStartXCommand:520 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] 
ACTION[0000819-160510172237486-oozie-W@fail] Start action 
[0000819-160510172237486-oozie-W@fail] with user-retry state : userRetryCount 
[0], userRetryMax [0], userRetryInterval [10]
2016-05-11 08:43:55,321  INFO ActionStartXCommand:520 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] 
ACTION[0000819-160510172237486-oozie-W@fail] 
[***0000819-160510172237486-oozie-W@fail***]Action status=DONE
2016-05-11 08:43:55,321  INFO ActionStartXCommand:520 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[0000819-160510172237486-oozie-W] 
ACTION[0000819-160510172237486-oozie-W@fail] 
[***0000819-160510172237486-oozie-W@fail***]Action updated in DB!
2016-05-11 08:43:55,401  INFO WorkflowNotificationXCommand:520 - USER[-] 
GROUP[-] TOKEN[-] APP[-] JOB[0000819-160510172237486-oozie-W] 
ACTION[0000819-160510172237486-oozie-W@fail] No Notification URL is defined. 
Therefore nothing to notify for job 0000819-160510172237486-oozie-W@fail
2016-05-11 08:43:55,401  INFO WorkflowNotificationXCommand:520 - USER[-] 
GROUP[-] TOKEN[-] APP[-] JOB[0000819-160510172237486-oozie-W] ACTION[] No 
Notification URL is defined. Therefore nothing to notify for job 
0000819-160510172237486-oozie-W
2016-05-11 08:43:55,402  INFO WorkflowNotificationXCommand:520 - USER[-] 
GROUP[-] TOKEN[-] APP[-] JOB[0000819-160510172237486-oozie-W] 
ACTION[0000819-160510172237486-oozie-W@spark-node] No Notification URL is 
defined. Therefore nothing to notify for job 
0000819-160510172237486-oozie-W@spark-node
{noformat}

> Pyspark job fails with Oozie
> ----------------------------
>
>                 Key: OOZIE-2482
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2482
>             Project: Oozie
>          Issue Type: Bug
>          Components: core, workflow
>    Affects Versions: 4.2.0
>         Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>            Reporter: Alexandre Linte
>            Assignee: Satish Subhashrao Saley
>         Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-zip.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_000001
>  (Is a directory)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:142)
>         at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
>         at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
>         at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
>         at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
>         at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
>         at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
>         at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
>         at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
>         at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
>         at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
>         at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
>         at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
>         at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
>         at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
>         at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
>         at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
>         at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
>         at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
>         at 
> org.apache.hadoop.service.AbstractService.<clinit>(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master                  yarn-master
>   deployMode              cluster
>   executorMemory          null
>   executorCores           null
>   totalExecutorCores      null
>   propertiesFile          null
>   driverMemory            null
>   driverCores             null
>   driverExtraClassPath    null
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise               false
>   queue                   null
>   numExecutors            null
>   files                   null
>   pyFiles                 null
>   archives                null
>   mainClass               null
>   primaryResource         
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   name                    Pysparkpi example
>   childArgs               [100]
>   jars                    null
>   packages                null
>   packagesExclusions      null
>   repositories            null
>   verbose                 true
> Spark properties used, including those specified through
>  --conf and those from the properties file null:
>   spark.executorEnv.SPARK_HOME -> /opt/application/Spark/current
>   spark.executorEnv.PYTHONPATH -> /opt/application/Spark/current/python
>   spark.yarn.appMasterEnv.SPARK_HOME -> /opt/application/Spark/current
> Main class:
> org.apache.spark.deploy.yarn.Client
> Arguments:
> --name
> Pysparkpi example
> --primary-py-file
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
> --class
> org.apache.spark.deploy.PythonRunner
> --arg
> 100
> System properties:
> spark.executorEnv.SPARK_HOME -> /opt/application/Spark/current
> spark.executorEnv.PYTHONPATH -> /opt/application/Spark/current/python
> SPARK_SUBMIT -> true
> spark.app.name -> Pysparkpi example
> spark.submit.deployMode -> cluster
> spark.yarn.appMasterEnv.SPARK_HOME -> /opt/application/Spark/current
> spark.yarn.isPython -> true
> spark.master -> yarn-cluster
> Classpath elements:
> Failing Oozie Launcher, Main class 
> [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, key not 
> found: SPARK_HOME
> java.util.NoSuchElementException: key not found: SPARK_HOME
>         at scala.collection.MapLike$class.default(MapLike.scala:228)
>         at scala.collection.AbstractMap.default(Map.scala:58)
>         at scala.collection.MapLike$class.apply(MapLike.scala:141)
>         at scala.collection.AbstractMap.apply(Map.scala:58)
>         at 
> org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:1045)
>         at 
> org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:1044)
>         at scala.Option.getOrElse(Option.scala:120)
>         at 
> org.apache.spark.deploy.yarn.Client.findPySparkArchives(Client.scala:1044)
>         at 
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:717)
>         at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
>         at org.apache.spark.deploy.yarn.Client.run(Client.scala:1016)
>         at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1076)
>         at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>         at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>         at 
> org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
>         at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
>         at 
> org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
>         at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at 
> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>         at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:380)
>         at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:301)
>         at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:187)
>         at 
> org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:230)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> log4j:WARN No appenders could be found for logger 
> (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> {noformat}
> The workflow used for Oozie is the following:
> {noformat}
> <workflow-app xmlns='uri:oozie:workflow:0.5' name='PysparkPi-test'>
>         <start to='spark-node' />
>         <action name='spark-node'>
>                 <spark xmlns="uri:oozie:spark-action:0.1">
>                         <job-tracker>${jobTracker}</job-tracker>
>                         <name-node>${nameNode}</name-node>
>                         <master>${master}</master>
>                         <mode>${mode}</mode>
>                         <name>Pysparkpi example</name>
>                         <class></class>
>                         
> <jar>${nameNode}/User/toto/WORK/Oozie/pyspark/lib/pi.py</jar>
>                         <spark-opts>--conf 
> spark.yarn.appMasterEnv.SPARK_HOME=/opt/application/Spark/current --conf 
> spark.executorEnv.SPARK_HOME=/opt/application/Spark/current --conf 
> spark.executorEnv.PYTHONPATH=/opt/application/Spark/current/python</spark-opts>
>                         <arg>100</arg>
>                 </spark>
>                 <ok to="end" />
>                 <error to="fail" />
>         </action>
>         <kill name="fail">
>                 <message>Workflow failed, error 
> message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>         </kill>
>         <end name='end' />
> </workflow-app>
> {noformat}
> I also created a JIRA for Spark: 
> [https://issues.apache.org/jira/browse/SPARK-13679]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

Reply via email to