[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-25 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301134#comment-15301134
 ] 

Satish Subhashrao Saley commented on OOZIE-2482:


Hi [~rkanter] and [~gezapeti],
Setting {{SPARK_HOME}} approach is nice and saves us from passing zip files 
through --py-files option. 
But I am worried about [copying over the zip files again 
|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L323-L327]
 by spark since they reside on local file system. Also we are copying them over 
from lib/ to python/lib/.
If we mention those using --py-files option with hdfs paths to pyspark.zip and 
py4j-0.9-src.zip then this copy could be avoided (i.e.if these files are part 
of sharelib, we can pass in that path). And for local mode, we just use the 
files which are available under local directory of launcher job. 

I think its between -- simpler code with extra copying Vs some involved code to 
avoid copying. 

with SPARK_HOME in yarn-cluster
{code}
2016-05-25 15:44:22,873 INFO [uber-SubtaskRunner] 
org.apache.spark.deploy.yarn.Client: Uploading resource 
file:/private/tmp/hadoop-saley/nm-local-dir/usercache/saley/appcache/application_1464215511846_0003/container_1464215511846_0003_01_01/python/lib/pyspark.zip
 -> 
hdfs://localhost:8020/user/saley/.sparkStaging/application_1464215511846_0004/pyspark.zip
2016-05-25 15:44:22,880 INFO [uber-SubtaskRunner] 
org.apache.spark.deploy.yarn.Client: Uploading resource 
file:/private/tmp/hadoop-saley/nm-local-dir/usercache/saley/appcache/application_1464215511846_0003/container_1464215511846_0003_01_01/python/lib/py4j-0.9-src.zip
 -> 
hdfs://localhost:8020/user/saley/.sparkStaging/application_1464215511846_0004/py4j-0.9-src.zip

{code}

with PYSPARK_ARCHIVES_PATH in yarn-cluster mode. I have setup zip files inside 
sharelib:
{code}
2016-05-25 23:35:43,440 INFO [uber-SubtaskRunner] 
org.apache.spark.deploy.yarn.Client: Source and destination file systems are 
the same. Not copying 
hdfs:/tmp/sharelib_dir/spark_yarn/share/spark/python/lib/pyspark.zip

2016-05-25 23:35:43,460 INFO [uber-SubtaskRunner] 
org.apache.spark.deploy.yarn.Client: Source and destination file systems are 
the same. Not copying 
hdfs:/tmp/sharelib_dir/spark_yarn/share/spark/python/lib/py4j-0.9-src.zip

{code}

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-3.patch, OOZIE-2482-4.patch, OOZIE-2482-5.patch, 
> OOZIE-2482-6.patch, OOZIE-2482-zip.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300990#comment-15300990
 ] 

Hadoop QA commented on OOZIE-2482:
--

Testing JIRA OOZIE-2482

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:red}-1{color} the patch contains 1 line(s) longer than 132 
characters
.{color:green}+1{color} the patch does adds/modifies 3 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 1781
.Tests failed: 2
.Tests errors: 0

.The patch failed the following testcases:

.  testIDGeneration(org.apache.oozie.service.TestZKUUIDService)
.  testMultipleIDGeneration(org.apache.oozie.service.TestZKUUIDService)

{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/2904/

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-3.patch, OOZIE-2482-4.patch, OOZIE-2482-5.patch, 
> OOZIE-2482-6.patch, OOZIE-2482-zip.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-25 Thread Peter Cseh (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300765#comment-15300765
 ] 

Peter Cseh commented on OOZIE-2482:
---

Fixed up the documentation to explain that py files should be referred locally. 
PySpark does not like full hdfs paths in all running modes. 

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-3.patch, OOZIE-2482-4.patch, OOZIE-2482-5.patch, 
> OOZIE-2482-6.patch, OOZIE-2482-zip.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packages 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-25 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300706#comment-15300706
 ] 

Robert Kanter commented on OOZIE-2482:
--

One last thing on the -5 patch: A dependency on commons-io is added to 
{{SparkMain}}, so we should be adding it to the spark sharelib explicitly 
instead of relying on Spark to pull it in.

I tried running a PySpark job and a Spark job on a nonsecure and secure 
cluster, with and without the zip files, and everything seems to be having as 
expected.  However, it's not working with {{yarn-client}} mode, only 
{{yarn-cluster}.  :(

Also, [~satishsaley], what do you think of the latest patch (other than the 
{{yarn-client}} issue)?  The approach is roughly the same, but it's setting 
different env vars.

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-3.patch, OOZIE-2482-4.patch, OOZIE-2482-5.patch, 
> OOZIE-2482-zip.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299883#comment-15299883
 ] 

Hadoop QA commented on OOZIE-2482:
--

Testing JIRA OOZIE-2482

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:red}-1{color} the patch contains 1 line(s) longer than 132 
characters
.{color:green}+1{color} the patch does adds/modifies 3 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 1781
.Tests failed: 3
.Tests errors: 0

.The patch failed the following testcases:

.  
testActionKillCommandDate(org.apache.oozie.command.coord.TestCoordActionsKillXCommand)
.  
testMemoryUsageAndSpeed(org.apache.oozie.service.TestPartitionDependencyManagerService)
.  
testBundleStatusTransitWithLock(org.apache.oozie.service.TestStatusTransitService)

{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/2902/

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-3.patch, OOZIE-2482-4.patch, OOZIE-2482-5.patch, 
> OOZIE-2482-zip.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-25 Thread Peter Cseh (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299627#comment-15299627
 ] 

Peter Cseh commented on OOZIE-2482:
---

Thank you for the review [~rkanter].
1) changed to:  These files can be added either to workflow's lib/ directory, 
to the sharelib or in sharelib mapping file.
2) link added with anchor (I could not find a way to verify that the link 
works.)
3) duplications removed.
4) Fixed, just like in JavaActionExecutor.injectLauncherProperties(), we now 
add the original keys and the ones with oozie.launcher removed. 
5) deleted
6) typo fixed
7) good idea, will do!


> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-3.patch, OOZIE-2482-4.patch, OOZIE-2482-zip.patch, 
> py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299448#comment-15299448
 ] 

Hadoop QA commented on OOZIE-2482:
--

Testing JIRA OOZIE-2482

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:red}-1{color} the patch contains 1 line(s) longer than 132 
characters
.{color:green}+1{color} the patch does adds/modifies 3 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 1781
.Tests failed: 1
.Tests errors: 0

.The patch failed the following testcases:

.  testIDGeneration(org.apache.oozie.service.TestZKUUIDService)

{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/2901/

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-3.patch, OOZIE-2482-4.patch, OOZIE-2482-zip.patch, 
> py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-24 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299376#comment-15299376
 ] 

Robert Kanter commented on OOZIE-2482:
--

Thanks [~satishsaley] and [~gezapeti] for working on this.  It's proved to be 
very tricky.
Here's some feedback on the -4 patch:
# In AG_Install.twiki, where it says {quote}These files can be added either to 
workflow's lib/ directory or in sharelib mapping file.{quote} I think it should 
also mention that they can also be added to the spark directory if using the 
"old" sharelib configuration.
# In DG_SparkActionExtension.twiki, where it says {quote}please refer to 
installation document{quote}, that should link to the AG_Install.twiki section 
about the sharelib (there should be an anchor there already because it’s a 
heading).
# I don’t think it’s necessary to repeat all of the standard "The prepare 
element...", "The job-xml element...", etc in the PySpark section in 
DG_SparkActionExectension.twiki.  That’s already mentioned earlier.  It’s only 
necessary to mention that the python file goes in the  element.
# In {{SparkActionExecutor}}, Something seems funny here:
{code:java}
String mapredChildEnv = conf.get("oozie.launcher." + MAPRED_CHILD_ENV);

if (mapredChildEnv == null) {
conf.set(MAPRED_CHILD_ENV, sparkHome);
}
else if (!mapredChildEnv.contains("SPARK_HOME")) {
conf.set(MAPRED_CHILD_ENV, mapredChildEnv + "," + sparkHome);
}
return conf;
{code}
We’re getting {{oozie.launcher.mapred.child.env}} from {{conf}}, but we’re 
setting {{mapred.child.env}} in {{conf}}.  Shouldn’t these match?
# There’s an extra blank line added in {{SparkMain}} before the if statement 
for the {{VERBOSE_OPTION}}
# In {{SparkMain}}, there’s a Javadoc that says "... pyspark.zip an 
py4j-VERSION-src.zip files…".  "An" should be "an" here.
# We don’t need to do it for this JIRA, but it might be nice to have a new 
PySpark example workflow.  Can you file a new related JIRA for that?

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-3.patch, OOZIE-2482-4.patch, OOZIE-2482-zip.patch, 
> py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-24 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299351#comment-15299351
 ] 

Robert Kanter commented on OOZIE-2482:
--

Now that OOZIE-2532 is in, the patch should apply, so I've kicked off Jenkins.

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-3.patch, OOZIE-2482-4.patch, OOZIE-2482-zip.patch, 
> py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   packagesExclusions  null
>   repositoriesnull
>   verbose

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297418#comment-15297418
 ] 

Hadoop QA commented on OOZIE-2482:
--

Testing JIRA OOZIE-2482

Cleaning local git workspace



{color:red}-1{color} Patch failed to apply to head of branch



> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-3.patch, OOZIE-2482-4.patch, OOZIE-2482-zip.patch, 
> py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packages

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-23 Thread Peter Cseh (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297354#comment-15297354
 ] 

Peter Cseh commented on OOZIE-2482:
---

Thank you for the great work [~satishsaley]! 
I managed to modify your patch and get it to work on my machine. I changed the 
following:
- SparkActionExecutor sets SPARK_HOME instead of PYSPARK_ARCHIVES_PATH 
- SparkMain creates the folder python/lib under the current working directory 
and copies the needed zip files there.
I've attached my solution. I've included your documentation changes in it.

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Peter Cseh
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-3.patch, OOZIE-2482-4.patch, OOZIE-2482-zip.patch, 
> py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296530#comment-15296530
 ] 

Hadoop QA commented on OOZIE-2482:
--

Testing JIRA OOZIE-2482

Cleaning local git workspace



{color:red}-1{color} Patch failed to apply to head of branch



> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-3.patch, OOZIE-2482-zip.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-20 Thread Alexandre Linte (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293604#comment-15293604
 ] 

Alexandre Linte commented on OOZIE-2482:


Hi [~satishsaley], I won't be able to give you the logs for the application 
1461692698792_19525, the logs were purged.

Here are the logs for a pyspark job that fails with the same error (application 
1461692698792_29704 / 1461692698792_29705).

OOZIE LOGS
{noformat}
2016-05-20 17:42:00,627  INFO CallbackServlet:520 - USER[-] GROUP[-] TOKEN[-] 
APP[-] JOB[0012689-160510172237486-oozie-W] 
ACTION[0012689-160510172237486-oozie-W@spark-node] callback for action 
[0012689-160510172237486-oozie-W@spark-node]
2016-05-20 17:42:00,892  INFO SparkActionExecutor:520 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[0012689-160510172237486-oozie-W] 
ACTION[0012689-160510172237486-oozie-W@spark-node] action completed, external 
ID [job_1461692698792_29704]
2016-05-20 17:42:00,897  WARN SparkActionExecutor:523 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[0012689-160510172237486-oozie-W] 
ACTION[0012689-160510172237486-oozie-W@spark-node] Launcher ERROR, reason: Main 
class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, 
Application application_1461692698792_29705 finished with failed status
2016-05-20 17:42:00,897  WARN SparkActionExecutor:523 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[0012689-160510172237486-oozie-W] 
ACTION[0012689-160510172237486-oozie-W@spark-node] Launcher exception: 
Application application_1461692698792_29705 finished with failed status
org.apache.spark.SparkException: Application application_1461692698792_29705 
finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:380)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:301)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:187)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:230)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2016-05-20 17:42:01,017  INFO ActionEndXCommand:520 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[0012689-160510172237486-oozie-W] 
ACTION[0012689-160510172237486-oozie-W@spark-node] ERROR is considered as 
FAILED for SLA
2016-05-20 17:42:01,080  INFO ActionStartXCommand:520 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[0012689-160510172237486-oozie-W] 
ACTION[0012689-160510172237486-oozie-W@fail] Start action 
[0012689-160510172237486-oozie-W@fail] with 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-19 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292628#comment-15292628
 ] 

Satish Subhashrao Saley commented on OOZIE-2482:


Thank you for review Robert. I set 
spark.executorEnv.PYTHONPATH=pyspark.zip:py4j-0.9-src.zip and it started 
working. I am checking whether we should have it by default as well. 

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-zip.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-19 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291537#comment-15291537
 ] 

Satish Subhashrao Saley commented on OOZIE-2482:


Could please share the logs for application_1461692698792_19525?

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-zip.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   packagesExclusions  null
>   repositoriesnull
>   verbose true
> Spark properties 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-19 Thread Alexandre Linte (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290655#comment-15290655
 ] 

Alexandre Linte commented on OOZIE-2482:


Hi [~satishsaley], sorry for the delay. 
I tried your solution (PYSPARK_ARCHIVES_PATH and py4j-0.9-src.zip  + 
pyspark.zip). The result is better but it's not working at 100%. When it fails 
I have the following logs on the Oozie server.
{noformat}
2016-05-11 08:43:28,391  WARN CoordActionReadyXCommand:523 - USER[czfv1086] 
GROUP[-] TOKEN[] APP[coord_app_2ip_loadicxip] 
JOB[024-160426195954711-oozie-C] ACTION[] No actions to start for 
jobId=024-160426195954711-oozie-C as max concurrency reached!
2016-05-11 08:43:30,971  INFO SparkActionExecutor:520 - USER[czfv1086] GROUP[-] 
TOKEN[] APP[wf_app_2ip_loadicxip] JOB[660-160510172237486-oozie-W] 
ACTION[660-160510172237486-oozie-W@launch_streaming] checking action, 
hadoop job ID [job_1461692698792_19420] status [RUNNING]
2016-05-11 08:43:47,663  INFO StatusTransitService$StatusTransitRunnable:520 - 
USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for 
[org.apache.oozie.service.StatusTransitService]
2016-05-11 08:43:47,664  INFO StatusTransitService$StatusTransitRunnable:520 - 
USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Running coordinator status 
service from last instance time =  2016-05-11T06:42Z
2016-05-11 08:43:47,671  INFO StatusTransitService$StatusTransitRunnable:520 - 
USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Running bundle status service 
from last instance time =  2016-05-11T06:42Z
2016-05-11 08:43:47,674  INFO StatusTransitService$StatusTransitRunnable:520 - 
USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for 
[org.apache.oozie.service.StatusTransitService]
2016-05-11 08:43:52,311  INFO PauseTransitService:520 - USER[-] GROUP[-] 
TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for 
[org.apache.oozie.service.PauseTransitService]
2016-05-11 08:43:52,338  INFO PauseTransitService:520 - USER[-] GROUP[-] 
TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for 
[org.apache.oozie.service.PauseTransitService]
2016-05-11 08:43:54,799  INFO CallbackServlet:520 - USER[-] GROUP[-] TOKEN[-] 
APP[-] JOB[819-160510172237486-oozie-W] 
ACTION[819-160510172237486-oozie-W@spark-node] callback for action 
[819-160510172237486-oozie-W@spark-node]
2016-05-11 08:43:55,130  INFO SparkActionExecutor:520 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[819-160510172237486-oozie-W] 
ACTION[819-160510172237486-oozie-W@spark-node] action completed, external 
ID [job_1461692698792_19524]
2016-05-11 08:43:55,136  WARN SparkActionExecutor:523 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[819-160510172237486-oozie-W] 
ACTION[819-160510172237486-oozie-W@spark-node] Launcher ERROR, reason: Main 
class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, 
Application application_1461692698792_19525 finished with failed status
2016-05-11 08:43:55,136  WARN SparkActionExecutor:523 - USER[shfs3453] GROUP[-] 
TOKEN[] APP[PysparkPi-test] JOB[819-160510172237486-oozie-W] 
ACTION[819-160510172237486-oozie-W@spark-node] Launcher exception: 
Application application_1461692698792_19525 finished with failed status
org.apache.spark.SparkException: Application application_1461692698792_19525 
finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-18 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290255#comment-15290255
 ] 

Robert Kanter commented on OOZIE-2482:
--

Here's some feedback on the 2 patch:
# Rename TestPyspark to TestPySpark
# The SparkActionExecutor should only add {{PYSPARK_ARCHIVES_PATH}} if the user 
is running a PySpark job (SparkMain already has a check for this and only does 
it’s PySpark stuff in that case)
# Docs
## Add a section to the Install docs page about adding the two zip files to the 
sharelib dir or by the mapping file
## Update the Spark Action docs page to explain how to use PySpark and add a 
note about the zip files that links to the Install docs page
# TestPyspark fails.  The stdout from one of the launcher jobs shows this:
{noformat}
Error from python worker:
  /usr/bin/python: No module named pyspark
PYTHONPATH was:
  
/Users/rkanter/.m2/repository/org/apache/spark/spark-core_2.10/1.6.1/spark-core_2.10-1.6.1.jar
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at 
org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:164)
at 
org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:87)
at 
org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:63)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:134)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more

Intercepting System.exit(1)
{noformat}


> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-zip.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-17 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287015#comment-15287015
 ] 

Marcelo Vanzin commented on OOZIE-2482:
---

`PYSPARK_ARCHIVES_PATH` won't be removed unless there's a replacement for it. 
So if you want to use it, you can.

I'm not sure about the location of the zip file, but I see no reason for it to 
change.

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-zip.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
> 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-17 Thread Peter Cseh (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286997#comment-15286997
 ] 

Peter Cseh commented on OOZIE-2482:
---

It seems like the patch command does not handle binary files. I've created 
OOZIE-2532 with the details. 

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-2.patch, 
> OOZIE-2482-zip.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   packagesExclusions  null
>   repositoriesnull
>   verbose true
> Spark 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-16 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285851#comment-15285851
 ] 

Robert Kanter commented on OOZIE-2482:
--

By the way, [~satishsaley] and [~gezapeti], if you use the {{--binary}} 
argument when generating the patch, it should include the binary content as 
part of the patch.  e.g. {{git diff --no-prefix --binary ...}}

{quote}
Robert Kanter I agree with you regarding documenting the change and appropriate 
error messages. 
Also, if users are already using oozie.service.ShareLibService.mapping.file for 
spark sharelib, then we can encourage them to add paths for pyspark and py4j 
zip files in there. That way individual user does not need copy over the zip 
files in workflow lib/ directory.
{quote}
Right.  Though that only covers if you're using the mapping file.  We should 
make sure to also document how to do this if you don't use the mapping file.  
In CDH, for instance, I'm planning on having us (somehow) put the zip files in 
the Spark Sharelib as part of our build so users don't have to even worry about 
this; but we should document the three ways (mapping file, lib/, and sharelib) 
for other users.

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-zip.patch, 
> py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-16 Thread Peter Cseh (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285831#comment-15285831
 ] 

Peter Cseh commented on OOZIE-2482:
---

Regarding environment variables:
I think the solution with PYSPARK_ARCHIVES_PATH is nicer, because you don't 
have to care about the python/lib directory structure, but I can't find any 
documentation about it. 

SPARK_HOME will likely stick around, but the paths needed relative to it or the 
zipfiles may change as they are also not showing up in the documentation.

[~vanzin], what do you think which one is more robust? Are there any plans on 
changing, dropping or officially supporting PYSPARK_ARCHIVES_PATH or the 
python/lib/*zip structure?

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-zip.patch, 
> py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285778#comment-15285778
 ] 

Hadoop QA commented on OOZIE-2482:
--

Testing JIRA OOZIE-2482

Cleaning local git workspace



{color:red}-1{color} Patch failed to apply to head of branch



> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-zip.patch, 
> py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   packagesExclusions  null
>   repositories  

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-16 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285702#comment-15285702
 ] 

Satish Subhashrao Saley commented on OOZIE-2482:


Setting up SPARK_HOME=. will work as well, but we need to make sure that 
pyspark and py4j zip files are under $SPARK_HOME/python/lib/ directory as spark 
will look for it in [this 
code|https://github.com/apache/spark/blob/branch-1.6/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1049-L1051].

Main reason for moving to spark 1.6.1 is the version mismatch errors I faced 
while writing the tests.
{code}
Exception: Python in worker has different version 2.7 than that in driver 
/Users/saley/src/oozie/sharelib/spark/target/test-da
ta/minicluster/mapred/local/1_0/taskTracker/test/jobcache/job_0001/attempt_0001_m_00_0/work/tmp/spark-f71bd1cd-72f6-458d-b3c2-930c5a0eeb00,
 PySpark cannot run with different minor versions
{code}

[~rkanter] I agree with you regarding documenting the change and appropriate 
error messages. 
Also, if users are already using {{oozie.service.ShareLibService.mapping.file}} 
for spark sharelib, then we can encourage them to add paths for pyspark and 
py4j zip files in there. That way individual user does not need copy over the 
zip files in workflow lib/ directory. 

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, OOZIE-2482-zip.patch, 
> py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-16 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285594#comment-15285594
 ] 

Robert Kanter commented on OOZIE-2482:
--

I suppose the other option is to make it so that the user has to manually add 
the two zip files into the Spark sharelib.  Given the complexities here, and 
how Spark keeps changing their packaging, we're probably best off just leaving 
that up to the user.  We can make it clear in the Oozie setup docs; and also if 
the user specifies a a python file but the zips are not there, the Spark 
Action(Executor?) could fail fast with a specific message about adding those 
zips.  We might even be able to have Oozie reject the workflow at submission 
time if the requirements are not met.

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
> 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-16 Thread Peter Cseh (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285505#comment-15285505
 ] 

Peter Cseh commented on OOZIE-2482:
---

I have an ugly way to extract the py files and create the appropriate zip that 
could make this work with Spark 1.1.0 as the Python files were packed into the 
spark-core jar. 
Unfortunately, none of the jars in 1.6.1 contains the py or zip files so that 
solution won't work.
We might try to convince them to propagate the python files into Maven somehow 
in a future release. Until then we could stick with 1.1.0 and grab the files 
from the old jar or upgrade and put the zips into the repository. I would 
prefer the former solution.

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archives   

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-16 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285023#comment-15285023
 ] 

Satish Subhashrao Saley commented on OOZIE-2482:


Tests are failing because jenkins is unable to find {{py4j}} and {{pyspark}} 
zip in test resources.  Attaching them here as per discussion with [~rohini].  
I have added those in {{sharelib/spark/src/test/resources}}

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch, py4j-0.9-src.zip, pyspark.zip
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-16 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284965#comment-15284965
 ] 

Robert Kanter commented on OOZIE-2482:
--

Thanks [~satishsaley] for working on this.  [~gezapeti] has been working on 
this too and finally got a working version the other day, which I think is very 
similar to what you have, though IIRC, he was setting {{SPARK_HOME}} to {{.}} 
(i.e. the working dir) instead of setting {{PYSPARK_ARCHIVES_PATH}}.  I'm not 
sure which is the better env var to set.

Another concern I have is over the two zip files themselves.  [~gezapeti] was 
working on a way to automatically include them in the Spark Sharelib using the 
maven assembly part of the build.  The current patch you posted adds them as 
test dependencies and seems to leave them up to the user otherwise.  

[~gezapeti], have you been able to figure out the maven assembly stuff?  
Perhaps we can combine your and [~satishsaley]'s efforts.

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283775#comment-15283775
 ] 

Hadoop QA commented on OOZIE-2482:
--

Testing JIRA OOZIE-2482

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch does adds/modifies 3 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warnings
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color}
.Tests run: 1779
.Tests failed: 0
.Tests errors: 1

.The patch failed the following testcases:

.  

{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/2884/

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2482-1.patch
>
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-02 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267198#comment-15267198
 ] 

Satish Subhashrao Saley commented on OOZIE-2482:


Earlier I tried pyspark with {{yarn-cluster}} on single node cluster on my mac 
and it was very easy. But running pyspark with {{yarn-cluster}} mode on 
multinode cluster needs few more things.

1. When we submit a spark job, [Spark code | 
https://github.com/apache/spark/blob/branch-1.6/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1047]
 checks for {{PYSPARK_ARCHIVES_PATH}}. If {{PYSPARK_ARCHIVES_PATH}} is not 
present then it looks for {{SPARK_HOME}}. Therefore, we must have at least one 
of them set up correctly.
We can set this environment variable using {{oozie.launcher.mapred.child.env}} 
property.

2. The py4j-0.9-src.zip and pyspark.zip (versions may vary based on spark 
version) are necessary to run python script in spark. Therefore, we need both 
of them present in classpath while executing the script. Simple way is to put 
them under lib/ directory of our workflow.

3. [--py-files option | 
https://github.com/apache/spark/blob/30e980ad8e644354f3c2d48b3904499545cf/docs/submitting-applications.md#bundling-your-applications-dependencies]
 must be configured and passed in {{}}

Settings would look like - 

{code}


.
.
 
  oozie.launcher.mapred.child.env
  PYSPARK_ARCHIVES_PATH=pyspark.zip



yarn-cluster
pyspark example
/hdfs/path/to/pi.py
--queue satishq --conf 
spark.yarn.historyServer.address=http://spark.yarn.hsaddress.com:#port --conf 
spark.ui.view.acls=* --conf 
spark.eventLog.dir=hdfs://hdfspath/mapred/sparkhistory --py-files 
pyspark.zip,py4j-0.9-src.zip

{code}

Oozie can do some extra work to make user's life easy by setting 
{{PYSPARK_ARCHIVES_PATH}}, adding --py-files option automatically by figuring 
out location of pyspark.zip and py4j-0.9-src.zip based on the mapping file 
provided by user in {{oozie.service.ShareLibService.mapping.file}} or from 
default sharelib location if user has not provided any mapping file.


> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-05-02 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266919#comment-15266919
 ] 

Robert Kanter commented on OOZIE-2482:
--

The Spark Action actually will have the spark-default.conf content as long as 
you provide it to the Oozie Server (see OOZIE-2170).

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   packagesExclusions  null
>   repositoriesnull
>   verbose true
> Spark properties used, including those specified through
>  --conf and those from the 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-04-20 Thread Mike Grimes (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250512#comment-15250512
 ] 

Mike Grimes commented on OOZIE-2482:


Based on some of the comments above it looks like the issue is due to the fact 
that spark-defaults.conf is not being pulled in (see that "Using properties 
file: null" in output). This is because Oozie will launch the spark action in a 
container on any random node - assuming spark and its required configuration is 
set up correctly on each node. Is this a fair assumption to make? I feel like 
this goes against how spark is currently being used in the community, it seems 
much more common to have Spark installed on the master, with all necessary 
configuration, and to run jobs from there.

Would it be ideal to re-implement the spark action not as an extension on the 
JavaAction, but the SshAction, to ensure it runs on the master node?

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Ferenc Denes
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-04-18 Thread Ferenc Denes (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245547#comment-15245547
 ] 

Ferenc Denes commented on OOZIE-2482:
-

Please feel free to work on that, I'm not close to the solution yet, and tied 
down with other issues.

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Ferenc Denes
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   packagesExclusions  null
>   repositoriesnull
>   verbose true
> Spark properties used, including those specified through
>  --conf and those from the properties file null:
>   

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-04-15 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243898#comment-15243898
 ] 

Satish Subhashrao Saley commented on OOZIE-2482:


Hi [~BigDataOrange],
I set SPARK_HOME incorrectly in hadoop-env.sh and faced same issue. After 
setting it correctly, I was able to execute pi.py.
{{export SPARK_HOME=/Users/saley/hadoop-stuff/spark-1.6.1-bin-hadoop2.6}}

Try setting {{export 
PYSPARK_ARCHIVES_PATH=$SPARK_HOME/python/lib/pyspark.zip,$SPARK_HOME/python/lib/py4j-0.9-src.zip}}
 

But it should work even if you don't set {{PYSPARK_ARCHIVES_PATH}} variable, 
the [else 
block|https://github.com/apache/spark/blob/branch-1.6/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1049-L1058]
  in the code will get executed.

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Ferenc Denes
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archives

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-04-15 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242873#comment-15242873
 ] 

Satish Subhashrao Saley commented on OOZIE-2482:


[~fdenes] Have you resolved the issue already (saw ticket reassigned)? If not, 
I am willing to work on it.

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Ferenc Denes
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   packagesExclusions  null
>   repositoriesnull
>   verbose true
> Spark properties used, including those specified through
>  --conf and those from the properties file null:

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-04-15 Thread Alexandre Linte (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242608#comment-15242608
 ] 

Alexandre Linte commented on OOZIE-2482:


Hi [~satishsaley],

Thank you for the replay. My bad, the argument "yarn-master" is a mistake. I 
corrected it by setting "yarn-cluster" in my job configuration. 

I checked the comments on the JIRA SPARK-10795. I can successfully do the 
command:
{noformat}
[toto@client pysparkpi]$ spark-submit -v --master yarn-client ./pi.py 100
Using properties file: /opt/application/Spark/current/conf/spark-defaults.conf
Adding default property: 
spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default property: 
spark.executor.extraJavaOptions=-Djava.library.path=/opt/application/Hadoop/current/lib/native/
Adding default property: spark.broadcast.compress=true
Adding default property: 
spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.driver.maxResultSize=1200m
Adding default property: spark.io.compression.snappy.blockSize=32k
Adding default property: spark.kryoserializer.buffer.max=1500m
Adding default property: spark.sql.hive.metastore.jars=builtin
Adding default property: spark.driver.memory=2g
Adding default property: spark.executor.instances=4
Adding default property: spark.kryo.referenceTracking=false
Adding default property: spark.default.parallelism=10
Adding default property: 
spark.kryo.classesToRegister=org.apache.hadoop.hive.ql.io.HiveKey,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch
Adding default property: spark.kryoserializer.buffer=100m
Adding default property: spark.master=yarn-client
Adding default property: spark.broadcast.blockSize=4096
Adding default property: spark.executor.memory=4g
Adding default property: spark.eventLog.dir=hdfs:///Products/SPARK/logs/
Adding default property: spark.eventLog.compress=true
Adding default property: spark.executor.cores=2
Adding default property: spark.yarn.scheduler.heartbeat.interval-ms=3000
Adding default property: spark.akka.frameSize=100
Adding default property: spark.sql.hive.metastore.version=1.2.1
Parsed arguments:
  master  yarn-client
  deployMode  null
  executorMemory  4g
  executorCores   2
  totalExecutorCores  null
  propertiesFile  
/opt/application/Spark/current/conf/spark-defaults.conf
  driverMemory2g
  driverCores null
  driverExtraClassPathnull
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise   false
  queue   null
  numExecutors4
  files   null
  pyFiles null
  archivesnull
  mainClass   null
  primaryResource 
file:/home/toto/workspace/oozie/pyspark/pysparkpi/./pi.py
  namepi.py
  childArgs   [100]
  jarsnull
  packagesnull
  packagesExclusions  null
  repositoriesnull
  verbose true

Spark properties used, including those specified through
 --conf and those from the properties file 
/opt/application/Spark/current/conf/spark-defaults.conf:
  spark.io.compression.codec -> org.apache.spark.io.SnappyCompressionCodec
  spark.default.parallelism -> 10
  spark.executor.memory -> 4g
  spark.driver.memory -> 2g
  spark.kryo.referenceTracking -> false
  spark.broadcast.blockSize -> 4096
  spark.executor.instances -> 4
  spark.eventLog.compress -> true
  spark.eventLog.enabled -> true
  spark.kryo.classesToRegister -> 
org.apache.hadoop.hive.ql.io.HiveKey,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch
  spark.kryoserializer.buffer -> 100m
  spark.serializer -> org.apache.spark.serializer.KryoSerializer
  spark.executor.extraJavaOptions -> 
-Djava.library.path=/opt/application/Hadoop/current/lib/native/
  spark.akka.frameSize -> 100
  spark.yarn.scheduler.heartbeat.interval-ms -> 3000
  spark.sql.hive.metastore.version -> 1.2.1
  spark.kryoserializer.buffer.max -> 1500m
  spark.broadcast.compress -> true
  spark.eventLog.dir -> hdfs:///Products/SPARK/logs/
  spark.driver.maxResultSize -> 1200m
  spark.master -> yarn-client
  spark.io.compression.snappy.blockSize -> 32k
  spark.executor.cores -> 2
  spark.sql.hive.metastore.jars -> builtin


Main class:
org.apache.spark.deploy.PythonRunner
Arguments:
file:/home/toto/workspace/oozie/pyspark/pysparkpi/./pi.py
null
100
System properties:
spark.io.compression.codec -> org.apache.spark.io.SnappyCompressionCodec
spark.default.parallelism -> 10
spark.kryo.referenceTracking -> false
spark.driver.memory -> 2g
spark.executor.memory -> 4g
spark.broadcast.blockSize -> 4096
spark.executor.instances -> 4
spark.eventLog.compress -> true
spark.eventLog.enabled 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-04-14 Thread Satish Subhashrao Saley (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242102#comment-15242102
 ] 

Satish Subhashrao Saley commented on OOZIE-2482:


Hi [~BigDataOrange],
Could you please check if you are facing 
https://issues.apache.org/jira/browse/SPARK-10795? 
[This|https://issues.apache.org/jira/browse/SPARK-10795?focusedCommentId=15180011=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15180011]
 and 
[this|https://issues.apache.org/jira/browse/SPARK-10795?focusedCommentId=15157683=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15157683]
 comment mentioned when the issue was seen. Internally, Oozie uses 
{{spark-submit}} to submit spark job. 

{quote}
Parsed arguments:
  master  yarn-master
{quote}
yarn-master is not a valid argument for master. [Spark 
doc|http://spark.apache.org/docs/latest/submitting-applications.html#master-urls]
  does not mention it. 

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>Assignee: Satish Subhashrao Saley
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-04-04 Thread Alexandre Linte (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223847#comment-15223847
 ] 

Alexandre Linte commented on OOZIE-2482:


Hi [~murali.msse], I tried your solution this morning. I set the SPARK_HOME in 
the hadoop-env.sh. I don't have the error "key not found: SPARK_HOME" but now I 
have the following:
{noformat}
Using properties file: null
Parsed arguments:
  master  yarn-master
  deployMode  cluster
  executorMemory  null
  executorCores   null
  totalExecutorCores  null
  propertiesFile  null
  driverMemorynull
  driverCores null
  driverExtraClassPathnull
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise   false
  queue   null
  numExecutorsnull
  files   null
  pyFiles null
  archivesnull
  mainClass   null
  primaryResource 
hdfs://sandbox/User/zzqj3827/WORK/Oozie/pyspark/lib/pi.py
  namePysparkpi example
  childArgs   [100]
  jarsnull
  packagesnull
  packagesExclusions  null
  repositoriesnull
  verbose true

Spark properties used, including those specified through
 --conf and those from the properties file null:



Main class:
org.apache.spark.deploy.yarn.Client
Arguments:
--name
Pysparkpi example
--primary-py-file
hdfs://sandbox/User/zzqj3827/WORK/Oozie/pyspark/lib/pi.py
--class
org.apache.spark.deploy.PythonRunner
--arg
100
System properties:
SPARK_SUBMIT -> true
spark.app.name -> Pysparkpi example
spark.submit.deployMode -> cluster
spark.yarn.isPython -> true
spark.master -> yarn-cluster
Classpath elements:



Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], 
main() threw exception, requirement failed: pyspark.zip not found; cannot run 
pyspark application in YARN mode.
java.lang.IllegalArgumentException: requirement failed: pyspark.zip not found; 
cannot run pyspark application in YARN mode.
at scala.Predef$.require(Predef.scala:233)
at 
org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:1047)
at 
org.apache.spark.deploy.yarn.Client$$anonfun$findPySparkArchives$2.apply(Client.scala:1044)
at scala.Option.getOrElse(Option.scala:120)
at 
org.apache.spark.deploy.yarn.Client.findPySparkArchives(Client.scala:1044)
at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:717)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1016)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1076)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:380)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:301)
at 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-03-31 Thread Murali Ramasami (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221150#comment-15221150
 ] 

Murali Ramasami commented on OOZIE-2482:


[~BigDataOrange] Please specify the SPARK_HOME in your hadoop-env.sh and 
restart the services and can you try?

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   packagesExclusions  null
>   repositoriesnull
>   verbose true
> Spark properties used, including those specified through
>  --conf and those from the properties file null:
>   spark.executorEnv.SPARK_HOME -> 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-03-24 Thread Alexandre Linte (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209920#comment-15209920
 ] 

Alexandre Linte commented on OOZIE-2482:


Hi [~murali.msse],

Thank you for your feedback, I'm going to check your solution today.

I tried both yarn-cluster and yarn-client mode but the result was the same 
(error key not found: SPARK_HOME). 

I also have a question. Did you set the SPARK_HOME in spark-env.sh? 

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   packagesExclusions  null
>   repositoriesnull
>   verbose 

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-03-21 Thread Murali Ramasami (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204386#comment-15204386
 ] 

Murali Ramasami commented on OOZIE-2482:


Also, can you tell me which mode you have tried. I have tried with 
yarn-cluster. 

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   packagesExclusions  null
>   repositoriesnull
>   verbose true
> Spark properties used, including those specified through
>  --conf and those from the properties file null:
>   spark.executorEnv.SPARK_HOME -> /opt/application/Spark/current
>   

[jira] [Commented] (OOZIE-2482) Pyspark job fails with Oozie

2016-03-21 Thread Murali Ramasami (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204383#comment-15204383
 ] 

Murali Ramasami commented on OOZIE-2482:


Can you try setting the SPARK_HOME in your hadoop-env.sh file and try again? I 
had faced the simillar issue and after setting the SPARK_HOME in hadoop-env the 
problem resolved. 

> Pyspark job fails with Oozie
> 
>
> Key: OOZIE-2482
> URL: https://issues.apache.org/jira/browse/OOZIE-2482
> Project: Oozie
>  Issue Type: Bug
>  Components: core, workflow
>Affects Versions: 4.2.0
> Environment: Hadoop 2.7.2, Spark 1.6.0 on Yarn, Oozie 4.2.0
> Cluster secured with Kerberos
>Reporter: Alexandre Linte
>
> Hello,
> I'm trying to run pi.py example in a pyspark job with Oozie. Every try I made 
> failed for the same reason: key not found: SPARK_HOME.
> Note: A scala job works well in the environment with Oozie.
> The logs on the executors are:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/hd4/hadoop/yarn/local/filecache/145/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/mnt/hd2/hadoop/yarn/local/filecache/155/spark-assembly-1.6.0-hadoop2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/application/Hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR setFile(null,true) call failed.
> java.io.FileNotFoundException: 
> /mnt/hd7/hadoop/yarn/log/application_1454673025841_13136/container_1454673025841_13136_01_01
>  (Is a directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:221)
> at java.io.FileOutputStream.(FileOutputStream.java:142)
> at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
> at 
> org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
> at 
> org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
> at 
> org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
> at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
> at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
> at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
> at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
> at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
> at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
> at org.apache.log4j.LogManager.(LogManager.java:127)
> at 
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> at 
> org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:275)
> at 
> org.apache.hadoop.service.AbstractService.(AbstractService.java:43)
> Using properties file: null
> Parsed arguments:
>   master  yarn-master
>   deployMode  cluster
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   null
>   primaryResource 
> hdfs://hadoopsandbox/User/toto/WORK/Oozie/pyspark/lib/pi.py
>   namePysparkpi example
>   childArgs   [100]
>   jarsnull
>   packagesnull
>   packagesExclusions  null
>   repositoriesnull
>   verbose true
> Spark properties used, including those specified through
>  --conf and those from the