[GitHub] nadav-har-tzvi commented on a change in pull request #20: PySpark fixes for YARN and Mesos

2018-05-26 Thread GitBox
nadav-har-tzvi commented on a change in pull request #20: PySpark fixes for 
YARN and Mesos
URL: https://github.com/apache/incubator-amaterasu/pull/20#discussion_r191065376
 
 

 ##
 File path: 
executor/src/main/scala/org/apache/spark/repl/amaterasu/runners/spark/SparkRunnerHelper.scala
 ##
 @@ -154,7 +154,7 @@ object SparkRunnerHelper extends Logging {
   .set("spark.master", master)
   .set("spark.executor.instances", "1") // TODO: change this
 
 Review comment:
   Change to take this from spark.opts


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nadav-har-tzvi commented on a change in pull request #20: PySpark fixes for YARN and Mesos

2018-05-26 Thread GitBox
nadav-har-tzvi commented on a change in pull request #20: PySpark fixes for 
YARN and Mesos
URL: https://github.com/apache/incubator-amaterasu/pull/20#discussion_r191065242
 
 

 ##
 File path: executor/src/main/resources/spark_intp.py
 ##
 @@ -21,20 +21,10 @@
 import os
 import sys
 import zipimport
+sys.path.append(os.getcwd())
 from runtime import AmaContext, Environment
 
-# os.chdir(os.getcwd() + '/build/resources/test/')
 
 Review comment:
   Bring back the comments for testing
   add the sys.path.append to the spark_intp test


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nadav-har-tzvi commented on a change in pull request #20: PySpark fixes for YARN and Mesos

2018-05-26 Thread GitBox
nadav-har-tzvi commented on a change in pull request #20: PySpark fixes for 
YARN and Mesos
URL: https://github.com/apache/incubator-amaterasu/pull/20#discussion_r191065225
 
 

 ##
 File path: 
common/src/main/scala/org/apache/amaterasu/common/configuration/ClusterConfig.scala
 ##
 @@ -209,7 +209,7 @@ class ClusterConfig extends Logging {
 if (props.containsKey("timeout")) timeout = 
props.getProperty("timeout").asInstanceOf[Double]
 if (props.containsKey("mode")) mode = props.getProperty("mode")
 if (props.containsKey("workingFolder")) workingFolder = 
props.getProperty("workingFolder", s"/user/$user")
-
+if (props.containsKey("pysparkPath")) pysparkPath = 
props.getProperty("pysparkPath")
 
 Review comment:
   Check if needed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] nadav-har-tzvi commented on a change in pull request #20: PySpark fixes for YARN and Mesos

2018-05-26 Thread GitBox
nadav-har-tzvi commented on a change in pull request #20: PySpark fixes for 
YARN and Mesos
URL: https://github.com/apache/incubator-amaterasu/pull/20#discussion_r191065217
 
 

 ##
 File path: 
executor/src/main/scala/org/apache/amaterasu/executor/execution/actions/runners/spark/SparkRunnersProvider.scala
 ##
 @@ -83,9 +84,15 @@ class SparkRunnersProvider extends RunnersProvider with 
Logging {
 sparkScalaRunner.initializeAmaContext(execData.env)
 
 runners.put(sparkScalaRunner.getIdentifier, sparkScalaRunner)
-
+var pypath = ""
 // TODO: get rid of hard-coded version
-lazy val pySparkRunner = PySparkRunner(execData.env, jobId, notifier, 
spark, 
s"${config.spark.home}/python:${config.spark.home}/python/pyspark:${config.spark.home}/python/pyspark/build:${config.spark.home}/python/pyspark/lib/py4j-0.10.4-src.zip",
 execData.pyDeps, config)
+config.mode match {
+  case "yarn" =>
+pypath = 
s"$$PYTHONPATH:$$SPARK_HOME/python:$$SPARK_HOME/python/build:${config.spark.home}/python:${config.spark.home}/python/pyspark:${config.spark.home}/python/pyspark/build:${config.spark.home}/python/pyspark/lib/py4j-0.10.4-src.zip:${new
 File(".").getAbsolutePath}"
 
 Review comment:
   Test if removing {config.spark.home} entries changes anything.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: AMATERASU-24

2018-05-26 Thread Nadav Har Tzvi
I agree with Yaniv that Frameworks should be plugins.
Think about it like this, in the future, hopefully, you will be able to do
something like "sudo yum install amaterasu"
After install the "core" amaterasu using yum, you will be able to use the
new CLI like this: "ama frameworks add " to add a
framework.
Alternatively we could do something like "sudo yum install amaterasu-spark"
I mean, this is what I think anyhow.

As I write this, I've just realized that we should open a thread to discuss
packaging options that we'd like to see implemented.

On 26 May 2018 at 22:53, Yaniv Rodenski  wrote:

> Hi Arun,
>
> You are correct Spark is the first framework, and in my mind,
> frameworks should be treated as plugins. Also, we need to consider that not
> all frameworks will run under the JVM.
> Last, each framework has two modules, a runner (used by both the executor
> and the leader) and runtime, to be used by the actions themselves
> I would suggest the following structure to start with:
> frameworks
>   |-> spark
>   |-> runner
>   |-> runtime
>
> As for the shell scripts, I will leave that for @Nadav, but please have a
> look at PR #17 containing the CLI that will replace the scripts as of
> 0.2.1-incubating.
>
> Cheers,
> Yaniv
>
> On Sat, May 26, 2018 at 5:16 PM, Arun Manivannan  wrote:
>
> > Gentlemen,
> >
> > I am looking into Amaterasu-24 and would like to run the intended changes
> > by you before I make them.
> >
> > Refactor Spark out of Amaterasu executor to it's own project
> >  > issues/AMATERASU-24?filter=allopenissues>
> >
> > I understand Spark is just the first of many frameworks that has been
> lined
> > up for support by Amaterasu.
> >
> > These are the intended changes :
> >
> > 1. Create a new module called "runners" and have the Spark runners under
> > executor pulled into this project
> > (org.apache.executor.execution.actions.runners.spark). We could call it
> > "frameworks" if "runners" is not a great name for this.
> > 2. Will also pull away the Spark dependencies from the Executor to the
> > respective sub-sub-projects (at the moment, just Spark).
> > 3. Since the result of the framework modules would be different bundles,
> > the pattern that I am considering to name the bundle is -
> "runner-spark".
> >  So, it would be "runners:runner-spark" in gradle.
> > 4. On the shell scripts (miniconda and load-spark-env") and the "-cp"
> > passed as commands for the ActionsExecutorLauncher, I could pull them as
> a
> > separate properties of Spark (inside the runner), so that the Application
> > master can use it.
> >
> > Is it okay if I rename the Miniconda install file to miniconda-install
> > using the "wget -O".  The reason why this change is proposed is to avoid
> > hardcoding the conda version inside the code and possibly pull it away
> into
> > amaterasu.properties file. (The changes are in the ama-start shell
> scripts
> > and a couple of places inside the code).
> >
> > Please let me know if this would work.
> >
> > Cheers,
> > Arun
> >
>
>
>
> --
> Yaniv Rodenski
>
> +61 477 778 405
> ya...@shinto.io
>


Re: AMATERASU-24

2018-05-26 Thread Yaniv Rodenski
Hi Arun,

You are correct Spark is the first framework, and in my mind,
frameworks should be treated as plugins. Also, we need to consider that not
all frameworks will run under the JVM.
Last, each framework has two modules, a runner (used by both the executor
and the leader) and runtime, to be used by the actions themselves
I would suggest the following structure to start with:
frameworks
  |-> spark
  |-> runner
  |-> runtime

As for the shell scripts, I will leave that for @Nadav, but please have a
look at PR #17 containing the CLI that will replace the scripts as of
0.2.1-incubating.

Cheers,
Yaniv

On Sat, May 26, 2018 at 5:16 PM, Arun Manivannan  wrote:

> Gentlemen,
>
> I am looking into Amaterasu-24 and would like to run the intended changes
> by you before I make them.
>
> Refactor Spark out of Amaterasu executor to it's own project
>  issues/AMATERASU-24?filter=allopenissues>
>
> I understand Spark is just the first of many frameworks that has been lined
> up for support by Amaterasu.
>
> These are the intended changes :
>
> 1. Create a new module called "runners" and have the Spark runners under
> executor pulled into this project
> (org.apache.executor.execution.actions.runners.spark). We could call it
> "frameworks" if "runners" is not a great name for this.
> 2. Will also pull away the Spark dependencies from the Executor to the
> respective sub-sub-projects (at the moment, just Spark).
> 3. Since the result of the framework modules would be different bundles,
> the pattern that I am considering to name the bundle is -  "runner-spark".
>  So, it would be "runners:runner-spark" in gradle.
> 4. On the shell scripts (miniconda and load-spark-env") and the "-cp"
> passed as commands for the ActionsExecutorLauncher, I could pull them as a
> separate properties of Spark (inside the runner), so that the Application
> master can use it.
>
> Is it okay if I rename the Miniconda install file to miniconda-install
> using the "wget -O".  The reason why this change is proposed is to avoid
> hardcoding the conda version inside the code and possibly pull it away into
> amaterasu.properties file. (The changes are in the ama-start shell scripts
> and a couple of places inside the code).
>
> Please let me know if this would work.
>
> Cheers,
> Arun
>



-- 
Yaniv Rodenski

+61 477 778 405
ya...@shinto.io


[GitHub] nadav-har-tzvi commented on issue #17: Amaterasu CLI for V0.2.1

2018-05-26 Thread GitBox
nadav-har-tzvi commented on issue #17: Amaterasu CLI for V0.2.1
URL: 
https://github.com/apache/incubator-amaterasu/pull/17#issuecomment-389258985
 
 
   Also, this feature requires manual testing in addition to the automated 
tests.
   Please, clone my 
[branch](https://github.com/nadav-har-tzvi/incubator-amaterasu/tree/feature/ama-cli)
 and follow the README.
   It has to be working on the following:
   Standalone Mesos cluster (use the vagrant image for this)
   Hortonworks HDP
   Amazon EMR
   
   This is extremely important!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] arunma opened a new pull request #21: AMATERASU-28 Miniconda version pulling away from code

2018-05-26 Thread GitBox
arunma opened a new pull request #21: AMATERASU-28 Miniconda version pulling 
away from code
URL: https://github.com/apache/incubator-amaterasu/pull/21
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AMATERASU-28) Pull Miniconda version away from compiled code

2018-05-26 Thread Arun Manivannan (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMATERASU-28?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Manivannan updated AMATERASU-28:
-
Summary: Pull Miniconda version away from compiled code  (was: Pull 
Miniconda version away from compilable code)

> Pull Miniconda version away from compiled code
> --
>
> Key: AMATERASU-28
> URL: https://issues.apache.org/jira/browse/AMATERASU-28
> Project: AMATERASU
>  Issue Type: Improvement
>Affects Versions: 0.2.1-incubating
>Reporter: Arun Manivannan
>Assignee: Arun Manivannan
>Priority: Minor
> Fix For: 0.2.1-incubating
>
>
> Miniconda version is hard-coded in a couple of places in the code at the 
> moment.  Pulling this out to have the version info in the shell scripts alone 
> (ama-start-yarn and ama-start-mesos.sh).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-28) Pull Miniconda version away from compilable code

2018-05-26 Thread Arun Manivannan (JIRA)
Arun Manivannan created AMATERASU-28:


 Summary: Pull Miniconda version away from compilable code
 Key: AMATERASU-28
 URL: https://issues.apache.org/jira/browse/AMATERASU-28
 Project: AMATERASU
  Issue Type: Improvement
Affects Versions: 0.2.1-incubating
Reporter: Arun Manivannan
Assignee: Arun Manivannan
 Fix For: 0.2.1-incubating


Miniconda version is hard-coded in a couple of places in the code at the 
moment.  Pulling this out to have the version info in the shell scripts alone 
(ama-start-yarn and ama-start-mesos.sh).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] eyalbenivri closed pull request #19: AMATERASU-21 Fix Spark scala tests

2018-05-26 Thread GitBox
eyalbenivri closed pull request #19: AMATERASU-21 Fix Spark scala tests
URL: https://github.com/apache/incubator-amaterasu/pull/19
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/executor/src/test/resources/simple-spark.scala 
b/executor/src/test/resources/simple-spark.scala
index 797235d..34798e7 100755
--- a/executor/src/test/resources/simple-spark.scala
+++ b/executor/src/test/resources/simple-spark.scala
@@ -1,11 +1,13 @@
+
 import org.apache.amaterasu.executor.runtime.AmaContext
-import org.apache.spark.sql.{DataFrame, SaveMode}
+import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
+
+val data = Seq(1,3,4,5,6)
 
-val data = Array(1, 2, 3, 4, 5)
 
 val sc = AmaContext.sc
 val rdd = sc.parallelize(data)
-val sqlContext = AmaContext.sqlContext
+val sqlContext = AmaContext.spark
 
 import sqlContext.implicits._
 val x: DataFrame = rdd.toDF()
diff --git a/executor/src/test/resources/step-2.scala 
b/executor/src/test/resources/step-2.scala
index 34ad839..189701f 100755
--- a/executor/src/test/resources/step-2.scala
+++ b/executor/src/test/resources/step-2.scala
@@ -1,7 +1,5 @@
 import org.apache.amaterasu.executor.runtime.AmaContext
 
-val oddRdd = AmaContext.getRDD[Int]("start", "rdd").filter(x=>x/2 == 0)
-oddRdd.take(5).foreach(println)
 
-val highNoDf = AmaContext.getDataFrame("start", "x").where("_1 > 3")
+val highNoDf = AmaContext.getDataFrame("start", "x").where("age > 20")
 highNoDf.show
diff --git 
a/executor/src/test/resources/tmp/job/start/x/part-r-0-c370548d-1371-4c08-81d6-f2ba72defa12.gz.parquet
 
b/executor/src/test/resources/tmp/job/start/x/part-r-0-c370548d-1371-4c08-81d6-f2ba72defa12.gz.parquet
new file mode 100644
index 000..e1b0d2e
Binary files /dev/null and 
b/executor/src/test/resources/tmp/job/start/x/part-r-0-c370548d-1371-4c08-81d6-f2ba72defa12.gz.parquet
 differ
diff --git 
a/executor/src/test/resources/tmp/job/start/x/part-r-1-c370548d-1371-4c08-81d6-f2ba72defa12.gz.parquet
 
b/executor/src/test/resources/tmp/job/start/x/part-r-1-c370548d-1371-4c08-81d6-f2ba72defa12.gz.parquet
new file mode 100644
index 000..d807ba9
Binary files /dev/null and 
b/executor/src/test/resources/tmp/job/start/x/part-r-1-c370548d-1371-4c08-81d6-f2ba72defa12.gz.parquet
 differ
diff --git 
a/executor/src/test/scala/org/apache/amaterasu/spark/SparkScalaRunnerTests.scala
 
b/executor/src/test/scala/org/apache/amaterasu/spark/SparkScalaRunnerTests.scala
index d41feea..68c06ce 100755
--- 
a/executor/src/test/scala/org/apache/amaterasu/spark/SparkScalaRunnerTests.scala
+++ 
b/executor/src/test/scala/org/apache/amaterasu/spark/SparkScalaRunnerTests.scala
@@ -1,54 +1,56 @@
-//package org.apache.amaterasu.spark
-//
-//import java.io.File
-//
-//import org.apache.amaterasu.common.runtime._
-//import org.apache.amaterasu.common.configuration.ClusterConfig
-//import org.apache.amaterasu.utilities.TestNotifier
-//
-//import scala.collection.JavaConverters._
-//import org.apache.commons.io.FileUtils
-//import java.io.ByteArrayOutputStream
-//
-//import org.apache.spark.SparkConf
-//import org.apache.spark.repl.Main
-//import org.apache.spark.repl.amaterasu.runners.spark.{SparkRunnerHelper, 
SparkScalaRunner}
-//import org.apache.spark.sql.SparkSession
-//import org.scalatest.{BeforeAndAfterAll, FlatSpec, Matchers}
-//
-//class SparkScalaRunnerTests extends FlatSpec with Matchers with 
BeforeAndAfterAll {
-//
-//  var runner: SparkScalaRunner = _
-//
-//  override protected def beforeAll(): Unit = {
-//
-//FileUtils.deleteQuietly(new File("/tmp/job_5/"))
-//
-//val env = Environment()
-//env.workingDir = "file:///tmp"
-//env.master = "local[*]"
-//
-//
-//val spark = SparkRunnerHelper.createSpark(env, "job_5", 
Seq.empty[String], Map.empty)
-//
-//
-//val notifier = new TestNotifier()
-//val strm = new ByteArrayOutputStream()
-//runner = SparkScalaRunner(env, "job_5", spark, strm, notifier, 
Seq.empty[String])
-//super.beforeAll()
-//  }
-//
-//  "SparkScalaRunner" should "execute the simple-spark.scala" in {
-//
-//val script = getClass.getResource("/simple-spark.scala").getPath
-//runner.executeSource(script, "start", Map.empty[String, String].asJava)
-//
-//  }
-//
-//  "SparkScalaRunner" should "execute step-2.scala and access data from 
simple-spark.scala" in {
-//
-//val script = getClass.getResource("/step-2.scala").getPath
-//runner.executeSource(script, "cont", Map.empty[String, String].asJava)
-//
-//  }
-//}
\ No newline at end of file
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this 

[jira] [Commented] (AMATERASU-21) Fix Spark scala tests

2018-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AMATERASU-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491564#comment-16491564
 ] 

ASF GitHub Bot commented on AMATERASU-21:
-

eyalbenivri closed pull request #19: AMATERASU-21 Fix Spark scala tests
URL: https://github.com/apache/incubator-amaterasu/pull/19
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/executor/src/test/resources/simple-spark.scala 
b/executor/src/test/resources/simple-spark.scala
index 797235d..34798e7 100755
--- a/executor/src/test/resources/simple-spark.scala
+++ b/executor/src/test/resources/simple-spark.scala
@@ -1,11 +1,13 @@
+
 import org.apache.amaterasu.executor.runtime.AmaContext
-import org.apache.spark.sql.{DataFrame, SaveMode}
+import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
+
+val data = Seq(1,3,4,5,6)
 
-val data = Array(1, 2, 3, 4, 5)
 
 val sc = AmaContext.sc
 val rdd = sc.parallelize(data)
-val sqlContext = AmaContext.sqlContext
+val sqlContext = AmaContext.spark
 
 import sqlContext.implicits._
 val x: DataFrame = rdd.toDF()
diff --git a/executor/src/test/resources/step-2.scala 
b/executor/src/test/resources/step-2.scala
index 34ad839..189701f 100755
--- a/executor/src/test/resources/step-2.scala
+++ b/executor/src/test/resources/step-2.scala
@@ -1,7 +1,5 @@
 import org.apache.amaterasu.executor.runtime.AmaContext
 
-val oddRdd = AmaContext.getRDD[Int]("start", "rdd").filter(x=>x/2 == 0)
-oddRdd.take(5).foreach(println)
 
-val highNoDf = AmaContext.getDataFrame("start", "x").where("_1 > 3")
+val highNoDf = AmaContext.getDataFrame("start", "x").where("age > 20")
 highNoDf.show
diff --git 
a/executor/src/test/resources/tmp/job/start/x/part-r-0-c370548d-1371-4c08-81d6-f2ba72defa12.gz.parquet
 
b/executor/src/test/resources/tmp/job/start/x/part-r-0-c370548d-1371-4c08-81d6-f2ba72defa12.gz.parquet
new file mode 100644
index 000..e1b0d2e
Binary files /dev/null and 
b/executor/src/test/resources/tmp/job/start/x/part-r-0-c370548d-1371-4c08-81d6-f2ba72defa12.gz.parquet
 differ
diff --git 
a/executor/src/test/resources/tmp/job/start/x/part-r-1-c370548d-1371-4c08-81d6-f2ba72defa12.gz.parquet
 
b/executor/src/test/resources/tmp/job/start/x/part-r-1-c370548d-1371-4c08-81d6-f2ba72defa12.gz.parquet
new file mode 100644
index 000..d807ba9
Binary files /dev/null and 
b/executor/src/test/resources/tmp/job/start/x/part-r-1-c370548d-1371-4c08-81d6-f2ba72defa12.gz.parquet
 differ
diff --git 
a/executor/src/test/scala/org/apache/amaterasu/spark/SparkScalaRunnerTests.scala
 
b/executor/src/test/scala/org/apache/amaterasu/spark/SparkScalaRunnerTests.scala
index d41feea..68c06ce 100755
--- 
a/executor/src/test/scala/org/apache/amaterasu/spark/SparkScalaRunnerTests.scala
+++ 
b/executor/src/test/scala/org/apache/amaterasu/spark/SparkScalaRunnerTests.scala
@@ -1,54 +1,56 @@
-//package org.apache.amaterasu.spark
-//
-//import java.io.File
-//
-//import org.apache.amaterasu.common.runtime._
-//import org.apache.amaterasu.common.configuration.ClusterConfig
-//import org.apache.amaterasu.utilities.TestNotifier
-//
-//import scala.collection.JavaConverters._
-//import org.apache.commons.io.FileUtils
-//import java.io.ByteArrayOutputStream
-//
-//import org.apache.spark.SparkConf
-//import org.apache.spark.repl.Main
-//import org.apache.spark.repl.amaterasu.runners.spark.{SparkRunnerHelper, 
SparkScalaRunner}
-//import org.apache.spark.sql.SparkSession
-//import org.scalatest.{BeforeAndAfterAll, FlatSpec, Matchers}
-//
-//class SparkScalaRunnerTests extends FlatSpec with Matchers with 
BeforeAndAfterAll {
-//
-//  var runner: SparkScalaRunner = _
-//
-//  override protected def beforeAll(): Unit = {
-//
-//FileUtils.deleteQuietly(new File("/tmp/job_5/"))
-//
-//val env = Environment()
-//env.workingDir = "file:///tmp"
-//env.master = "local[*]"
-//
-//
-//val spark = SparkRunnerHelper.createSpark(env, "job_5", 
Seq.empty[String], Map.empty)
-//
-//
-//val notifier = new TestNotifier()
-//val strm = new ByteArrayOutputStream()
-//runner = SparkScalaRunner(env, "job_5", spark, strm, notifier, 
Seq.empty[String])
-//super.beforeAll()
-//  }
-//
-//  "SparkScalaRunner" should "execute the simple-spark.scala" in {
-//
-//val script = getClass.getResource("/simple-spark.scala").getPath
-//runner.executeSource(script, "start", Map.empty[String, String].asJava)
-//
-//  }
-//
-//  "SparkScalaRunner" should "execute step-2.scala and access data from 
simple-spark.scala" in {
-//
-//val script = getClass.getResource("/step-2.scala").getPath
-//runner.executeSource(script, "cont", Map.empty[String, String].asJava)
-//
-//  }
-//}
\ No newline at end of file
+/*

[GitHub] eyalbenivri commented on a change in pull request #20: PySpark fixes for YARN and Mesos

2018-05-26 Thread GitBox
eyalbenivri commented on a change in pull request #20: PySpark fixes for YARN 
and Mesos
URL: https://github.com/apache/incubator-amaterasu/pull/20#discussion_r191042876
 
 

 ##
 File path: 
executor/src/main/scala/org/apache/amaterasu/executor/execution/actions/runners/spark/PySpark/PySparkRunner.scala
 ##
 @@ -28,7 +29,7 @@ import org.apache.amaterasu.sdk.AmaterasuRunner
 import org.apache.spark.SparkEnv
 import org.apache.spark.sql.SparkSession
 
-import scala.sys.process.Process
+import scala.sys.process._
 
 Review comment:
   As a convention, it is recommended to avoid import entire libraries (as in 
python it is kind of a code-smell to do 'from foo import *'. Please specify 
specific classes/objects etc to import.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] eyalbenivri commented on a change in pull request #20: PySpark fixes for YARN and Mesos

2018-05-26 Thread GitBox
eyalbenivri commented on a change in pull request #20: PySpark fixes for YARN 
and Mesos
URL: https://github.com/apache/incubator-amaterasu/pull/20#discussion_r191042836
 
 

 ##
 File path: 
common/src/main/scala/org/apache/amaterasu/common/configuration/ClusterConfig.scala
 ##
 @@ -68,7 +68,7 @@ class ClusterConfig extends Logging {
 
 class Master {
   var cores: Int = 1
-  var memoryMB: Int = 1024
+  var memoryMB: Int = 2048
 
 Review comment:
   Why are we changing these defaults?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] eyalbenivri commented on a change in pull request #20: PySpark fixes for YARN and Mesos

2018-05-26 Thread GitBox
eyalbenivri commented on a change in pull request #20: PySpark fixes for YARN 
and Mesos
URL: https://github.com/apache/incubator-amaterasu/pull/20#discussion_r191042874
 
 

 ##
 File path: 
executor/src/main/scala/org/apache/amaterasu/executor/execution/actions/runners/spark/PySpark/PySparkRunner.scala
 ##
 @@ -16,8 +16,9 @@
  */
 package org.apache.amaterasu.executor.execution.actions.runners.spark.PySpark
 
-import java.io.{File, PrintWriter, StringWriter}
+import java.io._
 
 Review comment:
   As a convention, it is recommended to avoid import entire libraries (as in 
python it is kind of a code-smell to do 'from foo import *'. Please specify 
specific classes/objects etc to import.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] eyalbenivri commented on a change in pull request #20: PySpark fixes for YARN and Mesos

2018-05-26 Thread GitBox
eyalbenivri commented on a change in pull request #20: PySpark fixes for YARN 
and Mesos
URL: https://github.com/apache/incubator-amaterasu/pull/20#discussion_r191042917
 
 

 ##
 File path: 
executor/src/main/scala/org/apache/spark/repl/amaterasu/runners/spark/SparkRunnerHelper.scala
 ##
 @@ -154,7 +154,7 @@ object SparkRunnerHelper extends Logging {
   .set("spark.master", master)
   .set("spark.executor.instances", "1") // TODO: change this
   .set("spark.yarn.jars", s"spark/jars/*")
-  .set("spark.executor.memory", "1g")
+  .set("spark.executor.memory", "2g")
 
 Review comment:
   Why are we changing these defaults?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] eyalbenivri closed pull request #18: AMATERASU-26 Pipeline tasks runs as "yarn" user instead of inheriting…

2018-05-26 Thread GitBox
eyalbenivri closed pull request #18: AMATERASU-26 Pipeline tasks runs as "yarn" 
user instead of inheriting…
URL: https://github.com/apache/incubator-amaterasu/pull/18
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/leader/src/main/java/org/apache/amaterasu/leader/yarn/Client.java 
b/leader/src/main/java/org/apache/amaterasu/leader/yarn/Client.java
index dc4f15e..e3c2812 100644
--- a/leader/src/main/java/org/apache/amaterasu/leader/yarn/Client.java
+++ b/leader/src/main/java/org/apache/amaterasu/leader/yarn/Client.java
@@ -29,6 +29,7 @@
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.security.UserGroupInformation;
 import org.apache.hadoop.yarn.api.ApplicationConstants;
 import org.apache.hadoop.yarn.api.records.*;
 import org.apache.hadoop.yarn.client.api.YarnClient;
@@ -110,8 +111,9 @@ private void run(JobOpts opts, String[] args) throws 
Exception {
 
 
 List commands = Collections.singletonList(
-"env AMA_NODE=" + System.getenv("AMA_NODE") + " " +
-"$JAVA_HOME/bin/java" +
+"env AMA_NODE=" + System.getenv("AMA_NODE") +
+" env HADOOP_USER_NAME=" + 
UserGroupInformation.getCurrentUser().getUserName() +
+" $JAVA_HOME/bin/java" +
 " -Dscala.usejavacp=false" +
 " -Xmx1G" +
 " org.apache.amaterasu.leader.yarn.ApplicationMaster " 
+
diff --git 
a/leader/src/main/scala/org/apache/amaterasu/leader/yarn/ApplicationMaster.scala
 
b/leader/src/main/scala/org/apache/amaterasu/leader/yarn/ApplicationMaster.scala
index a44202a..1828100 100644
--- 
a/leader/src/main/scala/org/apache/amaterasu/leader/yarn/ApplicationMaster.scala
+++ 
b/leader/src/main/scala/org/apache/amaterasu/leader/yarn/ApplicationMaster.scala
@@ -21,8 +21,8 @@ import java.net.{InetAddress, ServerSocket, URLEncoder}
 import java.nio.ByteBuffer
 import java.util
 import java.util.concurrent.{ConcurrentHashMap, LinkedBlockingQueue}
-import javax.jms.Session
 
+import javax.jms.Session
 import org.apache.activemq.ActiveMQConnectionFactory
 import org.apache.activemq.broker.BrokerService
 import org.apache.amaterasu.common.configuration.ClusterConfig
@@ -153,21 +153,14 @@ class ApplicationMaster extends 
AMRMClientAsync.CallbackHandler with Logging {
 // TODO: awsEnv currently set to empty string. should be changed to read 
values from (where?).
 allocListener = new YarnRMCallbackHandler(nmClient, jobManager, env, 
awsEnv = "", config, executorJar)
 
-rmClient = AMRMClientAsync.createAMRMClientAsync(1000, this)
-rmClient.init(conf)
-rmClient.start()
-
-// Register with ResourceManager
-log.info("Registering application")
-val registrationResponse = rmClient.registerApplicationMaster("", 0, "")
-log.info("Registered application")
+rmClient = startRMClient()
+val registrationResponse = registerAppMaster("", 0, "")
 val maxMem = registrationResponse.getMaximumResourceCapability.getMemory
 log.info("Max mem capability of resources in this cluster " + maxMem)
 val maxVCores = 
registrationResponse.getMaximumResourceCapability.getVirtualCores
 log.info("Max vcores capability of resources in this cluster " + maxVCores)
 log.info(s"Created jobManager. jobManager.registeredActions.size: 
${jobManager.registeredActions.size}")
 
-
 // Resource requirements for worker containers
 this.capability = Records.newRecord(classOf[Resource])
 val frameworkFactory = FrameworkProvidersFactory.apply(env, config)
@@ -194,6 +187,21 @@ class ApplicationMaster extends 
AMRMClientAsync.CallbackHandler with Logging {
 log.info("Finished asking for containers")
   }
 
+  private def startRMClient(): AMRMClientAsync[ContainerRequest] = {
+val client = AMRMClientAsync.createAMRMClientAsync[ContainerRequest](1000, 
this)
+client.init(conf)
+client.start()
+client
+  }
+
+  private def registerAppMaster(host: String, port: Int, url: String) = {
+// Register with ResourceManager
+log.info("Registering application")
+val registrationResponse = rmClient.registerApplicationMaster(host, port, 
url)
+log.info("Registered application")
+registrationResponse
+  }
+
   private def setupMessaging(jobId: String): Unit = {
 
 val cf = new ActiveMQConnectionFactory(address)
@@ -225,20 +233,6 @@ class ApplicationMaster extends 
AMRMClientAsync.CallbackHandler with Logging {
 
   override def onContainersAllocated(containers: util.List[Container]): Unit = 
{
 
-// creating the credentials for container execution
-val credentials = 

[jira] [Commented] (AMATERASU-26) Pipeline tasks (sub-Yarn jobs) runs as "yarn" user instead of inhering the user in which the amaterasu job was submitted

2018-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AMATERASU-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491558#comment-16491558
 ] 

ASF GitHub Bot commented on AMATERASU-26:
-

eyalbenivri closed pull request #18: AMATERASU-26 Pipeline tasks runs as "yarn" 
user instead of inheriting…
URL: https://github.com/apache/incubator-amaterasu/pull/18
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/leader/src/main/java/org/apache/amaterasu/leader/yarn/Client.java 
b/leader/src/main/java/org/apache/amaterasu/leader/yarn/Client.java
index dc4f15e..e3c2812 100644
--- a/leader/src/main/java/org/apache/amaterasu/leader/yarn/Client.java
+++ b/leader/src/main/java/org/apache/amaterasu/leader/yarn/Client.java
@@ -29,6 +29,7 @@
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.security.UserGroupInformation;
 import org.apache.hadoop.yarn.api.ApplicationConstants;
 import org.apache.hadoop.yarn.api.records.*;
 import org.apache.hadoop.yarn.client.api.YarnClient;
@@ -110,8 +111,9 @@ private void run(JobOpts opts, String[] args) throws 
Exception {
 
 
 List commands = Collections.singletonList(
-"env AMA_NODE=" + System.getenv("AMA_NODE") + " " +
-"$JAVA_HOME/bin/java" +
+"env AMA_NODE=" + System.getenv("AMA_NODE") +
+" env HADOOP_USER_NAME=" + 
UserGroupInformation.getCurrentUser().getUserName() +
+" $JAVA_HOME/bin/java" +
 " -Dscala.usejavacp=false" +
 " -Xmx1G" +
 " org.apache.amaterasu.leader.yarn.ApplicationMaster " 
+
diff --git 
a/leader/src/main/scala/org/apache/amaterasu/leader/yarn/ApplicationMaster.scala
 
b/leader/src/main/scala/org/apache/amaterasu/leader/yarn/ApplicationMaster.scala
index a44202a..1828100 100644
--- 
a/leader/src/main/scala/org/apache/amaterasu/leader/yarn/ApplicationMaster.scala
+++ 
b/leader/src/main/scala/org/apache/amaterasu/leader/yarn/ApplicationMaster.scala
@@ -21,8 +21,8 @@ import java.net.{InetAddress, ServerSocket, URLEncoder}
 import java.nio.ByteBuffer
 import java.util
 import java.util.concurrent.{ConcurrentHashMap, LinkedBlockingQueue}
-import javax.jms.Session
 
+import javax.jms.Session
 import org.apache.activemq.ActiveMQConnectionFactory
 import org.apache.activemq.broker.BrokerService
 import org.apache.amaterasu.common.configuration.ClusterConfig
@@ -153,21 +153,14 @@ class ApplicationMaster extends 
AMRMClientAsync.CallbackHandler with Logging {
 // TODO: awsEnv currently set to empty string. should be changed to read 
values from (where?).
 allocListener = new YarnRMCallbackHandler(nmClient, jobManager, env, 
awsEnv = "", config, executorJar)
 
-rmClient = AMRMClientAsync.createAMRMClientAsync(1000, this)
-rmClient.init(conf)
-rmClient.start()
-
-// Register with ResourceManager
-log.info("Registering application")
-val registrationResponse = rmClient.registerApplicationMaster("", 0, "")
-log.info("Registered application")
+rmClient = startRMClient()
+val registrationResponse = registerAppMaster("", 0, "")
 val maxMem = registrationResponse.getMaximumResourceCapability.getMemory
 log.info("Max mem capability of resources in this cluster " + maxMem)
 val maxVCores = 
registrationResponse.getMaximumResourceCapability.getVirtualCores
 log.info("Max vcores capability of resources in this cluster " + maxVCores)
 log.info(s"Created jobManager. jobManager.registeredActions.size: 
${jobManager.registeredActions.size}")
 
-
 // Resource requirements for worker containers
 this.capability = Records.newRecord(classOf[Resource])
 val frameworkFactory = FrameworkProvidersFactory.apply(env, config)
@@ -194,6 +187,21 @@ class ApplicationMaster extends 
AMRMClientAsync.CallbackHandler with Logging {
 log.info("Finished asking for containers")
   }
 
+  private def startRMClient(): AMRMClientAsync[ContainerRequest] = {
+val client = AMRMClientAsync.createAMRMClientAsync[ContainerRequest](1000, 
this)
+client.init(conf)
+client.start()
+client
+  }
+
+  private def registerAppMaster(host: String, port: Int, url: String) = {
+// Register with ResourceManager
+log.info("Registering application")
+val registrationResponse = rmClient.registerApplicationMaster(host, port, 
url)
+log.info("Registered application")
+registrationResponse
+  }
+
   private def setupMessaging(jobId: String): Unit = {
 
 val cf = new ActiveMQConnectionFactory(address)
@@ -225,20 +233,6 @@ class ApplicationMaster extends