[jira] [Resolved] (AMATERASU-45) PySpark: refactor the PySpark runtime into it's own component

2019-06-08 Thread Nadav Har Tzvi (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMATERASU-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nadav Har Tzvi resolved AMATERASU-45.
-
Resolution: Fixed

> PySpark: refactor the PySpark runtime into it's own component
> -
>
> Key: AMATERASU-45
> URL: https://issues.apache.org/jira/browse/AMATERASU-45
> Project: AMATERASU
>  Issue Type: Task
>Reporter: Yaniv Rodenski
>Assignee: Nadav Har Tzvi
>Priority: Major
> Fix For: 0.2.1-incubating
>
>
> The PySpark Runtime should be extracted to its own component and available as 
> a dependency for PySpark actions developers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Discuss] datasets input file in user's repository

2019-03-18 Thread Nadav Har Tzvi
I assumed as much.
I will implement a change to the leader to load the data from the env, it
will become a part of the PR for Amaterasu-46.

Cheers,
Nadav



On Tue, 19 Mar 2019 at 01:36, Yaniv Rodenski  wrote:

> Hi Nadav,
>
> I think datasets should be per environment, (for example, it is very common
> to use different databases for dev/test/prod), so I think that datasets as
> configurations in Amaterasu should sit under env).
>
> Cheers,
> Yaniv
>
> On Tue, Mar 19, 2019 at 5:13 AM Nadav Har Tzvi 
> wrote:
>
> > Hi,
> >
> > Just wanna open this up for discussion as it seems we somehow skipped
> this
> > point.
> > Basically, by now we pretty much have the new datasets APIs in place in
> the
> > Python SDK and in implementing frameworks. (amaterasu-pyspark,
> > amaterasu-pandas, amaterasu-python)
> > The only question left is regarding the way we get the datasets
> > definitions.
> > Currently, we still look up the datasets definitions in the maki file,
> > under the action's exports.
> > Do we intend to keep it that way? I assume not as I think that every
> action
> > needs access to all defined datasets.
> > In that case, how will the user submit datasets configuration? Is it
> > another file next to the maki.yaml? Is it a file that resides in the
> > environment, e.g. next to the env.yaml? Is it not even a file on its own
> > but a part of the env.yaml?
> > Ideas, anyone?
> >
> > Let's discuss this please!
> >
> > Cheers,
> > Nadav
> >
>
>
> --
> Yaniv Rodenski
>
> +61 477 778 405
> ya...@shinto.io
>


[Discuss] datasets input file in user's repository

2019-03-18 Thread Nadav Har Tzvi
Hi,

Just wanna open this up for discussion as it seems we somehow skipped this
point.
Basically, by now we pretty much have the new datasets APIs in place in the
Python SDK and in implementing frameworks. (amaterasu-pyspark,
amaterasu-pandas, amaterasu-python)
The only question left is regarding the way we get the datasets definitions.
Currently, we still look up the datasets definitions in the maki file,
under the action's exports.
Do we intend to keep it that way? I assume not as I think that every action
needs access to all defined datasets.
In that case, how will the user submit datasets configuration? Is it
another file next to the maki.yaml? Is it a file that resides in the
environment, e.g. next to the env.yaml? Is it not even a file on its own
but a part of the env.yaml?
Ideas, anyone?

Let's discuss this please!

Cheers,
Nadav


[jira] [Created] (AMATERASU-74) amaterasu-vagrant is broken

2019-03-04 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-74:
---

 Summary: amaterasu-vagrant is broken
 Key: AMATERASU-74
 URL: https://issues.apache.org/jira/browse/AMATERASU-74
 Project: AMATERASU
  Issue Type: Bug
Reporter: Nadav Har Tzvi


VirtualBox 6 basically killed the option to use symlinks in shared folders, 
thus it is impossible to use the jGit to clone repositories into the 
"/ama/repo" directory.

We need to fix this somehow.

Also, I have a local Vagrant-AWS  branch that was broken too, due to a 
different reason which has to do with Amazon's ENA requirement of any AMIs that 
run on the standard instance types, this rendered all the centos AMIs unusable. 
I did manage to setup Amaterasu manually on an Amazon Linux AMI with minor 
changes, but it needs ti be automated by vagrant as well.

To sum it, amaterasu-vagrant is totally broken, we need to revive it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] Resolving dependencies with the new architecture

2019-02-22 Thread Nadav Har Tzvi
Hey everyone,

With Amaterasu-8 in place and Amaterasu-45 nearing completion, there is one
more critical thing we need to attend to before we could call the new
architecture complete.

Dependencies resolution, management, distribution.

A feedback I received in PyCon-IL last year was about not using the
standard Python dependencies resolution mechanisms. So for example, we
currently require to supply a "python.yml" file instead of a
"requirements.txt" file.
This could also be applied to the Scala counterpart. Instead of defining
the dependencies in the "jars.yml", we could leverage one or more of the
available build tools to manage this (maybe by supplying a pom.xml as part
of the job repository, somehow, or a build.gradle file)

In regard to Python specifically, going forward, we will definitely let the
user provide a requirements.txt file.

Dependencies distribution is also a problem I would like to think about.
Option 1:
Leader will merge job level and action level dependencies into one
dependencies definition file (e.g. requirements.txt) and will distribute
that file to the executors.
Pros:
Easy to implement.

Cons:
Will not work in networks without outgoing internet connection as the
executors will need access to an external package repository.

Option 2:
Leader will do a full dependencies resolution and distributed the already
downloaded and packaged dependencies to the executors, where they will be
installed.

Pros:
Will work in any environment, only requires the leader to have outgoing
internet connection.

Cons:
1. Much more difficult to implement. Requires implementation of wrappers to
language specific package management systems.
2. Introduces more state to the system that has to be managed. Another
"weak link" in my opinion.

Anyhow, let's discuss this, I would like to hear suggestions, thoughts, etc.

Cheers,
Nadav


Re: [DISCUSS] Review and discussions thread for Amaterasu-45 PR

2019-02-20 Thread Nadav Har Tzvi
Ok,
I will do an E2E test on a pyspark only flow tonight. I think that after
rebasing onto master with the changes from Amaterasu-8, I will have to make
some adjustments in regard to the integration between the leader and the
executor. If there are any, we should be coordinated.

Cheers,
Nadav



On Wed, 20 Feb 2019 at 13:10, Arun Manivannan  wrote:

> Hi Nadav,
>
> Had a look at the datasets.yml and the bindings for it.  This aligns
> perfectly with the consensus that we had on the call that we had the day
> before.
>
> I'll have it implemented for the JVM by the end of this week.
>
> Cheers,
> Arun
>
> On Wed, Feb 20, 2019 at 3:12 AM Nadav Har Tzvi 
> wrote:
>
> > Hey everyone,
> >
> > I opened a draft PR for Amaterasu 45 (Python SDK, pyspark runtime, python
> > runtime). Do mind that this is still a WIP. I would like you to review
> this
> > PR from time to time as it evolves. The sooner I get inputs, the sooner
> > this feature will be out.
> >
> > PR: https://github.com/apache/incubator-amaterasu/pull/44
> >
> > Yaniv, Eyal, Arun. Please let's now define the final MVP of this feature
> so
> > I can know which reviews to defer and which to attend to before this PR
> is
> > closed and merged.
> >
> > Arun, this is most important, if I ended up implementing a design that is
> > vastly different than the scala runtime design, we should talk ASAP.
> >
> > I would like the community to give any feedback. Did I miss something?
> Did
> > I make some stupid mistake? Does the design look awkward? etc.
> >
> > Also a note on the python runtime. It is currently February 2019. As of
> > January 1st 2020 Python2 is EOL. Many third party Python packages already
> > dropped support entirely for Python2. Because of that, the entire SDK and
> > runtimes are implemented in Python >=3.4 (Python <= 3.3.x is EOL)
> > If you think this is problematic, please raise a flag.
> >
> > Cheers,
> > Nadav
> >
>


[jira] [Created] (AMATERASU-71) Add Python tests to Travis CI

2019-02-20 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-71:
---

 Summary: Add Python tests to Travis CI
 Key: AMATERASU-71
 URL: https://issues.apache.org/jira/browse/AMATERASU-71
 Project: AMATERASU
  Issue Type: Task
Affects Versions: 0.2.1-incubating
Reporter: Nadav Har Tzvi
 Fix For: 0.2.1-incubating


We need to add the upcoming python tests to be automatically invoked by travis.

The first feature that will need this is [AMATERASU-45]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] Review and discussions thread for Amaterasu-45 PR

2019-02-19 Thread Nadav Har Tzvi
Hey everyone,

I opened a draft PR for Amaterasu 45 (Python SDK, pyspark runtime, python
runtime). Do mind that this is still a WIP. I would like you to review this
PR from time to time as it evolves. The sooner I get inputs, the sooner
this feature will be out.

PR: https://github.com/apache/incubator-amaterasu/pull/44

Yaniv, Eyal, Arun. Please let's now define the final MVP of this feature so
I can know which reviews to defer and which to attend to before this PR is
closed and merged.

Arun, this is most important, if I ended up implementing a design that is
vastly different than the scala runtime design, we should talk ASAP.

I would like the community to give any feedback. Did I miss something? Did
I make some stupid mistake? Does the design look awkward? etc.

Also a note on the python runtime. It is currently February 2019. As of
January 1st 2020 Python2 is EOL. Many third party Python packages already
dropped support entirely for Python2. Because of that, the entire SDK and
runtimes are implemented in Python >=3.4 (Python <= 3.3.x is EOL)
If you think this is problematic, please raise a flag.

Cheers,
Nadav


Re: [jira] [Created] (AMATERASU-52) Implement AmaContext.datastores

2019-01-28 Thread Nadav Har Tzvi
Hey Arun,

I kinda feel like the datastores yaml is somewhat obscure. I propose the
following structure.

Instead of

datasets:
  hive:
- key: transactions
  uri: /user/somepath
  format: parquet
  database: transations_daily
  table: transx

- key: second_transactions
  uri: /seconduser/somepath
  format: avro
  database: transations_monthly
  table: avro_table
  file:
- key: users
  uri: s3://filestore
  format: parquet
  mode: overwrite

I would have

datasets:
  - key: transactions
uri: /user/somepath
format: parquet
database: transations_daily
table: transx
type: hive
  - key: second_transactions
uri: /seconduser/somepath
format: avro
database: transations_monthly
table: avro_table
type: hive
  - key: users
uri: s3://filestore
format: parquet
mode: overwrite
type: file

In my opinion it is more straightforward and uniform. I think it is also
more straightforward code-wise.
What do you think?

Cheers,
Nadav



On Mon, 14 Jan 2019 at 00:57, Yaniv Rodenski  wrote:

> Hi Arun,
>
> I've added my comments to the PR, but good call, I agree @Nadav Har Tzvi
>  should at least review as you both need to
> maintain compatible APIs.
>
> Cheers,
> Yaniv
>
> On Sun, Jan 13, 2019 at 10:21 PM Arun Manivannan  wrote:
>
>> Hi Guy, Yaniv and Nadiv,
>>
>> This PR <https://github.com/apache/incubator-amaterasu/pull/39> just
>> captures part of the issue - the datasets.yaml, ConfigManager and the
>> testcases. The Integration with the AmaContext is yet to be done but I
>> would like to get your thoughts on the implementation.
>>
>> Guy - Would it be okay if you could help throw some light on the syntax
>> and
>> the idiomatic part of Kotlin itself. Newbie here.
>>
>> Cheers,
>> Arun
>>
>> On Fri, Oct 12, 2018 at 7:15 PM Yaniv Rodenski (JIRA) 
>> wrote:
>>
>> > Yaniv Rodenski created AMATERASU-52:
>> > ---
>> >
>> >  Summary: Implement AmaContext.datastores
>> >  Key: AMATERASU-52
>> >  URL:
>> https://issues.apache.org/jira/browse/AMATERASU-52
>> >  Project: AMATERASU
>> >   Issue Type: Task
>> > Reporter: Yaniv Rodenski
>> > Assignee: Arun Manivannan
>> >  Fix For: 0.2.1-incubating
>> >
>> >
>> > AmaContext.datastores should contain the data from datastores.yaml
>> >
>> >
>> >
>> > --
>> > This message was sent by Atlassian JIRA
>> > (v7.6.3#76005)
>> >
>>
>
>
> --
> Yaniv Rodenski
>
> +61 477 778 405
> ya...@shinto.io
>
>


Re: [DISCUSS] podling report

2019-01-03 Thread Nadav Har Tzvi
Yeah, I give it a +1

Cheers,
Nadav



On Tue, 1 Jan 2019 at 17:12, Yaniv Rodenski  wrote:

> Hi All,
>
> I propose the following report to be submitted.
>
> Amaterasu
>
>
> Apache Amaterasu is a framework providing configuration management and
>
> deployment for Big Data Pipelines.
>
>
> It provides the following capabilities:
>
>
> Continuous integration tools to package pipelines and run tests.
>
> A repository to store those packaged applications: the applications
>
> repository.
>
> A repository to store the pipelines, and engine configuration (for
>
> instance, the location of the Spark master, etc.): per environment - the
>
> configuration repository.
>
> A dashboard to monitor the pipelines.
>
> A DSL and integration hooks allowing third parties to easily integrate.
>
>
> Amaterasu has been incubating since 2017-09.
>
>
> Three most important issues to address in the move towards graduation:
>
>
>   1. Grow up user and contributor communities
>
>   2. Prepare documentation
>
>
> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
>
> aware of?
>
>
> How has the community developed since the last report?
>
>
>   Two new contributors have contributed code that have been merged. In
> addition, we are actively looking for more use cases and organizations to
> use Amaterasu.
>
>
> How has the project developed since the last report?
>
>
>   * 5 pull requests have been opened since the last report and 4 have been
> merged
>
>   * Since last report 9 more issues have been created and 4 out of them
> have been assigned
>
>
> Date of the last release:
>
>
>   12 July 2018
>
>
> When were the last committers or PMC members elected?
>
>
>   N/A
>
>
> Have your mentors been helpful and responsive or are things falling through
> the cracks? In the latter case, please list any open issues that need to be
> addressed.
>
>
>
>  N/A
>
>
> Signed-off-by:
>
>
>   [](amaterasu) Jean-Baptiste Onofré
>
>   [](amaterasu) Olivier Lamy
>
>   [](amaterasu) Davor Bonaci
>
> --
> Yaniv Rodenski
>


[DISCUSS] Common dependencies formats

2018-10-20 Thread Nadav Har Tzvi
Hey everyone,

In both conventions I spoke in (PyCon-IL and SDP) the participants were
inquiring about why we don't use common formats to specify job dependencies
(e.g. - requirements.txt in Python).

I want to bring this matter for discussion. What can we do to conform
better with the developers community? Is there any good reason for us to
stick with the current dependencies specification format (through yaml)
other than technical reasons?

Cheers,
Nadav


[DISCUSS] Dependencies resolution and action level dependencies

2018-10-20 Thread Nadav Har Tzvi
Hey everyone,

Yaniv and I were just discussing how to resolve dependencies in the new
frameworks architecture and integrate the dependencies with the concrete
cluster resource manager (Mesos/YARN)
We rolled with the idea of each runner (or base runner) performing the
dependencies resolution on its own.
So for example, the Spark Scala runner would resolve the required JARs and
do whatever it needs to do with them (e.g. spark-submit --jars --packages
--repositories, etc).
The base Python provider will resolve dependencies and dynamically generate
a requirement.txt file that will deployed to the executor.
The handling of the requirements.txt file differs between different
concrete Python runners. For example, a regular Python runner would simply
run pip install, while the pyspark runner would need to rearrange the
dependencies in a way that would be acceptable by spark-submit (
https://bytes.grubhub.com/managing-dependencies-and-artifacts-in-pyspark-7641aa89ddb7
sounds like a decent idea, comment if you have a better idea please)

So far I hope it makes sense.

The next item I want to discuss is as follows:
In the new architecture, we do hierarchical runtime environment resolution,
starting at the top job level and drilling down to the action level,
outputting one unified environment configuration file that is deployed to
the executor.
I suggest doing the same with dependencies.
Currently, we only have job level dependencies. I suggest that we provide
action level dependencies and resolve them in exactly the same manner as we
resolve the environment.
There should be quite a few benefits for this approach:

   1. It will give the option to have different versions of the same
   package in different actions. This is especially important if you have 2+
   pipeline developers working independently, this would reduce the
   integration costs by letting each action be more self-contained.
   2. It should lower the startup time per action. The more dependencies
   you have, the longer it takes to resolve and install them. Actions will no
   longer get any unnecessary dependencies.


What do you think? Does it make sense?

Cheers,
Nadav


Re: Project status

2018-10-02 Thread Nadav Har Tzvi
Hey,
I was away for a vacation and had some pressure at my daily job before
that, now all of that has cleared up. Yaniv and I started integration right
before my vacation on the recent re-implementation of the Python SDK and
the PySpark SDK. We work on integration in the level of configuration files
prepared by the leader and pulled by the executor, these files are used to
configure storage, logging and the generation of the Amaterasu runtime.
We are really a few steps away from finishing it.
I expect that if we can put an effort into this in the upcoming weekend, we
can finally close this feature and move on to the next task.

Cheers,
Nadav



On Tue, 2 Oct 2018 at 10:28, Davor Bonaci  wrote:

> Any comments? Anyone?
>
> Option 1: start a vote to retire the podling and move the project into your
> own repository.
> Option 2: keep things as-is for a few months and re-assess.
>
> I'd say Option 2 requires a minimum of 3 people explicitly saying that they
> want to continue trying and contributing.
>
> On Sun, Sep 23, 2018 at 8:13 PM Davor Bonaci  wrote:
>
> > Thanks Yaniv for your comments.
> >
> >- After the release of 0.2.0 the community became very quiet. I think
> >>that at this point in the life of the project it is natural, as we
> all
> >>doing this in our free time and the release was a major effort that
> >> all of
> >>us (after talking to members in the community) had to compensate for
> >> in our
> >>day jobs and families.
> >>With that said, we shouldn't have gone so quiet. I think we can all
> >>agree this is not acceptable for so long (if at all).
> >>
> >
> > Not sure I agree: it is not natural for projects in the Incubator to be
> > quiet. It does happen to projects that are getting obsolete/irrelevant,
> > often after many years as TLPs. The release usually *increases* activity
> > around the project as new users come, ask questions, start contributing,
> > etc.
> >
> > On the other hand, totally fine for people to go quiet. The problem isn't
> > around anybody going quiet, but the fact of nobody new arriving. Is there
> > any evidence of any usage of the release? Anybody hitting any problem?
> Any
> > lack of documentation? Any bugfixes? That's the core of the problem.
> >
> >
> >>- It is very critical at this point to grow the community. Going back
> >> to
> >>my first point, as long as we are such a small community, efforts
> like
> >>releasing a version will set us back, and the last release is a good
> >>example for that danger.
> >>
> >
> > Not sure I agree: releases usually pick up the activity, pick up new
> > users, as new features now make the project more attractive. I don't
> think
> > I've ever seen an argument where "releasing a version sets us back".
> > Especially the *first* one.
> >
> >- Grow the community. BTW I think this is one reason we should
> consider
> >>staying an Apache project, I think that with the release, we should
> >> also
> >>shift some focus to growing the community. This is an issue I see
> other
> >>projects struggling with, this includes TLPs such as Apache Arrow
> (in a
> >>recent thread on their dev list) and I don't think there is one
> answer
> >> on
> >>how to do it, and I spent some time on other lists to see if they
> have
> >>solutions. I think we can do many things to fix this, and it's more
> of
> >> a
> >>trial and error process for most projects. Things we can (and should
> >> start
> >>doing immediately) includes doing more public presentations (and I
> >> have to
> >>give a shout-out @Nadav Har Tzvi  that
> >> presented
> >>in two conferences recently), write blog posts, and we should all
> >> invest
> >>time in doing so. But one thing we also need to do is actively
> looking
> >> for
> >>more contributors. If anyone here has someone they think is a good
> fit,
> >>let's try to get them onboard.
> >>
> >
> > Outreach (blogs, talks, etc.) can help, but they help you *scale*. I
> think
> > the project hasn't demonstrated early user fit -- and trying to scale
> > before establishing that often doesn't yield results. For example, if you
> > were to throw Amaterasu in front of 1000 people, how many would join the
> > community? If only a few, it is probably a bad idea to do it. (I worry it
> > is less than

[jira] [Commented] (AMATERASU-23) Add executor classpath from framework configuration - Mesos

2018-09-15 Thread Nadav Har Tzvi (JIRA)


[ 
https://issues.apache.org/jira/browse/AMATERASU-23?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16616235#comment-16616235
 ] 

Nadav Har Tzvi commented on AMATERASU-23:
-

Why is resolution "Invalid"?

> Add executor classpath from framework configuration - Mesos
> ---
>
> Key: AMATERASU-23
> URL: https://issues.apache.org/jira/browse/AMATERASU-23
> Project: AMATERASU
>  Issue Type: Improvement
>Affects Versions: 0.2.1-incubating
>Reporter: Yaniv Rodenski
>Assignee: Yaniv Rodenski
>Priority: Major
> Fix For: 0.2.1-incubating
>
>
> Currently, the executor classpath is hardcoded in the app-master/scheduler, 
> should be loaded from



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] podling report

2018-07-09 Thread Nadav Har Tzvi
+1

On Mon, Jul 9, 2018, 14:46 Jean-Baptiste Onofré  wrote:

> +1
>
> Regards
> JB
>
> On 09/07/2018 11:55, Eyal Ben-Ivri wrote:
> > +1
> >
> >
> > On 6. July 2018 at 18:50:37, Davor Bonaci (da...@apache.org) wrote:
> >
> > +1
> >
> > "have been released" --> "have been built and voted upon"
> >
> > On Fri, Jul 6, 2018 at 12:51 AM, Yaniv Rodenski  wrote:
> >
> >> Hi All,
> >>
> >> Sorry for doing this late again, but I propose the following report to
> be
> >> submitted:
> >>
> >> "
> >> Apache Amaterasu is a framework providing configuration management and
> >> deployment for Big Data Pipelines.
> >>
> >> It provides the following capabilities:
> >>
> >> Continuous integration tools to package pipelines and run tests.
> >> A repository to store those packaged applications: the applications
> >> repository.
> >> A repository to store the pipelines, and engine configuration (for
> >> instance, the location of the Spark master, etc.): per environment - the
> >> configuration repository.
> >> A dashboard to monitor the pipelines.
> >> A DSL and integration hooks allowing third parties to easily integrate.
> >>
> >> Amaterasu has been incubating since 2017-09.
> >>
> >> Three most important issues to address in the move towards graduation:
> >>
> >> 1. Prepare the first release
> >> 2. Grow up user and contributor communities
> >> 3. Prepare documentation
> >>
> >> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> >> aware of?
> >>
> >> None
> >>
> >> How has the community developed since the last report?
> >>
> >> * Two conference talks have been delivered (PyCon il and SDP)
> >> * Initial documentation has been created, targeted for Amaterasu's next
> >> release
> >>
> >> How has the project developed since the last report?
> >>
> >> * since the last report 4 release candidates have been released, at the
> >> time of this report the last RC is being voted on in the
> general@incubator
> >> mailing list
> >> * Two additional contributors started contributing to the code base
> >> * One more organization we are aware of have started a POC with
> Amaterasu
> >>
> >> Date of the last release:
> >>
> >> N/A
> >>
> >> When were the last committers or PMC members elected?
> >>
> >> N/A
> >> "
> >>
> >> If there are no objections I will update the wiki.
> >>
> >> Cheers,
> >> Yaniv
> >>
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


[jira] [Created] (AMATERASU-43) Make the notifier available for job code

2018-07-07 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-43:
---

 Summary: Make the notifier available for job code
 Key: AMATERASU-43
 URL: https://issues.apache.org/jira/browse/AMATERASU-43
 Project: AMATERASU
  Issue Type: Improvement
Reporter: Nadav Har Tzvi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-42) Amaterasu needs to respect Maven classifiers

2018-07-07 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-42:
---

 Summary: Amaterasu needs to respect Maven classifiers
 Key: AMATERASU-42
 URL: https://issues.apache.org/jira/browse/AMATERASU-42
 Project: AMATERASU
  Issue Type: Improvement
Reporter: Nadav Har Tzvi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AMATERASU-41) Support nested custom configurations in jobs environments in PySpark

2018-07-07 Thread Nadav Har Tzvi (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMATERASU-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nadav Har Tzvi updated AMATERASU-41:

Summary: Support nested custom configurations in jobs environments in 
PySpark  (was: Support nested custom configurations in jobs environments)

> Support nested custom configurations in jobs environments in PySpark
> 
>
> Key: AMATERASU-41
> URL: https://issues.apache.org/jira/browse/AMATERASU-41
> Project: AMATERASU
>  Issue Type: Improvement
>        Reporter: Nadav Har Tzvi
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-41) Support nested custom configurations in jobs environments

2018-07-07 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-41:
---

 Summary: Support nested custom configurations in jobs environments
 Key: AMATERASU-41
 URL: https://issues.apache.org/jira/browse/AMATERASU-41
 Project: AMATERASU
  Issue Type: Improvement
Reporter: Nadav Har Tzvi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-40) Change Spark-Scala runner to use spark-submit

2018-07-07 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-40:
---

 Summary: Change Spark-Scala runner to use spark-submit
 Key: AMATERASU-40
 URL: https://issues.apache.org/jira/browse/AMATERASU-40
 Project: AMATERASU
  Issue Type: Improvement
Reporter: Nadav Har Tzvi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-39) Support SSH for Git

2018-07-07 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-39:
---

 Summary: Support SSH for Git
 Key: AMATERASU-39
 URL: https://issues.apache.org/jira/browse/AMATERASU-39
 Project: AMATERASU
  Issue Type: Task
Reporter: Nadav Har Tzvi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AMATERASU-38) Failure to interpret multiline Scala code blocks

2018-07-07 Thread Nadav Har Tzvi (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMATERASU-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nadav Har Tzvi updated AMATERASU-38:

Priority: Blocker  (was: Major)

> Failure to interpret multiline Scala code blocks
> 
>
> Key: AMATERASU-38
> URL: https://issues.apache.org/jira/browse/AMATERASU-38
> Project: AMATERASU
>  Issue Type: Bug
>        Reporter: Nadav Har Tzvi
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-38) Failure to interpret multiline Scala code blocks

2018-07-07 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-38:
---

 Summary: Failure to interpret multiline Scala code blocks
 Key: AMATERASU-38
 URL: https://issues.apache.org/jira/browse/AMATERASU-38
 Project: AMATERASU
  Issue Type: Bug
Reporter: Nadav Har Tzvi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Release Apache Amaterasu (incubating) 0.2.0 (rc4)

2018-06-24 Thread Nadav Har Tzvi
Tested on standalone Mesos, EMR and HDP.
Spark-Scala and PySpark both work on each of the environments.
Thus, I vote +1

Cheers,
Nadav



On Sat, 23 Jun 2018 at 15:35, Yaniv Rodenski  wrote:

> Hi everyone,
>
> After cancelling the vote in the general@ list we've fixed the following:
> * Headers added to all non-code files where applicable as remarked by Davor
> * Sources now match the released version + no keys are present
> * The gradle-wrapper.jar was removed and instructions ware added to the
> readme.md file on how to add it.
> * Missing licenses found by Justin Mclean during the general@ vote ware
> added when applicable.
>
> So, hoping for the best, please review and vote on the release candidate #4
> for the version 0.2.0-incubating, as follows
>
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
>
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which is signed with the key with fingerprint [3],
> * source code tag "version-0.2.0-incubating-rc4" [4],
> * Java artifacts were built with Gradle 3.1 and OpenJDK/Oracle JDK
> 1.8.0_151
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Yaniv
>
> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?p
> rojectId=12321521&version=12342793
> [2] https://dist.apache.org/repos/dist/dev/incubator/amaterasu/0.2.0rc4/
> [3] https://dist.apache.org/repos/dist/dev/incubator/amaterasu/KEYS
> [4] https://github.com/apache/incubator-amaterasu/tags
>
> Thanks everyone
> Yaniv
>


[Discussion] Changing Amaterasu deployment strategy

2018-06-15 Thread Nadav Har Tzvi
Hey everyone,

While working on the new CLI (rather, fixing the damage caused by the
rebase ontop of RC3), I came across some issues and questions that, at
least in my opinion, should be addressed when we deploy Amaterasu.


   1. Configuration -
  1. *Status at 0.2.0-rc3:*
  amaterasu.properties file is bundled in the deployment package (the
  .tar file) and resides in the same directory as the entry point
  (ama-start-mesos.sh, ama-start-yarn.sh)
  2. *Suggestion for 0.2.1*:
 1. The new CLI provided the "ama setup" command that generates the
 configuration file based on a set of questions (TODO - change
to try and
 detect the cluster type and vendor - e.g. Hortonworks HDP, AWS EMR,
 Standalone Mesos, DC/OS, etc)
 2. Change amaterasu.properties to amaterasu.conf and have it
 located at /etc/amaterasu by default, to conform to how
things are usually
 done with many other Apache (and non-Apache) projects.
  2. Amaterasu assets -
  1. *Status at 0.2.0-rc3:*
  When ama-start-mesos.sh or ama-start-yarn.sh is invoked for the first
  time, the relevant dependencies are downloaded (mesos - Spark,
Miniconda |
  yarn - Miniconda) into the {AMATERASU_HOME}/dist directory, by
convention,
  AMATERASU_HOME=/ama
  2. *Suggestion for 0.2.1*:
  1. Again, the new CLI "ama setup" command also takes care of
 downloading the relevant dependencies.
 2. We need a default path for {AMATERASU_HOME}, {AMATERASU_HOME}
 will hold the relevant Amaterasu JARs and any files that are
required for
 Amaterasu to work properly (spark, Miniconda, etc). What
should this path
 be?
 My suggestion based on other projects:
 /usr/share - Apache Marathon
 /usr/lib - Apache Zookeeper

 I tend to go with /usr/share, simply because /usr/lib is intended
 for shared objects. So for example we will have
/usr/share/amaterasu, any
 thoughts?
 3. Distribution methods -
  1. *Status at 0.2.0-rc3:*
  Users who wish to use Amaterasu need to get a the distributable TAR
  and manually install and configure Amaterasu.
  2.
*Suggestion for 0.2.1: *
 1. The new CLI takes care of some of the things during "ama
 setup", but not everything. (see my fork for details:
     
https://github.com/nadav-har-tzvi/incubator-amaterasu/tree/feature/ama-cli
 )
 2. Create a RPM package, so users will be able to add a repository
 and simply "yum install amaterasu".


If you have any thoughts you'd like to share in the matter, please do.

Cheers,
Nadav


[jira] [Updated] (AMATERASU-34) Running Amaterasu during development without a cluster

2018-06-12 Thread Nadav Har Tzvi (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMATERASU-34?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nadav Har Tzvi updated AMATERASU-34:

Issue Type: New Feature  (was: Task)

> Running Amaterasu during development without a cluster
> --
>
> Key: AMATERASU-34
> URL: https://issues.apache.org/jira/browse/AMATERASU-34
> Project: AMATERASU
>  Issue Type: New Feature
>        Reporter: Nadav Har Tzvi
>Priority: Major
>
> As a pipeline developer, I'd like to be able to run Amaterasu with a local 
> configuration in a way that I don't need to setup a cluster.
> The reasoning behind this is that I want to experience quick turnaround of 
> the pipeline as a whole while developing different components of the pipeline.
> Currently the closest solution, is to use the amaterasu-vagrant repository to 
> setup a local mesos cluster on VBox, but it is still not good enough in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AMATERASU-34) Running Amaterasu during development without a cluster

2018-06-12 Thread Nadav Har Tzvi (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMATERASU-34?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nadav Har Tzvi updated AMATERASU-34:

Issue Type: Improvement  (was: New Feature)

> Running Amaterasu during development without a cluster
> --
>
> Key: AMATERASU-34
> URL: https://issues.apache.org/jira/browse/AMATERASU-34
> Project: AMATERASU
>  Issue Type: Improvement
>        Reporter: Nadav Har Tzvi
>Priority: Major
>
> As a pipeline developer, I'd like to be able to run Amaterasu with a local 
> configuration in a way that I don't need to setup a cluster.
> The reasoning behind this is that I want to experience quick turnaround of 
> the pipeline as a whole while developing different components of the pipeline.
> Currently the closest solution, is to use the amaterasu-vagrant repository to 
> setup a local mesos cluster on VBox, but it is still not good enough in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-34) Running Amaterasu during development without a cluster

2018-06-12 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-34:
---

 Summary: Running Amaterasu during development without a cluster
 Key: AMATERASU-34
 URL: https://issues.apache.org/jira/browse/AMATERASU-34
 Project: AMATERASU
  Issue Type: Task
Reporter: Nadav Har Tzvi


As a pipeline developer, I'd like to be able to run Amaterasu with a local 
configuration in a way that I don't need to setup a cluster.

The reasoning behind this is that I want to experience quick turnaround of the 
pipeline as a whole while developing different components of the pipeline.

Currently the closest solution, is to use the amaterasu-vagrant repository to 
setup a local mesos cluster on VBox, but it is still not good enough in my 
opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-32) Investigate why Amaterasu requires minimum of 2G memory to run on DC/OS

2018-06-04 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-32:
---

 Summary: Investigate why Amaterasu requires minimum of 2G memory 
to run on DC/OS
 Key: AMATERASU-32
 URL: https://issues.apache.org/jira/browse/AMATERASU-32
 Project: AMATERASU
  Issue Type: Task
Affects Versions: 0.2.1-incubating
Reporter: Nadav Har Tzvi
 Fix For: 0.2.1-incubating


This is even weirder than the problem we have in EMR. In DC/OS we can't do 
anything without requesting 2G of memory from Mesos, and that's for the 
job-samples.

Why on standalone deployment of Mesos we need 1G of memory and on DC/OS it is 
2G?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-31) Investigate why PySpark actions on EMR require minimum of 2G memory

2018-06-04 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-31:
---

 Summary: Investigate why PySpark actions on EMR require minimum of 
2G memory
 Key: AMATERASU-31
 URL: https://issues.apache.org/jira/browse/AMATERASU-31
 Project: AMATERASU
  Issue Type: Task
Affects Versions: 0.2.0-incubating
Reporter: Nadav Har Tzvi
 Fix For: 0.2.1-incubating






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-30) Job level spark memory requirements aren't respected in mesos

2018-06-04 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-30:
---

 Summary: Job level spark memory requirements aren't respected in 
mesos
 Key: AMATERASU-30
 URL: https://issues.apache.org/jira/browse/AMATERASU-30
 Project: AMATERASU
  Issue Type: Bug
Affects Versions: 0.2.0-incubating
Reporter: Nadav Har Tzvi
 Fix For: 0.2.1-incubating


When trying to override the default spark memory requirements using job.yml, 
the entry is completely ignored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-29) PySpark breaks for jobs without extra configurations

2018-06-04 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-29:
---

 Summary: PySpark breaks for jobs without extra configurations
 Key: AMATERASU-29
 URL: https://issues.apache.org/jira/browse/AMATERASU-29
 Project: AMATERASU
  Issue Type: Bug
Affects Versions: 0.2.0-incubating
Reporter: Nadav Har Tzvi
Assignee: Nadav Har Tzvi
 Fix For: 0.2.1-incubating


PySpark execution breaks when the job.yml is missing the "configurationa" key. 
There is an ugly "if" statement in the code that is used for testing, probably 
could be solved in a more elegant way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Amaterasu release 0.2.0-incubating, release candidate #3

2018-05-29 Thread Nadav Har Tzvi
Yaniv and I just tested it. It worked flawlessly on my end (HDP docker on
AWS). Both Spark-Scala and PySpark.
It worked on Yaniv's HDP cluster as well.
Worth noting:
1. HDP 2.6.4
2. Cluster has total of 32GB memory available
3. Each container is allocated 1G memory.
4. Amaterasu.properties:

zk=sandbox-hdp.hortonworks.com
version=0.2.0-incubating-rc3
master=192.168.33.11
user=root
mode=yarn
webserver.port=8000
webserver.root=dist
spark.version=2.6.4.0-91
yarn.queue=default
yarn.jarspath=hdfs:///apps/amaterasu
spark.home=/usr/hdp/current/spark2-client
#spark.home=/opt/cloudera/parcels/SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658/lib/spark2
yarn.hadoop.home.dir=/etc/hadoop
spark.opts.spark.yarn.am.extraJavaOptions="-Dhdp.version=2.6.4.0-91"
spark.opts.spark.driver.extraJavaOptions="-Dhdp.version=2.6.4.0-91"


Arun, please share:
1. YARN memory configurations
2. amaterasu.properties content
3. HDP version.

Cheers,
Nadav


On 30 May 2018 at 07:11, Arun Manivannan  wrote:

> The pmem disabling is just temporary. I'll do a detailed analysis and get
> back with a proper solution.
>
> Any hints on this front is highly appreciated.
>
> Cheers
> Arun
>
> On Wed, May 30, 2018, 01:10 Nadav Har Tzvi  wrote:
>
> > Yaniv, Eyal, this might be related to the same issue you faced with HDP.
> > Can you confirm?
> >
> > On Tue, May 29, 2018, 17:58 Arun Manivannan  wrote:
> >
> > > +1 from me
> > >
> > > Unit Tests and Build ran fine.
> > >
> > > Tested on HDP (VM) but had trouble allocating containers (didn't have
> > that
> > > before).  Apparently Centos VMs are known to have this problem.
> Disabled
> > > physical memory check  (yarn.nodemanager.pmem-check-enabled) and ran
> jobs
> > > successfully.
> > >
> > >
> > >
> > >
> > >
> > > On Tue, May 29, 2018 at 10:42 PM Kirupa Devarajan <
> > kirupagara...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Unit tests passing and build was successful on the branch
> > > > "version-0.2.0-incubating-rc3"
> > > >
> > > > +1 from me
> > > >
> > > > Cheers,
> > > > Kirupa
> > > >
> > > >
> > > > On Tue, May 29, 2018 at 3:06 PM, guy peleg 
> wrote:
> > > >
> > > > > +1 looks good to me
> > > > >
> > > > > On Tue, May 29, 2018, 14:39 Nadav Har Tzvi  >
> > > > wrote:
> > > > >
> > > > > > +1 approve. Tested multiple times and after a long round of
> fixing
> > > and
> > > > > > testing over and over.
> > > > > >
> > > > > > Cheers,
> > > > > > Nadav
> > > > > >
> > > > > >
> > > > > > On 29 May 2018 at 07:38, Yaniv Rodenski  wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > We have fixed the legal issues, as well as a bug found by
> @Nadav
> > > > please
> > > > > > > review and vote on the release candidate #3 for the version
> > > > > > > 0.2.0-incubating, as follows
> > > > > > >
> > > > > > > [ ] +1, Approve the release
> > > > > > > [ ] -1, Do not approve the release (please provide specific
> > > comments)
> > > > > > >
> > > > > > > The complete staging area is available for your review, which
> > > > includes:
> > > > > > >
> > > > > > > * JIRA release notes [1],
> > > > > > > * the official Apache source release to be deployed to
> > > > dist.apache.org
> > > > > > > [2],
> > > > > > > which is signed with the key with fingerprint [3],
> > > > > > > * source code tag "version-0.2.0-incubating-rc3" [4],
> > > > > > > * Java artifacts were built with Gradle 3.1 and OpenJDK/Oracle
> > JDK
> > > > > > > 1.8.0_151
> > > > > > >
> > > > > > > The vote will be open for at least 72 hours. It is adopted by
> > > > majority
> > > > > > > approval, with at least 3 PMC affirmative votes.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yaniv
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > > > > > > projectId=12321521&version=12342793
> > > > > > > [2] https://dist.apache.org/repos/
> dist/dev/incubator/amaterasu/
> > > > > 0.2.0rc3/
> > > > > > > [3]
> > > https://dist.apache.org/repos/dist/dev/incubator/amaterasu/KEYS
> > > > > > > [4] https://github.com/apache/incubator-amaterasu/tags
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Amaterasu release 0.2.0-incubating, release candidate #3

2018-05-29 Thread Nadav Har Tzvi
Yaniv, Eyal, this might be related to the same issue you faced with HDP.
Can you confirm?

On Tue, May 29, 2018, 17:58 Arun Manivannan  wrote:

> +1 from me
>
> Unit Tests and Build ran fine.
>
> Tested on HDP (VM) but had trouble allocating containers (didn't have that
> before).  Apparently Centos VMs are known to have this problem. Disabled
> physical memory check  (yarn.nodemanager.pmem-check-enabled) and ran jobs
> successfully.
>
>
>
>
>
> On Tue, May 29, 2018 at 10:42 PM Kirupa Devarajan  >
> wrote:
>
> > Unit tests passing and build was successful on the branch
> > "version-0.2.0-incubating-rc3"
> >
> > +1 from me
> >
> > Cheers,
> > Kirupa
> >
> >
> > On Tue, May 29, 2018 at 3:06 PM, guy peleg  wrote:
> >
> > > +1 looks good to me
> > >
> > > On Tue, May 29, 2018, 14:39 Nadav Har Tzvi 
> > wrote:
> > >
> > > > +1 approve. Tested multiple times and after a long round of fixing
> and
> > > > testing over and over.
> > > >
> > > > Cheers,
> > > > Nadav
> > > >
> > > >
> > > > On 29 May 2018 at 07:38, Yaniv Rodenski  wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > We have fixed the legal issues, as well as a bug found by @Nadav
> > please
> > > > > review and vote on the release candidate #3 for the version
> > > > > 0.2.0-incubating, as follows
> > > > >
> > > > > [ ] +1, Approve the release
> > > > > [ ] -1, Do not approve the release (please provide specific
> comments)
> > > > >
> > > > > The complete staging area is available for your review, which
> > includes:
> > > > >
> > > > > * JIRA release notes [1],
> > > > > * the official Apache source release to be deployed to
> > dist.apache.org
> > > > > [2],
> > > > > which is signed with the key with fingerprint [3],
> > > > > * source code tag "version-0.2.0-incubating-rc3" [4],
> > > > > * Java artifacts were built with Gradle 3.1 and OpenJDK/Oracle JDK
> > > > > 1.8.0_151
> > > > >
> > > > > The vote will be open for at least 72 hours. It is adopted by
> > majority
> > > > > approval, with at least 3 PMC affirmative votes.
> > > > >
> > > > > Thanks,
> > > > > Yaniv
> > > > >
> > > > > [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > > > > projectId=12321521&version=12342793
> > > > > [2] https://dist.apache.org/repos/dist/dev/incubator/amaterasu/
> > > 0.2.0rc3/
> > > > > [3]
> https://dist.apache.org/repos/dist/dev/incubator/amaterasu/KEYS
> > > > > [4] https://github.com/apache/incubator-amaterasu/tags
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Amaterasu release 0.2.0-incubating, release candidate #3

2018-05-28 Thread Nadav Har Tzvi
+1 approve. Tested multiple times and after a long round of fixing and
testing over and over.

Cheers,
Nadav


On 29 May 2018 at 07:38, Yaniv Rodenski  wrote:

> Hi everyone,
>
> We have fixed the legal issues, as well as a bug found by @Nadav please
> review and vote on the release candidate #3 for the version
> 0.2.0-incubating, as follows
>
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
>
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which is signed with the key with fingerprint [3],
> * source code tag "version-0.2.0-incubating-rc3" [4],
> * Java artifacts were built with Gradle 3.1 and OpenJDK/Oracle JDK
> 1.8.0_151
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Yaniv
>
> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12321521&version=12342793
> [2] https://dist.apache.org/repos/dist/dev/incubator/amaterasu/0.2.0rc3/
> [3] https://dist.apache.org/repos/dist/dev/incubator/amaterasu/KEYS
> [4] https://github.com/apache/incubator-amaterasu/tags
>


Re: Speeding up Anaconda deployment on executors

2018-05-27 Thread Nadav Har Tzvi
I am kinda inclined towards a solution that could be performed at setup
time. Let's try to explore in that direction. If we manage to nail this at
setup time, it means that the runtime will be lightning fast (compared to
what it is now :))

Cheers,
Nadav


On 28 May 2018 at 00:17, Nadav Har Tzvi  wrote:

> Hey everyone,
>
> So we have this issue, Anaconda takes forever to deploy on the executors,
> whether it is YARN or Mesos.
>
> Let's first discuss why is it like this right now.
>
> First, let's see for each platform, how Apache Amaterasu interacts with
> the underlying platform, in regard to what smallest independent unit that
> is awarded its own isolated execution environment.
>
> *Apache Mesos:*
> In Apache Mesos, we get our own nifty set of instances and executors. An
> instance obviously can host multiple executors. depending on its capacity.
> Thus the smallest independent unit here is the executor itself.
>
> *Apache Hadoop YARN*:
> On YARN, we have a similar set of resources, we have nodes, each node is a
> host to containers.
>
> Great, so far it sounds similar, right? Here is where Apache Amaterasu
> takes things a bit differently for each platform.
>
> In Apache Mesos, everything is run on the same executor, regardless of how
> many actions the job has. So if the job has 20 actions, they will run
> sequentially on the same executor, resulting in the smallest independent
> unit being the job itself, as only the job deserves its own running
> environment.
>
> On Hadoop, things are different, a lot.
> To start, each action is treated by YARN as a different application, with
> its own set of containers. This means that on YARN, action is the smallest
> independent unit.
>
> So what's the problem actually? So the problem in general is that we
> cannot rely on the existence of 3rd party utilities, libraries, you name
> it, on the target execution environment. This forces us to bundle anything
> we need along with the job execution process.
> Anaconda is exactly such 3rd party utility that we desperately need in
> order to run PySpark code that has dependencies on more than PySpark itself
> and pure Python. (Pandas, numpy, sklearn, there are more than enough
> examples out there)
> We need to install Anaconda once for each execution environment. In Apache
> Mesos our smallest reliable execution environment is the executor itself,
> thus we need to install Anaconda once per job.
> In YARN, our smallest execution environment is the container, hence we
> need to install Anaconda over and over for each action.
> This obviously poses a problem because of numerous reasons:
> 1. While we can make an excuse in the first action that it is setup time,
> it is obvious that for the second action we are wasting time, a lot. To
> compare Mesos and YARN, starting the second action on Mesos is a matter of
> seconds. In YARN it is measured in minutes.
> 2. We do the same thing over and over again, even if we run on the same
> machine. This makes no sense whatsoever! We are losing the ability to cache
> things. So for example, if I need numpy and that takes about 20-30 seconds
> to download and install, why do I need to install it from scratch over and
> over again?
> 3. It causes code reliability issues. If Miniconda isn't there and I need
> to roll a PySpark job, I now have to setup guards and fallbacks and what
> not? Even worse, I have to find weird tricks to even get access to the
> Miniconda environment, and that is different on Mesos and YARN, so now I
> have a jungle in the code!
> 4. On YARN, PySpark runs on yet a different container! Guess what?! This
> container has no access to miniconda! We currently use --py-files to send a
> list of gazzilion packages. This is different in Mesos, where PySpark
> itself runs in the same executor as the main Amaterasu process.
> So guess what? I now have a jungle in my PySpark invocation code too!
>
> Also take a note that the current implementation for Python 3rd party
> dependencies resolution is Anaconda, this gives us an isolated environment
> that doesn't rely on the existing Python (cause maybe, for some reason, you
> have Python 2.5 on your cluster, which is not supported by new versions of
> data libraries such as pandas, numpy and so forth), in addition it gives us
> the nifty Conda package manager.
> However, it doesn't mean that it has to stay that way. If the need or
> reason arises, we may need to also support pip and support using the native
> Python version (instead of the one supplied by Anaconda).
>
> I want to discuss the possible solutions to this. Please feel free to
> bring up your ideas.
>
> Cheers,
> Nadav
>
>


Speeding up Anaconda deployment on executors

2018-05-27 Thread Nadav Har Tzvi
Hey everyone,

So we have this issue, Anaconda takes forever to deploy on the executors,
whether it is YARN or Mesos.

Let's first discuss why is it like this right now.

First, let's see for each platform, how Apache Amaterasu interacts with the
underlying platform, in regard to what smallest independent unit that is
awarded its own isolated execution environment.

*Apache Mesos:*
In Apache Mesos, we get our own nifty set of instances and executors. An
instance obviously can host multiple executors. depending on its capacity.
Thus the smallest independent unit here is the executor itself.

*Apache Hadoop YARN*:
On YARN, we have a similar set of resources, we have nodes, each node is a
host to containers.

Great, so far it sounds similar, right? Here is where Apache Amaterasu
takes things a bit differently for each platform.

In Apache Mesos, everything is run on the same executor, regardless of how
many actions the job has. So if the job has 20 actions, they will run
sequentially on the same executor, resulting in the smallest independent
unit being the job itself, as only the job deserves its own running
environment.

On Hadoop, things are different, a lot.
To start, each action is treated by YARN as a different application, with
its own set of containers. This means that on YARN, action is the smallest
independent unit.

So what's the problem actually? So the problem in general is that we cannot
rely on the existence of 3rd party utilities, libraries, you name it, on
the target execution environment. This forces us to bundle anything we need
along with the job execution process.
Anaconda is exactly such 3rd party utility that we desperately need in
order to run PySpark code that has dependencies on more than PySpark itself
and pure Python. (Pandas, numpy, sklearn, there are more than enough
examples out there)
We need to install Anaconda once for each execution environment. In Apache
Mesos our smallest reliable execution environment is the executor itself,
thus we need to install Anaconda once per job.
In YARN, our smallest execution environment is the container, hence we need
to install Anaconda over and over for each action.
This obviously poses a problem because of numerous reasons:
1. While we can make an excuse in the first action that it is setup time,
it is obvious that for the second action we are wasting time, a lot. To
compare Mesos and YARN, starting the second action on Mesos is a matter of
seconds. In YARN it is measured in minutes.
2. We do the same thing over and over again, even if we run on the same
machine. This makes no sense whatsoever! We are losing the ability to cache
things. So for example, if I need numpy and that takes about 20-30 seconds
to download and install, why do I need to install it from scratch over and
over again?
3. It causes code reliability issues. If Miniconda isn't there and I need
to roll a PySpark job, I now have to setup guards and fallbacks and what
not? Even worse, I have to find weird tricks to even get access to the
Miniconda environment, and that is different on Mesos and YARN, so now I
have a jungle in the code!
4. On YARN, PySpark runs on yet a different container! Guess what?! This
container has no access to miniconda! We currently use --py-files to send a
list of gazzilion packages. This is different in Mesos, where PySpark
itself runs in the same executor as the main Amaterasu process.
So guess what? I now have a jungle in my PySpark invocation code too!

Also take a note that the current implementation for Python 3rd party
dependencies resolution is Anaconda, this gives us an isolated environment
that doesn't rely on the existing Python (cause maybe, for some reason, you
have Python 2.5 on your cluster, which is not supported by new versions of
data libraries such as pandas, numpy and so forth), in addition it gives us
the nifty Conda package manager.
However, it doesn't mean that it has to stay that way. If the need or
reason arises, we may need to also support pip and support using the native
Python version (instead of the one supplied by Anaconda).

I want to discuss the possible solutions to this. Please feel free to bring
up your ideas.

Cheers,
Nadav


Re: AMATERASU-24

2018-05-26 Thread Nadav Har Tzvi
I agree with Yaniv that Frameworks should be plugins.
Think about it like this, in the future, hopefully, you will be able to do
something like "sudo yum install amaterasu"
After install the "core" amaterasu using yum, you will be able to use the
new CLI like this: "ama frameworks add " to add a
framework.
Alternatively we could do something like "sudo yum install amaterasu-spark"
I mean, this is what I think anyhow.

As I write this, I've just realized that we should open a thread to discuss
packaging options that we'd like to see implemented.

On 26 May 2018 at 22:53, Yaniv Rodenski  wrote:

> Hi Arun,
>
> You are correct Spark is the first framework, and in my mind,
> frameworks should be treated as plugins. Also, we need to consider that not
> all frameworks will run under the JVM.
> Last, each framework has two modules, a runner (used by both the executor
> and the leader) and runtime, to be used by the actions themselves
> I would suggest the following structure to start with:
> frameworks
>   |-> spark
>   |-> runner
>   |-> runtime
>
> As for the shell scripts, I will leave that for @Nadav, but please have a
> look at PR #17 containing the CLI that will replace the scripts as of
> 0.2.1-incubating.
>
> Cheers,
> Yaniv
>
> On Sat, May 26, 2018 at 5:16 PM, Arun Manivannan  wrote:
>
> > Gentlemen,
> >
> > I am looking into Amaterasu-24 and would like to run the intended changes
> > by you before I make them.
> >
> > Refactor Spark out of Amaterasu executor to it's own project
> >  > issues/AMATERASU-24?filter=allopenissues>
> >
> > I understand Spark is just the first of many frameworks that has been
> lined
> > up for support by Amaterasu.
> >
> > These are the intended changes :
> >
> > 1. Create a new module called "runners" and have the Spark runners under
> > executor pulled into this project
> > (org.apache.executor.execution.actions.runners.spark). We could call it
> > "frameworks" if "runners" is not a great name for this.
> > 2. Will also pull away the Spark dependencies from the Executor to the
> > respective sub-sub-projects (at the moment, just Spark).
> > 3. Since the result of the framework modules would be different bundles,
> > the pattern that I am considering to name the bundle is -
> "runner-spark".
> >  So, it would be "runners:runner-spark" in gradle.
> > 4. On the shell scripts (miniconda and load-spark-env") and the "-cp"
> > passed as commands for the ActionsExecutorLauncher, I could pull them as
> a
> > separate properties of Spark (inside the runner), so that the Application
> > master can use it.
> >
> > Is it okay if I rename the Miniconda install file to miniconda-install
> > using the "wget -O".  The reason why this change is proposed is to avoid
> > hardcoding the conda version inside the code and possibly pull it away
> into
> > amaterasu.properties file. (The changes are in the ama-start shell
> scripts
> > and a couple of places inside the code).
> >
> > Please let me know if this would work.
> >
> > Cheers,
> > Arun
> >
>
>
>
> --
> Yaniv Rodenski
>
> +61 477 778 405
> ya...@shinto.io
>


[jira] [Resolved] (AMATERASU-4) Run an Amaterasu pipeline

2018-05-25 Thread Nadav Har Tzvi (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMATERASU-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nadav Har Tzvi resolved AMATERASU-4.

   Resolution: Implemented
Fix Version/s: 0.2.1-incubating

> Run an Amaterasu pipeline
> -
>
> Key: AMATERASU-4
> URL: https://issues.apache.org/jira/browse/AMATERASU-4
> Project: AMATERASU
>  Issue Type: Sub-task
>        Reporter: Nadav Har Tzvi
>        Assignee: Nadav Har Tzvi
>Priority: Major
> Fix For: 0.2.1-incubating
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The user will invoke "ama run"
> "ama run" will take in the following parameters (based on ama-start.sh):
> -r, --repo = 
> -b, --branch = , the default is "master"
> -e, --env = , this should correspond to a path under  /env 
> directory, e.g. /env/default, /env/test, etc. The default value is "default"
> -n, --name = 
> -i, --job-id = TBD
> -r, --report = 
> Invocation will start Amaterasu on demand.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMATERASU-3) Scaffolding based on maki.yml

2018-05-25 Thread Nadav Har Tzvi (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMATERASU-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nadav Har Tzvi resolved AMATERASU-3.

   Resolution: Implemented
Fix Version/s: 0.2.1-incubating

> Scaffolding based on maki.yml
> -
>
> Key: AMATERASU-3
> URL: https://issues.apache.org/jira/browse/AMATERASU-3
> Project: AMATERASU
>  Issue Type: Sub-task
>        Reporter: Nadav Har Tzvi
>        Assignee: Nadav Har Tzvi
>Priority: Major
>  Labels: CLI
> Fix For: 0.2.1-incubating
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When the user invokes "ama init -m "
> An amaterasu repository will be created/updated with the src files as 
> specified in the maki.
> e.g.:
> Given a maki.yml file:
> job-name: amaterasu-test # Replace this with your job's name
> flow:
> - name: start # Name of this step
>   runner:
>   group: spark # Currently supporting spark only, but expect more 
> here in the future!
>   type: scala # scala, sql, r, python
>   file: file.scala # Source code for the step
>   exports:
>   odd: parquet
> - name: step2
>   runner:
>   group: spark
>   type: scala
>   file: file2.scala
> Then an Amaterasu job repository will be created with src/file.scala and 
> src/file2.scala



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMATERASU-2) Initialization of an Amaterasu job repository via the CLI

2018-05-25 Thread Nadav Har Tzvi (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMATERASU-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nadav Har Tzvi resolved AMATERASU-2.

   Resolution: Implemented
Fix Version/s: 0.2.1-incubating

> Initialization of an Amaterasu job repository via the CLI
> -
>
> Key: AMATERASU-2
> URL: https://issues.apache.org/jira/browse/AMATERASU-2
> Project: AMATERASU
>  Issue Type: Sub-task
>        Reporter: Nadav Har Tzvi
>        Assignee: Nadav Har Tzvi
>Priority: Major
>  Labels: CLI
> Fix For: 0.2.1-incubating
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The user will invoke "ama init" to create an Amaterasu job repository.
> There are 2 invocation modes:
> # "ama init" creates a repository at CWD
> # "ama init " creates a repository at 
> The created repository will consist of:
> # A git repository
> # maki file
> # env + env/default directories, where env/default contains a spark.yml and 
> job.yml
> # empty src directory (maybe we would like to include one scala example and 
> one python example)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AMATERASU-27) ama CLI doesn't take into account amaterasu.properties changes

2018-05-25 Thread Nadav Har Tzvi (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMATERASU-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nadav Har Tzvi updated AMATERASU-27:

Summary: ama CLI doesn't take into account amaterasu.properties changes  
(was: ama CLI doesn't take into account amaterasu.properties changes in YARN)

> ama CLI doesn't take into account amaterasu.properties changes
> --
>
> Key: AMATERASU-27
> URL: https://issues.apache.org/jira/browse/AMATERASU-27
> Project: AMATERASU
>  Issue Type: Bug
>Affects Versions: 0.2.1-incubating
> Environment: any hadoop cluster
>Reporter: Nadav Har Tzvi
>Assignee: Nadav Har Tzvi
>Priority: Major
>  Labels: cli, yarn
> Fix For: 0.2.1-incubating
>
>
> To reproduce:
>  # On a hadoop cluster
>  # Setup Amaterasu
>  # Run a job
>  # Run ama setup again and change something
>  # Run a job. The changed setting will not be taken into account.
> How to fix:
> We need an indication that amaterasu.properties has changed, it can be any 
> mechanism
> (boolean flag, keep record of last 2 file hashes, etc)
> When we execute {{ama run}} then the CLI should check whether or not there is 
> a new version of amaterasu.properties. If there is a new version, upload it 
> to HDFS.
>  
> Existing workarounds:
> executing {{ama run}} with {{--force-bin}} will completely remove the 
> existing Amaterasu HDFS assets and will upload everything again. While it is 
> not amazing and consumes tons of time (has to upload the Spark client again), 
> it works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-27) ama CLI doesn't take into account amaterasu.properties changes in YARN

2018-05-25 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-27:
---

 Summary: ama CLI doesn't take into account amaterasu.properties 
changes in YARN
 Key: AMATERASU-27
 URL: https://issues.apache.org/jira/browse/AMATERASU-27
 Project: AMATERASU
  Issue Type: Bug
Affects Versions: 0.2.1-incubating
 Environment: any hadoop cluster
Reporter: Nadav Har Tzvi
Assignee: Nadav Har Tzvi
 Fix For: 0.2.1-incubating


To reproduce:
 # On a hadoop cluster
 # Setup Amaterasu
 # Run a job
 # Run ama setup again and change something
 # Run a job. The changed setting will not be taken into account.

How to fix:

We need an indication that amaterasu.properties has changed, it can be any 
mechanism

(boolean flag, keep record of last 2 file hashes, etc)

When we execute {{ama run}} then the CLI should check whether or not there is a 
new version of amaterasu.properties. If there is a new version, upload it to 
HDFS.

 

Existing workarounds:

executing {{ama run}} with {{--force-bin}} will completely remove the existing 
Amaterasu HDFS assets and will upload everything again. While it is not amazing 
and consumes tons of time (has to upload the Spark client again), it works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-25) Create documentation with ReadTheDocs

2018-05-16 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-25:
---

 Summary: Create documentation with ReadTheDocs
 Key: AMATERASU-25
 URL: https://issues.apache.org/jira/browse/AMATERASU-25
 Project: AMATERASU
  Issue Type: Task
Affects Versions: 0.2.1-incubating
Reporter: Nadav Har Tzvi
Assignee: Nadav Har Tzvi
 Fix For: 0.2.1-incubating


We need to start filling in documentation for Apache Amaterasu.

We will use readthedocs for this purpose.

We need to set up a /docs directory with rst files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Amaterasu release 0.2.0-incubating, release candidate #2

2018-04-25 Thread Nadav Har Tzvi
+1 looks good.

On Thu, Apr 26, 2018, 01:50 guy peleg  wrote:

> +1 everything indicates that the previous issues were resolved , in
> addition we need short feedback loop from users.
>
> On Thu, Apr 26, 2018, 08:48 Kirupa Devarajan 
> wrote:
>
> > +1 - Test and Build successful for release candidate #2
> >
> > On Wed, 25 Apr 2018, 13:32 Yaniv Rodenski,  wrote:
> >
> > > Hi everyone,
> > >
> > > After fixing issues with the build found in RC1 please review and vote
> on
> > > the release candidate #2 for the version 0.2.0-incubating, as follows:
> > >
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > The complete staging area is available for your review, which includes:
> > >
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > > [2],
> > > which is signed with the key with fingerprint [3],
> > > * source code tag "version-0.2.0-incubating-rc2" [4],
> > > * Java artifacts were built with Gradle 3.1 and OpenJDK/Oracle JDK
> > > 1.8.0_151
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > >
> > > Yaniv Rodenski
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12321521&version=12342793
> > >
> > > [2]
> https://dist.apache.org/repos/dist/dev/incubator/amaterasu/0.2.0rc2/
> > > [3] https://dist.apache.org/repos/dist/dev/incubator/amaterasu/KEYS
> > > [4] https://github.com/apache/incubator-amaterasu/tags
> > >
> > > --
> > > Yaniv Rodenski
> > >
> > > +61 477 778 405
> > > ya...@shinto.io
> > >
> >
>


Re: [VOTE] Amaterasu release 0.2.0-incubating, release candidate #1

2018-04-16 Thread Nadav Har Tzvi
Shad,
I confirm that when there is a zookeeper server running on the host
machine, the tests will fail. I manually stopped the ZK server and ran the
tests, this time the tests ran successfully.

Yaniv,
I am building it from the GH 0.2.0 release branch.

On 17 Apr 2018, at 8:20, Yaniv Rodenski  wrote:

Hi Shad,

Thanks for that, I’ve actually missed this is during tests. Thanks for that.
Nadav can you confirm? Also are you building the code from dist.apache.org?

I’m facing another issue regarding a missing dependency (scopt) which I
suspect wasn’t package by shadowJar.

I’ll keep investigating maybe it’s an environment issue as well.

Cheers,
Yaniv

On Tue, 17 Apr 2018 at 2:44 pm, Shad Amez  wrote:

Hi Nadav/Team,

FYI. I have faced these test failures :   [Thread-3] ERROR
org.apache.curator.test.TestingZooKeeperServer - .
java.net.BindException:
Address already in use

This is caused when there is another instance of Zookeeper running on the
port 2181. If this zookeeper instance is shutdown, then the tests run
sucessfully

Regards,
Shad

On Tue, Apr 17, 2018 at 9:58 AM, Yaniv Rodenski  wrote:

Hi Nadav,

OK, I did have a closer look this morning on a clean environment and it
seems that there is something wrong with the build coming out of Travis.
I suggest we stop the vote for now and Guy and myself will investigate.

Thanks, everyone
Yaniv

On Tue, Apr 17, 2018 at 4:27 AM, Nadav Har Tzvi 
wrote:

-1

There are a few issues:

1. Travis doesn't invoke gradlew test, thus the tests don't run at all

in

the CI environment.
2. When I deployed Amaterasu on both the mesos vagrant box and the HDP

box,

the action tests failed.
Here are the stack traces:

[Thread-3] ERROR org.apache.curator.test.TestingZooKeeperServer - From
testing server (random state: false) for instance:
InstanceSpec{dataDirectory=/tmp/1523902177319-0, port=2181, election
Port=38827, quorumPort=40951, deleteDataDirectoryOnClose=true,

serverId=3,

tickTime=-1, maxClientCnxns=-1}

org.apache.curator.test.InstanceSpec@885


java.net.BindException: Address already in use


   at sun.nio.ch.Net.bind0(Native Method)


   at sun.nio.ch.Net.bind(Net.java:433)


   at sun.nio.ch.Net.bind(Net.java:425)


   at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:

223)



   at
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)


   at
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)


   at
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(
NIOServerCnxnFactory.java:95)


   at
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(
ZooKeeperServerMain.java:111)


   at
org.apache.curator.test.TestingZooKeeperMain.runFromConfig(
TestingZooKeeperMain.java:73)


   at
org.apache.curator.test.TestingZooKeeperServer$1.run(
TestingZooKeeperServer.java:148)


   at java.lang.Thread.run(Thread.java:748)

org.apache.amaterasu.common.execution.ActionTests *** ABORTED *** (3
milliseconds)
 java.lang.RuntimeException: Unable to load a Suite class that was
discovered in the runpath: org.apache.amaterasu.common.
execution.ActionTests
 at
org.scalatest.tools.DiscoverySuite$.getSuiteInstance(
DiscoverySuite.scala:81)
 at
org.scalatest.tools.DiscoverySuite$$anonfun$1.
apply(DiscoverySuite.scala:38)
 at
org.scalatest.tools.DiscoverySuite$$anonfun$1.
apply(DiscoverySuite.scala:37)
 at
scala.collection.TraversableLike$$anonfun$map$
1.apply(TraversableLike.scala:234)
 at
scala.collection.TraversableLike$$anonfun$map$
1.apply(TraversableLike.scala:234)
 at scala.collection.Iterator$class.foreach(Iterator.scala:891)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
 at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at scala.collection.TraversableLike$class.map(

TraversableLike.scala:234)

 ...
 Cause: java.net.BindException: Address already in use
 at sun.nio.ch.Net.bind0(Native Method)
 at sun.nio.ch.Net.bind(Net.java:433)
 at sun.nio.ch.Net.bind(Net.java:425)
 at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:

223)

 at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
 at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
 at
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(
NIOServerCnxnFactory.java:95)
 at
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(
ZooKeeperServerMain.java:111)
 at
org.apache.curator.test.TestingZooKeeperMain.runFromConfig(
TestingZooKeeperMain.java:73)

 at
org.apache.curator.test.TestingZooKeeperServer$1.run(
TestingZooKeeperServer.java:148)

3. I received the above error also when testing in the local

development

environment.

Do other committers manage to reproduce this? Eyal? Kirupa?




On 16 Apr 2018, at 16:46, Yaniv Rodenski  wrote:

Hi everyone,

Please review and vote on the release candidate #1 for the version
0.2.0-incubating, 

Re: [VOTE] Amaterasu release 0.2.0-incubating, release candidate #1

2018-04-16 Thread Nadav Har Tzvi
-1

There are a few issues:

1. Travis doesn't invoke gradlew test, thus the tests don't run at all in
the CI environment.
2. When I deployed Amaterasu on both the mesos vagrant box and the HDP box,
the action tests failed.
Here are the stack traces:

[Thread-3] ERROR org.apache.curator.test.TestingZooKeeperServer - From
testing server (random state: false) for instance:
InstanceSpec{dataDirectory=/tmp/1523902177319-0, port=2181, election
Port=38827, quorumPort=40951, deleteDataDirectoryOnClose=true, serverId=3,
tickTime=-1, maxClientCnxns=-1} org.apache.curator.test.InstanceSpec@885

java.net.BindException: Address already in use


at sun.nio.ch.Net.bind0(Native Method)


at sun.nio.ch.Net.bind(Net.java:433)


at sun.nio.ch.Net.bind(Net.java:425)


at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)


at
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)


at
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)


at
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)


at
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:111)


at
org.apache.curator.test.TestingZooKeeperMain.runFromConfig(TestingZooKeeperMain.java:73)


at
org.apache.curator.test.TestingZooKeeperServer$1.run(TestingZooKeeperServer.java:148)


at java.lang.Thread.run(Thread.java:748)

org.apache.amaterasu.common.execution.ActionTests *** ABORTED *** (3
milliseconds)
  java.lang.RuntimeException: Unable to load a Suite class that was
discovered in the runpath: org.apache.amaterasu.common.execution.ActionTests
  at
org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:81)
  at
org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:38)
  at
org.scalatest.tools.DiscoverySuite$$anonfun$1.apply(DiscoverySuite.scala:37)
  at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.Iterator$class.foreach(Iterator.scala:891)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
  at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  ...
  Cause: java.net.BindException: Address already in use
  at sun.nio.ch.Net.bind0(Native Method)
  at sun.nio.ch.Net.bind(Net.java:433)
  at sun.nio.ch.Net.bind(Net.java:425)
  at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
  at
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
  at
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:111)
  at
org.apache.curator.test.TestingZooKeeperMain.runFromConfig(TestingZooKeeperMain.java:73)

  at
org.apache.curator.test.TestingZooKeeperServer$1.run(TestingZooKeeperServer.java:148)

3. I received the above error also when testing in the local development
environment.

Do other committers manage to reproduce this? Eyal? Kirupa?




On 16 Apr 2018, at 16:46, Yaniv Rodenski  wrote:

Hi everyone,

Please review and vote on the release candidate #1 for the version
0.2.0-incubating, as follows:

[ ] +1, Approve the release

[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:

* JIRA release notes [1],

* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint [3],

* source code tag "version-0.2.0-incubating-rc1" [4],

* Java artifacts were built with Gradle 3.1 and OpenJDK/Oracle JDK 1.8.0_151


The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.


Thanks,
Yaniv Rodenski

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?
projectId=12321521&version=12342793

[2] https://dist.apache.org/repos/dist/dev/incubator/amaterasu/0.2.0rc1/
[3] https://dist.apache.org/repos/dist/dev/incubator/amaterasu/KEYS
[4] https://github.com/apache/incubator-amaterasu/tags




-- 
=========
Thanks,
Nadav Har Tzvi


Re: Project status

2018-04-02 Thread Nadav Har Tzvi
cket filed.
> >> > >
> >> > > Emailing one mentor directly (or any other community member)
> >isn't a
> >> way
> >> > to
> >> > > build the community. Things need to be discussed in public
> >whenever
> >> > > possible.
> >> > >
> >> > > Given the above, blaming a mentor (whomever you may be referring
> >to)
> >> > > doesn't make sense.
> >> > >
> >> > > * We are ready to release version 0.2.0-incubating, the reason it
> >took
> >> > us a
> >> > > > month to initiate the process is the above automated build,
> >which I
> >> > > > suggested in prior discussion and had no rejections. We will
> >complete
> >> > > this
> >> > > > once build is enabled.
> >> > > >
> >> > >
> >> > > The release itself is a great milestone, but not the purpose to
> >itself.
> >> > >
> >> > >
> >> > > > * as for community growth, we are working with two
> >organizations on
> >> > > running
> >> > > > POCs (which will hopefully grow the user base) one of them is
> >due to
> >> > > start
> >> > > > very soon. I don't want to name them (first of all it's too
> >early,
> >> and
> >> > > also
> >> > > > it is for them to decide if they want to share) but a
> >representative
> >> > from
> >> > > > at least one of those organisations is on the list and is
> >welcomed to
> >> > > share
> >> > > > :)
> >> > > >
> >> > >
> >> > > Great!
> >> > >
> >> > >
> >> > > > * This year I've seen contributions from 4 contributors (not
> >much
> >> more
> >> > > than
> >> > > > 3, I know) but one of them is new (Guy Peleg) and AFAIK
> >additional
> >> > > > longer-term work is done by one more contributor on his local
> >fork
> >> > (Nadav
> >> > > > Har-Tzvi)
> >> > > >
> >> > >
> >> > > I think this is the crux of the problem. Why is longer-term work
> >going
> >> on
> >> > > in a local fork?
> >> > >
> >> > >
> >> > > > * We should be presenting more, and growing the community more
> >which
> >> is
> >> > > > hard to do starting out as a tiny community. Any advice given
> >there
> >> > would
> >> > > > be appreciated.
> >> > > >
> >> > >
> >> > > The first thing has to be do the basics well: on-list
> >communication,
> >> open
> >> > > discussions, no side channels, etc.
> >> > >
> >> > --
> >> > Yaniv Rodenski
> >> >
> >> > +61 477 778 405
> >> > ya...@shinto.io
> >> >
> >>
>



-- 
=
Nadav Har Tzvi
Committer@Amaterasu


Re: [DISCUSS] podling report

2018-04-02 Thread Nadav Har Tzvi
Hey,

I don't have confirmation for the SDP yet. It might be deferred to the next
SDP (December 2018 instead of June 2018).

On 2 April 2018 at 13:45, Yaniv Rodenski  wrote:

> Hi All,
>
> We have two days for submitting this report, and I hope we will have some
> additions but for now, I suggest the following report:
>
> "
> Apache Amaterasu is a framework providing configuration management and
> deployment for Big Data Pipelines.
>
> It provides the following capabilities:
>
> Continuous integration tools to package pipelines and run tests.
> A repository to store those packaged applications: the applications
> repository.
> A repository to store the pipelines, and engine configuration (for
> instance, the location of the Spark master, etc.): per environment - the
> configuration repository.
> A dashboard to monitor the pipelines.
> A DSL and integration hooks allowing third parties to easily integrate.
>
> Amaterasu has been incubating since 2017-09.
>
> Three most important issues to address in the move towards graduation:
>
>   1. Prepare the first release
>   2. Grow up user and contributor communities
>   3. Prepare documentation
>
> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> aware of?
>
>   None
>
> How has the community developed since the last report?
>
>   * Two conference talks are scheduled in the near future, Voxxed Days
> Melbourne (Yaniv Rodenski) and SDP in Tel-Aviv (Nadav Har-Tzvi)
>
> How has the project developed since the last report?
>
>   * since the last report 12 PRs were opened and 11 are merged
>   * One additional contributor started contributing to the code base
>   * One organization we are aware of have started a POC with Amaterasu
>   * We are in the process of automating the release process as
> preparation for the first release in the incubator.
>
> Date of last release:
>
>   N/A
>
> When were the last committers or PMC members elected?
>
>   N/A
> "
>
> If there are no objections I will update the wiki.
>
> Cheers,
> --
> Yaniv Rodenski
>
> +61 477 778 405
> ya...@shinto.io
>



-- 
=
Nadav Har Tzvi
Committer@Amaterasu


[jira] [Created] (AMATERASU-5) Allocated resources aren't cleaned up properly on crash/unexpected halt

2017-11-13 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-5:
--

 Summary: Allocated resources aren't cleaned up properly on 
crash/unexpected halt
 Key: AMATERASU-5
 URL: https://issues.apache.org/jira/browse/AMATERASU-5
 Project: AMATERASU
  Issue Type: Bug
 Environment: Centos 7 in Parallels
2 CPUs allocated
8 GB memory
Reporter: Nadav Har Tzvi
Priority: Critical
 Attachments: Screen Shot 2017-11-13 at 20.44.34.png, Screen Shot 
2017-11-13 at 20.45.24.png

Alright, it goes like this:
Given you have a slave with N cpus and M memory.
Given that each job requires 1 cpu and X memory.
When you run a job using ama-start
When you hit ctrl-c in the middle.
Then the next time you start executing Amaterasu, you will have n-1 cpus.
And the next time you start executing Amaterasu, you will have M-X memory.

The missing resources are back only after a reboot of the machine. Pretty darn 
problematic, as it will kill slaves in no time.

I attached images displaying some execution trace logs, I am using a VM with 2 
CPUs and 8 GB memory. You will see that the number of cpus dropping from 2 to 1 
in the first image and then from 1 to 0 (to not mentioned actually) in the 
second image. Available memory behaves in a similar way.

I accidentally discovered it while developing ama-cli where I screwed up 
execution quite a bit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Moving forward

2017-10-28 Thread Nadav Har Tzvi
@yaniv, Indeed. I hope that by the end of next week I will have a PR for
ama-cli ready. Sorry for the delay, my daily job got in the way.
I will wrap up the PR on next weekend and open.

On 28 October 2017 at 12:48, Jean-Baptiste Onofré  wrote:

> Thanks. I will take a look later today.
>
> Regards
> JB
>
> On Oct 28, 2017, 11:41, at 11:41, Yaniv Rodenski  wrote:
> >Hi All
> >
> >OK, I've pushed master to the apache/incubator-amaterasu repo.
> >@nadav and @kirupa I believe you two have PRs awaiting.
> >
> >On Fri, Oct 27, 2017 at 1:25 AM, Nadav Har Tzvi
> >
> >wrote:
> >
> >> Sure, no objections here. I will open PR for ama-cli after you are
> >done.
> >> Actually, after I am done :)
> >>
> >> > On 26 Oct 2017, at 2:48, Yaniv Rodenski  wrote:
> >> >
> >> > OK so GitHub doesn't allow forking empty branches, so I think I'll
> >just
> >> > push the shintoio master to the amaterasu-incubator master as
> >Olivier
> >> > suggested if that's ok with everyone, we have a couple of PRs
> >coming and
> >> I
> >> > think we should try to get them to the ASF repo.
> >> >
> >> > any objections?
> >> >
> >> > On Thu, Oct 26, 2017 at 4:50 AM, Davor Bonaci 
> >wrote:
> >> >
> >> >> Many options -- you can simply compare two repositories, and issue
> >a
> >> pull
> >> >> request in the GitHub UI.
> >> >>
> >> >> Alternatively, you can create a fork, clone it locally, set up
> >multiple
> >> >> remotes (to both old and new repositories), rebase and push.
> >> >>
> >> >> On Wed, Oct 25, 2017 at 1:17 AM, Yaniv Rodenski 
> >> wrote:
> >> >>
> >> >>> Hi Davor,
> >> >>>
> >> >>> looks like something worth our while.
> >> >>> One question though, how do we issue a PR from a non-forked repo?
> >> >>>
> >> >>> Cheers,
> >> >>> Yaniv
> >> >>>
> >> >>> On Tue, Oct 24, 2017 at 6:47 PM, Davor Bonaci 
> >> wrote:
> >> >>>
> >> >>>> (You are also welcome to retain any source history that the
> >original
> >> >>>> repository has -- feel free to push a pull request containing
> >all
> >> >>> commits.
> >> >>>> See Beam's pull request #1 as an example.)
> >> >>>>
> >> >>>> On Sun, Oct 22, 2017 at 5:29 PM, Yaniv Rodenski
> >
> >> >> wrote:
> >> >>>>
> >> >>>>> Thanks Olivier!
> >> >>>>> I prefer doing things myself, I’ll push it today.
> >> >>>>>
> >> >>>>> On Mon, 23 Oct 2017 at 11:25 am, Olivier Lamy
> >
> >> >>> wrote:
> >> >>>>>
> >> >>>>>> Hi,
> >> >>>>>>
> >> >>>>>> On 23 October 2017 at 11:20, Yaniv Rodenski 
> >> >> wrote:
> >> >>>>>>
> >> >>>>>>> Hi All,
> >> >>>>>>>
> >> >>>>>>> The Podling Report reminder is a really great motivation for
> >> >> moving
> >> >>>>>> forward
> >> >>>>>>> with Amaterasu :)
> >> >>>>>>> I think that on the bootstrapping end we still need to move
> >the
> >> >>> repo
> >> >>>> to
> >> >>>>>> the
> >> >>>>>>> ASF one.
> >> >>>>>>>
> >> >>>>>>> Mentors, do I just do this manually (push local clone) or is
> >> >> there
> >> >>> a
> >> >>>>> way
> >> >>>>>> to
> >> >>>>>>> pull it directly from github?
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> You can do it yourself (i.e push the branches you want to save
> >to
> >> >> ASF
> >> >>>>>> repo).
> >> >>>>>> Or create an INFRA ticket if you want to pull it directly from
> >> >>> github.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>> Also we are very close to version 0.2.0-incubating.
> >> >>>>>>> not a lot of open tasks, but I think we should also get our
> >JIRA
> >> >> in
> >> >>>>>> order,
> >> >>>>>>> I can port everything we have on our old trello, but maybe a
> >> >>> hangouts
> >> >>>>>> sync
> >> >>>>>>> to go over the open tasks is a good way to push things
> >forward?
> >> >>>>>>>
> >> >>>>>>> Cheers,
> >> >>>>>>> --
> >> >>>>>>> Yaniv Rodenski
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>> Olivier Lamy
> >> >>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
> >> >>>>>>
> >> >>>>> --
> >> >>>>> Yaniv Rodenski
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Yaniv Rodenski
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Yaniv Rodenski
> >>
> >>
> >
> >
> >--
> >Yaniv Rodenski
>


Re: Moving forward

2017-10-26 Thread Nadav Har Tzvi
Sure, no objections here. I will open PR for ama-cli after you are done. 
Actually, after I am done :)

> On 26 Oct 2017, at 2:48, Yaniv Rodenski  wrote:
> 
> OK so GitHub doesn't allow forking empty branches, so I think I'll just
> push the shintoio master to the amaterasu-incubator master as Olivier
> suggested if that's ok with everyone, we have a couple of PRs coming and I
> think we should try to get them to the ASF repo.
> 
> any objections?
> 
> On Thu, Oct 26, 2017 at 4:50 AM, Davor Bonaci  wrote:
> 
>> Many options -- you can simply compare two repositories, and issue a pull
>> request in the GitHub UI.
>> 
>> Alternatively, you can create a fork, clone it locally, set up multiple
>> remotes (to both old and new repositories), rebase and push.
>> 
>> On Wed, Oct 25, 2017 at 1:17 AM, Yaniv Rodenski  wrote:
>> 
>>> Hi Davor,
>>> 
>>> looks like something worth our while.
>>> One question though, how do we issue a PR from a non-forked repo?
>>> 
>>> Cheers,
>>> Yaniv
>>> 
>>> On Tue, Oct 24, 2017 at 6:47 PM, Davor Bonaci  wrote:
>>> 
 (You are also welcome to retain any source history that the original
 repository has -- feel free to push a pull request containing all
>>> commits.
 See Beam's pull request #1 as an example.)
 
 On Sun, Oct 22, 2017 at 5:29 PM, Yaniv Rodenski 
>> wrote:
 
> Thanks Olivier!
> I prefer doing things myself, I’ll push it today.
> 
> On Mon, 23 Oct 2017 at 11:25 am, Olivier Lamy 
>>> wrote:
> 
>> Hi,
>> 
>> On 23 October 2017 at 11:20, Yaniv Rodenski 
>> wrote:
>> 
>>> Hi All,
>>> 
>>> The Podling Report reminder is a really great motivation for
>> moving
>> forward
>>> with Amaterasu :)
>>> I think that on the bootstrapping end we still need to move the
>>> repo
 to
>> the
>>> ASF one.
>>> 
>>> Mentors, do I just do this manually (push local clone) or is
>> there
>>> a
> way
>> to
>>> pull it directly from github?
>> 
>> 
>> You can do it yourself (i.e push the branches you want to save to
>> ASF
>> repo).
>> Or create an INFRA ticket if you want to pull it directly from
>>> github.
>> 
>> 
>>> 
>>> 
>> Also we are very close to version 0.2.0-incubating.
>>> not a lot of open tasks, but I think we should also get our JIRA
>> in
>> order,
>>> I can port everything we have on our old trello, but maybe a
>>> hangouts
>> sync
>>> to go over the open tasks is a good way to push things forward?
>>> 
>>> Cheers,
>>> --
>>> Yaniv Rodenski
>>> 
>> 
>> 
>> 
>> --
>> Olivier Lamy
>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>> 
> --
> Yaniv Rodenski
> 
 
>>> 
>>> 
>>> 
>>> --
>>> Yaniv Rodenski
>>> 
>> 
> 
> 
> 
> -- 
> Yaniv Rodenski



[jira] [Created] (AMATERASU-4) Run an Amaterasu pipeline

2017-10-09 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-4:
--

 Summary: Run an Amaterasu pipeline
 Key: AMATERASU-4
 URL: https://issues.apache.org/jira/browse/AMATERASU-4
 Project: AMATERASU
  Issue Type: Sub-task
Reporter: Nadav Har Tzvi


The user will invoke "ama run"
"ama run" will take in the following parameters (based on ama-start.sh):
-r, --repo = 
-b, --branch = , the default is "master"
-e, --env = , this should correspond to a path under  /env 
directory, e.g. /env/default, /env/test, etc. The default value is "default"
-n, --name = 
-i, --job-id = TBD
-r, --report = 

Invocation will start Amaterasu on demand.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AMATERASU-3) Scaffolding based on maki.yml

2017-10-08 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-3:
--

 Summary: Scaffolding based on maki.yml
 Key: AMATERASU-3
 URL: https://issues.apache.org/jira/browse/AMATERASU-3
 Project: AMATERASU
  Issue Type: Sub-task
Reporter: Nadav Har Tzvi


When the user invokes "ama init -m "
An amaterasu repository will be created/updated with the src files as specified 
in the maki.

e.g.:
Given a maki.yml file:
job-name: amaterasu-test # Replace this with your job's name
flow:
- name: start # Name of this step
  runner:
  group: spark # Currently supporting spark only, but expect more here 
in the future!
  type: scala # scala, sql, r, python
  file: file.scala # Source code for the step
  exports:
  odd: parquet
- name: step2
  runner:
  group: spark
  type: scala
  file: file2.scala

Then an Amaterasu job repository will be created with src/file.scala and 
src/file2.scala



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AMATERASU-2) Initialization of an Amaterasu job repository via the CLI

2017-10-08 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-2:
--

 Summary: Initialization of an Amaterasu job repository via the CLI
 Key: AMATERASU-2
 URL: https://issues.apache.org/jira/browse/AMATERASU-2
 Project: AMATERASU
  Issue Type: Sub-task
Reporter: Nadav Har Tzvi


The user will invoke "ama init" to create an Amaterasu job repository.

There are 2 invocation modes:
# "ama init" creates a repository at CWD
# "ama init " creates a repository at 

The created repository will consist of:
# A git repository
# maki file
# env + env/default directories, where env/default contains a spark.yml and 
job.yml
# empty src directory (maybe we would like to include one scala example and one 
python example)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AMATERASU-1) Amaterasu CLI

2017-10-08 Thread Nadav Har Tzvi (JIRA)
Nadav Har Tzvi created AMATERASU-1:
--

 Summary: Amaterasu CLI
 Key: AMATERASU-1
 URL: https://issues.apache.org/jira/browse/AMATERASU-1
 Project: AMATERASU
  Issue Type: New Feature
 Environment: Linux, Python 2.7/3.3+
Reporter: Nadav Har Tzvi


We need a CLI that offers the following functionality:
# Initialization of a new Amaterasu compliant git repository
# Scaffolding based on a maki.yml file
# Running an Amaterasu pipeline (need to bring ama-start.sh functionality into 
the CLI)

The CLI needs to be packaged so it is easily installed via one command (e.g. 
setup.py)
It is a given, but each piece of functionality must be backed by tests



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)