[jira] [Commented] (AMATERASU-52) Implement AmaContext.datastores

2019-06-01 Thread Arun Manivannan (JIRA)


[ 
https://issues.apache.org/jira/browse/AMATERASU-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853741#comment-16853741
 ] 

Arun Manivannan commented on AMATERASU-52:
--

[~yaniv] [~nadavha]  - This PR (53) contains the Dataset config manager that 
parses and exposes an API for accessing the datasets.  Integration with 
Amacontext is WIP. 

> Implement AmaContext.datastores
> ---
>
> Key: AMATERASU-52
> URL: https://issues.apache.org/jira/browse/AMATERASU-52
> Project: AMATERASU
>  Issue Type: Task
>Reporter: Yaniv Rodenski
>    Assignee: Arun Manivannan
>Priority: Major
> Fix For: 0.2.1-incubating
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> AmaContext.datastores should contain the data from datastores.yaml



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [jira] [Created] (AMATERASU-52) Implement AmaContext.datastores

2019-01-29 Thread Arun Manivannan
Makes sense, Nadav. I have been toying with the idea of having the
structure like this. I am trying to make it work on konf (argggh!!) though.
Do you think this sounds reasonable?


datasets:
  hive:
transactions:
  uri: /user/somepath
  format: parquet
  database: transations_daily
  table: transx

second_transactions:
  uri: /seconduser/somepath
  format: avro
  database: transations_monthly
  table: avro_table
  file:
users:
  uri: s3://filestore
  format: parquet
  mode: overwrite



Cheers,
Arun


On Tue, Jan 29, 2019 at 1:45 PM Nadav Har Tzvi 
wrote:

> Hey Arun,
>
> I kinda feel like the datastores yaml is somewhat obscure. I propose the
> following structure.
>
> Instead of
>
> datasets:
>   hive:
> - key: transactions
>   uri: /user/somepath
>   format: parquet
>   database: transations_daily
>   table: transx
>
> - key: second_transactions
>   uri: /seconduser/somepath
>   format: avro
>   database: transations_monthly
>   table: avro_table
>   file:
> - key: users
>   uri: s3://filestore
>   format: parquet
>   mode: overwrite
>
> I would have
>
> datasets:
>   - key: transactions
> uri: /user/somepath
> format: parquet
> database: transations_daily
> table: transx
> type: hive
>   - key: second_transactions
> uri: /seconduser/somepath
> format: avro
> database: transations_monthly
> table: avro_table
> type: hive
>   - key: users
> uri: s3://filestore
> format: parquet
> mode: overwrite
> type: file
>
> In my opinion it is more straightforward and uniform. I think it is also
> more straightforward code-wise.
> What do you think?
>
> Cheers,
> Nadav
>
>
>
> On Mon, 14 Jan 2019 at 00:57, Yaniv Rodenski  wrote:
>
> > Hi Arun,
> >
> > I've added my comments to the PR, but good call, I agree @Nadav Har Tzvi
> >  should at least review as you both need to
> > maintain compatible APIs.
> >
> > Cheers,
> > Yaniv
> >
> > On Sun, Jan 13, 2019 at 10:21 PM Arun Manivannan 
> wrote:
> >
> >> Hi Guy, Yaniv and Nadiv,
> >>
> >> This PR <https://github.com/apache/incubator-amaterasu/pull/39> just
> >> captures part of the issue - the datasets.yaml, ConfigManager and the
> >> testcases. The Integration with the AmaContext is yet to be done but I
> >> would like to get your thoughts on the implementation.
> >>
> >> Guy - Would it be okay if you could help throw some light on the syntax
> >> and
> >> the idiomatic part of Kotlin itself. Newbie here.
> >>
> >> Cheers,
> >> Arun
> >>
> >> On Fri, Oct 12, 2018 at 7:15 PM Yaniv Rodenski (JIRA) 
> >> wrote:
> >>
> >> > Yaniv Rodenski created AMATERASU-52:
> >> > ---
> >> >
> >> >  Summary: Implement AmaContext.datastores
> >> >  Key: AMATERASU-52
> >> >  URL:
> >> https://issues.apache.org/jira/browse/AMATERASU-52
> >> >  Project: AMATERASU
> >> >   Issue Type: Task
> >> > Reporter: Yaniv Rodenski
> >> > Assignee: Arun Manivannan
> >> >  Fix For: 0.2.1-incubating
> >> >
> >> >
> >> > AmaContext.datastores should contain the data from datastores.yaml
> >> >
> >> >
> >> >
> >> > --
> >> > This message was sent by Atlassian JIRA
> >> > (v7.6.3#76005)
> >> >
> >>
> >
> >
> > --
> > Yaniv Rodenski
> >
> > +61 477 778 405
> > ya...@shinto.io
> >
> >
>


Re: [jira] [Created] (AMATERASU-52) Implement AmaContext.datastores

2019-01-13 Thread Arun Manivannan
Hi Guy, Yaniv and Nadiv,

This PR <https://github.com/apache/incubator-amaterasu/pull/39> just
captures part of the issue - the datasets.yaml, ConfigManager and the
testcases. The Integration with the AmaContext is yet to be done but I
would like to get your thoughts on the implementation.

Guy - Would it be okay if you could help throw some light on the syntax and
the idiomatic part of Kotlin itself. Newbie here.

Cheers,
Arun

On Fri, Oct 12, 2018 at 7:15 PM Yaniv Rodenski (JIRA) 
wrote:

> Yaniv Rodenski created AMATERASU-52:
> ---
>
>  Summary: Implement AmaContext.datastores
>  Key: AMATERASU-52
>  URL: https://issues.apache.org/jira/browse/AMATERASU-52
>  Project: AMATERASU
>   Issue Type: Task
> Reporter: Yaniv Rodenski
>     Assignee: Arun Manivannan
>  Fix For: 0.2.1-incubating
>
>
> AmaContext.datastores should contain the data from datastores.yaml
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>


Unable to read config with konf

2019-01-12 Thread Arun Manivannan
All,

As part of Amaterasu 52, I am trying to come up with a format for datasets
configuration. Here's what I have

datasets:
  hive:
- key: transactions
  uri: /user/somepath
  format: parquet
  database: transations_daily
  table: transx

- key: second_transactions
  uri: /seconduser/somepath
  format: avro
  database: transations_monthly
  table: avro_table
  file:
- key: users
  uri: s3://filestore
  format: parquet
  mode: overwrite

For the konf spec, I have


object DataSetsSpec: ConfigSpec("datasets"){
val fileDataSets by optional>(default = emptyList())
val hive by optional>(default = emptyList())
}

object FileDS: ConfigSpec ("file"){
val key by required(description = "Name of the dataset")
val uri by required(description = "Target path of the file
with the protocol qualifier")
val format by optional("parquet", "File format", "Format
in which the file must be writte")
val mode by optional("append", "Save Mode", "Mode in which
the file would be written - (overwrite, append)")
}

object HiveDS: ConfigSpec ("hive"){
val key by required()
val uri by required()
val database by required()
val table by required()
val format by optional("parquet", "File format", "Format
in which the file must be writte")
val mode by optional("append", "Save Mode", "Mode in which
the file would be written - (overwrite, append)")
}



New to Kotlin and Konf, I have been breaking my head over this for a couple
of hours now and finally created the issue
https://github.com/uchuhimo/konf/issues/17

Please let me know if you have any hints around this.

Cheers,
Arun


AMATERASU-44

2018-10-10 Thread Arun Manivannan
Hi Yaniv and all,

I am trying to take a stab at AMATERASU-44
and would like to get
some background behind this issue - specifically around the current state
and what are the things that ought to be considered before making the
change.


Cheers,
Arun


[jira] [Resolved] (AMATERASU-28) Pull Miniconda version away from compiled code

2018-10-10 Thread Arun Manivannan (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMATERASU-28?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Manivannan resolved AMATERASU-28.
--
Resolution: Fixed
  Assignee: Yaniv Rodenski  (was: Arun Manivannan)

> Pull Miniconda version away from compiled code
> --
>
> Key: AMATERASU-28
> URL: https://issues.apache.org/jira/browse/AMATERASU-28
> Project: AMATERASU
>  Issue Type: Improvement
>Affects Versions: 0.2.1-incubating
>    Reporter: Arun Manivannan
>Assignee: Yaniv Rodenski
>Priority: Minor
> Fix For: 0.2.1-incubating
>
>
> Miniconda version is hard-coded in a couple of places in the code at the 
> moment.  Pulling this out to have the version info in the shell scripts alone 
> (ama-start-yarn and ama-start-mesos.sh).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMATERASU-28) Pull Miniconda version away from compiled code

2018-10-10 Thread Arun Manivannan (JIRA)


[ 
https://issues.apache.org/jira/browse/AMATERASU-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645834#comment-16645834
 ] 

Arun Manivannan commented on AMATERASU-28:
--

This issue is already fixed as part of PR : 
[https://github.com/apache/incubator-amaterasu/pull/30]

 

[~yaniv] [~nadavha]

 

> Pull Miniconda version away from compiled code
> --
>
> Key: AMATERASU-28
> URL: https://issues.apache.org/jira/browse/AMATERASU-28
> Project: AMATERASU
>  Issue Type: Improvement
>Affects Versions: 0.2.1-incubating
>    Reporter: Arun Manivannan
>    Assignee: Arun Manivannan
>Priority: Minor
> Fix For: 0.2.1-incubating
>
>
> Miniconda version is hard-coded in a couple of places in the code at the 
> moment.  Pulling this out to have the version info in the shell scripts alone 
> (ama-start-yarn and ama-start-mesos.sh).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] podling report

2018-07-09 Thread Arun Manivannan
+1

Looks great.


On Mon, Jul 9, 2018, 17:22 Nadav Har Tzvi  wrote:

> +1
>
> On Mon, Jul 9, 2018, 14:46 Jean-Baptiste Onofré  wrote:
>
> > +1
> >
> > Regards
> > JB
> >
> > On 09/07/2018 11:55, Eyal Ben-Ivri wrote:
> > > +1
> > >
> > >
> > > On 6. July 2018 at 18:50:37, Davor Bonaci (da...@apache.org) wrote:
> > >
> > > +1
> > >
> > > "have been released" --> "have been built and voted upon"
> > >
> > > On Fri, Jul 6, 2018 at 12:51 AM, Yaniv Rodenski 
> wrote:
> > >
> > >> Hi All,
> > >>
> > >> Sorry for doing this late again, but I propose the following report to
> > be
> > >> submitted:
> > >>
> > >> "
> > >> Apache Amaterasu is a framework providing configuration management and
> > >> deployment for Big Data Pipelines.
> > >>
> > >> It provides the following capabilities:
> > >>
> > >> Continuous integration tools to package pipelines and run tests.
> > >> A repository to store those packaged applications: the applications
> > >> repository.
> > >> A repository to store the pipelines, and engine configuration (for
> > >> instance, the location of the Spark master, etc.): per environment -
> the
> > >> configuration repository.
> > >> A dashboard to monitor the pipelines.
> > >> A DSL and integration hooks allowing third parties to easily
> integrate.
> > >>
> > >> Amaterasu has been incubating since 2017-09.
> > >>
> > >> Three most important issues to address in the move towards graduation:
> > >>
> > >> 1. Prepare the first release
> > >> 2. Grow up user and contributor communities
> > >> 3. Prepare documentation
> > >>
> > >> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> > >> aware of?
> > >>
> > >> None
> > >>
> > >> How has the community developed since the last report?
> > >>
> > >> * Two conference talks have been delivered (PyCon il and SDP)
> > >> * Initial documentation has been created, targeted for Amaterasu's
> next
> > >> release
> > >>
> > >> How has the project developed since the last report?
> > >>
> > >> * since the last report 4 release candidates have been released, at
> the
> > >> time of this report the last RC is being voted on in the
> > general@incubator
> > >> mailing list
> > >> * Two additional contributors started contributing to the code base
> > >> * One more organization we are aware of have started a POC with
> > Amaterasu
> > >>
> > >> Date of the last release:
> > >>
> > >> N/A
> > >>
> > >> When were the last committers or PMC members elected?
> > >>
> > >> N/A
> > >> "
> > >>
> > >> If there are no objections I will update the wiki.
> > >>
> > >> Cheers,
> > >> Yaniv
> > >>
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: [VOTE] Release Apache Amaterasu (incubating) 0.2.0 (rc4)

2018-06-24 Thread Arun Manivannan
Clean built. Testcases run fine. Ran it on HDP and HDP 2.6.5.0-292 with
minor change on the leader AM. Works perfect.

+1

Cheers,
Arun

On Sun, Jun 24, 2018 at 5:10 PM guy peleg  wrote:

> Downloaded the source code and the artifact, everything looks good on my
> side.
> +1
>
> Guy
>
> On Sun, Jun 24, 2018, 19:09 Kirupa Devarajan 
> wrote:
>
> > gradle build and assemble successful on the branch
> > version-0.2.0-incubating-rc4
> > <
> >
> https://github.com/apache/incubator-amaterasu/tree/version-0.2.0-incubating-rc4
> > >
> > .
> >
> > +1
> >
> > Regards,
> > Kirupa
> >
> >
> > On Sun, Jun 24, 2018 at 6:57 PM, Nadav Har Tzvi 
> > wrote:
> >
> > > Tested on standalone Mesos, EMR and HDP.
> > > Spark-Scala and PySpark both work on each of the environments.
> > > Thus, I vote +1
> > >
> > > Cheers,
> > > Nadav
> > >
> > >
> > >
> > > On Sat, 23 Jun 2018 at 15:35, Yaniv Rodenski  wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > After cancelling the vote in the general@ list we've fixed the
> > > following:
> > > > * Headers added to all non-code files where applicable as remarked by
> > > Davor
> > > > * Sources now match the released version + no keys are present
> > > > * The gradle-wrapper.jar was removed and instructions ware added to
> the
> > > > readme.md file on how to add it.
> > > > * Missing licenses found by Justin Mclean during the general@ vote
> > ware
> > > > added when applicable.
> > > >
> > > > So, hoping for the best, please review and vote on the release
> > candidate
> > > #4
> > > > for the version 0.2.0-incubating, as follows
> > > >
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > > The complete staging area is available for your review, which
> includes:
> > > >
> > > > * JIRA release notes [1],
> > > > * the official Apache source release to be deployed to
> dist.apache.org
> > > > [2],
> > > > which is signed with the key with fingerprint [3],
> > > > * source code tag "version-0.2.0-incubating-rc4" [4],
> > > > * Java artifacts were built with Gradle 3.1 and OpenJDK/Oracle JDK
> > > > 1.8.0_151
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > > approval, with at least 3 PMC affirmative votes.
> > > >
> > > > Thanks,
> > > > Yaniv
> > > >
> > > > [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?p
> > > > rojectId=12321521=12342793
> > > > [2]
> > https://dist.apache.org/repos/dist/dev/incubator/amaterasu/0.2.0rc4/
> > > > [3] https://dist.apache.org/repos/dist/dev/incubator/amaterasu/KEYS
> > > > [4] https://github.com/apache/incubator-amaterasu/tags
> > > >
> > > > Thanks everyone
> > > > Yaniv
> > > >
> > >
> >
>


Re: [VOTE] Amaterasu release 0.2.0-incubating, release candidate #3

2018-06-03 Thread Arun Manivannan
Ah. Missed the part where we could configure parameters at a job level.
Thanks a lot for clarifying, Yaniv.

Cheers,
Arun

On Sun, Jun 3, 2018 at 6:37 PM Yaniv Rodenski  wrote:

> Hi Arun,
>
> Fair point, but I think it can be configured via env/[env_name]/spark.yml
> am I wrong?
> Anyway, ideally I think we should try and have per job configurations in
> the environment rather than in the amaterasu.properties
>
> Cheers,
> Yaniv
>
> On Sun, 3 Jun 2018 at 1:32 am, Arun Manivannan  wrote:
>
> > Gentlemen,
> >
> > Apologies for coming back late.  The issue was just the minimum container
> > size that was configured in my cluster
> > (yarn.scheduler.maximum-allocation-mb).  It was set at 1 GB.
> >
> > I didn't specify any spark specific memory parameters during my run (the
> > memory defaults that the SparkSetupProvider was looking at) and to top it
> > the code was setting the Xmx at 1 GB causing the overallocation and
> > failure.
> >
> > I have one minor proposal.  If this is agreeable, I can raise a quick PR.
> >
> > Can we pull out the executor java options as a property in the
> > amaterasu.properties?
> >
> > amaterasu.executor.extra.java.opts = "-Xmx1G -Dscala.usejavacp=true
> > -Dhdp.version=2.6.5.0-292"
> >
> >
> > As a side effect, we must provide the flexibility to allow quotes around
> > the parameter but passing the quotes to the java command would fail.  I
> > have stripped off the extra quotes in a dirty way at the moment. Should
> we
> > consider proper command parsing (and possibly convert them to be bash
> > compatible strings)?
> >
> > s"java -cp
> > spark/jars/*:executor.jar:spark/conf/:${config.YARN.hadoopHomeDir}/conf/
> > " +
> >   s" ${config.amaterasuExecutorJavaOpts.replaceAll("\"","")} "+
> >
> >
> > Meanwhile, I'll also update by PR for Amaterasu-24 after pulling the
> latest
> > from the branch.
> >
> > Cheers,
> > Arun
> >
> > On Wed, May 30, 2018 at 1:25 PM Arun Manivannan  wrote:
> >
> > > Thanks a lot, Nadav. Will get home and spend some more time on this. I
> > was
> > > in a rush and did this poor workaround. My VM is just 8 GB.
> > >
> > > Cheers
> > > Arun
> > >
> > >
> > > On Wed, May 30, 2018, 12:27 Nadav Har Tzvi 
> > wrote:
> > >
> > >> Yaniv and I just tested it. It worked flawlessly on my end (HDP docker
> > on
> > >> AWS). Both Spark-Scala and PySpark.
> > >> It worked on Yaniv's HDP cluster as well.
> > >> Worth noting:
> > >> 1. HDP 2.6.4
> > >> 2. Cluster has total of 32GB memory available
> > >> 3. Each container is allocated 1G memory.
> > >> 4. Amaterasu.properties:
> > >>
> > >> zk=sandbox-hdp.hortonworks.com
> > >> version=0.2.0-incubating-rc3
> > >> master=192.168.33.11
> > >> user=root
> > >> mode=yarn
> > >> webserver.port=8000
> > >> webserver.root=dist
> > >> spark.version=2.6.4.0-91
> > >> yarn.queue=default
> > >> yarn.jarspath=hdfs:///apps/amaterasu
> > >> spark.home=/usr/hdp/current/spark2-client
> > >>
> > >>
> >
> #spark.home=/opt/cloudera/parcels/SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658/lib/spark2
> > >> yarn.hadoop.home.dir=/etc/hadoop
> > >> spark.opts.spark.yarn.am.extraJavaOptions="-Dhdp.version=2.6.4.0-91"
> > >> spark.opts.spark.driver.extraJavaOptions="-Dhdp.version=2.6.4.0-91"
> > >>
> > >>
> > >> Arun, please share:
> > >> 1. YARN memory configurations
> > >> 2. amaterasu.properties content
> > >> 3. HDP version.
> > >>
> > >> Cheers,
> > >> Nadav
> > >>
> > >>
> > >> On 30 May 2018 at 07:11, Arun Manivannan  wrote:
> > >>
> > >> > The pmem disabling is just temporary. I'll do a detailed analysis
> and
> > >> get
> > >> > back with a proper solution.
> > >> >
> > >> > Any hints on this front is highly appreciated.
> > >> >
> > >> > Cheers
> > >> > Arun
> > >> >
> > >> > On Wed, May 30, 2018, 01:10 Nadav Har Tzvi 
> > >> wrote:
> > >> >
> > >> > > Yaniv, Eyal, this might be related to the same issue you faced
&

Re: [VOTE] Amaterasu release 0.2.0-incubating, release candidate #3

2018-06-02 Thread Arun Manivannan
Gentlemen,

Apologies for coming back late.  The issue was just the minimum container
size that was configured in my cluster
(yarn.scheduler.maximum-allocation-mb).  It was set at 1 GB.

I didn't specify any spark specific memory parameters during my run (the
memory defaults that the SparkSetupProvider was looking at) and to top it
the code was setting the Xmx at 1 GB causing the overallocation and
failure.

I have one minor proposal.  If this is agreeable, I can raise a quick PR.

Can we pull out the executor java options as a property in the
amaterasu.properties?

amaterasu.executor.extra.java.opts = "-Xmx1G -Dscala.usejavacp=true
-Dhdp.version=2.6.5.0-292"


As a side effect, we must provide the flexibility to allow quotes around
the parameter but passing the quotes to the java command would fail.  I
have stripped off the extra quotes in a dirty way at the moment. Should we
consider proper command parsing (and possibly convert them to be bash
compatible strings)?

s"java -cp 
spark/jars/*:executor.jar:spark/conf/:${config.YARN.hadoopHomeDir}/conf/
" +
  s" ${config.amaterasuExecutorJavaOpts.replaceAll("\"","")} "+


Meanwhile, I'll also update by PR for Amaterasu-24 after pulling the latest
from the branch.

Cheers,
Arun

On Wed, May 30, 2018 at 1:25 PM Arun Manivannan  wrote:

> Thanks a lot, Nadav. Will get home and spend some more time on this. I was
> in a rush and did this poor workaround. My VM is just 8 GB.
>
> Cheers
> Arun
>
>
> On Wed, May 30, 2018, 12:27 Nadav Har Tzvi  wrote:
>
>> Yaniv and I just tested it. It worked flawlessly on my end (HDP docker on
>> AWS). Both Spark-Scala and PySpark.
>> It worked on Yaniv's HDP cluster as well.
>> Worth noting:
>> 1. HDP 2.6.4
>> 2. Cluster has total of 32GB memory available
>> 3. Each container is allocated 1G memory.
>> 4. Amaterasu.properties:
>>
>> zk=sandbox-hdp.hortonworks.com
>> version=0.2.0-incubating-rc3
>> master=192.168.33.11
>> user=root
>> mode=yarn
>> webserver.port=8000
>> webserver.root=dist
>> spark.version=2.6.4.0-91
>> yarn.queue=default
>> yarn.jarspath=hdfs:///apps/amaterasu
>> spark.home=/usr/hdp/current/spark2-client
>>
>> #spark.home=/opt/cloudera/parcels/SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658/lib/spark2
>> yarn.hadoop.home.dir=/etc/hadoop
>> spark.opts.spark.yarn.am.extraJavaOptions="-Dhdp.version=2.6.4.0-91"
>> spark.opts.spark.driver.extraJavaOptions="-Dhdp.version=2.6.4.0-91"
>>
>>
>> Arun, please share:
>> 1. YARN memory configurations
>> 2. amaterasu.properties content
>> 3. HDP version.
>>
>> Cheers,
>> Nadav
>>
>>
>> On 30 May 2018 at 07:11, Arun Manivannan  wrote:
>>
>> > The pmem disabling is just temporary. I'll do a detailed analysis and
>> get
>> > back with a proper solution.
>> >
>> > Any hints on this front is highly appreciated.
>> >
>> > Cheers
>> > Arun
>> >
>> > On Wed, May 30, 2018, 01:10 Nadav Har Tzvi 
>> wrote:
>> >
>> > > Yaniv, Eyal, this might be related to the same issue you faced with
>> HDP.
>> > > Can you confirm?
>> > >
>> > > On Tue, May 29, 2018, 17:58 Arun Manivannan  wrote:
>> > >
>> > > > +1 from me
>> > > >
>> > > > Unit Tests and Build ran fine.
>> > > >
>> > > > Tested on HDP (VM) but had trouble allocating containers (didn't
>> have
>> > > that
>> > > > before).  Apparently Centos VMs are known to have this problem.
>> > Disabled
>> > > > physical memory check  (yarn.nodemanager.pmem-check-enabled) and ran
>> > jobs
>> > > > successfully.
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Tue, May 29, 2018 at 10:42 PM Kirupa Devarajan <
>> > > kirupagara...@gmail.com
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > Unit tests passing and build was successful on the branch
>> > > > > "version-0.2.0-incubating-rc3"
>> > > > >
>> > > > > +1 from me
>> > > > >
>> > > > > Cheers,
>> > > > > Kirupa
>> > > > >
>> > > > >
>> > > > > On Tue, May 29, 2018 at 3:06 PM, guy peleg 
>> > wrote:
>> > > > >
>> > > > > > +1 looks good to me
>>

Re: [VOTE] Amaterasu release 0.2.0-incubating, release candidate #3

2018-05-29 Thread Arun Manivannan
+1 from me

Unit Tests and Build ran fine.

Tested on HDP (VM) but had trouble allocating containers (didn't have that
before).  Apparently Centos VMs are known to have this problem. Disabled
physical memory check  (yarn.nodemanager.pmem-check-enabled) and ran jobs
successfully.





On Tue, May 29, 2018 at 10:42 PM Kirupa Devarajan 
wrote:

> Unit tests passing and build was successful on the branch
> "version-0.2.0-incubating-rc3"
>
> +1 from me
>
> Cheers,
> Kirupa
>
>
> On Tue, May 29, 2018 at 3:06 PM, guy peleg  wrote:
>
> > +1 looks good to me
> >
> > On Tue, May 29, 2018, 14:39 Nadav Har Tzvi 
> wrote:
> >
> > > +1 approve. Tested multiple times and after a long round of fixing and
> > > testing over and over.
> > >
> > > Cheers,
> > > Nadav
> > >
> > >
> > > On 29 May 2018 at 07:38, Yaniv Rodenski  wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > We have fixed the legal issues, as well as a bug found by @Nadav
> please
> > > > review and vote on the release candidate #3 for the version
> > > > 0.2.0-incubating, as follows
> > > >
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > > The complete staging area is available for your review, which
> includes:
> > > >
> > > > * JIRA release notes [1],
> > > > * the official Apache source release to be deployed to
> dist.apache.org
> > > > [2],
> > > > which is signed with the key with fingerprint [3],
> > > > * source code tag "version-0.2.0-incubating-rc3" [4],
> > > > * Java artifacts were built with Gradle 3.1 and OpenJDK/Oracle JDK
> > > > 1.8.0_151
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > > approval, with at least 3 PMC affirmative votes.
> > > >
> > > > Thanks,
> > > > Yaniv
> > > >
> > > > [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > > > projectId=12321521=12342793
> > > > [2] https://dist.apache.org/repos/dist/dev/incubator/amaterasu/
> > 0.2.0rc3/
> > > > [3] https://dist.apache.org/repos/dist/dev/incubator/amaterasu/KEYS
> > > > [4] https://github.com/apache/incubator-amaterasu/tags
> > > >
> > >
> >
>


[jira] [Assigned] (AMATERASU-24) Refactor Spark out of Amaterasu executor to it's own project

2018-05-28 Thread Arun Manivannan (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMATERASU-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Manivannan reassigned AMATERASU-24:


Assignee: Arun Manivannan

> Refactor Spark out of Amaterasu executor to it's own project
> 
>
> Key: AMATERASU-24
> URL: https://issues.apache.org/jira/browse/AMATERASU-24
> Project: AMATERASU
>  Issue Type: Improvement
>Reporter: Yaniv Rodenski
>    Assignee: Arun Manivannan
>Priority: Major
> Fix For: 0.2.1-incubating
>
>
> The Spark framework is a part of the Amaterasu executor and leader, it needs 
> to be under it own project under a new frameworks folder



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: AMATERASU-24

2018-05-27 Thread Arun Manivannan
Hi Yaniv, Nadiv and all,

Not sure if PR is the best way to initiate a discussion on the code but I
just managed to raise one based on my forked remote branch.

https://github.com/apache/incubator-amaterasu/pull/22
https://github.com/arunma/incubator-amaterasu/tree/AMATERASU-24-FrameworkRefactor

I am working on my gradle skills but I've made all the sub-modules compile
and the testcases run.  I am pretty sure the proper runs wouldn't work
because, at the moment, only the executor is set in the classpath.

Let the comment games begin :-)

Regards,
Arun

On Sun, May 27, 2018 at 4:43 PM Arun Manivannan <a...@arunma.com> wrote:

> Hi Yaniv,
>
> Makes perfect sense. Non-JVM frameworks is something I hadn't considered.
> I haven't done something like this in the past and would require your
> guidance.  Would it be okay if we have a discussion over a meeting?
>
> Two modules - Yes. I have made changes in that direction but have kept
> both the runner and the runtime in the same module under different
> packages.  I'll make some final touches and get you the branch for review
> and discussion.  For now, these are the primary focus areas:
>
> 1. Executors still have the yarn and mesos dependency but not of Spark (I
> believe that needs some work as well considering the non-JVM frameworks)
> 2. Runners and Runtime modules has the spark dependency pulled into them
> (At the moment, both are unified under the same module but different
> packages)
>
> I still see Scala (libs/reflect/compiler) bundled in both the modules.
> This is a concern and needs some work on gradle.
>
> On the shell changes, I wasn't very sure whether I am on the right track.
> Thanks for clarifying.  I'll probably close the shell script PR after
> discussing with Nadav.
>
> Cheers,
> Arun
>
>
> On Sun, May 27, 2018 at 3:53 AM Yaniv Rodenski <ya...@shinto.io> wrote:
>
>> Hi Arun,
>>
>> You are correct Spark is the first framework, and in my mind,
>> frameworks should be treated as plugins. Also, we need to consider that
>> not
>> all frameworks will run under the JVM.
>> Last, each framework has two modules, a runner (used by both the executor
>> and the leader) and runtime, to be used by the actions themselves
>> I would suggest the following structure to start with:
>> frameworks
>>   |-> spark
>>   |-> runner
>>   |-> runtime
>>
>> As for the shell scripts, I will leave that for @Nadav, but please have a
>> look at PR #17 containing the CLI that will replace the scripts as of
>> 0.2.1-incubating.
>>
>> Cheers,
>> Yaniv
>>
>> On Sat, May 26, 2018 at 5:16 PM, Arun Manivannan <a...@arunma.com> wrote:
>>
>> > Gentlemen,
>> >
>> > I am looking into Amaterasu-24 and would like to run the intended
>> changes
>> > by you before I make them.
>> >
>> > Refactor Spark out of Amaterasu executor to it's own project
>> > <https://issues.apache.org/jira/projects/AMATERASU/
>> > issues/AMATERASU-24?filter=allopenissues>
>> >
>> > I understand Spark is just the first of many frameworks that has been
>> lined
>> > up for support by Amaterasu.
>> >
>> > These are the intended changes :
>> >
>> > 1. Create a new module called "runners" and have the Spark runners under
>> > executor pulled into this project
>> > (org.apache.executor.execution.actions.runners.spark). We could call it
>> > "frameworks" if "runners" is not a great name for this.
>> > 2. Will also pull away the Spark dependencies from the Executor to the
>> > respective sub-sub-projects (at the moment, just Spark).
>> > 3. Since the result of the framework modules would be different bundles,
>> > the pattern that I am considering to name the bundle is -
>> "runner-spark".
>> >  So, it would be "runners:runner-spark" in gradle.
>> > 4. On the shell scripts (miniconda and load-spark-env") and the "-cp"
>> > passed as commands for the ActionsExecutorLauncher, I could pull them
>> as a
>> > separate properties of Spark (inside the runner), so that the
>> Application
>> > master can use it.
>> >
>> > Is it okay if I rename the Miniconda install file to miniconda-install
>> > using the "wget -O".  The reason why this change is proposed is to avoid
>> > hardcoding the conda version inside the code and possibly pull it away
>> into
>> > amaterasu.properties file. (The changes are in the ama-start shell
>> scripts
>> > and a couple of places inside the code).
>> >
>> > Please let me know if this would work.
>> >
>> > Cheers,
>> > Arun
>> >
>>
>>
>>
>> --
>> Yaniv Rodenski
>>
>> +61 477 778 405 <+61%20477%20778%20405>
>> ya...@shinto.io
>>
>


Re: Amaterasu PyCon il Presentation

2018-05-27 Thread Arun Manivannan
Looks great !!  I suppose this will be accompanied by a Demo.


On Sun, May 27, 2018 at 5:49 PM Yaniv Rodenski  wrote:

> Hi All,
>
> @Nadav is giving an intro to Amaterasu @ PyCon Israel next week and we
> thought everyone here might like to have the deck
> <
> https://drive.google.com/file/d/1rGLBPw3hkpd0ZVQrJGZWqHf6gtBk3Ztv/view?usp=sharing
> >
> and may be present somewhere :)
>
> Cheers,
>
> --
> Yaniv Rodenski
>
> +61 477 778 405 <+61%20477%20778%20405>
> ya...@shinto.io
>


Re: AMATERASU-24

2018-05-27 Thread Arun Manivannan
Hi Nadav,

Absolutely. This sounds fantastic.  I am really keen to understand more on
this and would be happy to help where I can to move this forward.

Yes, It would be great if we could have a different discussion thread on
this (or preferably a meeting).  Let me quickly push a branch for your
review and progressively make the necessary changes on this.

Cheers,
Arun

On Sun, May 27, 2018 at 12:29 PM Nadav Har Tzvi <nadavhart...@gmail.com>
wrote:

> I agree with Yaniv that Frameworks should be plugins.
> Think about it like this, in the future, hopefully, you will be able to do
> something like "sudo yum install amaterasu"
> After install the "core" amaterasu using yum, you will be able to use the
> new CLI like this: "ama frameworks add " to add a
> framework.
> Alternatively we could do something like "sudo yum install amaterasu-spark"
> I mean, this is what I think anyhow.
>
> As I write this, I've just realized that we should open a thread to discuss
> packaging options that we'd like to see implemented.
>
> On 26 May 2018 at 22:53, Yaniv Rodenski <ya...@shinto.io> wrote:
>
> > Hi Arun,
> >
> > You are correct Spark is the first framework, and in my mind,
> > frameworks should be treated as plugins. Also, we need to consider that
> not
> > all frameworks will run under the JVM.
> > Last, each framework has two modules, a runner (used by both the executor
> > and the leader) and runtime, to be used by the actions themselves
> > I would suggest the following structure to start with:
> > frameworks
> >   |-> spark
> >   |-> runner
> >   |-> runtime
> >
> > As for the shell scripts, I will leave that for @Nadav, but please have a
> > look at PR #17 containing the CLI that will replace the scripts as of
> > 0.2.1-incubating.
> >
> > Cheers,
> > Yaniv
> >
> > On Sat, May 26, 2018 at 5:16 PM, Arun Manivannan <a...@arunma.com>
> wrote:
> >
> > > Gentlemen,
> > >
> > > I am looking into Amaterasu-24 and would like to run the intended
> changes
> > > by you before I make them.
> > >
> > > Refactor Spark out of Amaterasu executor to it's own project
> > > <https://issues.apache.org/jira/projects/AMATERASU/
> > > issues/AMATERASU-24?filter=allopenissues>
> > >
> > > I understand Spark is just the first of many frameworks that has been
> > lined
> > > up for support by Amaterasu.
> > >
> > > These are the intended changes :
> > >
> > > 1. Create a new module called "runners" and have the Spark runners
> under
> > > executor pulled into this project
> > > (org.apache.executor.execution.actions.runners.spark). We could call it
> > > "frameworks" if "runners" is not a great name for this.
> > > 2. Will also pull away the Spark dependencies from the Executor to the
> > > respective sub-sub-projects (at the moment, just Spark).
> > > 3. Since the result of the framework modules would be different
> bundles,
> > > the pattern that I am considering to name the bundle is -
> > "runner-spark".
> > >  So, it would be "runners:runner-spark" in gradle.
> > > 4. On the shell scripts (miniconda and load-spark-env") and the "-cp"
> > > passed as commands for the ActionsExecutorLauncher, I could pull them
> as
> > a
> > > separate properties of Spark (inside the runner), so that the
> Application
> > > master can use it.
> > >
> > > Is it okay if I rename the Miniconda install file to miniconda-install
> > > using the "wget -O".  The reason why this change is proposed is to
> avoid
> > > hardcoding the conda version inside the code and possibly pull it away
> > into
> > > amaterasu.properties file. (The changes are in the ama-start shell
> > scripts
> > > and a couple of places inside the code).
> > >
> > > Please let me know if this would work.
> > >
> > > Cheers,
> > > Arun
> > >
> >
> >
> >
> > --
> > Yaniv Rodenski
> >
> > +61 477 778 405 <+61%20477%20778%20405>
> > ya...@shinto.io
> >
>


[jira] [Updated] (AMATERASU-28) Pull Miniconda version away from compiled code

2018-05-26 Thread Arun Manivannan (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMATERASU-28?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Manivannan updated AMATERASU-28:
-
Summary: Pull Miniconda version away from compiled code  (was: Pull 
Miniconda version away from compilable code)

> Pull Miniconda version away from compiled code
> --
>
> Key: AMATERASU-28
> URL: https://issues.apache.org/jira/browse/AMATERASU-28
> Project: AMATERASU
>  Issue Type: Improvement
>Affects Versions: 0.2.1-incubating
>    Reporter: Arun Manivannan
>    Assignee: Arun Manivannan
>Priority: Minor
> Fix For: 0.2.1-incubating
>
>
> Miniconda version is hard-coded in a couple of places in the code at the 
> moment.  Pulling this out to have the version info in the shell scripts alone 
> (ama-start-yarn and ama-start-mesos.sh).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-28) Pull Miniconda version away from compilable code

2018-05-26 Thread Arun Manivannan (JIRA)
Arun Manivannan created AMATERASU-28:


 Summary: Pull Miniconda version away from compilable code
 Key: AMATERASU-28
 URL: https://issues.apache.org/jira/browse/AMATERASU-28
 Project: AMATERASU
  Issue Type: Improvement
Affects Versions: 0.2.1-incubating
Reporter: Arun Manivannan
Assignee: Arun Manivannan
 Fix For: 0.2.1-incubating


Miniconda version is hard-coded in a couple of places in the code at the 
moment.  Pulling this out to have the version info in the shell scripts alone 
(ama-start-yarn and ama-start-mesos.sh).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AMATERASU-26) Pipeline tasks (sub-Yarn jobs) runs as "yarn" user instead of inhering the user in which the amaterasu job was submitted

2018-05-16 Thread Arun Manivannan (JIRA)

 [ 
https://issues.apache.org/jira/browse/AMATERASU-26?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Manivannan updated AMATERASU-26:
-
Description: Referring to the screenshot, the original user with which the 
amaterasu job was submitted was username "amaterasu".  However, the sub jobs of 
the pipeline gets submitted with the default user "yarn".

> Pipeline tasks (sub-Yarn jobs) runs as "yarn" user instead of inhering the 
> user in which the amaterasu job was submitted
> 
>
> Key: AMATERASU-26
> URL: https://issues.apache.org/jira/browse/AMATERASU-26
> Project: AMATERASU
>      Issue Type: Improvement
>    Reporter: Arun Manivannan
>Assignee: Arun Manivannan
>Priority: Major
> Fix For: 0.2.1-incubating
>
> Attachments: TaskJobsRunAsYarnUser.png
>
>
> Referring to the screenshot, the original user with which the amaterasu job 
> was submitted was username "amaterasu".  However, the sub jobs of the 
> pipeline gets submitted with the default user "yarn".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMATERASU-26) Pipeline tasks (sub-Yarn jobs) runs as "yarn" user instead of inhering the user in which the amaterasu job was submitted

2018-05-16 Thread Arun Manivannan (JIRA)
Arun Manivannan created AMATERASU-26:


 Summary: Pipeline tasks (sub-Yarn jobs) runs as "yarn" user 
instead of inhering the user in which the amaterasu job was submitted
 Key: AMATERASU-26
 URL: https://issues.apache.org/jira/browse/AMATERASU-26
 Project: AMATERASU
  Issue Type: Improvement
Reporter: Arun Manivannan
Assignee: Arun Manivannan
 Fix For: 0.2.1-incubating
 Attachments: TaskJobsRunAsYarnUser.png





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


YARN deployment

2018-04-21 Thread Arun Manivannan
Hi Yaniv and Eyal,

Sorry about the hiatus.  Day job has been hectic the last couple of months.

I am really glad that we now have full blown YARN support. Thanks a lot !!

Is there a place where I could find a rough document around how to submit
jobs on YARN. If you could respond to this thread, I am more than happy to
contribute to the docs.

I would like to do a POC of sorts for one of my projects at work. A really
dumbed-down version of the application is at :

https://github.com/arunma/ama_datapopulator
https://github.com/arunma/ama_reconciler

The first Spark job populates the data in a bunch of Hive tables
The second Spark job runs pre-configured queries against these tables and
compares them against another data in another Hive table (reconciliation
table).


For now, we can safely assume that there's no data shared between these
dataframes.

Greatly appreciate your response on the YARN job submission.

Cheers,
Arun


ama-start and JobLauncher stuck (Solution found) + Running Spark jobs across several repo

2017-12-30 Thread Arun Manivannan
Hi,

Very Good morning and Wish you all a wonderful New Year ahead.

Sorry to bother you on New Year's eve but I really appreciate any hints.

*1. Setting up and running the basic setup (on MacOS High Sierra) (Solution
found): *

I remember having done this successfully before but it was strange this
time.

a. Cloned amaterasu and amaterasu-vagrant repos.
b. Built amaterasu using ./gradlew buildHomeDir test (and later tried with
buildDistribution)
c. Have a vagrant box up and running (have modified the location of the
sync folder to point to amaterasu's build/amaterasu directory)
d. Have installed mesos locally using brew and set the
MESOS_NATIVE_JAVA_LIBRARY to point to /usr/local/lib/libmesos.dylib
e. Did a

ama-start.sh --repo="https://github.com/shintoio/amaterasu-job-sample.git;
--branch="master" --env="test" --report="code"


I found from mesos that zookeeper wasn't running and from zookeeper logs
that java wasn't installed.  I manually installed java (sudo yum install
java-1.7.0-openjdk-devel) and then started zookeeper (service
zookeeper-server start). Mesos came up.

*Question 1 : Is it okay if I add the java installation as part of the
provision.sh or did I miss anything earlier in order to bump into this
issue?*

2. Upon submitting the job, I saw that the job didn't run fine.


*Client logs (./ama-start.sh)*

I1231 13:24:01.548068 193015808 sched.cpp:232] Version: 1.3.0
I1231 13:24:01.554518 226975744 sched.cpp:336] New master detected at
master@192.168.33.11:5050
I1231 13:24:01.554733 226975744 sched.cpp:352] No credentials provided.
Attempting to register without authentication
I1231 13:24:01.558101 226975744 sched.cpp:759] Framework registered with
70566146-bf07-4515-aa0c-ed7fd597efe3-0019
===> moving to err action null
2017-12-31 13:24:08.884:INFO:oejs.ServerConnector:Thread-22: Stopped
ServerConnector@424e1977{HTTP/1.1}{0.0.0.0:8000}
2017-12-31 13:24:08.886:INFO:oejsh.ContextHandler:Thread-22: Stopped
o.e.j.s.ServletContextHandler@7bedc48a
{/,file:/Users/arun/IdeaProjects/amaterasu/build/amaterasu/dist/,UNAVAILABLE}
2017-12-31 13:24:08.886:INFO:oejsh.ContextHandler:Thread-22: Stopped
o.e.j.s.h.ContextHandler@4802796d{/,null,UNAVAILABLE}
I1231 13:24:08.887260 229122048 sched.cpp:2021] Asked to stop the driver
I1231 13:24:08.887711 229122048 sched.cpp:1203] Stopping framework
70566146-bf07-4515-aa0c-ed7fd597efe3-0019

*(MESOS LOGS as ATTACHMENT MesosMaster-INFO.log)*

*Question 2 :  Hints please*

*2. Dev setup*
I would like to debug the program to understand how the flow of code within
Amaterasu. I have attempted the following and that gives the same result as
above. Created a main program that invokes the job launcher with the
following parameters (this class is in test directory primarily to bring in
the "provided" libraries. Not sure if that makes sense).

*Program arguments : *
-Djava.library.path=/usr/lib
*Environment variables : (not sure why System.setProperty doesn't picked
up)*
AMA_NODE = MacBook-Pro.local
MESOS_NATIVE_JAVA_LIBRARY = /usr/local/lib/libmesos.dylib

object JobLauncherDebug extends App {
  JobLauncher.main(Array(
"--home", "/Users/arun/IdeaProjects/amaterasu/build/amaterasu",
"--repo", "https://github.com/shintoio/amaterasu-job-sample.git;,
"--branch", "master",
"--env", "test",
"--report", "code"
  ))
}

*Question 3 : Would this be a good idea to go about debugging into the
code?*

*3. Pipelining two Spark jobs*

My ultimate goal would be to run the following use-case using Amaterasu.
This is very similar to what I do at work.

a. Create a bunch of Hive tables (and hdfs directories) using a Spark job
(that will be used as a deployment script - one time setup but no harm in
running it again since it has the "if not exists" clause) (
https://github.com/arunma/ama_schemapreparer)
b. Run another Spark job that populates the data (this job runs on regular
intervals throughout the day) (https://github.com/arunma/ama_datapopulator)
c. Run a different Spark job that reconciles the populated data.

I am yet to create the "job" project for this one which I intend to do once
I have the default testcase running.

*Question 4 :*
A couple of hurdles that I believe I would have is that Amaterasu, at the
moment,

a. Expects the Spark jobs to be in the same repository.
b. The file that instantiates the Spark session, context etc has to be
explicitly given as a ".scala" file (we then use the IMain interpreter to
inject the AmaContext?)

Now, with two repositories in play and only the binaries/repository name
given for the repo, would it be a good idea to achieve the AmaContext
insertion using a compiler plugin?  I am pretty sure this has been
discussed before and it would be great if you could share your views on
this. I can come up with a POC PR of sorts if I get some ideas.

Best Regards,
Arun


Re: Initial setup of Amaterasu - Unable to run

2017-09-30 Thread Arun Manivannan
Hi Yaniv,

That was spot on.  Yes. That was the issue and I am able to complete the
demo job successfully !!  I wonder where I should have looked into to have
figured out this issue myself. :-(

Probably, I should have listened carefully to your presentation but I have
a few really basic questions.

1. Is the general way to test a deployment is to run the buildHomeDir (for
dev) and run it on vagrant with mesos?  Is there a way to bypass mesos and
run my spark jobs on local mode and python jobs on my local machine?  We
discussed this earlier - I would like to use amaterasu for running some
quick integration tests and it would be easier to test it on my local
machine than to have a VM running.

2. Probably, I am not seeing this right.  On the amaterasu-jobs, I notice
that all the jobs are in the same repo.  As you know, most often our jobs
in the pipeline isn't in a single repo.  I also notice that the
SparkRunnerHelper interprets the file.scala/other file arguments that is
passed into it and binds the context variables.  However, once we are out
of the repository, all we would have is binaries.

 a. Is it a requirement at the moment to have all the Driver source
files in a single repo i.e. the job repo ?
 b. If that's the case, then how do I add external dependencies to
the component spark jobs?

3. I see that using AmaContext would enable handshake between jobs in the
pipeline.  I realise then, we must have the sdk/amaterasu library in the
classpath of the Spark job.  Which one would that be?

I am terribly sorry if these questions doesn't make much sense in the
context of the project.  I would just like to know if I had misunderstood
the purpose of the project.  I absolutely realise that the project is just
incubating and it's too much to ask for all the bells and whistles on day 1.

Best Regards,
Arun



On Sat, Sep 30, 2017 at 8:00 PM Yaniv Rodenski <ya...@shinto.io> wrote:

> Hi Arun,
>
> I think you are hitting a bug that we’ve fixed but was in a pending PR,
> I've just merged the PR.
> Try to do a git pull and run again, let us know if it solves the problem.
>
> Cheers,
> Yaniv
>
> On Sat, 30 Sep 2017 at 7:36 pm, Arun Manivannan <a...@arunma.com> wrote:
>
> > Hi,
> >
> > I am trying to make an initial run on Amaterasu with
> > https://github.com/arunma/amaterasu-v2-demo (just an unmodified fork of
> > https://github.com/shintoio/amaterasu-v2-demo).  Seems like the spark
> job
> > fails with error (as I see from the logs).   Not surprisingly, I am
> unable
> > so see a json on the /tmp/test1.
> >
> > I am not familiar with Mesos. Tried to check for clues on /var/log/mesos
> on
> > the vagrant box with no luck.
> >
> > I am just running a single node mesos on vagrant (
> > https://github.com/shintoio/amaterasu-vagrant).  Greatly appreciate if
> you
> > could help me with some hints.
> >
> > Earlier I ran a `./gradlew buildHomeDir` and modified the Vagrantfile to
> > point to my local build directory of Amaterasu.
> >
> > Cheers,
> > Arun
> >
> >
> >
> > [vagrant@node1 ama]$ ./ama-start.sh --repo="
> > https://github.com/arunma/amaterasu-v2-demo.git; --branch="master"
> > --env="test" --report="code"
> > serving amaterasu from /ama/lib on user supplied port
> > ./ama-start.sh: line 29: popd: directory stack empty
> >
> >
> >  /\
> >  /  \ /\
> > / /\ /  \
> >   _ _ / /  / /\ \
> >  /_\   _ __   __ _ | |_  ___  _ _  __(_( _(_(_ )_)
> > / _ \ | '  \ / _` ||  _|/ -_)| '_|/ _` |(_-<| || |
> >/_/ \_\|_|_|_|\__,_| \__|\___||_|  \__,_|/__/ \_,_|
> >
> > Continuously deployed data pipelines
> > Version 0.2.0-incubating
> >
> >
> > repo: https://github.com/arunma/amaterasu-v2-demo.git
> > java -cp ./bin/leader-0.2.0-incubating-all.jar
> > -Djava.library.path=/usr/lib
> > org.apache.amaterasu.leader.mesos.JobLauncher --home . --repo
> > https://github.com/arunma/amaterasu-v2-demo.git --branch master --env
> test
> > --report code
> > 2017-09-30 09:22:46.184:INFO::main: Logging initialized @688ms
> > 2017-09-30 09:22:46.262:INFO:oejs.Server:main: jetty-9.2.z-SNAPSHOT
> > 2017-09-30 09:22:46.300:INFO:oejsh.ContextHandler:main: Started
> > o.e.j.s.ServletContextHandler@385e9564{/,file:/ama/dist/,AVAILABLE}
> > 2017-09-30 09:22:46.317:INFO:oejs.ServerConnector:main: Started
> > ServerConnector@1dac5ef{HTTP/1.1}{0.0.0.0:8000}
> > 2017-09-30 09:22:46.317:INFO:oejs.Server:main: Started @822ms
> > SLF4J: Failed to load class "org.slf4j.impl.StaticLog