anyone attending Apache Big Data Miami?

2017-05-02 Thread Martin Serrano

Devs,

I'm going to Apache Big Data in a couple weeks.  Just wondering if 
anyone else on the list will be there.  Perhaps we can get together, do 
a BOF or something.


Cheers,
Martin Serrano



[jira] [Comment Edited] (TWILL-217) AppMaster launcher should include eventHandler dependencies and nothing else from application

2017-02-24 Thread Martin Serrano (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883217#comment-15883217
 ] 

Martin Serrano edited comment on TWILL-217 at 2/24/17 6:09 PM:
---

Yeah, I will do what you suggest.  I considered that but it seemed a little 
hacky so I was trying to think of other things.  

Regarding guava, I see your point, and there are things in it that are no 
longer needed now that java has them (functions, predicates, etc).


was (Author: mserrano):
Yeah, I can do what you suggest.  I considered that but it seemed a little 
hacky so I was trying to think of other things.  

Regarding guava, I see your point, and there are things in it that are no 
longer needed now that java has them (functions, predicates, etc).

> AppMaster launcher should include eventHandler dependencies and nothing else 
> from application
> -
>
> Key: TWILL-217
> URL: https://issues.apache.org/jira/browse/TWILL-217
> Project: Apache Twill
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 0.9.0
>    Reporter: Martin Serrano
>Assignee: Martin Serrano
>
> Currently the launcher for the appmaster includes the application.jar 
> libraries.  This is to support user code that adds an EventHandler.  The 
> application may have many dependencies and including them in the appmaster 
> classpath can lead to otherwise inaddressable incompatibilities.
> In my case, something in my application's large dependency graph was 
> interfering with the Kafka server operation.  I was not able to determine 
> what it was but tweaking the appmaster loader to not include my application 
> jars fixed the issue.
> Instead the bundler that creates the twill.jar should include the 
> EventHandler extension (if any) as an explicit dependency.  In this way, only 
> the jars needed to support the event handler will be on the twill classpath.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TWILL-217) AppMaster launcher should include eventHandler dependencies and nothing else from application

2017-02-24 Thread Martin Serrano (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883085#comment-15883085
 ] 

Martin Serrano commented on TWILL-217:
--

I know.  I'm just suggesting the use of ClassPath to generate the list of 
classes that are the source of the dependency walk.  Is upgrading guava 
acceptable?  if so, to what version?

> AppMaster launcher should include eventHandler dependencies and nothing else 
> from application
> -
>
> Key: TWILL-217
> URL: https://issues.apache.org/jira/browse/TWILL-217
> Project: Apache Twill
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 0.9.0
>    Reporter: Martin Serrano
>Assignee: Martin Serrano
>
> Currently the launcher for the appmaster includes the application.jar 
> libraries.  This is to support user code that adds an EventHandler.  The 
> application may have many dependencies and including them in the appmaster 
> classpath can lead to otherwise inaddressable incompatibilities.
> In my case, something in my application's large dependency graph was 
> interfering with the Kafka server operation.  I was not able to determine 
> what it was but tweaking the appmaster loader to not include my application 
> jars fixed the issue.
> Instead the bundler that creates the twill.jar should include the 
> EventHandler extension (if any) as an explicit dependency.  In this way, only 
> the jars needed to support the event handler will be on the twill classpath.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TWILL-217) AppMaster launcher should include eventHandler dependencies and nothing else from application

2017-02-24 Thread Martin Serrano (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882642#comment-15882642
 ] 

Martin Serrano commented on TWILL-217:
--

I was referring to the need to construct the jar that will be used to create 
the _Base Classloader_.  It should only reference the api and its dependencies 
right?


> AppMaster launcher should include eventHandler dependencies and nothing else 
> from application
> -
>
> Key: TWILL-217
> URL: https://issues.apache.org/jira/browse/TWILL-217
> Project: Apache Twill
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 0.9.0
>    Reporter: Martin Serrano
>Assignee: Martin Serrano
>
> Currently the launcher for the appmaster includes the application.jar 
> libraries.  This is to support user code that adds an EventHandler.  The 
> application may have many dependencies and including them in the appmaster 
> classpath can lead to otherwise inaddressable incompatibilities.
> In my case, something in my application's large dependency graph was 
> interfering with the Kafka server operation.  I was not able to determine 
> what it was but tweaking the appmaster loader to not include my application 
> jars fixed the issue.
> Instead the bundler that creates the twill.jar should include the 
> EventHandler extension (if any) as an explicit dependency.  In this way, only 
> the jars needed to support the event handler will be on the twill classpath.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TWILL-217) AppMaster launcher should include eventHandler dependencies and nothing else from application

2017-02-23 Thread Martin Serrano (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881207#comment-15881207
 ] 

Martin Serrano commented on TWILL-217:
--

Seems then that we will need some way to get the list of all 
org.apache.twill.api.* classes off of the classpath in order to do this 
dependency walk:

* We could do this ourselves with some custom code.
* Guava 14 introduced such a utility: 
[ClassPath|https://github.com/google/guava/wiki/ReflectionExplained#classpath]. 
 What is the history around using guava 13 versus later libraries?
* Could sort of cheat, and do something like what {{YarnTwillRunnerService}} 
does but just for the api package:
{code:language=java}
// Find all the classpaths for Twill classes. It is used for class 
filtering when building application jar
// in the YarnTwillPreparer
Dependencies.findClassDependencies(getClass().getClassLoader(), new 
ClassAcceptor() {
  @Override
  public boolean accept(String className, URL classUrl, URL classPathUrl) {
if (!className.startsWith("org.apache.twill.")) {
  return false;
}
twillClassPaths.add(classPathUrl);
return true;
  }
}, getClass().getName());
{code}

Thoughts?

> AppMaster launcher should include eventHandler dependencies and nothing else 
> from application
> -
>
> Key: TWILL-217
> URL: https://issues.apache.org/jira/browse/TWILL-217
> Project: Apache Twill
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 0.9.0
>Reporter: Martin Serrano
>Assignee: Martin Serrano
>
> Currently the launcher for the appmaster includes the application.jar 
> libraries.  This is to support user code that adds an EventHandler.  The 
> application may have many dependencies and including them in the appmaster 
> classpath can lead to otherwise inaddressable incompatibilities.
> In my case, something in my application's large dependency graph was 
> interfering with the Kafka server operation.  I was not able to determine 
> what it was but tweaking the appmaster loader to not include my application 
> jars fixed the issue.
> Instead the bundler that creates the twill.jar should include the 
> EventHandler extension (if any) as an explicit dependency.  In this way, only 
> the jars needed to support the event handler will be on the twill classpath.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (TWILL-217) AppMaster launcher should include eventHandler dependencies and nothing else from application

2017-02-23 Thread Martin Serrano (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881207#comment-15881207
 ] 

Martin Serrano edited comment on TWILL-217 at 2/23/17 8:31 PM:
---

Seems then that we will need some way to get the list of all 
org.apache.twill.api.* classes off of the classpath in order to do this 
dependency walk:

* We could do this ourselves with some custom code.
* Guava 14 introduced such a utility: 
[ClassPath|https://github.com/google/guava/wiki/ReflectionExplained#classpath]. 
 What is the history around using guava 13 versus later libraries?
* Could sort of cheat, and do something like what {{YarnTwillRunnerService}} 
does but restrict for the api package:
{code:language=java}
// Find all the classpaths for Twill classes. It is used for class 
filtering when building application jar
// in the YarnTwillPreparer
Dependencies.findClassDependencies(getClass().getClassLoader(), new 
ClassAcceptor() {
  @Override
  public boolean accept(String className, URL classUrl, URL classPathUrl) {
if (!className.startsWith("org.apache.twill.")) {
  return false;
}
twillClassPaths.add(classPathUrl);
return true;
  }
}, getClass().getName());
{code}

Thoughts?


was (Author: mserrano):
Seems then that we will need some way to get the list of all 
org.apache.twill.api.* classes off of the classpath in order to do this 
dependency walk:

* We could do this ourselves with some custom code.
* Guava 14 introduced such a utility: 
[ClassPath|https://github.com/google/guava/wiki/ReflectionExplained#classpath]. 
 What is the history around using guava 13 versus later libraries?
* Could sort of cheat, and do something like what {{YarnTwillRunnerService}} 
does but just for the api package:
{code:language=java}
// Find all the classpaths for Twill classes. It is used for class 
filtering when building application jar
// in the YarnTwillPreparer
Dependencies.findClassDependencies(getClass().getClassLoader(), new 
ClassAcceptor() {
  @Override
  public boolean accept(String className, URL classUrl, URL classPathUrl) {
if (!className.startsWith("org.apache.twill.")) {
  return false;
}
twillClassPaths.add(classPathUrl);
return true;
  }
}, getClass().getName());
{code}

Thoughts?

> AppMaster launcher should include eventHandler dependencies and nothing else 
> from application
> -
>
> Key: TWILL-217
> URL: https://issues.apache.org/jira/browse/TWILL-217
> Project: Apache Twill
>  Issue Type: Improvement
>  Components: yarn
>    Affects Versions: 0.9.0
>    Reporter: Martin Serrano
>Assignee: Martin Serrano
>
> Currently the launcher for the appmaster includes the application.jar 
> libraries.  This is to support user code that adds an EventHandler.  The 
> application may have many dependencies and including them in the appmaster 
> classpath can lead to otherwise inaddressable incompatibilities.
> In my case, something in my application's large dependency graph was 
> interfering with the Kafka server operation.  I was not able to determine 
> what it was but tweaking the appmaster loader to not include my application 
> jars fixed the issue.
> Instead the bundler that creates the twill.jar should include the 
> EventHandler extension (if any) as an explicit dependency.  In this way, only 
> the jars needed to support the event handler will be on the twill classpath.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Release of Twill 0.10.0

2017-02-19 Thread Martin Serrano
Terence is correct.  It is easily worked around by supplying the jar so 
shouldn't be consider a blocker.

Martin

Sent from my Verizon Wireless 4G LTE DROID
On Feb 19, 2017 3:30 AM, Henry Saputra  wrote:
sounds good to me, thanks

On Sat, Feb 18, 2017 at 11:40 PM, Terence Yim  wrote:

> I believe TWILL-215 comes from the missing logback library from the client
> as discussed on another email thread. I am already preparing the 0.10.0
> release and we can always have a new release that includes the fix when
> deemed necessary.
>
> Terence
>
> On Sat, Feb 18, 2017 at 11:25 PM, Henry Saputra 
> wrote:
>
> > The potential blocker issue is TWILL-215.
> >
> > @MartinSerrano, could you list steps to repro this issue?
> >
> > Want to figure out if this is a blocker for 0.10.0 release
> >
> > - Henry
> >
> > On Thu, Feb 16, 2017 at 9:38 AM, Terence Yim  wrote:
> >
> > > Hi all,
> > >
> > > We've accumulated quite some enhancements and bug fixes and I think
> it's
> > > good time to have a new twill release. I am planning to send out a vote
> > > tomorrow (2/17). Please let me know if there is any concern.
> > >
> > > Terence
> > >
> >
>


[jira] [Commented] (TWILL-217) AppMaster launcher should include eventHandler dependencies and nothing else from application

2017-02-16 Thread Martin Serrano (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870007#comment-15870007
 ] 

Martin Serrano commented on TWILL-217:
--

That sounds reasonable.  Thanks!

> AppMaster launcher should include eventHandler dependencies and nothing else 
> from application
> -
>
> Key: TWILL-217
> URL: https://issues.apache.org/jira/browse/TWILL-217
> Project: Apache Twill
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 0.9.0
>    Reporter: Martin Serrano
>Assignee: Martin Serrano
>
> Currently the launcher for the appmaster includes the application.jar 
> libraries.  This is to support user code that adds an EventHandler.  The 
> application may have many dependencies and including them in the appmaster 
> classpath can lead to otherwise inaddressable incompatibilities.
> In my case, something in my application's large dependency graph was 
> interfering with the Kafka server operation.  I was not able to determine 
> what it was but tweaking the appmaster loader to not include my application 
> jars fixed the issue.
> Instead the bundler that creates the twill.jar should include the 
> EventHandler extension (if any) as an explicit dependency.  In this way, only 
> the jars needed to support the event handler will be on the twill classpath.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (TWILL-218) The implicit jopt-simple dependency should be made explicit

2017-02-13 Thread Martin Serrano (JIRA)
Martin Serrano created TWILL-218:


 Summary: The implicit jopt-simple dependency should be made 
explicit
 Key: TWILL-218
 URL: https://issues.apache.org/jira/browse/TWILL-218
 Project: Apache Twill
  Issue Type: Improvement
  Components: core
Affects Versions: 0.9.0
Reporter: Martin Serrano
Assignee: Martin Serrano
 Fix For: 0.10.0


Kafka has a dependency on jopt-simple.  It seems that some of the scala code is 
opaque to twill dependency resolution because this dependency is not found.  In 
environments that do not ship with jopt-simple, this shows up as a class not 
found exception which prevents the kafka service from coming up completely.

While it would be better to have dependency resolution that discovered this on 
its own, explicitly adding the dependency solves the problem with minimal 
changes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


classpath for the appmaster

2017-02-10 Thread Martin Serrano

Devs,

In our full deployment environment I could not get the Kafka forwarding 
of logs to work.  I kept getting Kafka errors on the AM trying to lookup 
the topic.   Seeing as how I had been able to get this same runnable 
working in a unit test environment I figured it had to do with the 
classpath.  Looking deeper I saw that the AM runs with the 
application.jar contents on its classpath.  Why is that?  It seems to me 
that the runnable classpaths should never be part of the AM.


I changed the TwillLauncher to not use the application.jar for the AM 
classpath and got a CNFE in the AM for jsimpleopt.OptionSpec.  It seems 
this is an implicit dependency of Kafka that is not currently discovered 
by the dependency mechanism (presumably because Kafka is written in 
Scala).   When I added jsimpleopt-3.2.jar to my classpath and as a 
dependency class for the AM everything worked!  I was not getting the 
CNFE when application.jar/lib/* was on the classpath so something in my 
application libs must have been picked up by Kafka initialization.


IMO, the AppMaster is internal Twill code and its dependencies should be 
fully provided by the distribution and self-contained. That may present 
some build challenges, but users should not ever run into this stuff.   
I'll file and ticket and submit a PR if there is agreement on this, but 
the application.jar/lib/* being on the AM classpath seems pretty 
intentional from looking at the code.


Cheers,
-Martin




Re: twill dependency on logback

2017-02-10 Thread Martin Serrano
Twill should not impose a logging framework on the runnables IMO.  I 
think slf4j is reasonable and since twill is handling the Kafka hookup, 
the runnables themselves need know nothing about logback correct?  I 
think ultimately, how logging is handled should be pluggable.   In our 
use case, the kafka tie-in is a nice idea, but for most cases will not 
fare so well.  For instance:


* The log handlers are only relevant in the process which starts the 
application.  Once that jvm is gone, the kafka queue is never consumed 
again.  Our yarn clients are typically established for control purposes 
and are ephemeral.  I realize that the kafka queue could be consumed by 
a new process but it doesn't seem like that is part of the API.
* A different strategy may be appropriate for some 
applications/runnables.  For example I may want to accumulate my logging 
in HDFS or just local to the AM .
* Right now our yarn applications dump logs locally and we use custom 
endpoints in the applications to pull logs for analysis. Otherwise they 
just accumulate and are then subject to the yarn default aggregation policy.
* We have several output log files to segregate different activity. I'd 
like to be able to replicate that in the twill environment.


To sum up, I'm not quite sure how to handle logging yet in the twill 
environment, but I'm pretty sure we'll need more flexibility and some 
type of pluggability.


Cheers,
Martin

On 02/10/2017 02:22 PM, Henry Saputra wrote:

Ah ok, thanks for clarifying your concern, Martin.

Twill currently do need the logback in the runnables running the YARN to be
able to collect the logs and publish them to the embedded Kafka.

So in your case, you want to use slf4j but backed up by log4j binding
instead for client and runnables?

- Henry

On Fri, Feb 10, 2017 at 9:24 AM, Martin Serrano <mar...@attivio.com> wrote:


Terence,

Correct, I don't want to use logback  on the client side.  slf4j is okay
-- we ship with the log4j binding.  In the twill containers, the AM doesn't
matter for me, but the runnables does.So yes, flexibility for the
runnables is important.

Thanks,
Martin


On 02/10/2017 01:56 AM, Terence Yim wrote:


Hi Martin,

If I understand correctly, your intention is to not using slf4j + logback
on the client side? How about the twill containers (both AM and runnables)?
Is it ok to use logback or you want twill to be more flexible about that?

I understand the failure on the AM that you mentioned, I am just
wondering what's your end goal looks like to shape a better solution for
this.

Terence

Sent from my iPhone

On Feb 9, 2017, at 11:33 AM, Martin Serrano <mar...@attivio.com> wrote:

Terence,

I'm familiar with the logback appender and Kafka code.  My point is this:

* the AppMaster depends on logback.
* when the YarnTwillPreparer class calls createTwillJar it is creating
the runtime jar for the AppMaster from the current classpath (or more
accurately from the classloader used by the current thread).
* this means the logback jar will not be within the twill jar unless it
is currently on the classpath of the client.  The current dependency code
ignores dependent classes which are not found in the classpath while
walking the dependency graph.  This is what leads to the class not found
exception when starting the appmaster.  This is why I filed TWILL-215.
* having the logback jar in the current classpath turns on logback
within my twill client code since I use slf4j.

Does that make sense?

-Martin



On 02/09/2017 02:19 PM, Terence Yim wrote:

Hi Martin,

Twill has a logback Appender implementation for capturing logs emitted
via
slf4j api from runnable and publish them to the embedded Kafka running
inside the AM process. If you are using log4j as the API for emitting
logs,
what you can do is to use the log4j-over-slf4j bridge to have logs
emitted
via the log4j API get bridged to slf4j.

I suspect why you are seeing the class missing error is most likely
because
you have the slf4j to log4j bridge (the reverse of the one I mentioned
above, look for a jar with name containing "slf4j-log4j12" in the client
classpath) that comes earlier in the classpath then the logback jars.

Terence

On Thu, Feb 9, 2017 at 10:47 AM, Martin Serrano <mar...@attivio.com>

wrote:

Henry,

I see this behavior deploying with YARN 2.7.1, HDP 2.3.  But I'm not
sure
you understood my issue.

* The logback jar dependency is only picked up if it is on the
classpath
when the bundle is created.
* With logback in my twill client classpath, the appmaster starts fine.
However without logback in my client classpath the appmaster will get a
ClassNotFoundException.
* We use log4j and with logback in my client classpath, it takes over
the
slf4j bindings and I lose control of the client logging.

So my question was about whether this is expected or if there is a
well-known procedure for working around it.  It seems there should be
a way
to tell the twill system to where to fin

Re: Gaining Control of Runnable after client terminates

2017-02-10 Thread Martin Serrano

Caleb,

Sorry, made a mistake there.  In the simple case the application has the 
simple classname, not the full one.


-Martin

On 02/10/2017 02:02 PM, Martin Serrano wrote:

Caleb,

The TwillRunnerService handles the ZK registration of applications for 
you.  When TwillClient B starts the service it can get the controllers 
for the application via the TwillRunnerService.lookup method.  If you 
did not explicitly name your application, then your are probably using 
the prepare method that names the application to the full classname of 
your runnable.


-Martin

On 02/09/2017 10:53 AM, Meier, Caleb wrote:

Hello,

Suppose that I start an instance of NotificationRunnable with 
TwillClient A.  NotificationRunnable continues to run after 
TwillClient A terminates (I no longer have a handle on the 
YarnTwillRunner that prepared it), but I want to send a command to 
NotificatonRunnable using an instance of TwillController that I 
create in TwillClient B.  Is this possible?  It seems like I would 
still need access to the YarnTwillRunner that started the 
NotificationRunnable.  I’m basing this on the source code for 
YarnTwillRunner.lookup(…) – it seems like this method only returns 
controllers for runnables that the enclosing instance of 
YarnTwillRunner has prepared.  Am I mistaken about this?  If not, do 
I need to explicitly register my application with a 
ZkDiscoveryService and then look it up later through the same 
service?  Is there a better way to go about this?


Thanks,

Caleb A. Meier, Ph.D.
Software Engineer II ♦ Analyst
Parsons Corporation
1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
Office:  (703)797-3066
caleb.me...@parsons.com<mailto:caleb.me...@parsons.com> ♦ 
www.parsons.com<https://webportal.parsons.com/,DanaInfo=www.parsons.com+> 









Re: twill dependency on logback

2017-02-10 Thread Martin Serrano

Terence,

Correct, I don't want to use logback  on the client side.  slf4j is okay 
-- we ship with the log4j binding.  In the twill containers, the AM 
doesn't matter for me, but the runnables does.So yes, flexibility 
for the runnables is important.


Thanks,
Martin

On 02/10/2017 01:56 AM, Terence Yim wrote:

Hi Martin,

If I understand correctly, your intention is to not using slf4j + logback on 
the client side? How about the twill containers (both AM and runnables)? Is it 
ok to use logback or you want twill to be more flexible about that?

I understand the failure on the AM that you mentioned, I am just wondering 
what's your end goal looks like to shape a better solution for this.

Terence

Sent from my iPhone


On Feb 9, 2017, at 11:33 AM, Martin Serrano <mar...@attivio.com> wrote:

Terence,

I'm familiar with the logback appender and Kafka code.  My point is this:

* the AppMaster depends on logback.
* when the YarnTwillPreparer class calls createTwillJar it is creating the 
runtime jar for the AppMaster from the current classpath (or more accurately 
from the classloader used by the current thread).
* this means the logback jar will not be within the twill jar unless it is 
currently on the classpath of the client.  The current dependency code ignores 
dependent classes which are not found in the classpath while walking the 
dependency graph.  This is what leads to the class not found exception when 
starting the appmaster.  This is why I filed TWILL-215.
* having the logback jar in the current classpath turns on logback within my 
twill client code since I use slf4j.

Does that make sense?

-Martin




On 02/09/2017 02:19 PM, Terence Yim wrote:
Hi Martin,

Twill has a logback Appender implementation for capturing logs emitted via
slf4j api from runnable and publish them to the embedded Kafka running
inside the AM process. If you are using log4j as the API for emitting logs,
what you can do is to use the log4j-over-slf4j bridge to have logs emitted
via the log4j API get bridged to slf4j.

I suspect why you are seeing the class missing error is most likely because
you have the slf4j to log4j bridge (the reverse of the one I mentioned
above, look for a jar with name containing "slf4j-log4j12" in the client
classpath) that comes earlier in the classpath then the logback jars.

Terence


On Thu, Feb 9, 2017 at 10:47 AM, Martin Serrano <mar...@attivio.com> wrote:

Henry,

I see this behavior deploying with YARN 2.7.1, HDP 2.3.  But I'm not sure
you understood my issue.

* The logback jar dependency is only picked up if it is on the classpath
when the bundle is created.
* With logback in my twill client classpath, the appmaster starts fine.
However without logback in my client classpath the appmaster will get a
ClassNotFoundException.
* We use log4j and with logback in my client classpath, it takes over the
slf4j bindings and I lose control of the client logging.

So my question was about whether this is expected or if there is a
well-known procedure for working around it.  It seems there should be a way
to tell the twill system to where to find the appmaster dependencies
without having them in the classpath of the twill client.

Thanks!
-Martin



On 02/08/2017 08:09 PM, Henry Saputra wrote:

But the logback dependency should be included in the jar packaging that
YARN client sends for Twill ApplicationMaster.

Are you seeing this behavior in deploying Twill app in latest YARN?

- Henry

On Wed, Feb 8, 2017 at 12:30 PM, Martin Serrano <mar...@attivio.com>
wrote:

Hey Devs,

It seems like the twill project goes through some pain to try to insulate
itself logging frameworks.  I see use of the slf4j API. However, the
appmaster code has a dependency on logback via the
org.apache.twill.internal.logging.Loggings class.   The appmaster will
not start up without this dependency present.  With the dependency code
as
it is now, there is no way to include the logback jar in the generated
bundle without it being on the current classpath.  I've created a ticket
(TWILL-215) to make a missing dependency trigger an exception at bundle
generation time rather than appmaster execution time.

When the logback jar is on my classpath, my client code picks up logback
instead of our current logger (log4j).  Is this what is expected?  Is
there
any known workaround?  It seems like there may be a case for specifying
dependencies of the appmaster that are located outside of the current jvm
classpath.

Thanks,
Martin







Re: twill dependency on logback

2017-02-08 Thread Martin Serrano
Any chance we could slack/chat on this tomorrow?

Sent from my Verizon Wireless 4G LTE DROID
On Feb 8, 2017 8:09 PM, Henry Saputra <henry.sapu...@gmail.com> wrote:
But the logback dependency should be included in the jar packaging that
YARN client sends for Twill ApplicationMaster.

Are you seeing this behavior in deploying Twill app in latest YARN?

- Henry

On Wed, Feb 8, 2017 at 12:30 PM, Martin Serrano <mar...@attivio.com> wrote:

> Hey Devs,
>
> It seems like the twill project goes through some pain to try to insulate
> itself logging frameworks.  I see use of the slf4j API. However, the
> appmaster code has a dependency on logback via the
> org.apache.twill.internal.logging.Loggings class.   The appmaster will
> not start up without this dependency present.  With the dependency code as
> it is now, there is no way to include the logback jar in the generated
> bundle without it being on the current classpath.  I've created a ticket
> (TWILL-215) to make a missing dependency trigger an exception at bundle
> generation time rather than appmaster execution time.
>
> When the logback jar is on my classpath, my client code picks up logback
> instead of our current logger (log4j).  Is this what is expected?  Is there
> any known workaround?  It seems like there may be a case for specifying
> dependencies of the appmaster that are located outside of the current jvm
> classpath.
>
> Thanks,
> Martin
>
>


[jira] [Commented] (TWILL-215) Dependencies not on classpath lead to runtime startup error

2017-02-08 Thread Martin Serrano (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858607#comment-15858607
 ] 

Martin Serrano commented on TWILL-215:
--

The details got pretty messy.  I still think a github diff/pr is the best way 
to discuss.  Should I just submit a PR?

> Dependencies not on classpath lead to runtime startup error
> ---
>
> Key: TWILL-215
> URL: https://issues.apache.org/jira/browse/TWILL-215
> Project: Apache Twill
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0
>    Reporter: Martin Serrano
>Priority: Critical
> Fix For: 0.10.0
>
>
> We do not use logback in our environment but it is a dependency of 
> {{ApplicationMasterMain}}.  When {{YarnTwillPreparer.createTwillJar}} is 
> called in our environment, the logback jar is not on our classpath.   For a 
> class not in the classpath, the {{Dependencies.findClassDependencies}} method 
> ignores it.  This leads to a runtime startup error when the app master tries 
> to start.
> This is easily fixed unless there some use case for ignoring the dependency 
> when it is not on the classpath.  An exception should be thrown and no yarn 
> job should be submitted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


twill dependency on logback

2017-02-08 Thread Martin Serrano

Hey Devs,

It seems like the twill project goes through some pain to try to 
insulate itself logging frameworks.  I see use of the slf4j API. 
However, the appmaster code has a dependency on logback via the 
org.apache.twill.internal.logging.Loggings class.   The appmaster will 
not start up without this dependency present.  With the dependency code 
as it is now, there is no way to include the logback jar in the 
generated bundle without it being on the current classpath.  I've 
created a ticket (TWILL-215) to make a missing dependency trigger an 
exception at bundle generation time rather than appmaster execution time.


When the logback jar is on my classpath, my client code picks up logback 
instead of our current logger (log4j).  Is this what is expected?  Is 
there any known workaround?  It seems like there may be a case for 
specifying dependencies of the appmaster that are located outside of the 
current jvm classpath.


Thanks,
Martin



[jira] [Commented] (TWILL-215) Dependencies not on classpath lead to runtime startup error

2017-02-08 Thread Martin Serrano (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858268#comment-15858268
 ] 

Martin Serrano commented on TWILL-215:
--

You get a {{ClassNotFoundException}} trying to start the app master.  My view 
would be that a preventable error like this should be prevented since the twill 
system depends on the appmaster to be able to discover any startup errors.  I'm 
going to submit a PR for the proposed fix (throwing an exception when the 
bundle is being created) since I think the details will warrant discussion.

> Dependencies not on classpath lead to runtime startup error
> ---
>
> Key: TWILL-215
> URL: https://issues.apache.org/jira/browse/TWILL-215
> Project: Apache Twill
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0
>    Reporter: Martin Serrano
>Priority: Critical
> Fix For: 0.10.0
>
>
> We do not use logback in our environment but it is a dependency of 
> {{ApplicationMasterMain}}.  When {{YarnTwillPreparer.createTwillJar}} is 
> called in our environment, the logback jar is not on our classpath.   For a 
> class not in the classpath, the {{Dependencies.findClassDependencies}} method 
> ignores it.  This leads to a runtime startup error when the app master tries 
> to start.
> This is easily fixed unless there some use case for ignoring the dependency 
> when it is not on the classpath.  An exception should be thrown and no yarn 
> job should be submitted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TWILL-213) Increase of instances while starting up may lead to ignored retries and instance increases

2017-02-07 Thread Martin Serrano (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856669#comment-15856669
 ] 

Martin Serrano commented on TWILL-213:
--

I've updated the description with what I think is happening.

> Increase of instances while starting up may lead to ignored retries and 
> instance increases
> --
>
> Key: TWILL-213
> URL: https://issues.apache.org/jira/browse/TWILL-213
> Project: Apache Twill
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 0.9.0
>    Reporter: Martin Serrano
>
> As seen in the test development for TWILL-181, if the number of instances for 
> a container is increased before the {{ApplicationMasterService}} has observed 
> the original request as being satisfied, the instance increase and any 
> subsequent retries will be blocked.  This is because in {{launchRunnable}}:
> {code}
> TwillContainerLauncher launcher = new TwillContainerLauncher(
> twillSpec.getRunnables().get(runnableName), 
> processLauncher.getContainerInfo(), launchContext,
> ZKClients.namespace(zkClient, getZKNamespace(runnableName)),
> containerCount, jvmOpts, reservedMemory, getSecureStoreLocation());
>   runningContainers.start(runnableName, 
> processLauncher.getContainerInfo(), launcher);
>   // Need to call complete to workaround bug in YARN AMRMClient
>   if (provisionRequest.containerAcquired()) {
> amClient.completeContainerRequest(provisionRequest.getRequestId());
>   }
>   /*
>* The provisionRequest will either contain a single container 
> (ALLOCATE_ONE_INSTANCE_AT_A_TIME), or all the
>* containers to satisfy the expectedContainers count. In the later 
> case, the provision request is complete once
>* all the containers have run at which point we poll() to remove the 
> provisioning request.
>*/
>   if (expectedContainers.getExpected(runnableName) == 
> runningContainers.count(runnableName) ||
> 
> provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
>  {
> provisioning.poll();
>   }
> {code}
> There is a race condition.  The sequence:
> * *Thread A*: {{runningContainers.start}} is called and 2 instances are 
> started
> * *Thread B*: The runnable from {{createSetInstanceRunnable}} executes, sees 
> the 2 instances are started and updates the expected count to 3.
> * *Thread A*: Gets to if check comparing expectedContainers (3) to 
> runningContainers.count (2).  Since this fails, {{poll}} is not called and 
> this provision request is not satisfied.
> Subsequent calls will try to provision the 3rd container because it seems 
> like the first provision request is not yet satisfied.
> The {{MaxRetriesTestRun.maxRetriesWithIncreasedInstances}} method can be used 
> to reproduce this case intermittently by changing the {{allRunning.await}} 
> check to something that does a countdown latch {{onRunning}} as 
> {{EchoServerTestRun}} does.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: interest in bare-bones cluster/agent solution?

2017-02-06 Thread Martin Serrano

Devs,

I've got an initial prototype going on this project.  During this 
process I noticed that much of what I needed to do ended up duplicating 
code that is in the twill-yarn project.  I'm curious if the community 
would be open to refactoring that moved some of this code such that it 
could be shared by different clustering control implementations.  For 
instance, much of the code in YarnTwillPreparer, YarnContainerMain, 
YarnServiceMain, etc is not really yarn specific.


Thoughts?

Thanks,
Martin

On 02/01/2017 04:13 PM, Martin Serrano wrote:

Hey Devs,

I've been evangelizing Twill at here at my company.  We have a for 
several years had a basic clustering solution for running our system 
and its components across multiple hosts.  As we've worked to migrate 
the platform to Hadoop and YARN, we have started to move our cluster 
control to be Yarn-based.  And my goal is to make it Twill-based.  
However as we do this work we anticipate the need to support customers 
that do not have and do not want to operate Hadoop infrastructure.


I'm working on an architecture which would use Twill for command and 
control for all of our services.  For customers that did not want the 
full enterprise capabilities that come with Hadoop we could continue 
to offer our basic clustering support (albeit with reduced 
capabilities) by plugging our clustering solution into Twill.  One of 
the aspects of Twill that interested me from the start was that the 
control API was abstracted from YARN.


A couple questions:

1) Do any plugins for other clustering backends already exist? Open 
source?  Commercial?


2) If we were to do this, would the Twill community be interested in a 
donation of this code to the Twill project?


I recall that Henry and I had a conversation at the last Apache Big 
Data that there had been talk of plugins for other clustering backends 
but I don't know if anything ever came of that.


Cheers,
Martin Serrano





[jira] [Updated] (TWILL-213) Increase of instances while starting up may lead to ignored retries and instance increases

2017-02-03 Thread Martin Serrano (JIRA)

 [ 
https://issues.apache.org/jira/browse/TWILL-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Serrano updated TWILL-213:
-
Description: 
As seen in the test development for TWILL-181, if the number of instances for a 
container is increased before the {{ApplicationMasterService}} has observed the 
original request as being satisfied, the instance increase and any subsequent 
retries will be blocked.  This is because in {{launchRunnable}}:

{code}
  if (expectedContainers.getExpected(runnableName) == 
runningContainers.count(runnableName) ||

provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
 {
provisioning.poll();
  }
{code}

we are comparing the expected containers to the running count to decide if 
{{provisioning.poll()}} should be called.   If a new instance request has been 
made, the expected containers will have been updated and the running count 
never will.  The {{MaxRetriesTestRun.maxRetriesWithIncreasedInstances}} method 
can be used to reproduce this case intermittently by changing the 
{{allRunning.await}} check to something that does a countdown latch 
{{onRunning}} as {{EchoServerTestRun}} does.

  was:
As seen in the test development for TWILL-181, if the number of instances for a 
container is increased before the `ApplicationMasterService` has observed the 
original request as being satisfied, the instance increase and any subsequent 
retries will be blocked.  This is because in `launchRunnable`:

{code}
  if (expectedContainers.getExpected(runnableName) == 
runningContainers.count(runnableName) ||

provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
 {
provisioning.poll();
  }
{code}

we are comparing the expected containers to the running count to decide if 
`provisioning.poll()` should be called.   If a new instance request has been 
made, the expected containers will have been updated and the running count 
never will.  The `MaxRetriesTestRun.maxRetriesWithIncreasedInstances` method 
can be used to reproduce this case intermittently by changing the 
`allRunning.await` check to something that does a countdown latch `onRunning` 
as `EchoServerTestRun` does.


> Increase of instances while starting up may lead to ignored retries and 
> instance increases
> --
>
> Key: TWILL-213
> URL: https://issues.apache.org/jira/browse/TWILL-213
> Project: Apache Twill
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 0.9.0
>    Reporter: Martin Serrano
>
> As seen in the test development for TWILL-181, if the number of instances for 
> a container is increased before the {{ApplicationMasterService}} has observed 
> the original request as being satisfied, the instance increase and any 
> subsequent retries will be blocked.  This is because in {{launchRunnable}}:
> {code}
>   if (expectedContainers.getExpected(runnableName) == 
> runningContainers.count(runnableName) ||
> 
> provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
>  {
> provisioning.poll();
>   }
> {code}
> we are comparing the expected containers to the running count to decide if 
> {{provisioning.poll()}} should be called.   If a new instance request has 
> been made, the expected containers will have been updated and the running 
> count never will.  The {{MaxRetriesTestRun.maxRetriesWithIncreasedInstances}} 
> method can be used to reproduce this case intermittently by changing the 
> {{allRunning.await}} check to something that does a countdown latch 
> {{onRunning}} as {{EchoServerTestRun}} does.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (TWILL-213) Increase of instances while starting up may lead to ignored retries and instance increases

2017-02-03 Thread Martin Serrano (JIRA)
Martin Serrano created TWILL-213:


 Summary: Increase of instances while starting up may lead to 
ignored retries and instance increases
 Key: TWILL-213
 URL: https://issues.apache.org/jira/browse/TWILL-213
 Project: Apache Twill
  Issue Type: Bug
  Components: yarn
Affects Versions: 0.9.0
Reporter: Martin Serrano


As seen in the test development for TWILL-181, if the number of instances for a 
container is increased before the `ApplicationMasterService` has observed the 
original request as being satisfied, the instance increase and any subsequent 
retries will be blocked.  This is because in `launchRunnable`:

{code}
  if (expectedContainers.getExpected(runnableName) == 
runningContainers.count(runnableName) ||

provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
 {
provisioning.poll();
  }
{code}

we are comparing the expected containers to the running count to decide if 
`provisioning.poll()` should be called.   If a new instance request has been 
made, the expected containers will have been updated and the running count 
never will.  The `MaxRetriesTestRun.maxRetriesWithIncreasedInstances` method 
can be used to reproduce this case intermittently by changing the 
`allRunning.await` check to something that does a countdown latch `onRunning` 
as `EchoServerTestRun` does.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


interest in bare-bones cluster/agent solution?

2017-02-01 Thread Martin Serrano

Hey Devs,

I've been evangelizing Twill at here at my company.  We have a for 
several years had a basic clustering solution for running our system and 
its components across multiple hosts.  As we've worked to migrate the 
platform to Hadoop and YARN, we have started to move our cluster control 
to be Yarn-based.  And my goal is to make it Twill-based.  However as we 
do this work we anticipate the need to support customers that do not 
have and do not want to operate Hadoop infrastructure.


I'm working on an architecture which would use Twill for command and 
control for all of our services.  For customers that did not want the 
full enterprise capabilities that come with Hadoop we could continue to 
offer our basic clustering support (albeit with reduced capabilities) by 
plugging our clustering solution into Twill.  One of the aspects of 
Twill that interested me from the start was that the control API was 
abstracted from YARN.


A couple questions:

1) Do any plugins for other clustering backends already exist? Open 
source?  Commercial?


2) If we were to do this, would the Twill community be interested in a 
donation of this code to the Twill project?


I recall that Henry and I had a conversation at the last Apache Big Data 
that there had been talk of plugins for other clustering backends but I 
don't know if anything ever came of that.


Cheers,
Martin Serrano



Re: [DISCUSS] HasDependencies tagging interface

2017-01-18 Thread Martin Serrano
Yes, this was my intention.  In our case, the required additional 
classes can be determined dynamically based on configuration.  It makes 
sense for the code that does this determination to live with the 
runnable rather than the controller.


-Martin

On 01/17/2017 09:08 PM, Andreas Neumann wrote:

I guess the difference is decoupling of the preparer from the runnable.
Martin's approach makes it a property of the runnable itself, so the
preparer can derive this information. That is, I can modify my runnable
without having to modify my invocation of the preparer.

Thoughts?

-Andreas.

On Tue, Jan 17, 2017 at 5:56 PM, Terence Yim <cht...@gmail.com> wrote:


Hi Martin,

Is it not doable via the TwillPreparer.withDependencies method?

Terence

On Tue, Jan 17, 2017 at 2:56 PM, Martin Serrano <mar...@attivio.com>
wrote:


Team,

I have some untraceable dependencies for one of my runnables.  It occurs
to me that preparing and launching the runnable is not always the best
place to define these dependencies (using withDependences method).  The
runnable itself will always have these deps (there is static xml
configuration embedded in the lib).  What would folks think of the idea

of

a tagging interface that TwillPreparer would check and insert the deps
itself.

Something like:

public interface HasDependencies {

   Iterable<Class> dependencies();

}

This interface could be added to any implementation of TwillRunnable.

Thoughts?

-Martin






[DISCUSS] HasDependencies tagging interface

2017-01-17 Thread Martin Serrano

Team,

I have some untraceable dependencies for one of my runnables.  It occurs 
to me that preparing and launching the runnable is not always the best 
place to define these dependencies (using withDependences method).  The 
runnable itself will always have these deps (there is static xml 
configuration embedded in the lib).  What would folks think of the idea 
of a tagging interface that TwillPreparer would check and insert the 
deps itself.


Something like:

public interface HasDependencies {

  Iterable dependencies();

}

This interface could be added to any implementation of TwillRunnable.

Thoughts?

-Martin



Re: Desired behavior at shutdown

2017-01-14 Thread Martin Serrano

Created https://issues.apache.org/jira/browse/TWILL-204

On 01/14/2017 11:47 AM, Martin Serrano wrote:

Terence,

I'm going to make a ticket and move discussion to that.  Okay?

-Martin



Re: Desired behavior at shutdown

2017-01-14 Thread Martin Serrano

Terence,

I'm going to make a ticket and move discussion to that.  Okay?

-Martin

On 01/07/2017 02:29 PM, Terence Yim wrote:

Hi Martin,

One simpler way is that we could use the messaging mechanism in Twill to
send a message from controller to AM during shutdown, after the controller
sees a special log event emitted by AM after the AM shuts down all
containers. The AM will wait for the message from controller before
completely shutting down itself (with some reasonable timeout to avoid
infinite wait).

Terence

On Fri, Jan 6, 2017 at 4:46 PM, Martin Serrano <mar...@attivio.com> wrote:


Terence,

I see your point.  I've thought a bit about this and it seems the only
solution would be to coordinate via ZK between the controllers and the AM.
The solution would be something like this:

* Controller consumers register an ephemeral znode within under kafka
znode to indicate they are clients
* Controllers listen on the .../kafka/broker znode for a child named
shuttingDown
* When AM reaches the state where it wants to shut down, it creates the
.../kafka/broker/shuttingDown znode and waits for there to be no registered
controllers.  This wait would have an upper bound to prevent eternal
waiting.
* Once controller consumer sees the shuttingDown node, if it receives an
emtpy messages buffer it shuts itself down.
* Controller consumers remove their registration znode when they shut down
* The AM shuts down the broker once all controller consumers are gone or
it has reached its timeout

This solution avoids the checkpointing load and znode use scales with the
number of consumers which is presumably smallish.  There are no net z-ops
outside of consumer creation and shutdown time. However, I consider this a
complex kind of setup and this code tends to be harder to maintain since
the logic is spread amongst different application layers.  I don't see any
other way to ensure full reading of the kafka queues given the decoupled
nature of the broker and client.

Something else I thought of that would usually alleviate the issue but be
much simpler would be to have an extended timeout before broker shutdown if
the containers exit with a non-success error code (say 15s).  The shutdown
timeouts could also be made configurable. What do you think?

-Martin




On 01/05/2017 10:53 PM, Terence Yim wrote:


Hi Martin,

I do agree that the AM should only shutdown the Embedded Kafka server
once all the controllers see all the logs. However, the difficulties is in
how does the AM knows about it? The Twill controller is using simple Kafka
API instead of the higher level one (as that one involves checkpointing to
ZK, as we don't want running many twill apps put a heavy load on ZK). Do
you have any suggestions how to do that?

Thanks,
Terence

Sent from my iPhone

On Jan 5, 2017, at 2:42 PM, Martin Serrano <mar...@attivio.com> wrote:

Actually, after further investigation, I realize the server side has to
be dealt with because it is shutting down the Kafka broker before all the
messages are read from it.  I see that there is a 2 second delay for
clients to pull what they can first.  What would folks think about an
algorithm that checked the topic for unread messages and had a longer
timeout (say 30s) as long as there were messages to be received still?  Is
there an issue that the client may not be present on the other side and
that the delay of shutting down the AM would be undesirable?

-Martin

On 01/05/2017 12:32 PM, Martin Serrano wrote:

All,

I'm encountering a situation on a fast machine where the Kafka log
aggregation topic is not empty when the system shuts down. The scenario:

   *  log consumer consumes all messages
   * consumer sleeps (500ms) due to empty queue
   * containers exit, posting /final log messages/ about why
   * controller notices containers are down and terminates consumers.
   * consumer is interrupted from sleep and but has been canceled so it
 does not get the rest of the messages.

This scenario can be really confusing during development because an
error may be missed (as in my case) if it falls into the /final log
messages/.  Before I file a ticket and fix this, I wanted to get some
feedback.  Looking at org.apache.twill.internal.kafka.client.SimpleKafkaConsumer
it seems this behavior could be intentional given this log message (line
384):

 LOG.debug("Unable to fetch messages on {}, kafka consumer
service shutdown is in progress.", topicPart);

My opinion is that final messages logged by a container are likely to
be critical in diagnosing errors and that twill should do whatever it can
to forward them before shutting things down. If there is agreement on this
I'll file a ticket and fix it.  My general approach would be to indicate to
the consumer that it is in a shuttingDown state which it would use to break
from the consume loop once the message set was empty.  If this makes sense
would we need to support a timeout for the maximum amount of time to be in
thi

Re: Desired behavior at shutdown

2017-01-06 Thread Martin Serrano

Terence,

I see your point.  I've thought a bit about this and it seems the only 
solution would be to coordinate via ZK between the controllers and the 
AM.  The solution would be something like this:


* Controller consumers register an ephemeral znode within under kafka 
znode to indicate they are clients
* Controllers listen on the .../kafka/broker znode for a child named 
shuttingDown
* When AM reaches the state where it wants to shut down, it creates the 
.../kafka/broker/shuttingDown znode and waits for there to be no 
registered controllers.  This wait would have an upper bound to prevent 
eternal waiting.
* Once controller consumer sees the shuttingDown node, if it receives an 
emtpy messages buffer it shuts itself down.

* Controller consumers remove their registration znode when they shut down
* The AM shuts down the broker once all controller consumers are gone or 
it has reached its timeout


This solution avoids the checkpointing load and znode use scales with 
the number of consumers which is presumably smallish.  There are no net 
z-ops outside of consumer creation and shutdown time. However, I 
consider this a complex kind of setup and this code tends to be harder 
to maintain since the logic is spread amongst different application 
layers.  I don't see any other way to ensure full reading of the kafka 
queues given the decoupled nature of the broker and client.


Something else I thought of that would usually alleviate the issue but 
be much simpler would be to have an extended timeout before broker 
shutdown if the containers exit with a non-success error code (say 
15s).  The shutdown timeouts could also be made configurable. What do 
you think?


-Martin



On 01/05/2017 10:53 PM, Terence Yim wrote:

Hi Martin,

I do agree that the AM should only shutdown the Embedded Kafka server once all 
the controllers see all the logs. However, the difficulties is in how does the 
AM knows about it? The Twill controller is using simple Kafka API instead of 
the higher level one (as that one involves checkpointing to ZK, as we don't 
want running many twill apps put a heavy load on ZK). Do you have any 
suggestions how to do that?

Thanks,
Terence

Sent from my iPhone


On Jan 5, 2017, at 2:42 PM, Martin Serrano <mar...@attivio.com> wrote:

Actually, after further investigation, I realize the server side has to be 
dealt with because it is shutting down the Kafka broker before all the messages 
are read from it.  I see that there is a 2 second delay for clients to pull 
what they can first.  What would folks think about an algorithm that checked 
the topic for unread messages and had a longer timeout (say 30s) as long as 
there were messages to be received still?  Is there an issue that the client 
may not be present on the other side and that the delay of shutting down the AM 
would be undesirable?

-Martin


On 01/05/2017 12:32 PM, Martin Serrano wrote:

All,

I'm encountering a situation on a fast machine where the Kafka log aggregation 
topic is not empty when the system shuts down. The scenario:

  *  log consumer consumes all messages
  * consumer sleeps (500ms) due to empty queue
  * containers exit, posting /final log messages/ about why
  * controller notices containers are down and terminates consumers.
  * consumer is interrupted from sleep and but has been canceled so it
does not get the rest of the messages.

This scenario can be really confusing during development because an error may 
be missed (as in my case) if it falls into the /final log messages/.  Before I 
file a ticket and fix this, I wanted to get some feedback.  Looking at 
org.apache.twill.internal.kafka.client.SimpleKafkaConsumer it seems this 
behavior could be intentional given this log message (line 384):

LOG.debug("Unable to fetch messages on {}, kafka consumer service 
shutdown is in progress.", topicPart);

My opinion is that final messages logged by a container are likely to be 
critical in diagnosing errors and that twill should do whatever it can to 
forward them before shutting things down. If there is agreement on this I'll 
file a ticket and fix it.  My general approach would be to indicate to the 
consumer that it is in a shuttingDown state which it would use to break from 
the consume loop once the message set was empty.  If this makes sense would we 
need to support a timeout for the maximum amount of time to be in this state 
before punting on the rest of the messages?  My instinct is no, get them all, 
but given the way the code is set up now, perhaps there are good reasons to 
timeout.

Thanks,

Martin Serrano





[jira] [Commented] (TWILL-181) Control the maximum number of retries for failed application starts

2016-08-23 Thread Martin Serrano (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433574#comment-15433574
 ] 

Martin Serrano commented on TWILL-181:
--

I'm going to start taking a look at this.  

> Control the maximum number of retries for failed application starts
> ---
>
> Key: TWILL-181
> URL: https://issues.apache.org/jira/browse/TWILL-181
> Project: Apache Twill
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 0.7.0-incubating
>    Reporter: Martin Serrano
> Fix For: 0.8.0
>
>
> If an application consistently exits with a non-zero code,  twill will 
> attempt to restart indefinitely.  I ran into this issue and a list search 
> also reveals [others|  http://markmail.org/message/dehx7r6tpqgcmjh4].  
> There should be a mechanism to specify the maximum number of retries until 
> the application fails.  Ideally by default there would be a non-infinite 
> maximum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: TwillRunnerService usage

2016-08-12 Thread Martin Serrano
Thanks!  Same here.

On 08/10/2016 05:48 PM, Henry Saputra wrote:
> Hi Martin,
>
> Glad to see you in the dev@ list =)
>
> Looking forward to working with you and your team.
>
> - Henry
>
>
> On Wed, Aug 10, 2016 at 1:39 PM, Martin Serrano <mar...@attivio.com> wrote:
>
>> On 08/09/2016 01:45 PM, Terence Yim wrote:
>>> Hi Martin,
>>>
>>> Currently there is no way of knowing when the TwillRunnerService finished
>>> the first sync up with the ZK. Would you mind opening a JIRA for that?
>> https://issues.apache.org/jira/browse/TWILL-183
>>
>>> For the YARN application state inside the YarnTwillController, currently
>> it
>>> is not surfaced. Potentially it can be added to the
>>> TwillController.getResourceReport() method.
>>>
>>> We are also looking for improving the state reporting through the
>>> TwillController after an app was submitted via surfacing more information
>>> about individual app state/resource and cluster resources. Would you mind
>>> file JIRA(s) for them as well?
>> I will.  I need to research a bit to write a good ticket.
>>
>> -Martin
>>> Thanks,
>>> Terence
>>>
>>> On Mon, Aug 8, 2016 at 1:07 PM, Martin Serrano <mar...@attivio.com>
>> wrote:
>>>> Hi,
>>>>
>>>> I see from the source that the TwillRunnerService is locating existing
>>>> controllers by querying ZK in the background.  Since these run in the
>>>> background, there seems to be no way to know when the requests are
>>>> complete.  Thus, on starting up the service, I can't reliably determine
>>>> the complete set of controllers running from previous sessions.  Is
>>>> there a way to know?  Is there a listener I can register?
>>>>
>>>> Also, I see that YarnTwillController internally determines the current
>>>> application state (RUNNING, ACCEPTED, etc) via a YarnAppicationReport.
>>>> Is there any public API for this information?
>>>>
>>>> Basically I'm trying to work through the problem of submission of an
>>>> application, monitoring the current state, detecting somehow that it is
>>>> going to stay stuck in the ACCEPTED state, and then working out why.
>>>> Sometimes it is lack of memory resources, sometimes cpu.  Is there any
>>>> programmatic way?
>>>>
>>>> Thanks,
>>>> Martin Serrano
>>>>
>>



[jira] [Commented] (TWILL-182) ApplicationBundler will overwrite dependencies with identical names

2016-08-11 Thread Martin Serrano (JIRA)

[ 
https://issues.apache.org/jira/browse/TWILL-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418006#comment-15418006
 ] 

Martin Serrano commented on TWILL-182:
--

will do.

> ApplicationBundler will overwrite dependencies with identical names
> ---
>
> Key: TWILL-182
> URL: https://issues.apache.org/jira/browse/TWILL-182
> Project: Apache Twill
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.7.0-incubating
>    Reporter: Martin Serrano
> Fix For: 0.8.0
>
>
> If two jars obtained from *different* classpath locations have the same name 
> but different contents, one will overwrite the other.  The dependency code 
> correctly finds the jars (uses the full path in the HashSet which accumulates 
> the deps) but when the bundle is created the jars are written to {{/lib}} 
> under their name.  This results in one overwriting the other.
> While this is not a likely occurrence, it occurs for us in our development 
> environment because our published jar names are built up from their project 
> hierarchy.  For example the model project for our sdk is in {{.../sdk/model}} 
> and will be on the classpath as {{.../sdk/model.jar}} and published as 
> {{sdk-model.jar}}.  
> In practice however this could occur with any jar name and would be more 
> likely over time.
> The {{ApplicationBundler}} could detect this and re-write the name with some 
> part of the path or suffix to ensure the name is unique.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: TwillRunnerService usage

2016-08-10 Thread Martin Serrano
On 08/09/2016 01:45 PM, Terence Yim wrote:
> Hi Martin,
>
> Currently there is no way of knowing when the TwillRunnerService finished
> the first sync up with the ZK. Would you mind opening a JIRA for that?

https://issues.apache.org/jira/browse/TWILL-183

> For the YARN application state inside the YarnTwillController, currently it
> is not surfaced. Potentially it can be added to the
> TwillController.getResourceReport() method.
>
> We are also looking for improving the state reporting through the
> TwillController after an app was submitted via surfacing more information
> about individual app state/resource and cluster resources. Would you mind
> file JIRA(s) for them as well?
I will.  I need to research a bit to write a good ticket.

-Martin
> Thanks,
> Terence
>
> On Mon, Aug 8, 2016 at 1:07 PM, Martin Serrano <mar...@attivio.com> wrote:
>
>> Hi,
>>
>> I see from the source that the TwillRunnerService is locating existing
>> controllers by querying ZK in the background.  Since these run in the
>> background, there seems to be no way to know when the requests are
>> complete.  Thus, on starting up the service, I can't reliably determine
>> the complete set of controllers running from previous sessions.  Is
>> there a way to know?  Is there a listener I can register?
>>
>> Also, I see that YarnTwillController internally determines the current
>> application state (RUNNING, ACCEPTED, etc) via a YarnAppicationReport.
>> Is there any public API for this information?
>>
>> Basically I'm trying to work through the problem of submission of an
>> application, monitoring the current state, detecting somehow that it is
>> going to stay stuck in the ACCEPTED state, and then working out why.
>> Sometimes it is lack of memory resources, sometimes cpu.  Is there any
>> programmatic way?
>>
>> Thanks,
>> Martin Serrano
>>



[jira] [Created] (TWILL-183) TwillRunnerService should provide a way to determine that TwillRunner background load is complete

2016-08-10 Thread Martin Serrano (JIRA)
Martin Serrano created TWILL-183:


 Summary: TwillRunnerService should provide a way to determine that 
TwillRunner background load is complete
 Key: TWILL-183
 URL: https://issues.apache.org/jira/browse/TWILL-183
 Project: Apache Twill
  Issue Type: Improvement
  Components: core
Affects Versions: 0.7.0-incubating
Reporter: Martin Serrano


I see from the source that the TwillRunnerService is locating existing
controllers by querying ZK in the background.  Since these run in the
background, there seems to be no way to know when the requests are
complete.  Thus, on starting up the service, I can't reliably determine
the complete set of controllers running from previous sessions.  Is
there a way to know?  Is there a listener I can register?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)