anyone attending Apache Big Data Miami?
Devs, I'm going to Apache Big Data in a couple weeks. Just wondering if anyone else on the list will be there. Perhaps we can get together, do a BOF or something. Cheers, Martin Serrano
[jira] [Comment Edited] (TWILL-217) AppMaster launcher should include eventHandler dependencies and nothing else from application
[ https://issues.apache.org/jira/browse/TWILL-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883217#comment-15883217 ] Martin Serrano edited comment on TWILL-217 at 2/24/17 6:09 PM: --- Yeah, I will do what you suggest. I considered that but it seemed a little hacky so I was trying to think of other things. Regarding guava, I see your point, and there are things in it that are no longer needed now that java has them (functions, predicates, etc). was (Author: mserrano): Yeah, I can do what you suggest. I considered that but it seemed a little hacky so I was trying to think of other things. Regarding guava, I see your point, and there are things in it that are no longer needed now that java has them (functions, predicates, etc). > AppMaster launcher should include eventHandler dependencies and nothing else > from application > - > > Key: TWILL-217 > URL: https://issues.apache.org/jira/browse/TWILL-217 > Project: Apache Twill > Issue Type: Improvement > Components: yarn >Affects Versions: 0.9.0 > Reporter: Martin Serrano >Assignee: Martin Serrano > > Currently the launcher for the appmaster includes the application.jar > libraries. This is to support user code that adds an EventHandler. The > application may have many dependencies and including them in the appmaster > classpath can lead to otherwise inaddressable incompatibilities. > In my case, something in my application's large dependency graph was > interfering with the Kafka server operation. I was not able to determine > what it was but tweaking the appmaster loader to not include my application > jars fixed the issue. > Instead the bundler that creates the twill.jar should include the > EventHandler extension (if any) as an explicit dependency. In this way, only > the jars needed to support the event handler will be on the twill classpath. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-217) AppMaster launcher should include eventHandler dependencies and nothing else from application
[ https://issues.apache.org/jira/browse/TWILL-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883085#comment-15883085 ] Martin Serrano commented on TWILL-217: -- I know. I'm just suggesting the use of ClassPath to generate the list of classes that are the source of the dependency walk. Is upgrading guava acceptable? if so, to what version? > AppMaster launcher should include eventHandler dependencies and nothing else > from application > - > > Key: TWILL-217 > URL: https://issues.apache.org/jira/browse/TWILL-217 > Project: Apache Twill > Issue Type: Improvement > Components: yarn >Affects Versions: 0.9.0 > Reporter: Martin Serrano >Assignee: Martin Serrano > > Currently the launcher for the appmaster includes the application.jar > libraries. This is to support user code that adds an EventHandler. The > application may have many dependencies and including them in the appmaster > classpath can lead to otherwise inaddressable incompatibilities. > In my case, something in my application's large dependency graph was > interfering with the Kafka server operation. I was not able to determine > what it was but tweaking the appmaster loader to not include my application > jars fixed the issue. > Instead the bundler that creates the twill.jar should include the > EventHandler extension (if any) as an explicit dependency. In this way, only > the jars needed to support the event handler will be on the twill classpath. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-217) AppMaster launcher should include eventHandler dependencies and nothing else from application
[ https://issues.apache.org/jira/browse/TWILL-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882642#comment-15882642 ] Martin Serrano commented on TWILL-217: -- I was referring to the need to construct the jar that will be used to create the _Base Classloader_. It should only reference the api and its dependencies right? > AppMaster launcher should include eventHandler dependencies and nothing else > from application > - > > Key: TWILL-217 > URL: https://issues.apache.org/jira/browse/TWILL-217 > Project: Apache Twill > Issue Type: Improvement > Components: yarn >Affects Versions: 0.9.0 > Reporter: Martin Serrano >Assignee: Martin Serrano > > Currently the launcher for the appmaster includes the application.jar > libraries. This is to support user code that adds an EventHandler. The > application may have many dependencies and including them in the appmaster > classpath can lead to otherwise inaddressable incompatibilities. > In my case, something in my application's large dependency graph was > interfering with the Kafka server operation. I was not able to determine > what it was but tweaking the appmaster loader to not include my application > jars fixed the issue. > Instead the bundler that creates the twill.jar should include the > EventHandler extension (if any) as an explicit dependency. In this way, only > the jars needed to support the event handler will be on the twill classpath. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-217) AppMaster launcher should include eventHandler dependencies and nothing else from application
[ https://issues.apache.org/jira/browse/TWILL-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881207#comment-15881207 ] Martin Serrano commented on TWILL-217: -- Seems then that we will need some way to get the list of all org.apache.twill.api.* classes off of the classpath in order to do this dependency walk: * We could do this ourselves with some custom code. * Guava 14 introduced such a utility: [ClassPath|https://github.com/google/guava/wiki/ReflectionExplained#classpath]. What is the history around using guava 13 versus later libraries? * Could sort of cheat, and do something like what {{YarnTwillRunnerService}} does but just for the api package: {code:language=java} // Find all the classpaths for Twill classes. It is used for class filtering when building application jar // in the YarnTwillPreparer Dependencies.findClassDependencies(getClass().getClassLoader(), new ClassAcceptor() { @Override public boolean accept(String className, URL classUrl, URL classPathUrl) { if (!className.startsWith("org.apache.twill.")) { return false; } twillClassPaths.add(classPathUrl); return true; } }, getClass().getName()); {code} Thoughts? > AppMaster launcher should include eventHandler dependencies and nothing else > from application > - > > Key: TWILL-217 > URL: https://issues.apache.org/jira/browse/TWILL-217 > Project: Apache Twill > Issue Type: Improvement > Components: yarn >Affects Versions: 0.9.0 >Reporter: Martin Serrano >Assignee: Martin Serrano > > Currently the launcher for the appmaster includes the application.jar > libraries. This is to support user code that adds an EventHandler. The > application may have many dependencies and including them in the appmaster > classpath can lead to otherwise inaddressable incompatibilities. > In my case, something in my application's large dependency graph was > interfering with the Kafka server operation. I was not able to determine > what it was but tweaking the appmaster loader to not include my application > jars fixed the issue. > Instead the bundler that creates the twill.jar should include the > EventHandler extension (if any) as an explicit dependency. In this way, only > the jars needed to support the event handler will be on the twill classpath. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (TWILL-217) AppMaster launcher should include eventHandler dependencies and nothing else from application
[ https://issues.apache.org/jira/browse/TWILL-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881207#comment-15881207 ] Martin Serrano edited comment on TWILL-217 at 2/23/17 8:31 PM: --- Seems then that we will need some way to get the list of all org.apache.twill.api.* classes off of the classpath in order to do this dependency walk: * We could do this ourselves with some custom code. * Guava 14 introduced such a utility: [ClassPath|https://github.com/google/guava/wiki/ReflectionExplained#classpath]. What is the history around using guava 13 versus later libraries? * Could sort of cheat, and do something like what {{YarnTwillRunnerService}} does but restrict for the api package: {code:language=java} // Find all the classpaths for Twill classes. It is used for class filtering when building application jar // in the YarnTwillPreparer Dependencies.findClassDependencies(getClass().getClassLoader(), new ClassAcceptor() { @Override public boolean accept(String className, URL classUrl, URL classPathUrl) { if (!className.startsWith("org.apache.twill.")) { return false; } twillClassPaths.add(classPathUrl); return true; } }, getClass().getName()); {code} Thoughts? was (Author: mserrano): Seems then that we will need some way to get the list of all org.apache.twill.api.* classes off of the classpath in order to do this dependency walk: * We could do this ourselves with some custom code. * Guava 14 introduced such a utility: [ClassPath|https://github.com/google/guava/wiki/ReflectionExplained#classpath]. What is the history around using guava 13 versus later libraries? * Could sort of cheat, and do something like what {{YarnTwillRunnerService}} does but just for the api package: {code:language=java} // Find all the classpaths for Twill classes. It is used for class filtering when building application jar // in the YarnTwillPreparer Dependencies.findClassDependencies(getClass().getClassLoader(), new ClassAcceptor() { @Override public boolean accept(String className, URL classUrl, URL classPathUrl) { if (!className.startsWith("org.apache.twill.")) { return false; } twillClassPaths.add(classPathUrl); return true; } }, getClass().getName()); {code} Thoughts? > AppMaster launcher should include eventHandler dependencies and nothing else > from application > - > > Key: TWILL-217 > URL: https://issues.apache.org/jira/browse/TWILL-217 > Project: Apache Twill > Issue Type: Improvement > Components: yarn > Affects Versions: 0.9.0 > Reporter: Martin Serrano >Assignee: Martin Serrano > > Currently the launcher for the appmaster includes the application.jar > libraries. This is to support user code that adds an EventHandler. The > application may have many dependencies and including them in the appmaster > classpath can lead to otherwise inaddressable incompatibilities. > In my case, something in my application's large dependency graph was > interfering with the Kafka server operation. I was not able to determine > what it was but tweaking the appmaster loader to not include my application > jars fixed the issue. > Instead the bundler that creates the twill.jar should include the > EventHandler extension (if any) as an explicit dependency. In this way, only > the jars needed to support the event handler will be on the twill classpath. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: Release of Twill 0.10.0
Terence is correct. It is easily worked around by supplying the jar so shouldn't be consider a blocker. Martin Sent from my Verizon Wireless 4G LTE DROID On Feb 19, 2017 3:30 AM, Henry Saputrawrote: sounds good to me, thanks On Sat, Feb 18, 2017 at 11:40 PM, Terence Yim wrote: > I believe TWILL-215 comes from the missing logback library from the client > as discussed on another email thread. I am already preparing the 0.10.0 > release and we can always have a new release that includes the fix when > deemed necessary. > > Terence > > On Sat, Feb 18, 2017 at 11:25 PM, Henry Saputra > wrote: > > > The potential blocker issue is TWILL-215. > > > > @MartinSerrano, could you list steps to repro this issue? > > > > Want to figure out if this is a blocker for 0.10.0 release > > > > - Henry > > > > On Thu, Feb 16, 2017 at 9:38 AM, Terence Yim wrote: > > > > > Hi all, > > > > > > We've accumulated quite some enhancements and bug fixes and I think > it's > > > good time to have a new twill release. I am planning to send out a vote > > > tomorrow (2/17). Please let me know if there is any concern. > > > > > > Terence > > > > > >
[jira] [Commented] (TWILL-217) AppMaster launcher should include eventHandler dependencies and nothing else from application
[ https://issues.apache.org/jira/browse/TWILL-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870007#comment-15870007 ] Martin Serrano commented on TWILL-217: -- That sounds reasonable. Thanks! > AppMaster launcher should include eventHandler dependencies and nothing else > from application > - > > Key: TWILL-217 > URL: https://issues.apache.org/jira/browse/TWILL-217 > Project: Apache Twill > Issue Type: Improvement > Components: yarn >Affects Versions: 0.9.0 > Reporter: Martin Serrano >Assignee: Martin Serrano > > Currently the launcher for the appmaster includes the application.jar > libraries. This is to support user code that adds an EventHandler. The > application may have many dependencies and including them in the appmaster > classpath can lead to otherwise inaddressable incompatibilities. > In my case, something in my application's large dependency graph was > interfering with the Kafka server operation. I was not able to determine > what it was but tweaking the appmaster loader to not include my application > jars fixed the issue. > Instead the bundler that creates the twill.jar should include the > EventHandler extension (if any) as an explicit dependency. In this way, only > the jars needed to support the event handler will be on the twill classpath. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (TWILL-218) The implicit jopt-simple dependency should be made explicit
Martin Serrano created TWILL-218: Summary: The implicit jopt-simple dependency should be made explicit Key: TWILL-218 URL: https://issues.apache.org/jira/browse/TWILL-218 Project: Apache Twill Issue Type: Improvement Components: core Affects Versions: 0.9.0 Reporter: Martin Serrano Assignee: Martin Serrano Fix For: 0.10.0 Kafka has a dependency on jopt-simple. It seems that some of the scala code is opaque to twill dependency resolution because this dependency is not found. In environments that do not ship with jopt-simple, this shows up as a class not found exception which prevents the kafka service from coming up completely. While it would be better to have dependency resolution that discovered this on its own, explicitly adding the dependency solves the problem with minimal changes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
classpath for the appmaster
Devs, In our full deployment environment I could not get the Kafka forwarding of logs to work. I kept getting Kafka errors on the AM trying to lookup the topic. Seeing as how I had been able to get this same runnable working in a unit test environment I figured it had to do with the classpath. Looking deeper I saw that the AM runs with the application.jar contents on its classpath. Why is that? It seems to me that the runnable classpaths should never be part of the AM. I changed the TwillLauncher to not use the application.jar for the AM classpath and got a CNFE in the AM for jsimpleopt.OptionSpec. It seems this is an implicit dependency of Kafka that is not currently discovered by the dependency mechanism (presumably because Kafka is written in Scala). When I added jsimpleopt-3.2.jar to my classpath and as a dependency class for the AM everything worked! I was not getting the CNFE when application.jar/lib/* was on the classpath so something in my application libs must have been picked up by Kafka initialization. IMO, the AppMaster is internal Twill code and its dependencies should be fully provided by the distribution and self-contained. That may present some build challenges, but users should not ever run into this stuff. I'll file and ticket and submit a PR if there is agreement on this, but the application.jar/lib/* being on the AM classpath seems pretty intentional from looking at the code. Cheers, -Martin
Re: twill dependency on logback
Twill should not impose a logging framework on the runnables IMO. I think slf4j is reasonable and since twill is handling the Kafka hookup, the runnables themselves need know nothing about logback correct? I think ultimately, how logging is handled should be pluggable. In our use case, the kafka tie-in is a nice idea, but for most cases will not fare so well. For instance: * The log handlers are only relevant in the process which starts the application. Once that jvm is gone, the kafka queue is never consumed again. Our yarn clients are typically established for control purposes and are ephemeral. I realize that the kafka queue could be consumed by a new process but it doesn't seem like that is part of the API. * A different strategy may be appropriate for some applications/runnables. For example I may want to accumulate my logging in HDFS or just local to the AM . * Right now our yarn applications dump logs locally and we use custom endpoints in the applications to pull logs for analysis. Otherwise they just accumulate and are then subject to the yarn default aggregation policy. * We have several output log files to segregate different activity. I'd like to be able to replicate that in the twill environment. To sum up, I'm not quite sure how to handle logging yet in the twill environment, but I'm pretty sure we'll need more flexibility and some type of pluggability. Cheers, Martin On 02/10/2017 02:22 PM, Henry Saputra wrote: Ah ok, thanks for clarifying your concern, Martin. Twill currently do need the logback in the runnables running the YARN to be able to collect the logs and publish them to the embedded Kafka. So in your case, you want to use slf4j but backed up by log4j binding instead for client and runnables? - Henry On Fri, Feb 10, 2017 at 9:24 AM, Martin Serrano <mar...@attivio.com> wrote: Terence, Correct, I don't want to use logback on the client side. slf4j is okay -- we ship with the log4j binding. In the twill containers, the AM doesn't matter for me, but the runnables does.So yes, flexibility for the runnables is important. Thanks, Martin On 02/10/2017 01:56 AM, Terence Yim wrote: Hi Martin, If I understand correctly, your intention is to not using slf4j + logback on the client side? How about the twill containers (both AM and runnables)? Is it ok to use logback or you want twill to be more flexible about that? I understand the failure on the AM that you mentioned, I am just wondering what's your end goal looks like to shape a better solution for this. Terence Sent from my iPhone On Feb 9, 2017, at 11:33 AM, Martin Serrano <mar...@attivio.com> wrote: Terence, I'm familiar with the logback appender and Kafka code. My point is this: * the AppMaster depends on logback. * when the YarnTwillPreparer class calls createTwillJar it is creating the runtime jar for the AppMaster from the current classpath (or more accurately from the classloader used by the current thread). * this means the logback jar will not be within the twill jar unless it is currently on the classpath of the client. The current dependency code ignores dependent classes which are not found in the classpath while walking the dependency graph. This is what leads to the class not found exception when starting the appmaster. This is why I filed TWILL-215. * having the logback jar in the current classpath turns on logback within my twill client code since I use slf4j. Does that make sense? -Martin On 02/09/2017 02:19 PM, Terence Yim wrote: Hi Martin, Twill has a logback Appender implementation for capturing logs emitted via slf4j api from runnable and publish them to the embedded Kafka running inside the AM process. If you are using log4j as the API for emitting logs, what you can do is to use the log4j-over-slf4j bridge to have logs emitted via the log4j API get bridged to slf4j. I suspect why you are seeing the class missing error is most likely because you have the slf4j to log4j bridge (the reverse of the one I mentioned above, look for a jar with name containing "slf4j-log4j12" in the client classpath) that comes earlier in the classpath then the logback jars. Terence On Thu, Feb 9, 2017 at 10:47 AM, Martin Serrano <mar...@attivio.com> wrote: Henry, I see this behavior deploying with YARN 2.7.1, HDP 2.3. But I'm not sure you understood my issue. * The logback jar dependency is only picked up if it is on the classpath when the bundle is created. * With logback in my twill client classpath, the appmaster starts fine. However without logback in my client classpath the appmaster will get a ClassNotFoundException. * We use log4j and with logback in my client classpath, it takes over the slf4j bindings and I lose control of the client logging. So my question was about whether this is expected or if there is a well-known procedure for working around it. It seems there should be a way to tell the twill system to where to fin
Re: Gaining Control of Runnable after client terminates
Caleb, Sorry, made a mistake there. In the simple case the application has the simple classname, not the full one. -Martin On 02/10/2017 02:02 PM, Martin Serrano wrote: Caleb, The TwillRunnerService handles the ZK registration of applications for you. When TwillClient B starts the service it can get the controllers for the application via the TwillRunnerService.lookup method. If you did not explicitly name your application, then your are probably using the prepare method that names the application to the full classname of your runnable. -Martin On 02/09/2017 10:53 AM, Meier, Caleb wrote: Hello, Suppose that I start an instance of NotificationRunnable with TwillClient A. NotificationRunnable continues to run after TwillClient A terminates (I no longer have a handle on the YarnTwillRunner that prepared it), but I want to send a command to NotificatonRunnable using an instance of TwillController that I create in TwillClient B. Is this possible? It seems like I would still need access to the YarnTwillRunner that started the NotificationRunnable. I’m basing this on the source code for YarnTwillRunner.lookup(…) – it seems like this method only returns controllers for runnables that the enclosing instance of YarnTwillRunner has prepared. Am I mistaken about this? If not, do I need to explicitly register my application with a ZkDiscoveryService and then look it up later through the same service? Is there a better way to go about this? Thanks, Caleb A. Meier, Ph.D. Software Engineer II ♦ Analyst Parsons Corporation 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209 Office: (703)797-3066 caleb.me...@parsons.com<mailto:caleb.me...@parsons.com> ♦ www.parsons.com<https://webportal.parsons.com/,DanaInfo=www.parsons.com+>
Re: twill dependency on logback
Terence, Correct, I don't want to use logback on the client side. slf4j is okay -- we ship with the log4j binding. In the twill containers, the AM doesn't matter for me, but the runnables does.So yes, flexibility for the runnables is important. Thanks, Martin On 02/10/2017 01:56 AM, Terence Yim wrote: Hi Martin, If I understand correctly, your intention is to not using slf4j + logback on the client side? How about the twill containers (both AM and runnables)? Is it ok to use logback or you want twill to be more flexible about that? I understand the failure on the AM that you mentioned, I am just wondering what's your end goal looks like to shape a better solution for this. Terence Sent from my iPhone On Feb 9, 2017, at 11:33 AM, Martin Serrano <mar...@attivio.com> wrote: Terence, I'm familiar with the logback appender and Kafka code. My point is this: * the AppMaster depends on logback. * when the YarnTwillPreparer class calls createTwillJar it is creating the runtime jar for the AppMaster from the current classpath (or more accurately from the classloader used by the current thread). * this means the logback jar will not be within the twill jar unless it is currently on the classpath of the client. The current dependency code ignores dependent classes which are not found in the classpath while walking the dependency graph. This is what leads to the class not found exception when starting the appmaster. This is why I filed TWILL-215. * having the logback jar in the current classpath turns on logback within my twill client code since I use slf4j. Does that make sense? -Martin On 02/09/2017 02:19 PM, Terence Yim wrote: Hi Martin, Twill has a logback Appender implementation for capturing logs emitted via slf4j api from runnable and publish them to the embedded Kafka running inside the AM process. If you are using log4j as the API for emitting logs, what you can do is to use the log4j-over-slf4j bridge to have logs emitted via the log4j API get bridged to slf4j. I suspect why you are seeing the class missing error is most likely because you have the slf4j to log4j bridge (the reverse of the one I mentioned above, look for a jar with name containing "slf4j-log4j12" in the client classpath) that comes earlier in the classpath then the logback jars. Terence On Thu, Feb 9, 2017 at 10:47 AM, Martin Serrano <mar...@attivio.com> wrote: Henry, I see this behavior deploying with YARN 2.7.1, HDP 2.3. But I'm not sure you understood my issue. * The logback jar dependency is only picked up if it is on the classpath when the bundle is created. * With logback in my twill client classpath, the appmaster starts fine. However without logback in my client classpath the appmaster will get a ClassNotFoundException. * We use log4j and with logback in my client classpath, it takes over the slf4j bindings and I lose control of the client logging. So my question was about whether this is expected or if there is a well-known procedure for working around it. It seems there should be a way to tell the twill system to where to find the appmaster dependencies without having them in the classpath of the twill client. Thanks! -Martin On 02/08/2017 08:09 PM, Henry Saputra wrote: But the logback dependency should be included in the jar packaging that YARN client sends for Twill ApplicationMaster. Are you seeing this behavior in deploying Twill app in latest YARN? - Henry On Wed, Feb 8, 2017 at 12:30 PM, Martin Serrano <mar...@attivio.com> wrote: Hey Devs, It seems like the twill project goes through some pain to try to insulate itself logging frameworks. I see use of the slf4j API. However, the appmaster code has a dependency on logback via the org.apache.twill.internal.logging.Loggings class. The appmaster will not start up without this dependency present. With the dependency code as it is now, there is no way to include the logback jar in the generated bundle without it being on the current classpath. I've created a ticket (TWILL-215) to make a missing dependency trigger an exception at bundle generation time rather than appmaster execution time. When the logback jar is on my classpath, my client code picks up logback instead of our current logger (log4j). Is this what is expected? Is there any known workaround? It seems like there may be a case for specifying dependencies of the appmaster that are located outside of the current jvm classpath. Thanks, Martin
Re: twill dependency on logback
Any chance we could slack/chat on this tomorrow? Sent from my Verizon Wireless 4G LTE DROID On Feb 8, 2017 8:09 PM, Henry Saputra <henry.sapu...@gmail.com> wrote: But the logback dependency should be included in the jar packaging that YARN client sends for Twill ApplicationMaster. Are you seeing this behavior in deploying Twill app in latest YARN? - Henry On Wed, Feb 8, 2017 at 12:30 PM, Martin Serrano <mar...@attivio.com> wrote: > Hey Devs, > > It seems like the twill project goes through some pain to try to insulate > itself logging frameworks. I see use of the slf4j API. However, the > appmaster code has a dependency on logback via the > org.apache.twill.internal.logging.Loggings class. The appmaster will > not start up without this dependency present. With the dependency code as > it is now, there is no way to include the logback jar in the generated > bundle without it being on the current classpath. I've created a ticket > (TWILL-215) to make a missing dependency trigger an exception at bundle > generation time rather than appmaster execution time. > > When the logback jar is on my classpath, my client code picks up logback > instead of our current logger (log4j). Is this what is expected? Is there > any known workaround? It seems like there may be a case for specifying > dependencies of the appmaster that are located outside of the current jvm > classpath. > > Thanks, > Martin > >
[jira] [Commented] (TWILL-215) Dependencies not on classpath lead to runtime startup error
[ https://issues.apache.org/jira/browse/TWILL-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858607#comment-15858607 ] Martin Serrano commented on TWILL-215: -- The details got pretty messy. I still think a github diff/pr is the best way to discuss. Should I just submit a PR? > Dependencies not on classpath lead to runtime startup error > --- > > Key: TWILL-215 > URL: https://issues.apache.org/jira/browse/TWILL-215 > Project: Apache Twill > Issue Type: Bug > Components: core >Affects Versions: 0.9.0 > Reporter: Martin Serrano >Priority: Critical > Fix For: 0.10.0 > > > We do not use logback in our environment but it is a dependency of > {{ApplicationMasterMain}}. When {{YarnTwillPreparer.createTwillJar}} is > called in our environment, the logback jar is not on our classpath. For a > class not in the classpath, the {{Dependencies.findClassDependencies}} method > ignores it. This leads to a runtime startup error when the app master tries > to start. > This is easily fixed unless there some use case for ignoring the dependency > when it is not on the classpath. An exception should be thrown and no yarn > job should be submitted. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
twill dependency on logback
Hey Devs, It seems like the twill project goes through some pain to try to insulate itself logging frameworks. I see use of the slf4j API. However, the appmaster code has a dependency on logback via the org.apache.twill.internal.logging.Loggings class. The appmaster will not start up without this dependency present. With the dependency code as it is now, there is no way to include the logback jar in the generated bundle without it being on the current classpath. I've created a ticket (TWILL-215) to make a missing dependency trigger an exception at bundle generation time rather than appmaster execution time. When the logback jar is on my classpath, my client code picks up logback instead of our current logger (log4j). Is this what is expected? Is there any known workaround? It seems like there may be a case for specifying dependencies of the appmaster that are located outside of the current jvm classpath. Thanks, Martin
[jira] [Commented] (TWILL-215) Dependencies not on classpath lead to runtime startup error
[ https://issues.apache.org/jira/browse/TWILL-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858268#comment-15858268 ] Martin Serrano commented on TWILL-215: -- You get a {{ClassNotFoundException}} trying to start the app master. My view would be that a preventable error like this should be prevented since the twill system depends on the appmaster to be able to discover any startup errors. I'm going to submit a PR for the proposed fix (throwing an exception when the bundle is being created) since I think the details will warrant discussion. > Dependencies not on classpath lead to runtime startup error > --- > > Key: TWILL-215 > URL: https://issues.apache.org/jira/browse/TWILL-215 > Project: Apache Twill > Issue Type: Bug > Components: core >Affects Versions: 0.9.0 > Reporter: Martin Serrano >Priority: Critical > Fix For: 0.10.0 > > > We do not use logback in our environment but it is a dependency of > {{ApplicationMasterMain}}. When {{YarnTwillPreparer.createTwillJar}} is > called in our environment, the logback jar is not on our classpath. For a > class not in the classpath, the {{Dependencies.findClassDependencies}} method > ignores it. This leads to a runtime startup error when the app master tries > to start. > This is easily fixed unless there some use case for ignoring the dependency > when it is not on the classpath. An exception should be thrown and no yarn > job should be submitted. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-213) Increase of instances while starting up may lead to ignored retries and instance increases
[ https://issues.apache.org/jira/browse/TWILL-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856669#comment-15856669 ] Martin Serrano commented on TWILL-213: -- I've updated the description with what I think is happening. > Increase of instances while starting up may lead to ignored retries and > instance increases > -- > > Key: TWILL-213 > URL: https://issues.apache.org/jira/browse/TWILL-213 > Project: Apache Twill > Issue Type: Bug > Components: yarn >Affects Versions: 0.9.0 > Reporter: Martin Serrano > > As seen in the test development for TWILL-181, if the number of instances for > a container is increased before the {{ApplicationMasterService}} has observed > the original request as being satisfied, the instance increase and any > subsequent retries will be blocked. This is because in {{launchRunnable}}: > {code} > TwillContainerLauncher launcher = new TwillContainerLauncher( > twillSpec.getRunnables().get(runnableName), > processLauncher.getContainerInfo(), launchContext, > ZKClients.namespace(zkClient, getZKNamespace(runnableName)), > containerCount, jvmOpts, reservedMemory, getSecureStoreLocation()); > runningContainers.start(runnableName, > processLauncher.getContainerInfo(), launcher); > // Need to call complete to workaround bug in YARN AMRMClient > if (provisionRequest.containerAcquired()) { > amClient.completeContainerRequest(provisionRequest.getRequestId()); > } > /* >* The provisionRequest will either contain a single container > (ALLOCATE_ONE_INSTANCE_AT_A_TIME), or all the >* containers to satisfy the expectedContainers count. In the later > case, the provision request is complete once >* all the containers have run at which point we poll() to remove the > provisioning request. >*/ > if (expectedContainers.getExpected(runnableName) == > runningContainers.count(runnableName) || > > provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME)) > { > provisioning.poll(); > } > {code} > There is a race condition. The sequence: > * *Thread A*: {{runningContainers.start}} is called and 2 instances are > started > * *Thread B*: The runnable from {{createSetInstanceRunnable}} executes, sees > the 2 instances are started and updates the expected count to 3. > * *Thread A*: Gets to if check comparing expectedContainers (3) to > runningContainers.count (2). Since this fails, {{poll}} is not called and > this provision request is not satisfied. > Subsequent calls will try to provision the 3rd container because it seems > like the first provision request is not yet satisfied. > The {{MaxRetriesTestRun.maxRetriesWithIncreasedInstances}} method can be used > to reproduce this case intermittently by changing the {{allRunning.await}} > check to something that does a countdown latch {{onRunning}} as > {{EchoServerTestRun}} does. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: interest in bare-bones cluster/agent solution?
Devs, I've got an initial prototype going on this project. During this process I noticed that much of what I needed to do ended up duplicating code that is in the twill-yarn project. I'm curious if the community would be open to refactoring that moved some of this code such that it could be shared by different clustering control implementations. For instance, much of the code in YarnTwillPreparer, YarnContainerMain, YarnServiceMain, etc is not really yarn specific. Thoughts? Thanks, Martin On 02/01/2017 04:13 PM, Martin Serrano wrote: Hey Devs, I've been evangelizing Twill at here at my company. We have a for several years had a basic clustering solution for running our system and its components across multiple hosts. As we've worked to migrate the platform to Hadoop and YARN, we have started to move our cluster control to be Yarn-based. And my goal is to make it Twill-based. However as we do this work we anticipate the need to support customers that do not have and do not want to operate Hadoop infrastructure. I'm working on an architecture which would use Twill for command and control for all of our services. For customers that did not want the full enterprise capabilities that come with Hadoop we could continue to offer our basic clustering support (albeit with reduced capabilities) by plugging our clustering solution into Twill. One of the aspects of Twill that interested me from the start was that the control API was abstracted from YARN. A couple questions: 1) Do any plugins for other clustering backends already exist? Open source? Commercial? 2) If we were to do this, would the Twill community be interested in a donation of this code to the Twill project? I recall that Henry and I had a conversation at the last Apache Big Data that there had been talk of plugins for other clustering backends but I don't know if anything ever came of that. Cheers, Martin Serrano
[jira] [Updated] (TWILL-213) Increase of instances while starting up may lead to ignored retries and instance increases
[ https://issues.apache.org/jira/browse/TWILL-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Serrano updated TWILL-213: - Description: As seen in the test development for TWILL-181, if the number of instances for a container is increased before the {{ApplicationMasterService}} has observed the original request as being satisfied, the instance increase and any subsequent retries will be blocked. This is because in {{launchRunnable}}: {code} if (expectedContainers.getExpected(runnableName) == runningContainers.count(runnableName) || provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME)) { provisioning.poll(); } {code} we are comparing the expected containers to the running count to decide if {{provisioning.poll()}} should be called. If a new instance request has been made, the expected containers will have been updated and the running count never will. The {{MaxRetriesTestRun.maxRetriesWithIncreasedInstances}} method can be used to reproduce this case intermittently by changing the {{allRunning.await}} check to something that does a countdown latch {{onRunning}} as {{EchoServerTestRun}} does. was: As seen in the test development for TWILL-181, if the number of instances for a container is increased before the `ApplicationMasterService` has observed the original request as being satisfied, the instance increase and any subsequent retries will be blocked. This is because in `launchRunnable`: {code} if (expectedContainers.getExpected(runnableName) == runningContainers.count(runnableName) || provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME)) { provisioning.poll(); } {code} we are comparing the expected containers to the running count to decide if `provisioning.poll()` should be called. If a new instance request has been made, the expected containers will have been updated and the running count never will. The `MaxRetriesTestRun.maxRetriesWithIncreasedInstances` method can be used to reproduce this case intermittently by changing the `allRunning.await` check to something that does a countdown latch `onRunning` as `EchoServerTestRun` does. > Increase of instances while starting up may lead to ignored retries and > instance increases > -- > > Key: TWILL-213 > URL: https://issues.apache.org/jira/browse/TWILL-213 > Project: Apache Twill > Issue Type: Bug > Components: yarn >Affects Versions: 0.9.0 > Reporter: Martin Serrano > > As seen in the test development for TWILL-181, if the number of instances for > a container is increased before the {{ApplicationMasterService}} has observed > the original request as being satisfied, the instance increase and any > subsequent retries will be blocked. This is because in {{launchRunnable}}: > {code} > if (expectedContainers.getExpected(runnableName) == > runningContainers.count(runnableName) || > > provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME)) > { > provisioning.poll(); > } > {code} > we are comparing the expected containers to the running count to decide if > {{provisioning.poll()}} should be called. If a new instance request has > been made, the expected containers will have been updated and the running > count never will. The {{MaxRetriesTestRun.maxRetriesWithIncreasedInstances}} > method can be used to reproduce this case intermittently by changing the > {{allRunning.await}} check to something that does a countdown latch > {{onRunning}} as {{EchoServerTestRun}} does. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (TWILL-213) Increase of instances while starting up may lead to ignored retries and instance increases
Martin Serrano created TWILL-213: Summary: Increase of instances while starting up may lead to ignored retries and instance increases Key: TWILL-213 URL: https://issues.apache.org/jira/browse/TWILL-213 Project: Apache Twill Issue Type: Bug Components: yarn Affects Versions: 0.9.0 Reporter: Martin Serrano As seen in the test development for TWILL-181, if the number of instances for a container is increased before the `ApplicationMasterService` has observed the original request as being satisfied, the instance increase and any subsequent retries will be blocked. This is because in `launchRunnable`: {code} if (expectedContainers.getExpected(runnableName) == runningContainers.count(runnableName) || provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME)) { provisioning.poll(); } {code} we are comparing the expected containers to the running count to decide if `provisioning.poll()` should be called. If a new instance request has been made, the expected containers will have been updated and the running count never will. The `MaxRetriesTestRun.maxRetriesWithIncreasedInstances` method can be used to reproduce this case intermittently by changing the `allRunning.await` check to something that does a countdown latch `onRunning` as `EchoServerTestRun` does. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
interest in bare-bones cluster/agent solution?
Hey Devs, I've been evangelizing Twill at here at my company. We have a for several years had a basic clustering solution for running our system and its components across multiple hosts. As we've worked to migrate the platform to Hadoop and YARN, we have started to move our cluster control to be Yarn-based. And my goal is to make it Twill-based. However as we do this work we anticipate the need to support customers that do not have and do not want to operate Hadoop infrastructure. I'm working on an architecture which would use Twill for command and control for all of our services. For customers that did not want the full enterprise capabilities that come with Hadoop we could continue to offer our basic clustering support (albeit with reduced capabilities) by plugging our clustering solution into Twill. One of the aspects of Twill that interested me from the start was that the control API was abstracted from YARN. A couple questions: 1) Do any plugins for other clustering backends already exist? Open source? Commercial? 2) If we were to do this, would the Twill community be interested in a donation of this code to the Twill project? I recall that Henry and I had a conversation at the last Apache Big Data that there had been talk of plugins for other clustering backends but I don't know if anything ever came of that. Cheers, Martin Serrano
Re: [DISCUSS] HasDependencies tagging interface
Yes, this was my intention. In our case, the required additional classes can be determined dynamically based on configuration. It makes sense for the code that does this determination to live with the runnable rather than the controller. -Martin On 01/17/2017 09:08 PM, Andreas Neumann wrote: I guess the difference is decoupling of the preparer from the runnable. Martin's approach makes it a property of the runnable itself, so the preparer can derive this information. That is, I can modify my runnable without having to modify my invocation of the preparer. Thoughts? -Andreas. On Tue, Jan 17, 2017 at 5:56 PM, Terence Yim <cht...@gmail.com> wrote: Hi Martin, Is it not doable via the TwillPreparer.withDependencies method? Terence On Tue, Jan 17, 2017 at 2:56 PM, Martin Serrano <mar...@attivio.com> wrote: Team, I have some untraceable dependencies for one of my runnables. It occurs to me that preparing and launching the runnable is not always the best place to define these dependencies (using withDependences method). The runnable itself will always have these deps (there is static xml configuration embedded in the lib). What would folks think of the idea of a tagging interface that TwillPreparer would check and insert the deps itself. Something like: public interface HasDependencies { Iterable<Class> dependencies(); } This interface could be added to any implementation of TwillRunnable. Thoughts? -Martin
[DISCUSS] HasDependencies tagging interface
Team, I have some untraceable dependencies for one of my runnables. It occurs to me that preparing and launching the runnable is not always the best place to define these dependencies (using withDependences method). The runnable itself will always have these deps (there is static xml configuration embedded in the lib). What would folks think of the idea of a tagging interface that TwillPreparer would check and insert the deps itself. Something like: public interface HasDependencies { Iterabledependencies(); } This interface could be added to any implementation of TwillRunnable. Thoughts? -Martin
Re: Desired behavior at shutdown
Created https://issues.apache.org/jira/browse/TWILL-204 On 01/14/2017 11:47 AM, Martin Serrano wrote: Terence, I'm going to make a ticket and move discussion to that. Okay? -Martin
Re: Desired behavior at shutdown
Terence, I'm going to make a ticket and move discussion to that. Okay? -Martin On 01/07/2017 02:29 PM, Terence Yim wrote: Hi Martin, One simpler way is that we could use the messaging mechanism in Twill to send a message from controller to AM during shutdown, after the controller sees a special log event emitted by AM after the AM shuts down all containers. The AM will wait for the message from controller before completely shutting down itself (with some reasonable timeout to avoid infinite wait). Terence On Fri, Jan 6, 2017 at 4:46 PM, Martin Serrano <mar...@attivio.com> wrote: Terence, I see your point. I've thought a bit about this and it seems the only solution would be to coordinate via ZK between the controllers and the AM. The solution would be something like this: * Controller consumers register an ephemeral znode within under kafka znode to indicate they are clients * Controllers listen on the .../kafka/broker znode for a child named shuttingDown * When AM reaches the state where it wants to shut down, it creates the .../kafka/broker/shuttingDown znode and waits for there to be no registered controllers. This wait would have an upper bound to prevent eternal waiting. * Once controller consumer sees the shuttingDown node, if it receives an emtpy messages buffer it shuts itself down. * Controller consumers remove their registration znode when they shut down * The AM shuts down the broker once all controller consumers are gone or it has reached its timeout This solution avoids the checkpointing load and znode use scales with the number of consumers which is presumably smallish. There are no net z-ops outside of consumer creation and shutdown time. However, I consider this a complex kind of setup and this code tends to be harder to maintain since the logic is spread amongst different application layers. I don't see any other way to ensure full reading of the kafka queues given the decoupled nature of the broker and client. Something else I thought of that would usually alleviate the issue but be much simpler would be to have an extended timeout before broker shutdown if the containers exit with a non-success error code (say 15s). The shutdown timeouts could also be made configurable. What do you think? -Martin On 01/05/2017 10:53 PM, Terence Yim wrote: Hi Martin, I do agree that the AM should only shutdown the Embedded Kafka server once all the controllers see all the logs. However, the difficulties is in how does the AM knows about it? The Twill controller is using simple Kafka API instead of the higher level one (as that one involves checkpointing to ZK, as we don't want running many twill apps put a heavy load on ZK). Do you have any suggestions how to do that? Thanks, Terence Sent from my iPhone On Jan 5, 2017, at 2:42 PM, Martin Serrano <mar...@attivio.com> wrote: Actually, after further investigation, I realize the server side has to be dealt with because it is shutting down the Kafka broker before all the messages are read from it. I see that there is a 2 second delay for clients to pull what they can first. What would folks think about an algorithm that checked the topic for unread messages and had a longer timeout (say 30s) as long as there were messages to be received still? Is there an issue that the client may not be present on the other side and that the delay of shutting down the AM would be undesirable? -Martin On 01/05/2017 12:32 PM, Martin Serrano wrote: All, I'm encountering a situation on a fast machine where the Kafka log aggregation topic is not empty when the system shuts down. The scenario: * log consumer consumes all messages * consumer sleeps (500ms) due to empty queue * containers exit, posting /final log messages/ about why * controller notices containers are down and terminates consumers. * consumer is interrupted from sleep and but has been canceled so it does not get the rest of the messages. This scenario can be really confusing during development because an error may be missed (as in my case) if it falls into the /final log messages/. Before I file a ticket and fix this, I wanted to get some feedback. Looking at org.apache.twill.internal.kafka.client.SimpleKafkaConsumer it seems this behavior could be intentional given this log message (line 384): LOG.debug("Unable to fetch messages on {}, kafka consumer service shutdown is in progress.", topicPart); My opinion is that final messages logged by a container are likely to be critical in diagnosing errors and that twill should do whatever it can to forward them before shutting things down. If there is agreement on this I'll file a ticket and fix it. My general approach would be to indicate to the consumer that it is in a shuttingDown state which it would use to break from the consume loop once the message set was empty. If this makes sense would we need to support a timeout for the maximum amount of time to be in thi
Re: Desired behavior at shutdown
Terence, I see your point. I've thought a bit about this and it seems the only solution would be to coordinate via ZK between the controllers and the AM. The solution would be something like this: * Controller consumers register an ephemeral znode within under kafka znode to indicate they are clients * Controllers listen on the .../kafka/broker znode for a child named shuttingDown * When AM reaches the state where it wants to shut down, it creates the .../kafka/broker/shuttingDown znode and waits for there to be no registered controllers. This wait would have an upper bound to prevent eternal waiting. * Once controller consumer sees the shuttingDown node, if it receives an emtpy messages buffer it shuts itself down. * Controller consumers remove their registration znode when they shut down * The AM shuts down the broker once all controller consumers are gone or it has reached its timeout This solution avoids the checkpointing load and znode use scales with the number of consumers which is presumably smallish. There are no net z-ops outside of consumer creation and shutdown time. However, I consider this a complex kind of setup and this code tends to be harder to maintain since the logic is spread amongst different application layers. I don't see any other way to ensure full reading of the kafka queues given the decoupled nature of the broker and client. Something else I thought of that would usually alleviate the issue but be much simpler would be to have an extended timeout before broker shutdown if the containers exit with a non-success error code (say 15s). The shutdown timeouts could also be made configurable. What do you think? -Martin On 01/05/2017 10:53 PM, Terence Yim wrote: Hi Martin, I do agree that the AM should only shutdown the Embedded Kafka server once all the controllers see all the logs. However, the difficulties is in how does the AM knows about it? The Twill controller is using simple Kafka API instead of the higher level one (as that one involves checkpointing to ZK, as we don't want running many twill apps put a heavy load on ZK). Do you have any suggestions how to do that? Thanks, Terence Sent from my iPhone On Jan 5, 2017, at 2:42 PM, Martin Serrano <mar...@attivio.com> wrote: Actually, after further investigation, I realize the server side has to be dealt with because it is shutting down the Kafka broker before all the messages are read from it. I see that there is a 2 second delay for clients to pull what they can first. What would folks think about an algorithm that checked the topic for unread messages and had a longer timeout (say 30s) as long as there were messages to be received still? Is there an issue that the client may not be present on the other side and that the delay of shutting down the AM would be undesirable? -Martin On 01/05/2017 12:32 PM, Martin Serrano wrote: All, I'm encountering a situation on a fast machine where the Kafka log aggregation topic is not empty when the system shuts down. The scenario: * log consumer consumes all messages * consumer sleeps (500ms) due to empty queue * containers exit, posting /final log messages/ about why * controller notices containers are down and terminates consumers. * consumer is interrupted from sleep and but has been canceled so it does not get the rest of the messages. This scenario can be really confusing during development because an error may be missed (as in my case) if it falls into the /final log messages/. Before I file a ticket and fix this, I wanted to get some feedback. Looking at org.apache.twill.internal.kafka.client.SimpleKafkaConsumer it seems this behavior could be intentional given this log message (line 384): LOG.debug("Unable to fetch messages on {}, kafka consumer service shutdown is in progress.", topicPart); My opinion is that final messages logged by a container are likely to be critical in diagnosing errors and that twill should do whatever it can to forward them before shutting things down. If there is agreement on this I'll file a ticket and fix it. My general approach would be to indicate to the consumer that it is in a shuttingDown state which it would use to break from the consume loop once the message set was empty. If this makes sense would we need to support a timeout for the maximum amount of time to be in this state before punting on the rest of the messages? My instinct is no, get them all, but given the way the code is set up now, perhaps there are good reasons to timeout. Thanks, Martin Serrano
[jira] [Commented] (TWILL-181) Control the maximum number of retries for failed application starts
[ https://issues.apache.org/jira/browse/TWILL-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433574#comment-15433574 ] Martin Serrano commented on TWILL-181: -- I'm going to start taking a look at this. > Control the maximum number of retries for failed application starts > --- > > Key: TWILL-181 > URL: https://issues.apache.org/jira/browse/TWILL-181 > Project: Apache Twill > Issue Type: Improvement > Components: yarn >Affects Versions: 0.7.0-incubating > Reporter: Martin Serrano > Fix For: 0.8.0 > > > If an application consistently exits with a non-zero code, twill will > attempt to restart indefinitely. I ran into this issue and a list search > also reveals [others| http://markmail.org/message/dehx7r6tpqgcmjh4]. > There should be a mechanism to specify the maximum number of retries until > the application fails. Ideally by default there would be a non-infinite > maximum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: TwillRunnerService usage
Thanks! Same here. On 08/10/2016 05:48 PM, Henry Saputra wrote: > Hi Martin, > > Glad to see you in the dev@ list =) > > Looking forward to working with you and your team. > > - Henry > > > On Wed, Aug 10, 2016 at 1:39 PM, Martin Serrano <mar...@attivio.com> wrote: > >> On 08/09/2016 01:45 PM, Terence Yim wrote: >>> Hi Martin, >>> >>> Currently there is no way of knowing when the TwillRunnerService finished >>> the first sync up with the ZK. Would you mind opening a JIRA for that? >> https://issues.apache.org/jira/browse/TWILL-183 >> >>> For the YARN application state inside the YarnTwillController, currently >> it >>> is not surfaced. Potentially it can be added to the >>> TwillController.getResourceReport() method. >>> >>> We are also looking for improving the state reporting through the >>> TwillController after an app was submitted via surfacing more information >>> about individual app state/resource and cluster resources. Would you mind >>> file JIRA(s) for them as well? >> I will. I need to research a bit to write a good ticket. >> >> -Martin >>> Thanks, >>> Terence >>> >>> On Mon, Aug 8, 2016 at 1:07 PM, Martin Serrano <mar...@attivio.com> >> wrote: >>>> Hi, >>>> >>>> I see from the source that the TwillRunnerService is locating existing >>>> controllers by querying ZK in the background. Since these run in the >>>> background, there seems to be no way to know when the requests are >>>> complete. Thus, on starting up the service, I can't reliably determine >>>> the complete set of controllers running from previous sessions. Is >>>> there a way to know? Is there a listener I can register? >>>> >>>> Also, I see that YarnTwillController internally determines the current >>>> application state (RUNNING, ACCEPTED, etc) via a YarnAppicationReport. >>>> Is there any public API for this information? >>>> >>>> Basically I'm trying to work through the problem of submission of an >>>> application, monitoring the current state, detecting somehow that it is >>>> going to stay stuck in the ACCEPTED state, and then working out why. >>>> Sometimes it is lack of memory resources, sometimes cpu. Is there any >>>> programmatic way? >>>> >>>> Thanks, >>>> Martin Serrano >>>> >>
[jira] [Commented] (TWILL-182) ApplicationBundler will overwrite dependencies with identical names
[ https://issues.apache.org/jira/browse/TWILL-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418006#comment-15418006 ] Martin Serrano commented on TWILL-182: -- will do. > ApplicationBundler will overwrite dependencies with identical names > --- > > Key: TWILL-182 > URL: https://issues.apache.org/jira/browse/TWILL-182 > Project: Apache Twill > Issue Type: Bug > Components: core >Affects Versions: 0.7.0-incubating > Reporter: Martin Serrano > Fix For: 0.8.0 > > > If two jars obtained from *different* classpath locations have the same name > but different contents, one will overwrite the other. The dependency code > correctly finds the jars (uses the full path in the HashSet which accumulates > the deps) but when the bundle is created the jars are written to {{/lib}} > under their name. This results in one overwriting the other. > While this is not a likely occurrence, it occurs for us in our development > environment because our published jar names are built up from their project > hierarchy. For example the model project for our sdk is in {{.../sdk/model}} > and will be on the classpath as {{.../sdk/model.jar}} and published as > {{sdk-model.jar}}. > In practice however this could occur with any jar name and would be more > likely over time. > The {{ApplicationBundler}} could detect this and re-write the name with some > part of the path or suffix to ensure the name is unique. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: TwillRunnerService usage
On 08/09/2016 01:45 PM, Terence Yim wrote: > Hi Martin, > > Currently there is no way of knowing when the TwillRunnerService finished > the first sync up with the ZK. Would you mind opening a JIRA for that? https://issues.apache.org/jira/browse/TWILL-183 > For the YARN application state inside the YarnTwillController, currently it > is not surfaced. Potentially it can be added to the > TwillController.getResourceReport() method. > > We are also looking for improving the state reporting through the > TwillController after an app was submitted via surfacing more information > about individual app state/resource and cluster resources. Would you mind > file JIRA(s) for them as well? I will. I need to research a bit to write a good ticket. -Martin > Thanks, > Terence > > On Mon, Aug 8, 2016 at 1:07 PM, Martin Serrano <mar...@attivio.com> wrote: > >> Hi, >> >> I see from the source that the TwillRunnerService is locating existing >> controllers by querying ZK in the background. Since these run in the >> background, there seems to be no way to know when the requests are >> complete. Thus, on starting up the service, I can't reliably determine >> the complete set of controllers running from previous sessions. Is >> there a way to know? Is there a listener I can register? >> >> Also, I see that YarnTwillController internally determines the current >> application state (RUNNING, ACCEPTED, etc) via a YarnAppicationReport. >> Is there any public API for this information? >> >> Basically I'm trying to work through the problem of submission of an >> application, monitoring the current state, detecting somehow that it is >> going to stay stuck in the ACCEPTED state, and then working out why. >> Sometimes it is lack of memory resources, sometimes cpu. Is there any >> programmatic way? >> >> Thanks, >> Martin Serrano >>
[jira] [Created] (TWILL-183) TwillRunnerService should provide a way to determine that TwillRunner background load is complete
Martin Serrano created TWILL-183: Summary: TwillRunnerService should provide a way to determine that TwillRunner background load is complete Key: TWILL-183 URL: https://issues.apache.org/jira/browse/TWILL-183 Project: Apache Twill Issue Type: Improvement Components: core Affects Versions: 0.7.0-incubating Reporter: Martin Serrano I see from the source that the TwillRunnerService is locating existing controllers by querying ZK in the background. Since these run in the background, there seems to be no way to know when the requests are complete. Thus, on starting up the service, I can't reliably determine the complete set of controllers running from previous sessions. Is there a way to know? Is there a listener I can register? -- This message was sent by Atlassian JIRA (v6.3.4#6332)