[jira] [Commented] (TWILL-261) Add supports to run TwillApplications against Kubernetes cluster
[ https://issues.apache.org/jira/browse/TWILL-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584176#comment-16584176 ] Yuliya Feldman commented on TWILL-261: -- +1 > Add supports to run TwillApplications against Kubernetes cluster > > > Key: TWILL-261 > URL: https://issues.apache.org/jira/browse/TWILL-261 > Project: Apache Twill > Issue Type: Story >Reporter: Terence Yim >Assignee: Terence Yim >Priority: Major > > Top level story to brainstorm and gather tasks needed in order to bring > TwillApplication to Kubernetes cluster -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Subject: [VOTE] Release of Apache Twill-0.13.0 [rc1]
I would think it's better to remove it, since 0.12.1 is a regular release, unless it is not. Thanks, Yuliya On Thu, Jul 19, 2018 at 1:26 PM, Poorna Chandra wrote: > I added all the bugfixes since 0.12.0 into the changes file. I can remove > the bugs fixed in 0.12.1 from the file. > > Thanks, > Poorna. > > On Thu, Jul 19, 2018, 1:11 PM Yuliya Feldman wrote: > > > In this case we should remove mention of the bugs fixed in 0.12.1 > > Or we should keep incremental list that is updated with each new release. > > > > Otherwise it would be a confusion about which release those bugs were > fixed > > in: > > > > Bug > > [TWILL-61] - Fix to allow higher attempts to relaunch the app after > > the first attempt failed > > [TWILL-254] - Update to use ContainerId.fromString in Hadoop 2.6+ > > [TWILL-255] - Incorrect logging after memory was adjusted. Does not > > show memory before adjustment > > > > Thanks, > > Yuliya > > > > > > On Tue, Jul 17, 2018 at 7:34 PM, Poorna Chandra > wrote: > > > > > Yes, you are right. TWILL-248 is the only change between 0.12.1 and > > 0.13.0. > > > > > > Poorna > > > > > > On Tue, Jul 17, 2018, 5:00 PM Yuliya Feldman > wrote: > > > > > > > What's the difference between 0.12.1 and 0.13.0 ? > > > > > > > > Looks like only TWILL-248, is it? > > > > > > > > On Tue, Jul 17, 2018 at 4:49 PM, Poorna Chandra > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > This is a call for a vote on releasing Apache Twill 0.13.0, release > > > > > candidate 1. This > > > > > is the 15th release of Twill. > > > > > > > > > > The source tarball, including signatures, digests, etc. can be > found > > > at: > > > > > https://dist.apache.org/repos/dist/dev/twill/0.13.0-rc1/src > > > > > > > > > > The tag to be voted upon is v0.13.0: > > > > > https://git-wip-us.apache.org/repos/asf?p=twill.git;a= > > > > > shortlog;h=refs/tags/v0.13.0 > > > > > > > > > > The release hash is 26c3c988d3358f1c56f3b9a3471b45c144375804: > > > > > https://git-wip-us.apache.org/repos/asf?p=twill.git;a=commit;h= > > > > > 26c3c988d3358f1c56f3b9a3471b45c144375804 > > > > > > > > > > The Nexus Staging URL: > > > > > > > https://repository.apache.org/content/repositories/orgapachetwill-1026 > > > > > > > > > > Release artifacts are signed with the following key: > > > > > http://people.apache.org/keys/committer/poorna > > > > > > > > > > KEYS file available: > > > > > https://dist.apache.org/repos/dist/dev/twill/KEYS > > > > > > > > > > For information about the contents of this release, see: > > > > > https://dist.apache.org/repos/dist/dev/twill/0.13.0-rc1/ > CHANGES.txt > > > > > > > > > > Please vote on releasing this package as Apache Twill 0.13.0 > > > > > > > > > > The vote will be open for 72 hours. > > > > > > > > > > [ ] +1 Release this package as Apache Twill 0.13.0 > > > > > [ ] +0 no opinion > > > > > [ ] -1 Do not release this package because ... > > > > > > > > > > +1 from myself. > > > > > > > > > > Thanks, > > > > > Poorna > > > > > > > > > > > > > > >
Re: Subject: [VOTE] Release of Apache Twill-0.13.0 [rc1]
In this case we should remove mention of the bugs fixed in 0.12.1 Or we should keep incremental list that is updated with each new release. Otherwise it would be a confusion about which release those bugs were fixed in: Bug [TWILL-61] - Fix to allow higher attempts to relaunch the app after the first attempt failed [TWILL-254] - Update to use ContainerId.fromString in Hadoop 2.6+ [TWILL-255] - Incorrect logging after memory was adjusted. Does not show memory before adjustment Thanks, Yuliya On Tue, Jul 17, 2018 at 7:34 PM, Poorna Chandra wrote: > Yes, you are right. TWILL-248 is the only change between 0.12.1 and 0.13.0. > > Poorna > > On Tue, Jul 17, 2018, 5:00 PM Yuliya Feldman wrote: > > > What's the difference between 0.12.1 and 0.13.0 ? > > > > Looks like only TWILL-248, is it? > > > > On Tue, Jul 17, 2018 at 4:49 PM, Poorna Chandra > wrote: > > > > > Hi all, > > > > > > This is a call for a vote on releasing Apache Twill 0.13.0, release > > > candidate 1. This > > > is the 15th release of Twill. > > > > > > The source tarball, including signatures, digests, etc. can be found > at: > > > https://dist.apache.org/repos/dist/dev/twill/0.13.0-rc1/src > > > > > > The tag to be voted upon is v0.13.0: > > > https://git-wip-us.apache.org/repos/asf?p=twill.git;a= > > > shortlog;h=refs/tags/v0.13.0 > > > > > > The release hash is 26c3c988d3358f1c56f3b9a3471b45c144375804: > > > https://git-wip-us.apache.org/repos/asf?p=twill.git;a=commit;h= > > > 26c3c988d3358f1c56f3b9a3471b45c144375804 > > > > > > The Nexus Staging URL: > > > https://repository.apache.org/content/repositories/orgapachetwill-1026 > > > > > > Release artifacts are signed with the following key: > > > http://people.apache.org/keys/committer/poorna > > > > > > KEYS file available: > > > https://dist.apache.org/repos/dist/dev/twill/KEYS > > > > > > For information about the contents of this release, see: > > > https://dist.apache.org/repos/dist/dev/twill/0.13.0-rc1/CHANGES.txt > > > > > > Please vote on releasing this package as Apache Twill 0.13.0 > > > > > > The vote will be open for 72 hours. > > > > > > [ ] +1 Release this package as Apache Twill 0.13.0 > > > [ ] +0 no opinion > > > [ ] -1 Do not release this package because ... > > > > > > +1 from myself. > > > > > > Thanks, > > > Poorna > > > > > >
Re: Subject: [VOTE] Release of Apache Twill-0.13.0 [rc1]
What's the difference between 0.12.1 and 0.13.0 ? Looks like only TWILL-248, is it? On Tue, Jul 17, 2018 at 4:49 PM, Poorna Chandra wrote: > Hi all, > > This is a call for a vote on releasing Apache Twill 0.13.0, release > candidate 1. This > is the 15th release of Twill. > > The source tarball, including signatures, digests, etc. can be found at: > https://dist.apache.org/repos/dist/dev/twill/0.13.0-rc1/src > > The tag to be voted upon is v0.13.0: > https://git-wip-us.apache.org/repos/asf?p=twill.git;a= > shortlog;h=refs/tags/v0.13.0 > > The release hash is 26c3c988d3358f1c56f3b9a3471b45c144375804: > https://git-wip-us.apache.org/repos/asf?p=twill.git;a=commit;h= > 26c3c988d3358f1c56f3b9a3471b45c144375804 > > The Nexus Staging URL: > https://repository.apache.org/content/repositories/orgapachetwill-1026 > > Release artifacts are signed with the following key: > http://people.apache.org/keys/committer/poorna > > KEYS file available: > https://dist.apache.org/repos/dist/dev/twill/KEYS > > For information about the contents of this release, see: > https://dist.apache.org/repos/dist/dev/twill/0.13.0-rc1/CHANGES.txt > > Please vote on releasing this package as Apache Twill 0.13.0 > > The vote will be open for 72 hours. > > [ ] +1 Release this package as Apache Twill 0.13.0 > [ ] +0 no opinion > [ ] -1 Do not release this package because ... > > +1 from myself. > > Thanks, > Poorna >
[jira] [Created] (TWILL-260) Less invasive change to upgrade version of zkclient
Yuliya Feldman created TWILL-260: Summary: Less invasive change to upgrade version of zkclient Key: TWILL-260 URL: https://issues.apache.org/jira/browse/TWILL-260 Project: Apache Twill Issue Type: Bug Components: yarn, zookeeper Reporter: Yuliya Feldman Assignee: Yuliya Feldman This is related to [TWILL-249|https://issues.apache.org/jira/projects/TWILL/issues/TWILL-249] The less invasive change is just to upgrade zkclient to the latest version. This should solve issues with trying to work around issues with zkclient library bug of isues while processing connected and sasl events. This is causing major issues with MapR - since they have sasl enabled even in case of no security enabled -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TWILL-255) incorrect logging after memory/cpu was adjusted
Yuliya Feldman created TWILL-255: Summary: incorrect logging after memory/cpu was adjusted Key: TWILL-255 URL: https://issues.apache.org/jira/browse/TWILL-255 Project: Apache Twill Issue Type: Bug Components: yarn Reporter: Yuliya Feldman Assignee: Yuliya Feldman While adjusting resources for Containers when logging what was adjusted it shows values after adjustment, so it's not known what it was adjusted from. Affected are: adjustCapability() Hadoop20YarnAMClient Hadoop21YarnAMClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TWILL-252) Not providing any feedback when size of the container requested can't be allocated
[ https://issues.apache.org/jira/browse/TWILL-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295356#comment-16295356 ] Yuliya Feldman commented on TWILL-252: -- In order to do this before YARN application is submitted the app that submits YARN app needs to deal with YARN APIs to get information about max container size allowed, while Twill is doing it already - as it adjusts the size. > Not providing any feedback when size of the container requested can't be > allocated > -- > > Key: TWILL-252 > URL: https://issues.apache.org/jira/browse/TWILL-252 > Project: Apache Twill > Issue Type: Bug >Reporter: Yuliya Feldman > > Looks like when YARN is configured with max memory per container > (yarn.scheduler.maximum-allocation-mb) less then amount of memory end user > allocates for their application and container is allocated with just > yarn.scheduler.maximum-allocation-mb value there is no way to know about it > until container is allocated. > We try to divide memory into heap and off-heap and end up with setting up off > heap to the value higher then allocated for the container, as application > assumes it gets what it asked for or container is not allocated at all. > Need either ability to fail application in this case or not allocate > container with memory less then asked. > As currently Twill adjusts memory and cpu with only INFO level messages in > AppMaster log: > from Hadoop21YarnAMClient.java > {code:java} > protected Resource adjustCapability(Resource resource) { > int cores = resource.getVirtualCores(); > int updatedCores = Math.min(resource.getVirtualCores(), > maxCapability.getVirtualCores()); > if (cores != updatedCores) { > resource.setVirtualCores(updatedCores); > LOG.info("Adjust virtual cores requirement from {} to {}.", cores, > updatedCores); > } > int updatedMemory = Math.min(resource.getMemory(), > maxCapability.getMemory()); > if (resource.getMemory() != updatedMemory) { > resource.setMemory(updatedMemory); > LOG.info("Adjust memory requirement from {} to {} MB.", > resource.getMemory(), updatedMemory); > } > return resource; > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TWILL-252) Not providing any feedback when size of the container requested can't be allocated
[ https://issues.apache.org/jira/browse/TWILL-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuliya Feldman updated TWILL-252: - Description: Looks like when YARN is configured with max memory per container (yarn.scheduler.maximum-allocation-mb) less then amount of memory end user allocates for their application and container is allocated with just yarn.scheduler.maximum-allocation-mb value there is no way to know about it until container is allocated. We try to divide memory into heap and off-heap and end up with setting up off heap to the value higher then allocated for the container, as application assumes it gets what it asked for or container is not allocated at all. Need either ability to fail application in this case or not allocate container with memory less then asked. As currently Twill adjusts memory and cpu with only INFO level messages in AppMaster log: from Hadoop21YarnAMClient.java {code:java} protected Resource adjustCapability(Resource resource) { int cores = resource.getVirtualCores(); int updatedCores = Math.min(resource.getVirtualCores(), maxCapability.getVirtualCores()); if (cores != updatedCores) { resource.setVirtualCores(updatedCores); LOG.info("Adjust virtual cores requirement from {} to {}.", cores, updatedCores); } int updatedMemory = Math.min(resource.getMemory(), maxCapability.getMemory()); if (resource.getMemory() != updatedMemory) { resource.setMemory(updatedMemory); LOG.info("Adjust memory requirement from {} to {} MB.", resource.getMemory(), updatedMemory); } return resource; } {code} was: Looks like when YARN is configured with max memory per container (yarn.scheduler.maximum-allocation-mb) less then amount of memory end user allocates for their application and container is allocated with just yarn.scheduler.maximum-allocation-mb value there is no way to know about it until container is allocated. We try to divide memory into heap and off-heap and end up with setting up off heap to the value higher then allocated for the container, as application assumes it gets what it asked for or container is not allocated at all. Need either ability to fail application in this case or not allocate container with memory less then asked. > Not providing any feedback when size of the container requested can't be > allocated > -- > > Key: TWILL-252 > URL: https://issues.apache.org/jira/browse/TWILL-252 > Project: Apache Twill > Issue Type: Bug >Reporter: Yuliya Feldman > > Looks like when YARN is configured with max memory per container > (yarn.scheduler.maximum-allocation-mb) less then amount of memory end user > allocates for their application and container is allocated with just > yarn.scheduler.maximum-allocation-mb value there is no way to know about it > until container is allocated. > We try to divide memory into heap and off-heap and end up with setting up off > heap to the value higher then allocated for the container, as application > assumes it gets what it asked for or container is not allocated at all. > Need either ability to fail application in this case or not allocate > container with memory less then asked. > As currently Twill adjusts memory and cpu with only INFO level messages in > AppMaster log: > from Hadoop21YarnAMClient.java > {code:java} > protected Resource adjustCapability(Resource resource) { > int cores = resource.getVirtualCores(); > int updatedCores = Math.min(resource.getVirtualCores(), > maxCapability.getVirtualCores()); > if (cores != updatedCores) { > resource.setVirtualCores(updatedCores); > LOG.info("Adjust virtual cores requirement from {} to {}.", cores, > updatedCores); > } > int updatedMemory = Math.min(resource.getMemory(), > maxCapability.getMemory()); > if (resource.getMemory() != updatedMemory) { > resource.setMemory(updatedMemory); > LOG.info("Adjust memory requirement from {} to {} MB.", > resource.getMemory(), updatedMemory); > } > return resource; > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TWILL-252) Not providing any feedback when size of the container requested can't be allocated
Yuliya Feldman created TWILL-252: Summary: Not providing any feedback when size of the container requested can't be allocated Key: TWILL-252 URL: https://issues.apache.org/jira/browse/TWILL-252 Project: Apache Twill Issue Type: Bug Reporter: Yuliya Feldman Looks like when YARN is configured with max memory per container (yarn.scheduler.maximum-allocation-mb) less then amount of memory end user allocates for their application and container is allocated with just yarn.scheduler.maximum-allocation-mb value there is no way to know about it until container is allocated. We try to divide memory into heap and off-heap and end up with setting up off heap to the value higher then allocated for the container, as application assumes it gets what it asked for or container is not allocated at all. Need either ability to fail application in this case or not allocate container with memory less then asked. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TWILL-249) Upgrade Twill to use newer version of kafka_2.10
Yuliya Feldman created TWILL-249: Summary: Upgrade Twill to use newer version of kafka_2.10 Key: TWILL-249 URL: https://issues.apache.org/jira/browse/TWILL-249 Project: Apache Twill Issue Type: Improvement Reporter: Yuliya Feldman Currently to work around issue with secure cluster and how zkclient handles it we have https://issues.apache.org/jira/browse/TWILL-139 Since that time zkcient was fixed (fix in version 0.7) But Twill still uses old version of kafka 0.8 that relies on 0.3 version of zkclient. zkclient version 0.7 is included as a dependency since kafka 0.9.0.0. That version of kafka introduces changes to APIs though. This JIRA is to upgrade to later version of Kafka. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TWILL-203) irrespective of number of CPUs specified in App Config it is always 1
[ https://issues.apache.org/jira/browse/TWILL-203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuliya Feldman closed TWILL-203. Resolution: Invalid > irrespective of number of CPUs specified in App Config it is always 1 > - > > Key: TWILL-203 > URL: https://issues.apache.org/jira/browse/TWILL-203 > Project: Apache Twill > Issue Type: Bug > Components: yarn >Affects Versions: 0.8.0 > Reporter: Yuliya Feldman > Attachments: Screen Shot 2017-01-06 at 10.17.24 AM.png, rm.log, > twillclient.log > > > When trying to deploy Bundled Jar app and specifying number of CPUs > 1 it > still defaults to 1 when application is starting. > Version of YARN is: 2.7.2, Version of Twill is 0.8. Capacity scheduler. > Looks like (from the logs) all the info is passed through Twill correctly and > gets "lost" while getting to RM > Please see attached logs form Twill Client, RM and Screenshot -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TWILL-203) irrespective of number of CPUs specified in App Config it is always 1
[ https://issues.apache.org/jira/browse/TWILL-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208558#comment-16208558 ] Yuliya Feldman commented on TWILL-203: -- Looks like it depends on YARN setup -whether it set up to support cpu as a resource or not > irrespective of number of CPUs specified in App Config it is always 1 > - > > Key: TWILL-203 > URL: https://issues.apache.org/jira/browse/TWILL-203 > Project: Apache Twill > Issue Type: Bug > Components: yarn >Affects Versions: 0.8.0 > Reporter: Yuliya Feldman > Attachments: Screen Shot 2017-01-06 at 10.17.24 AM.png, rm.log, > twillclient.log > > > When trying to deploy Bundled Jar app and specifying number of CPUs > 1 it > still defaults to 1 when application is starting. > Version of YARN is: 2.7.2, Version of Twill is 0.8. Capacity scheduler. > Looks like (from the logs) all the info is passed through Twill correctly and > gets "lost" while getting to RM > Please see attached logs form Twill Client, RM and Screenshot -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TWILL-243) Failed BundleRunnable is stuck in exiting state
Yuliya Feldman created TWILL-243: Summary: Failed BundleRunnable is stuck in exiting state Key: TWILL-243 URL: https://issues.apache.org/jira/browse/TWILL-243 Project: Apache Twill Issue Type: Bug Components: ext, yarn Reporter: Yuliya Feldman I am using BundleRunnable and so far my experience was that in case of failure of my runnable container process never exits. And my impression it pretty much all the time stuck in executing Shutdown hooks (not even my application specific) Here is ThreadDump for that thread: "TwillContainerService" #32 prio=5 os_prio=0 tid=0x7f1590297800 nid=0x4de2 in Object.wait() [0x7f15805d9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1249) - locked <0xff9c41d0> (a org.apache.twill.internal.ServiceMain$1) at java.lang.Thread.join(Thread.java:1323) at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106) at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46) at java.lang.Shutdown.runHooks(Shutdown.java:123) at java.lang.Shutdown.sequence(Shutdown.java:167) at java.lang.Shutdown.exit(Shutdown.java:212) - locked <0xff6363e8> (a java.lang.Class for java.lang.Shutdown) at java.lang.Runtime.exit(Runtime.java:109) at java.lang.System.exit(System.java:971) at org.apache.twill.ext.BundledJarRunnable.run(BundledJarRunnable.java:59) at org.apache.twill.internal.container.TwillContainerService.doRun(TwillContainerService.java:130) at org.apache.twill.internal.AbstractTwillService.run(AbstractTwillService.java:181) at twill.com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) at java.lang.Thread.run(Thread.java:745) Just wonder if anybody else experienced similar. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TWILL-237) Twill is using hdfs HAUtil api that is nont-compatible with hadoop 2.8
Yuliya Feldman created TWILL-237: Summary: Twill is using hdfs HAUtil api that is nont-compatible with hadoop 2.8 Key: TWILL-237 URL: https://issues.apache.org/jira/browse/TWILL-237 Project: Apache Twill Issue Type: Bug Components: yarn Reporter: Yuliya Feldman Assignee: Yuliya Feldman Twill is using hdfs.HAUtil apis that are suppose to be hdfs private and subsequently signature of isLogicalURI was changed (actually name was changed) in hadoop version 2.8 Will post a patch for now to support both old and new names, but I think eventually references to private hdfs interfaces/classes should be removed from twill -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: HelloWorld Struggle
Do those env vars resolve on the NodeManager nodes correctly? On Mon, Jun 12, 2017 at 2:05 PM, Chris Hebert < chris.hebert-...@digitalreasoning.com> wrote: > Other YARN apps work fine. For example, I just successfully ran the stock > MapReduce wordcount example (and of course, MapReduce is a YARN > application). > > I ran HelloWorld in debug mode earlier and found that yarnClasspath > contains the following: > $HADOOP_CONF_DIR > $HADOOP_COMMON_HOME/* > $HADOOP_COMMON_HOME/lib/* > $HADOOP_HDFS_HOME/* > $HADOOP_HDFS_HOME/lib/* > $HADOOP_MAPRED_HOME/* > $HADOOP_MAPRED_HOME/lib/* > $HADOOP_YARN_HOME/* > $HADOOP_YARN_HOME/lib/* > > I don't know whether these environment variables are supposed to be > resolved to path variables already or if that happens later. I also don't > know if I'm supposed to explicitly declare these environment variables > somewhere, and if so, I do not know where in the Hadoop configuration it is > ideal for me to declare them. I tried declaring them in hadoop_env.sh and > core-site.xml, but I'm not sure I did it right, and at least in my initial > efforts declaring these variables did not seem to prevent the error. If it > is the correct thing for me to set these variables appropriately somewhere, > then I will continue to try to do so. > > On Mon, Jun 12, 2017 at 3:46 PM, Yuliya Feldman wrote: > > > I don't think it is an issue with classpath on the client - since it gets > > to start AppMaster container > > > > Is any other YARN app runs OK? > > > > May be YarnConfiguration.YARN_APPLICATION_CLASSPATH is not producing > right > > jars > > > > Look at HelloWorld code for yarnClasspath > > > > > > > > On Mon, Jun 12, 2017 at 1:22 PM, Chris Hebert < > > chris.hebert-...@digitalreasoning.com> wrote: > > > > > I hate to ask this here, but it won't work, so whatever. > > > > > > I followed the HelloWorld section of the Getting Started guide < > > > http://twill.apache.org/GettingStarted.html> on my cluster with Hadoop > > and > > > Zookeeper set up and functioning properly. > > > > > > git clone https://github.com/apache/twill.git > > > cd twill > > > mvn clean install -DskipTests > > > > > > export > > > CP=twill-examples/yarn/target/twill-examples-yarn-0.12.0- > > > SNAPSHOT.jar:`hadoop > > > classpath` > > > java -cp $CP org.apache.twill.example.yarn.HelloWorld > > > my.zookeeper.domain:2181 > > > > > > Yes, `hadoop classpath` echoes all the relevant jar directories. > > > > > > The command runs well for a bit with multiple: > > > [ STARTING] DEBUG o.a.twill.yarn.YarnTwillController - Yarn > application > > > status for HelloWorldRunnable application_1_0001: ACCEPTED > > > > > > until: > > > [ STARTING] DEBUG o.a.hadoop.service.AbstractService - Service: > > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl entered state > > > STOPPED > > > [ STARTING] DEBUG org.apache.hadoop.ipc.Client - stopping client from > > > cache: org.apache.hadoop.ipc.Client@4d465b11 > > > [ STARTING] DEBUG o.a.twill.yarn.YarnTwillController - Yarn > application > > > status for HelloWorldRunnable application_1_0001: FAILED > > > ... > > > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > > Yarn > > > application completed with failure HelloWorldRunnable... > > > > > > The ResourceManager reveals: > > > Application application_1_0001 failed 2 times due to AM > > > Container for appattempt_1_0001_02 exited with > > exitCode: 1 > > > ... > > > Diagnostics: Exception from container-launch. > > > > > > The corresponding YARN logs for each DataNode reveal: > > > Exception in thread "main" java.lang.NoClassDefFoundError: > > > org/apache/hadoop/conf/Configuration > > > at java.lang.Class.getDeclaredMethods0(Native Method) > > > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > > > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > > > at java.lang.Class.getMethod0(Class.java:3018) > > > at java.lang.Class.getMethod(Class.java:1784) > > > at org.apache.twill.launcher.TwillLauncher.main(TwillLauncher.java:70) > > > Caused by: java.lang.ClassNotFoundException: > > > org.apache.hadoop.conf.Configuration > > > at java.net.URLClassLoader.findClass(URLClassLoader.
Re: HelloWorld Struggle
I don't think it is an issue with classpath on the client - since it gets to start AppMaster container Is any other YARN app runs OK? May be YarnConfiguration.YARN_APPLICATION_CLASSPATH is not producing right jars Look at HelloWorld code for yarnClasspath On Mon, Jun 12, 2017 at 1:22 PM, Chris Hebert < chris.hebert-...@digitalreasoning.com> wrote: > I hate to ask this here, but it won't work, so whatever. > > I followed the HelloWorld section of the Getting Started guide < > http://twill.apache.org/GettingStarted.html> on my cluster with Hadoop and > Zookeeper set up and functioning properly. > > git clone https://github.com/apache/twill.git > cd twill > mvn clean install -DskipTests > > export > CP=twill-examples/yarn/target/twill-examples-yarn-0.12.0- > SNAPSHOT.jar:`hadoop > classpath` > java -cp $CP org.apache.twill.example.yarn.HelloWorld > my.zookeeper.domain:2181 > > Yes, `hadoop classpath` echoes all the relevant jar directories. > > The command runs well for a bit with multiple: > [ STARTING] DEBUG o.a.twill.yarn.YarnTwillController - Yarn application > status for HelloWorldRunnable application_1_0001: ACCEPTED > > until: > [ STARTING] DEBUG o.a.hadoop.service.AbstractService - Service: > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl entered state > STOPPED > [ STARTING] DEBUG org.apache.hadoop.ipc.Client - stopping client from > cache: org.apache.hadoop.ipc.Client@4d465b11 > [ STARTING] DEBUG o.a.twill.yarn.YarnTwillController - Yarn application > status for HelloWorldRunnable application_1_0001: FAILED > ... > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Yarn > application completed with failure HelloWorldRunnable... > > The ResourceManager reveals: > Application application_1_0001 failed 2 times due to AM > Container for appattempt_1_0001_02 exited with exitCode: 1 > ... > Diagnostics: Exception from container-launch. > > The corresponding YARN logs for each DataNode reveal: > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/hadoop/conf/Configuration > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at org.apache.twill.launcher.TwillLauncher.main(TwillLauncher.java:70) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.conf.Configuration > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 6 more > Launch class (org.apache.twill.internal.appmaster.ApplicationMasterMain) > using classloader java.net.URLClassLoader with classpath: [ > *A list of several classpaths like > "file:/some/path/yarn/local/usercache/my.username/appcache/application_ > 1_0001/container_e10_ > 1 > _0001_02_01/application.jar/lib/twill-examples-yarn-0. > 12.0-SNAPSHOT.jar" > But none of which are paths to any Hadoop jars of the sort that are > referenced in $CP* > ] > > What am I missing? > > I've spent an embarrassingly large amount of time on this fiddling with > environment variables and Hadoop configuration. (I'm an intern learning > this stuff the hard way, so it's not really embarrassing, just > substantial.) >
[jira] [Commented] (TWILL-233) Apache-Twill 0.11.0 install failure
[ https://issues.apache.org/jira/browse/TWILL-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16015868#comment-16015868 ] Yuliya Feldman commented on TWILL-233: -- [~narahari] can you point to exact error(s) - log is really big. > Apache-Twill 0.11.0 install failure > --- > > Key: TWILL-233 > URL: https://issues.apache.org/jira/browse/TWILL-233 > Project: Apache Twill > Issue Type: Bug > Components: yarn >Affects Versions: 0.11.0 > Environment: Redhat Linux 64 bit >Reporter: Narahari >Priority: Critical > Fix For: 0.11.0 > > Attachments: install_errors.txt > > > Trying to install Apache-Twill 0.11.0 followed by link > http://twill.apache.org/GettingStarted.html. Getting below errors. We are > trying to install on MapR lab box. MapR version is 5.1.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: ENOENT error on upgrading to Twill 0.10.0
Code of your application you want to be running in YARN I believe :) On Mon, Mar 27, 2017 at 3:28 PM, Sam William wrote: > Yes. 22 bytes looks like an empty zip file. Any idea what should there in > the application jar file ? > > Sam > > On Mar 27, 2017, at 13:22, Yuliya Feldman wrote: > > > > File is very small - it may be nothing to do with file not found. Either > > permissions or something else > > > > On Mon, Mar 27, 2017 at 1:17 PM, Sam William > wrote: > > > >> I logged into the master host and looked at the nodemanager logs. It > fails > >> at localizing the application jar. The files are there in HDFS. I can > >> even see it is able to copy the other files just fine (for example the > >> launcher jar and runtime.config) > >> > >> -rw-r--r-- 3 sam supergroup 22 2017-03-27 12:47 > >> /user/sam/Build-shards-GRE/2f30b4ab-d9e1-48bd-9384- > >> 44a506886fc1/Build-shards-GRE-bd5d893b401041edceec38c78f1ece > >> c7-application.538b9590-d7f5-4121-824e-448a12a635c1.jar > >> -rw-r--r-- 3 sam supergroup5991970 2017-03-27 12:47 > >> /user/sam/Build-shards-GRE/2f30b4ab-d9e1-48bd-9384- > >> 44a506886fc1/buil.b0458483-23ca-4243-89f6-d1a40210110d. > >> -rw-r--r-- 3 sam supergroup 5725 2017-03-27 12:47 > >> /user/sam/Build-shards-GRE/2f30b4ab-d9e1-48bd-9384- > 44a506886fc1/launcher. > >> 4d7df397-5325-4a5f-8c95-ddcae99867f5.jar > >> -rw-r--r-- 3 sam supergroup 1038 2017-03-27 12:47 > >> /user/sam/Build-shards-GRE/2f30b4ab-d9e1-48bd-9384- > >> 44a506886fc1/localizeFiles.bbe5dc82-9fe9-4249-8964-df15212a1812.json > >> -rw-r--r-- 3 sam supergroup 2072 2017-03-27 12:47 > >> /user/sam/Build-shards-GRE/2f30b4ab-d9e1-48bd-9384- > >> 44a506886fc1/runtime.config.9dd1b585-c601-40b7-8831-25383013eb1e.jar > >> -rw-r--r-- 3 sam supergroup 48245414 2017-03-27 12:47 > >> /user/sam/Build-shards-GRE/2f30b4ab-d9e1-48bd-9384- > >> 44a506886fc1/twill.c765e4d8-958e-4811-b138-c4ef71e2a93e.jar > >> > >> > >> 2017-03-27 12:47:45,632 INFO org.apache.hadoop.yarn.server. > >> nodemanager.containermanager.localizer.LocalizedResource: Resource > >> hdfs://pv34-search-dev/user/sam/Build-shards-GRE/2f30b4ab- > >> d9e1-48bd-9384-44a506886fc1/runtime.config.9dd1b585-c601- > >> 40b7-8831-25383013eb1e.jar(->/data/8/yarn/nm/usercache/sam/ > >> appcache/application_1484158548936_11282/filecache/ > >> 11/runtime.config.9dd1b585-c601-40b7-8831-25383013eb1e.jar) > transitioned > >> from DOWNLOADING to LOCALIZED > >> 2017-03-27 12:47:45,645 INFO org.apache.hadoop.yarn.server. > >> nodemanager.containermanager.localizer.LocalizedResource: Resource > >> hdfs://pv34-search-dev/user/sam/Build-shards-GRE/2f30b4ab- > >> d9e1-48bd-9384-44a506886fc1/launcher.4d7df397-5325-4a5f- > >> 8c95-ddcae99867f5.jar(->/data/10/yarn/nm/usercache/sam/ > >> appcache/application_1484158548936_11282/filecache/ > >> 12/launcher.4d7df397-5325-4a5f-8c95-ddcae99867f5.jar) transitioned from > >> DOWNLOADING to LOCALIZED > >> 2017-03-27 12:47:45,651 WARN org.apache.hadoop.security. > UserGroupInformation: > >> PriviledgedActionException as:sam (auth:SIMPLE) cause:ENOENT: No such > file > >> or directory > >> 2017-03-27 12:47:45,655 WARN org.apache.hadoop.yarn.server. > >> nodemanager.containermanager.localizer.ResourceLocalizationService: { > >> hdfs://pv34-search-dev/user/sam/Build-shards-GRE/2f30b4ab- > >> d9e1-48bd-9384-44a506886fc1/Build-shards-GRE- > >> bd5d893b401041edceec38c78f1ecec7-application.538b9590-d7f5- > 4121-824e-448a12a635c1.jar, > >> 1490644063924, ARCHIVE, null } failed: No such file or directory > >> ENOENT: No such file or directory > >>at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native > >> Method) > >>at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO. > >> java:230) > >>at org.apache.hadoop.fs.RawLocalFileSystem.setPermission( > >> RawLocalFileSystem.java:660) > >>at org.apache.hadoop.fs.DelegateToFileSystem.setPermission( > >> DelegateToFileSystem.java:206) > >>at org.apache.hadoop.fs.FilterFs.setPermission(FilterFs.java: > 251) > >>at org.apache.hadoop.fs.FileContext$10.next( > FileContext.java:955) > >>at org.apache.hadoop.fs.FileContext$10.next( > FileContext.java:951) > >>at org.apache.hadoop.fs.FSLinkResolver.resolve( > >> FSLinkResolver.java:90) > &
Re: ENOENT error on upgrading to Twill 0.10.0
File is very small - it may be nothing to do with file not found. Either permissions or something else On Mon, Mar 27, 2017 at 1:17 PM, Sam William wrote: > I logged into the master host and looked at the nodemanager logs. It fails > at localizing the application jar. The files are there in HDFS. I can > even see it is able to copy the other files just fine (for example the > launcher jar and runtime.config) > > -rw-r--r-- 3 sam supergroup 22 2017-03-27 12:47 > /user/sam/Build-shards-GRE/2f30b4ab-d9e1-48bd-9384- > 44a506886fc1/Build-shards-GRE-bd5d893b401041edceec38c78f1ece > c7-application.538b9590-d7f5-4121-824e-448a12a635c1.jar > -rw-r--r-- 3 sam supergroup5991970 2017-03-27 12:47 > /user/sam/Build-shards-GRE/2f30b4ab-d9e1-48bd-9384- > 44a506886fc1/buil.b0458483-23ca-4243-89f6-d1a40210110d. > -rw-r--r-- 3 sam supergroup 5725 2017-03-27 12:47 > /user/sam/Build-shards-GRE/2f30b4ab-d9e1-48bd-9384-44a506886fc1/launcher. > 4d7df397-5325-4a5f-8c95-ddcae99867f5.jar > -rw-r--r-- 3 sam supergroup 1038 2017-03-27 12:47 > /user/sam/Build-shards-GRE/2f30b4ab-d9e1-48bd-9384- > 44a506886fc1/localizeFiles.bbe5dc82-9fe9-4249-8964-df15212a1812.json > -rw-r--r-- 3 sam supergroup 2072 2017-03-27 12:47 > /user/sam/Build-shards-GRE/2f30b4ab-d9e1-48bd-9384- > 44a506886fc1/runtime.config.9dd1b585-c601-40b7-8831-25383013eb1e.jar > -rw-r--r-- 3 sam supergroup 48245414 2017-03-27 12:47 > /user/sam/Build-shards-GRE/2f30b4ab-d9e1-48bd-9384- > 44a506886fc1/twill.c765e4d8-958e-4811-b138-c4ef71e2a93e.jar > > > 2017-03-27 12:47:45,632 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.LocalizedResource: Resource > hdfs://pv34-search-dev/user/sam/Build-shards-GRE/2f30b4ab- > d9e1-48bd-9384-44a506886fc1/runtime.config.9dd1b585-c601- > 40b7-8831-25383013eb1e.jar(->/data/8/yarn/nm/usercache/sam/ > appcache/application_1484158548936_11282/filecache/ > 11/runtime.config.9dd1b585-c601-40b7-8831-25383013eb1e.jar) transitioned > from DOWNLOADING to LOCALIZED > 2017-03-27 12:47:45,645 INFO org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.LocalizedResource: Resource > hdfs://pv34-search-dev/user/sam/Build-shards-GRE/2f30b4ab- > d9e1-48bd-9384-44a506886fc1/launcher.4d7df397-5325-4a5f- > 8c95-ddcae99867f5.jar(->/data/10/yarn/nm/usercache/sam/ > appcache/application_1484158548936_11282/filecache/ > 12/launcher.4d7df397-5325-4a5f-8c95-ddcae99867f5.jar) transitioned from > DOWNLOADING to LOCALIZED > 2017-03-27 12:47:45,651 WARN org.apache.hadoop.security.UserGroupInformation: > PriviledgedActionException as:sam (auth:SIMPLE) cause:ENOENT: No such file > or directory > 2017-03-27 12:47:45,655 WARN org.apache.hadoop.yarn.server. > nodemanager.containermanager.localizer.ResourceLocalizationService: { > hdfs://pv34-search-dev/user/sam/Build-shards-GRE/2f30b4ab- > d9e1-48bd-9384-44a506886fc1/Build-shards-GRE- > bd5d893b401041edceec38c78f1ecec7-application.538b9590-d7f5-4121-824e-448a12a635c1.jar, > 1490644063924, ARCHIVE, null } failed: No such file or directory > ENOENT: No such file or directory > at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native > Method) > at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO. > java:230) > at org.apache.hadoop.fs.RawLocalFileSystem.setPermission( > RawLocalFileSystem.java:660) > at org.apache.hadoop.fs.DelegateToFileSystem.setPermission( > DelegateToFileSystem.java:206) > at org.apache.hadoop.fs.FilterFs.setPermission(FilterFs.java:251) > at org.apache.hadoop.fs.FileContext$10.next(FileContext.java:955) > at org.apache.hadoop.fs.FileContext$10.next(FileContext.java:951) > at org.apache.hadoop.fs.FSLinkResolver.resolve( > FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.setPermission( > FileContext.java:951) > > > > On Mar 27, 2017, at 12:45, Sam William wrote: > > > > Hi Terence, > > Im not able to get logs for these jobs. “yarn logs” command does nt > return anything. > > Sam > >> On Mar 26, 2017, at 17:32, Terence Yim wrote: > >> > >> Hi Sam, > >> > >> I guess it might be related to the missing of the Hadoop conf directory > in the container classpath, such that the locationfactory constructed from > the container side is not correct. Do you have access to the containers > stdout file? It shows the classpath twill uses. > >> > >> Terence > >> > >> Sent from my iPhone > >> > >>> On Mar 26, 2017, at 3:16 PM, Sam William wrote: > >>> > >>> It works with Twill-0.9.0. So far I have been able to narrow it down > to one commit > >>> > >>> 5986553 (TWILL-63) Speed up application launch time > >>> > >>> Let me see if can nail down to a particular change. > >>> > >>> Sam > >>> > >>> > On Mar 25, 2017, at 13:34, Sam William wrote: > > HI Terence, > Our cloudera installation is CDH-5.7 and I use hadoop 2.3.0 packages > for my fat jars. > > SAm > > On Mar 25, 2017, at 12:31, Terence Yi
question regarding BundleRunnable behavior in case of failure
Hello there, I am using BundleRunnable and so far my experience was that in case of failure of my runnable container process never exits. And my impression it pretty much all the time stuck in executing Shutdown hooks (not even my application specific) Here is ThreadDump for that thread: "TwillContainerService" #32 prio=5 os_prio=0 tid=0x7f1590297800 nid=0x4de2 in Object.wait() [0x7f15805d9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1249) - locked <0xff9c41d0> (a org.apache.twill.internal.ServiceMain$1) at java.lang.Thread.join(Thread.java:1323) at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106) at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46) at java.lang.Shutdown.runHooks(Shutdown.java:123) at java.lang.Shutdown.sequence(Shutdown.java:167) at java.lang.Shutdown.exit(Shutdown.java:212) - locked <0xff6363e8> (a java.lang.Class for java.lang.Shutdown) at java.lang.Runtime.exit(Runtime.java:109) at java.lang.System.exit(System.java:971) at org.apache.twill.ext.BundledJarRunnable.run(BundledJarRunnable.java:59) at org.apache.twill.internal.container.TwillContainerService.doRun(TwillContainerService.java:130) at org.apache.twill.internal.AbstractTwillService.run(AbstractTwillService.java:181) at twill.com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52) at java.lang.Thread.run(Thread.java:745) Just wonder if anybody else experienced similar. Thanks
[jira] [Commented] (TWILL-225) Allow using different configurations per application submission
[ https://issues.apache.org/jira/browse/TWILL-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933443#comment-15933443 ] Yuliya Feldman commented on TWILL-225: -- [~chtyim] Great idea, otherwise it has to be pretty much TwillRunnerService per "run" with any modification to Configuration object > Allow using different configurations per application submission > --- > > Key: TWILL-225 > URL: https://issues.apache.org/jira/browse/TWILL-225 > Project: Apache Twill > Issue Type: Improvement >Reporter: Terence Yim >Assignee: Terence Yim > Fix For: 0.11.0 > > > Currently there are couple configurations that can be provided via the hadoop > {{Configuration}} object to the {{YarnTwillRunnerService}}. However, those > configurations are global (same for all app launched through the same > {{TwillRunnerService}}). It would be better if the {{TwillPreparer}} exposes > method to alter the configuration for a given app submission. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-216) Make ratio between total memory and on-heap memory configurable
[ https://issues.apache.org/jira/browse/TWILL-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873375#comment-15873375 ] Yuliya Feldman commented on TWILL-216: -- Thank you guys for quick turnaround > Make ratio between total memory and on-heap memory configurable > --- > > Key: TWILL-216 > URL: https://issues.apache.org/jira/browse/TWILL-216 > Project: Apache Twill > Issue Type: Improvement > Components: yarn > Reporter: Yuliya Feldman > Assignee: Yuliya Feldman > Fix For: 0.10.0 > > > As of now ratio between on-heap memory and total memory provided to yarn > container is hardcoded to 0.7, so if app running in the container needs more > reserved memory than on-heap it is not possible to achieve. > Suggestion is to make it configurable as well as amount of reserved memory -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-216) Make ratio between total memory and on-heap memory configurable
[ https://issues.apache.org/jira/browse/TWILL-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872884#comment-15872884 ] Yuliya Feldman commented on TWILL-216: -- [~hsaputra] What is your concern regarding double versus float? We already use double for that ratio. > Make ratio between total memory and on-heap memory configurable > --- > > Key: TWILL-216 > URL: https://issues.apache.org/jira/browse/TWILL-216 > Project: Apache Twill > Issue Type: Improvement > Components: yarn > Reporter: Yuliya Feldman > Assignee: Yuliya Feldman > Fix For: 0.10.0 > > > As of now ratio between on-heap memory and total memory provided to yarn > container is hardcoded to 0.7, so if app running in the container needs more > reserved memory than on-heap it is not possible to achieve. > Suggestion is to make it configurable as well as amount of reserved memory -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-216) Make ratio between total memory and on-heap memory configurable
[ https://issues.apache.org/jira/browse/TWILL-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872281#comment-15872281 ] Yuliya Feldman commented on TWILL-216: -- Sorry, missed one style change. Will update PR in a minute > Make ratio between total memory and on-heap memory configurable > --- > > Key: TWILL-216 > URL: https://issues.apache.org/jira/browse/TWILL-216 > Project: Apache Twill > Issue Type: Improvement > Components: yarn > Reporter: Yuliya Feldman > Assignee: Yuliya Feldman > > As of now ratio between on-heap memory and total memory provided to yarn > container is hardcoded to 0.7, so if app running in the container needs more > reserved memory than on-heap it is not possible to achieve. > Suggestion is to make it configurable as well as amount of reserved memory -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-216) Make ratio between total memory and on-heap memory configurable
[ https://issues.apache.org/jira/browse/TWILL-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872275#comment-15872275 ] Yuliya Feldman commented on TWILL-216: -- [~chtyim] Thank you for the reviews, I have updated PR with latest changes, also squashed commits > Make ratio between total memory and on-heap memory configurable > --- > > Key: TWILL-216 > URL: https://issues.apache.org/jira/browse/TWILL-216 > Project: Apache Twill > Issue Type: Improvement > Components: yarn > Reporter: Yuliya Feldman > Assignee: Yuliya Feldman > > As of now ratio between on-heap memory and total memory provided to yarn > container is hardcoded to 0.7, so if app running in the container needs more > reserved memory than on-heap it is not possible to achieve. > Suggestion is to make it configurable as well as amount of reserved memory -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-216) Make ratio between total memory and on-heap memory configurable
[ https://issues.apache.org/jira/browse/TWILL-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871362#comment-15871362 ] Yuliya Feldman commented on TWILL-216: -- [~chtyim] Absolutely - I'll try to address your comments ASAP > Make ratio between total memory and on-heap memory configurable > --- > > Key: TWILL-216 > URL: https://issues.apache.org/jira/browse/TWILL-216 > Project: Apache Twill > Issue Type: Improvement > Components: yarn >Reporter: Yuliya Feldman >Assignee: Yuliya Feldman > > As of now ratio between on-heap memory and total memory provided to yarn > container is hardcoded to 0.7, so if app running in the container needs more > reserved memory than on-heap it is not possible to achieve. > Suggestion is to make it configurable as well as amount of reserved memory -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-216) Make ratio between total memory and on-heap memory configurable
[ https://issues.apache.org/jira/browse/TWILL-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871248#comment-15871248 ] Yuliya Feldman commented on TWILL-216: -- Sorry for not adding more details soon. Essentially at the moment Twill decides how much of the total requested memory to allocate to heap based on the hardcoded ratio of 0.7, meaning it will allocate at least 70% to heap. There could be applications that use direct memory quite a bit and they want to allocate more then 30% of total memory to be direct memory. This is a rational behind this JIRA. > Make ratio between total memory and on-heap memory configurable > --- > > Key: TWILL-216 > URL: https://issues.apache.org/jira/browse/TWILL-216 > Project: Apache Twill > Issue Type: Improvement > Components: yarn > Reporter: Yuliya Feldman > Assignee: Yuliya Feldman > > As of now ratio between on-heap memory and total memory provided to yarn > container is hardcoded to 0.7, so if app running in the container needs more > reserved memory than on-heap it is not possible to achieve. > Suggestion is to make it configurable as well as amount of reserved memory -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (TWILL-216) Make ratio between total memory and on-heap memory configurable
[ https://issues.apache.org/jira/browse/TWILL-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871248#comment-15871248 ] Yuliya Feldman edited comment on TWILL-216 at 2/17/17 6:24 AM: --- Sorry for not adding more details sooner. Essentially at the moment Twill decides how much of the total requested memory to allocate to heap based on the hardcoded ratio of 0.7, meaning it will allocate at least 70% to heap. There could be applications that use direct memory quite a bit and they want to allocate more then 30% of total memory to be direct memory. This is a rational behind this JIRA. was (Author: yufeldman): Sorry for not adding more details soon. Essentially at the moment Twill decides how much of the total requested memory to allocate to heap based on the hardcoded ratio of 0.7, meaning it will allocate at least 70% to heap. There could be applications that use direct memory quite a bit and they want to allocate more then 30% of total memory to be direct memory. This is a rational behind this JIRA. > Make ratio between total memory and on-heap memory configurable > --- > > Key: TWILL-216 > URL: https://issues.apache.org/jira/browse/TWILL-216 > Project: Apache Twill > Issue Type: Improvement > Components: yarn > Reporter: Yuliya Feldman > Assignee: Yuliya Feldman > > As of now ratio between on-heap memory and total memory provided to yarn > container is hardcoded to 0.7, so if app running in the container needs more > reserved memory than on-heap it is not possible to achieve. > Suggestion is to make it configurable as well as amount of reserved memory -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: Bundled jar ability to pick up local jars
Already figured Thanks, Yuliya On Wed, Feb 15, 2017 at 2:51 PM, Henry Saputra wrote: > Hi Yuliya, > > With bundled jar approach you need to include all your dependencies in that > app jar itself and the dependencies put in the "lib" directory specified by > setLibFolder. > > For example: > https://github.com/apache/twill/blob/2a8de333f4014fbcbcde826e507b9d > 8810554ffc/twill-examples/yarn/src/main/java/org/apache/ > twill/example/yarn/ > BundledJarExample.java > > Here is the link to source that explain the format of the bundled jar: > > https://github.com/apache/twill/blob/2a8de333f4014fbcbcde826e507b9d > 8810554ffc/twill-ext/src/main/java/org/apache/twill/ext/ > BundledJarRunner.java > > > - Henry > > On Tue, Feb 14, 2017 at 11:25 PM, Yuliya Feldman > wrote: > > > Let me even further rephrase the question > > > > What should be the structure of bundled jar > > > > It feels like it should be classes of the main jar + lib folder with > > additional jars - it can not be main jar + lib folder with jars, as in > this > > case main jar is not really loaded since it loads parent jar (one that is > > defined as "bundled") > > > > Thanks > > > > On Tue, Feb 14, 2017 at 6:54 PM, Yuliya Feldman > wrote: > > > > > Sorry, > > > > > > I probably was not clear. I understand that TwillContainer launcher > will > > > take classpath into consideration. > > > I was more wondering about bundledjar loading - we load it in a > separate > > > classloader, so everything has to be included into bundled jar, > otherwise > > > it does not seem to work, as it will be missing dependencies - nothing > is > > > loaded outside of the jar itself. > > > > > > Thanks > > > > > > On Tue, Feb 14, 2017 at 6:39 PM, Terence Yim wrote: > > > > > >> Hi, > > >> > > >> If a jar is already available on the node, you can use > > >> TwillPreparer.withClasspath to include those to the container > classpath. > > >> > > >> Terence > > >> > > >> Sent from my iPhone > > >> > > >> > On Feb 14, 2017, at 6:32 PM, Yuliya Feldman > > wrote: > > >> > > > >> > Hello there, > > >> > > > >> > I have a question regarding Bundled jar. > > >> > > > >> > Is there is anyway I could pick up some jar/config form the node > where > > >> it > > >> > is running so it is not prepackaged within bundled jar itself. > > >> > > > >> > Thanks > > >> > > > > > > > > >
Re: Bundled jar ability to pick up local jars
Let me even further rephrase the question What should be the structure of bundled jar It feels like it should be classes of the main jar + lib folder with additional jars - it can not be main jar + lib folder with jars, as in this case main jar is not really loaded since it loads parent jar (one that is defined as "bundled") Thanks On Tue, Feb 14, 2017 at 6:54 PM, Yuliya Feldman wrote: > Sorry, > > I probably was not clear. I understand that TwillContainer launcher will > take classpath into consideration. > I was more wondering about bundledjar loading - we load it in a separate > classloader, so everything has to be included into bundled jar, otherwise > it does not seem to work, as it will be missing dependencies - nothing is > loaded outside of the jar itself. > > Thanks > > On Tue, Feb 14, 2017 at 6:39 PM, Terence Yim wrote: > >> Hi, >> >> If a jar is already available on the node, you can use >> TwillPreparer.withClasspath to include those to the container classpath. >> >> Terence >> >> Sent from my iPhone >> >> > On Feb 14, 2017, at 6:32 PM, Yuliya Feldman wrote: >> > >> > Hello there, >> > >> > I have a question regarding Bundled jar. >> > >> > Is there is anyway I could pick up some jar/config form the node where >> it >> > is running so it is not prepackaged within bundled jar itself. >> > >> > Thanks >> > >
Re: Bundled jar ability to pick up local jars
Sorry, I probably was not clear. I understand that TwillContainer launcher will take classpath into consideration. I was more wondering about bundledjar loading - we load it in a separate classloader, so everything has to be included into bundled jar, otherwise it does not seem to work, as it will be missing dependencies - nothing is loaded outside of the jar itself. Thanks On Tue, Feb 14, 2017 at 6:39 PM, Terence Yim wrote: > Hi, > > If a jar is already available on the node, you can use > TwillPreparer.withClasspath to include those to the container classpath. > > Terence > > Sent from my iPhone > > > On Feb 14, 2017, at 6:32 PM, Yuliya Feldman wrote: > > > > Hello there, > > > > I have a question regarding Bundled jar. > > > > Is there is anyway I could pick up some jar/config form the node where it > > is running so it is not prepackaged within bundled jar itself. > > > > Thanks >
Bundled jar ability to pick up local jars
Hello there, I have a question regarding Bundled jar. Is there is anyway I could pick up some jar/config form the node where it is running so it is not prepackaged within bundled jar itself. Thanks
Re: java.lang.IncompatibleClassChangeError: Implementing class
It is a problem with guava versions difference Twill is using version of guava that is different from Hadoop one I would highly recommend create shaded jar with Twill libraries to shade guava On Fri, Feb 10, 2017 at 6:52 PM, Matteo Pelati wrote: > 2.7.3 > > And this is teh stacktrace: > > Exception in thread "main" java.lang.IncompatibleClassChangeError: > Implementing class > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:760) > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) > at java.net.URLClassLoader.access$100(URLClassLoader.java:73) > at java.net.URLClassLoader$1.run(URLClassLoader.java:368) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:361) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.twill.internal.zookeeper.DefaultZKClientService.( > DefaultZKClientService.java:98) > at > org.apache.twill.zookeeper.ZKClientService$Builder.build( > ZKClientService.java:101) > at > org.apache.twill.yarn.YarnTwillRunnerService.getZKClientService( > YarnTwillRunnerService.java:450) > at > org.apache.twill.yarn.YarnTwillRunnerService.( > YarnTwillRunnerService.java:164) > at > org.apache.twill.yarn.YarnTwillRunnerService.( > YarnTwillRunnerService.java:150) > at > com.dataheaps.beanszoo.utils.BundledJarExample.main( > BundledJarExample.java:72) > > > Thanks > Matteo > > On Sat, Feb 11, 2017 at 1:10 AM, Terence Yim wrote: > > > Hi, > > > > What is the Hadoop version you are using? And do you have the class name > > of the incompatible class involved? > > > > Terence > > > > Sent from my iPhone > > > > > On Feb 10, 2017, at 8:11 AM, Matteo Pelati > > wrote: > > > > > > Hello, > > > > > > I'm trying to use Twill in BeansZoo and I'm running into the following > > > excpetion when I try to run a basic application: > > > > > > java.lang.IncompatibleClassChangeError: Implementing class > > > > > > any hint ? > > > > > > Thanks > > > Matteo > > > > > > -- > > > Matteo Pelati > > > Phone: +65-91149676 > > > Skype: matteop1976 > > > > > > -- > Matteo Pelati > Phone: +65-91149676 > Skype: matteop1976 >
[jira] [Updated] (TWILL-216) Make ratio between total memory and on-heap memory configurable
[ https://issues.apache.org/jira/browse/TWILL-216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuliya Feldman updated TWILL-216: - Summary: Make ratio between total memory and on-heap memory configurable (was: Make ration between total memory and onheap memory configurable) > Make ratio between total memory and on-heap memory configurable > --- > > Key: TWILL-216 > URL: https://issues.apache.org/jira/browse/TWILL-216 > Project: Apache Twill > Issue Type: Improvement > Components: yarn > Reporter: Yuliya Feldman > Assignee: Yuliya Feldman > > As of now ratio between on-heap memory and total memory provided to yarn > container is hardcoded to 0.7, so if app running in the container needs more > reserved memory than on-heap it is not possible to achieve. > Suggestion is to make it configurable as well as amount of reserved memory -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (TWILL-216) Make ration between total memory and onheap memory configurable
Yuliya Feldman created TWILL-216: Summary: Make ration between total memory and onheap memory configurable Key: TWILL-216 URL: https://issues.apache.org/jira/browse/TWILL-216 Project: Apache Twill Issue Type: Improvement Components: yarn Reporter: Yuliya Feldman Assignee: Yuliya Feldman As of now ratio between on-heap memory and total memory provided to yarn container is hardcoded to 0.7, so if app running in the container needs more reserved memory than on-heap it is not possible to achieve. Suggestion is to make it configurable as well as amount of reserved memory -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TWILL-210) ServiceMain does not handle well URI without authority
[ https://issues.apache.org/jira/browse/TWILL-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843928#comment-15843928 ] Yuliya Feldman commented on TWILL-210: -- [~chtyim] Thank you. It was really quick. > ServiceMain does not handle well URI without authority > -- > > Key: TWILL-210 > URL: https://issues.apache.org/jira/browse/TWILL-210 > Project: Apache Twill > Issue Type: Bug > Components: yarn >Affects Versions: 0.8.0, 0.9.0 > Reporter: Yuliya Feldman >Assignee: Yuliya Feldman > Fix For: 0.10.0 > > > When figuring out defaultFS from path ServiceMain does not handle correctly > FileSystems that do not provide URI authority > E.g. maprfs:/// -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TWILL-210) ServiceMain does not handle well URI without authority
Yuliya Feldman created TWILL-210: Summary: ServiceMain does not handle well URI without authority Key: TWILL-210 URL: https://issues.apache.org/jira/browse/TWILL-210 Project: Apache Twill Issue Type: Bug Components: yarn Affects Versions: 0.9.0, 0.8.0 Reporter: Yuliya Feldman When figuring out defaultFS from path ServiceMain does not handle correctly FileSystems that do not provide URI authority E.g. maprfs:/// -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TWILL-203) irrespective of number of CPUs specified in App Config it is always 1
[ https://issues.apache.org/jira/browse/TWILL-203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuliya Feldman updated TWILL-203: - Attachment: rm.log updated rm.log snippet > irrespective of number of CPUs specified in App Config it is always 1 > - > > Key: TWILL-203 > URL: https://issues.apache.org/jira/browse/TWILL-203 > Project: Apache Twill > Issue Type: Bug > Components: yarn >Affects Versions: 0.8.0 > Reporter: Yuliya Feldman > Attachments: Screen Shot 2017-01-06 at 10.17.24 AM.png, rm.log, > twillclient.log > > > When trying to deploy Bundled Jar app and specifying number of CPUs > 1 it > still defaults to 1 when application is starting. > Version of YARN is: 2.7.2, Version of Twill is 0.8. Capacity scheduler. > Looks like (from the logs) all the info is passed through Twill correctly and > gets "lost" while getting to RM > Please see attached logs form Twill Client, RM and Screenshot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TWILL-203) irrespective of number of CPUs specified in App Config it is always 1
[ https://issues.apache.org/jira/browse/TWILL-203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuliya Feldman updated TWILL-203: - Attachment: (was: rm.log) > irrespective of number of CPUs specified in App Config it is always 1 > - > > Key: TWILL-203 > URL: https://issues.apache.org/jira/browse/TWILL-203 > Project: Apache Twill > Issue Type: Bug > Components: yarn >Affects Versions: 0.8.0 > Reporter: Yuliya Feldman > Attachments: Screen Shot 2017-01-06 at 10.17.24 AM.png, rm.log, > twillclient.log > > > When trying to deploy Bundled Jar app and specifying number of CPUs > 1 it > still defaults to 1 when application is starting. > Version of YARN is: 2.7.2, Version of Twill is 0.8. Capacity scheduler. > Looks like (from the logs) all the info is passed through Twill correctly and > gets "lost" while getting to RM > Please see attached logs form Twill Client, RM and Screenshot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: CPU count in Resource Spec
Created: https://issues.apache.org/jira/browse/TWILL-203 Thanks On Tue, Jan 10, 2017 at 11:15 PM, Yuliya Feldman wrote: > sure > > will do > > Thanks > > On Tue, Jan 10, 2017 at 11:11 PM, Terence Yim wrote: > >> Probably the apache mailing list does not allow attachment. Would you >> mind creating a JIRA an have it attached to the ticket? >> >> https://issues.apache.org/jira/browse/TWILL < >> https://issues.apache.org/jira/browse/TWILL> >> >> Terence >> >> > On Jan 10, 2017, at 11:09 PM, Yuliya Feldman wrote: >> > >> > sure >> > >> > attached >> > >> > Hopefully attachments are allowed >> > >> > On Tue, Jan 10, 2017 at 11:07 PM, Terence Yim > <mailto:cht...@gmail.com>> wrote: >> > Hi, >> > >> > I don’t see any attachment in your previous email. Would you mind >> attaching it again? >> > >> > Terence >> > > On Jan 10, 2017, at 11:02 PM, Yuliya Feldman > <mailto:yul...@dremio.com>> wrote: >> > > >> > > Capacity scheduler >> > > >> > > On Tue, Jan 10, 2017 at 11:00 PM, Terence Yim > <mailto:cht...@gmail.com>> wrote: >> > > >> > >> Hi, >> > >> >> > >> Do you know what resource schedule that YARN is running with? >> > >> >> > >> Terence >> > >> >> > >>> On Jan 6, 2017, at 10:32 AM, Yuliya Feldman > <mailto:yul...@dremio.com>> wrote: >> > >>> >> > >>> Yes, I did >> > >>> >> > >>> See attached >> > >>> >> > >>> On Fri, Jan 6, 2017 at 10:15 AM, Terence Yim > <mailto:cht...@gmail.com> > > >> cht...@gmail.com <mailto:cht...@gmail.com>>> wrote: >> > >>> Hi, >> > >>> >> > >>> We've never observe this before. The AM container itself is always >> > >> running >> > >>> with 1 vcore. Have you look at the YARN container info in the RM UI >> that >> > >> is >> > >>> having that particular TwillRunnable running inside? >> > >>> >> > >>> Terence >> > >>> >> > >>> On Fri, Jan 6, 2017 at 9:54 AM, Yuliya Feldman > <mailto:yul...@dremio.com> >> > >> <mailto:yul...@dremio.com <mailto:yul...@dremio.com>>> wrote: >> > >>> >> > >>>> yes >> > >>>> >> > >>>> On Fri, Jan 6, 2017 at 9:40 AM, Terence Yim > <mailto:cht...@gmail.com> > > >> cht...@gmail.com <mailto:cht...@gmail.com>>> wrote: >> > >>>> >> > >>>>> Where do you verify the CPU cores? Was it from the YARN resource >> > >> manager >> > >>>>> UI? >> > >>>>> >> > >>>>> Terence >> > >>>>> >> > >>>>> Sent from my iPhone >> > >>>>> >> > >>>>>> On Jan 6, 2017, at 9:01 AM, Yuliya Feldman > <mailto:yul...@dremio.com> >> > >> <mailto:yul...@dremio.com <mailto:yul...@dremio.com>>> wrote: >> > >>>>>> >> > >>>>>> Sorry for so many questions - unfortunately not much information >> I >> > >> can >> > >>>>> fish >> > >>>>>> in the wild :). >> > >>>>>> >> > >>>>>> I noticed that CPU count set on Application Specification is not >> > >>>> getting >> > >>>>>> reflected while starting container - it is always 1 CPU. >> > >>>>>> >> > >>>>>> Did anybody experience this? It really feels like it is lost >> > >> somewhere >> > >>>> in >> > >>>>>> between Twill and YARN. I am using YARN 2.7.2 >> > >>>>>> >> > >>>>>> Thanks in advance >> > >>>>> >> > >>>> >> > >>> >> > >> >> > >> >> > >> > >> >> >
[jira] [Created] (TWILL-203) irrespective of number of CPUs specified in App Config it is always 1
Yuliya Feldman created TWILL-203: Summary: irrespective of number of CPUs specified in App Config it is always 1 Key: TWILL-203 URL: https://issues.apache.org/jira/browse/TWILL-203 Project: Apache Twill Issue Type: Bug Components: yarn Affects Versions: 0.8.0 Reporter: Yuliya Feldman Attachments: Screen Shot 2017-01-06 at 10.17.24 AM.png, rm.log, twillclient.log When trying to deploy Bundled Jar app and specifying number of CPUs > 1 it still defaults to 1 when application is starting. Version of YARN is: 2.7.2, Version of Twill is 0.8. Capacity scheduler. Looks like (from the logs) all the info is passed through Twill correctly and gets "lost" while getting to RM Please see attached logs form Twill Client, RM and Screenshot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: CPU count in Resource Spec
sure will do Thanks On Tue, Jan 10, 2017 at 11:11 PM, Terence Yim wrote: > Probably the apache mailing list does not allow attachment. Would you mind > creating a JIRA an have it attached to the ticket? > > https://issues.apache.org/jira/browse/TWILL <https://issues.apache.org/ > jira/browse/TWILL> > > Terence > > > On Jan 10, 2017, at 11:09 PM, Yuliya Feldman wrote: > > > > sure > > > > attached > > > > Hopefully attachments are allowed > > > > On Tue, Jan 10, 2017 at 11:07 PM, Terence Yim cht...@gmail.com>> wrote: > > Hi, > > > > I don’t see any attachment in your previous email. Would you mind > attaching it again? > > > > Terence > > > On Jan 10, 2017, at 11:02 PM, Yuliya Feldman <mailto:yul...@dremio.com>> wrote: > > > > > > Capacity scheduler > > > > > > On Tue, Jan 10, 2017 at 11:00 PM, Terence Yim <mailto:cht...@gmail.com>> wrote: > > > > > >> Hi, > > >> > > >> Do you know what resource schedule that YARN is running with? > > >> > > >> Terence > > >> > > >>> On Jan 6, 2017, at 10:32 AM, Yuliya Feldman <mailto:yul...@dremio.com>> wrote: > > >>> > > >>> Yes, I did > > >>> > > >>> See attached > > >>> > > >>> On Fri, Jan 6, 2017 at 10:15 AM, Terence Yim <mailto:cht...@gmail.com> > >> cht...@gmail.com <mailto:cht...@gmail.com>>> wrote: > > >>> Hi, > > >>> > > >>> We've never observe this before. The AM container itself is always > > >> running > > >>> with 1 vcore. Have you look at the YARN container info in the RM UI > that > > >> is > > >>> having that particular TwillRunnable running inside? > > >>> > > >>> Terence > > >>> > > >>> On Fri, Jan 6, 2017 at 9:54 AM, Yuliya Feldman <mailto:yul...@dremio.com> > > >> <mailto:yul...@dremio.com <mailto:yul...@dremio.com>>> wrote: > > >>> > > >>>> yes > > >>>> > > >>>> On Fri, Jan 6, 2017 at 9:40 AM, Terence Yim <mailto:cht...@gmail.com> > >> cht...@gmail.com <mailto:cht...@gmail.com>>> wrote: > > >>>> > > >>>>> Where do you verify the CPU cores? Was it from the YARN resource > > >> manager > > >>>>> UI? > > >>>>> > > >>>>> Terence > > >>>>> > > >>>>> Sent from my iPhone > > >>>>> > > >>>>>> On Jan 6, 2017, at 9:01 AM, Yuliya Feldman <mailto:yul...@dremio.com> > > >> <mailto:yul...@dremio.com <mailto:yul...@dremio.com>>> wrote: > > >>>>>> > > >>>>>> Sorry for so many questions - unfortunately not much information I > > >> can > > >>>>> fish > > >>>>>> in the wild :). > > >>>>>> > > >>>>>> I noticed that CPU count set on Application Specification is not > > >>>> getting > > >>>>>> reflected while starting container - it is always 1 CPU. > > >>>>>> > > >>>>>> Did anybody experience this? It really feels like it is lost > > >> somewhere > > >>>> in > > >>>>>> between Twill and YARN. I am using YARN 2.7.2 > > >>>>>> > > >>>>>> Thanks in advance > > >>>>> > > >>>> > > >>> > > >> > > >> > > > > > >
Re: CPU count in Resource Spec
sure attached Hopefully attachments are allowed On Tue, Jan 10, 2017 at 11:07 PM, Terence Yim wrote: > Hi, > > I don’t see any attachment in your previous email. Would you mind > attaching it again? > > Terence > > On Jan 10, 2017, at 11:02 PM, Yuliya Feldman wrote: > > > > Capacity scheduler > > > > On Tue, Jan 10, 2017 at 11:00 PM, Terence Yim wrote: > > > >> Hi, > >> > >> Do you know what resource schedule that YARN is running with? > >> > >> Terence > >> > >>> On Jan 6, 2017, at 10:32 AM, Yuliya Feldman wrote: > >>> > >>> Yes, I did > >>> > >>> See attached > >>> > >>> On Fri, Jan 6, 2017 at 10:15 AM, Terence Yim >> cht...@gmail.com>> wrote: > >>> Hi, > >>> > >>> We've never observe this before. The AM container itself is always > >> running > >>> with 1 vcore. Have you look at the YARN container info in the RM UI > that > >> is > >>> having that particular TwillRunnable running inside? > >>> > >>> Terence > >>> > >>> On Fri, Jan 6, 2017 at 9:54 AM, Yuliya Feldman >> <mailto:yul...@dremio.com>> wrote: > >>> > >>>> yes > >>>> > >>>> On Fri, Jan 6, 2017 at 9:40 AM, Terence Yim >> cht...@gmail.com>> wrote: > >>>> > >>>>> Where do you verify the CPU cores? Was it from the YARN resource > >> manager > >>>>> UI? > >>>>> > >>>>> Terence > >>>>> > >>>>> Sent from my iPhone > >>>>> > >>>>>> On Jan 6, 2017, at 9:01 AM, Yuliya Feldman >> <mailto:yul...@dremio.com>> wrote: > >>>>>> > >>>>>> Sorry for so many questions - unfortunately not much information I > >> can > >>>>> fish > >>>>>> in the wild :). > >>>>>> > >>>>>> I noticed that CPU count set on Application Specification is not > >>>> getting > >>>>>> reflected while starting container - it is always 1 CPU. > >>>>>> > >>>>>> Did anybody experience this? It really feels like it is lost > >> somewhere > >>>> in > >>>>>> between Twill and YARN. I am using YARN 2.7.2 > >>>>>> > >>>>>> Thanks in advance > >>>>> > >>>> > >>> > >> > >> > >
Re: CPU count in Resource Spec
Capacity scheduler On Tue, Jan 10, 2017 at 11:00 PM, Terence Yim wrote: > Hi, > > Do you know what resource schedule that YARN is running with? > > Terence > > > On Jan 6, 2017, at 10:32 AM, Yuliya Feldman wrote: > > > > Yes, I did > > > > See attached > > > > On Fri, Jan 6, 2017 at 10:15 AM, Terence Yim cht...@gmail.com>> wrote: > > Hi, > > > > We've never observe this before. The AM container itself is always > running > > with 1 vcore. Have you look at the YARN container info in the RM UI that > is > > having that particular TwillRunnable running inside? > > > > Terence > > > > On Fri, Jan 6, 2017 at 9:54 AM, Yuliya Feldman <mailto:yul...@dremio.com>> wrote: > > > > > yes > > > > > > On Fri, Jan 6, 2017 at 9:40 AM, Terence Yim cht...@gmail.com>> wrote: > > > > > > > Where do you verify the CPU cores? Was it from the YARN resource > manager > > > > UI? > > > > > > > > Terence > > > > > > > > Sent from my iPhone > > > > > > > > > On Jan 6, 2017, at 9:01 AM, Yuliya Feldman <mailto:yul...@dremio.com>> wrote: > > > > > > > > > > Sorry for so many questions - unfortunately not much information I > can > > > > fish > > > > > in the wild :). > > > > > > > > > > I noticed that CPU count set on Application Specification is not > > > getting > > > > > reflected while starting container - it is always 1 CPU. > > > > > > > > > > Did anybody experience this? It really feels like it is lost > somewhere > > > in > > > > > between Twill and YARN. I am using YARN 2.7.2 > > > > > > > > > > Thanks in advance > > > > > > > > > > >
Re: How/does Twill can survive a restart of TwillClient
Yes, would be great to have deterministic way and not eventually consistent :) as it is now. I would imagine first pull from ZK should be inline and updates can be done as they come using watchers. On Fri, Jan 6, 2017 at 5:16 PM, Martin Serrano wrote: > I believe that https://issues.apache.org/jira/browse/TWILL-183 should > address this issue, correct? > > > On 12/25/2016 01:38 AM, Terence Yim wrote: > >> 2. The TwillRunner is decided to survive process restart with the ability >> to rediscover all the running twill applications via ZooKeeper. However, >> due to the natural of async operations in ZK, you might need to call >> "lookup" couple times before all the necessary information is synced up >> from ZK after the process restarted. >> >
Re: CPU count in Resource Spec
Yes, I did See attached On Fri, Jan 6, 2017 at 10:15 AM, Terence Yim wrote: > Hi, > > We've never observe this before. The AM container itself is always running > with 1 vcore. Have you look at the YARN container info in the RM UI that is > having that particular TwillRunnable running inside? > > Terence > > On Fri, Jan 6, 2017 at 9:54 AM, Yuliya Feldman wrote: > > > yes > > > > On Fri, Jan 6, 2017 at 9:40 AM, Terence Yim wrote: > > > > > Where do you verify the CPU cores? Was it from the YARN resource > manager > > > UI? > > > > > > Terence > > > > > > Sent from my iPhone > > > > > > > On Jan 6, 2017, at 9:01 AM, Yuliya Feldman > wrote: > > > > > > > > Sorry for so many questions - unfortunately not much information I > can > > > fish > > > > in the wild :). > > > > > > > > I noticed that CPU count set on Application Specification is not > > getting > > > > reflected while starting container - it is always 1 CPU. > > > > > > > > Did anybody experience this? It really feels like it is lost > somewhere > > in > > > > between Twill and YARN. I am using YARN 2.7.2 > > > > > > > > Thanks in advance > > > > > >
Re: CPU count in Resource Spec
yes On Fri, Jan 6, 2017 at 9:40 AM, Terence Yim wrote: > Where do you verify the CPU cores? Was it from the YARN resource manager > UI? > > Terence > > Sent from my iPhone > > > On Jan 6, 2017, at 9:01 AM, Yuliya Feldman wrote: > > > > Sorry for so many questions - unfortunately not much information I can > fish > > in the wild :). > > > > I noticed that CPU count set on Application Specification is not getting > > reflected while starting container - it is always 1 CPU. > > > > Did anybody experience this? It really feels like it is lost somewhere in > > between Twill and YARN. I am using YARN 2.7.2 > > > > Thanks in advance >
CPU count in Resource Spec
Sorry for so many questions - unfortunately not much information I can fish in the wild :). I noticed that CPU count set on Application Specification is not getting reflected while starting container - it is always 1 CPU. Did anybody experience this? It really feels like it is lost somewhere in between Twill and YARN. I am using YARN 2.7.2 Thanks in advance
logging within an application
Hello there, I have an issue with logging from my App that runs within container - no logging comes out except one that goes to System.out Here is what I have in stderr of the container (not sure if StaticBinder duplicate is an issue, but I will certainly try to fix it): OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/var/lib/hadoop/yarn/data/usercache/username/appcache/application_1483475903489_0036/container_1483475903489_0036_01_02/tmp/twill.launcher-1483721015382-0/lib/logback-classic-1.0.13.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder] 17/01/06 16:43:37 INFO utils.VerifiableProperties: Verifying properties 17/01/06 16:43:37 INFO utils.VerifiableProperties: Property metadata.broker.list is overridden to host:40040 17/01/06 16:43:37 INFO utils.VerifiableProperties: Property request.required.acks is overridden to 1 17/01/06 16:43:37 INFO utils.VerifiableProperties: Property partitioner.class is overridden to org.apache.twill.internal.kafka.client.IntegerPartitioner 17/01/06 16:43:37 INFO utils.VerifiableProperties: Property compression.codec is overridden to snappy 17/01/06 16:43:37 INFO utils.VerifiableProperties: Property key.serializer.class is overridden to org.apache.twill.internal.kafka.client.IntegerEncoder 17/01/06 16:43:37 INFO utils.VerifiableProperties: Property serializer.class is overridden to org.apache.twill.internal.kafka.client.ByteBufferEncoder 17/01/06 16:43:37 INFO client.ClientUtils$: Fetching metadata from broker id:0,host:host,port:40040 with correlation id 0 for 1 topic(s) Set(log) 17/01/06 16:43:37 INFO producer.SyncProducer: Connected to host:40040 for producing 17/01/06 16:43:37 INFO producer.SyncProducer: Disconnecting from host:40040 17/01/06 16:43:38 INFO producer.SyncProducer: Connected host:40040 for producing SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Jan 06, 2017 4:43:55 PM org.hibernate.validator.internal.util.Version INFO: HV01: Hibernate Validator 5.2.4.Final 17/01/06 16:44:02 INFO producer.Producer: Shutting down producer 17/01/06 16:44:02 INFO producer.ProducerPool: Closing all sync producers 17/01/06 16:44:02 INFO producer.SyncProducer: Disconnecting from host:40040 Thanks
Re: Retrieving failure when app terminates abnormally
Thank you Terrence On Fri, Jan 6, 2017 at 12:37 AM, Terence Yim wrote: > Hi, > > It depends on when the failure happen. If the app is already submitted to > YARN successfully and failed afterwards, it should still get reflected > through the Future returned by the teminate() call. However, due to > https://issues.apache.org/jira/browse/TWILL-180 < > https://issues.apache.org/jira/browse/TWILL-180>, it is currently not > reflected correctly. > > Terence > > > On Jan 6, 2017, at 12:30 AM, Yuliya Feldman wrote: > > > > Thank you > > > > What about retrieving error if it fails to start? > > > > On Fri, Jan 6, 2017 at 12:12 AM, Terence Yim wrote: > > > >> Hi, > >> > >> Currently the error can be retrieved via `TwillController.terminate(). > get()` > >> call, as stated in the javadoc of the `terminate()` method. > >> > >> * Calling this method multiple times is allowed and a {@link Future} > >> representing the termination state > >> * will be returned. > >> > >> Terence > >> > >>> On Jan 5, 2017, at 11:17 PM, Yuliya Feldman wrote: > >>> > >>> Hello there, > >>> > >>> I am trying to use async APIs to start/stop Twill managed Yarn > >> Application. > >>> > >>> I am using onRunning() and onTerminated() APIs for this, but I don't > see > >> a > >>> way of retrieving an error in case of failure > >>> > >>> public void onTerminated(final Runnable runnable, Executor executor) { > >>> this.addListener(new ServiceListenerAdapter() { > >>> public void failed(State from, Throwable failure) { > >>> runnable.run(); > >>> } > >>> > >>> public void terminated(State from) { > >>> runnable.run(); > >>> } > >>> }, executor); > >>> } > >>> > >>> > >>> Is there is a way of retrieving "Throwable failure" ? > >>> > >>> Or am I using wrong APIs? > >>> > >>> Thanks > >> > >> > >
Re: Retrieving failure when app terminates abnormally
Thank you What about retrieving error if it fails to start? On Fri, Jan 6, 2017 at 12:12 AM, Terence Yim wrote: > Hi, > > Currently the error can be retrieved via `TwillController.terminate().get()` > call, as stated in the javadoc of the `terminate()` method. > > * Calling this method multiple times is allowed and a {@link Future} > representing the termination state > * will be returned. > > Terence > > > On Jan 5, 2017, at 11:17 PM, Yuliya Feldman wrote: > > > > Hello there, > > > > I am trying to use async APIs to start/stop Twill managed Yarn > Application. > > > > I am using onRunning() and onTerminated() APIs for this, but I don't see > a > > way of retrieving an error in case of failure > > > > public void onTerminated(final Runnable runnable, Executor executor) { > > this.addListener(new ServiceListenerAdapter() { > >public void failed(State from, Throwable failure) { > > runnable.run(); > >} > > > >public void terminated(State from) { > > runnable.run(); > >} > > }, executor); > > } > > > > > > Is there is a way of retrieving "Throwable failure" ? > > > > Or am I using wrong APIs? > > > > Thanks > >
Retrieving failure when app terminates abnormally
Hello there, I am trying to use async APIs to start/stop Twill managed Yarn Application. I am using onRunning() and onTerminated() APIs for this, but I don't see a way of retrieving an error in case of failure public void onTerminated(final Runnable runnable, Executor executor) { this.addListener(new ServiceListenerAdapter() { public void failed(State from, Throwable failure) { runnable.run(); } public void terminated(State from) { runnable.run(); } }, executor); } Is there is a way of retrieving "Throwable failure" ? Or am I using wrong APIs? Thanks
Re: How/does Twill can survive a restart of TwillClient
I have created: https://issues.apache.org/jira/browse/TWILL-202 to limit time on waiting for resources from YARN to allow process requests in queue. Thanks, Yuliya On Sat, Dec 24, 2016 at 11:24 PM, Yuliya wrote: > Thank you for the replies > > Comments inline > > > On Dec 24, 2016, at 10:38 PM, Terence Yim wrote: > > > > Hi, > > > > 1. I see what you mean now. The reason why Twill currently wait for all > the > > requested containers up and running before changing the number of > > containers again is mainly to provide a more deterministic state > transition > > for runnable lifecyle, in case the application logic is sensitive to > number > > of instances. However, I do agree that Twill can provide more flexible > way > > to let the application to decide whether waiting is needed or not. Would > > you mind opening a JIRA for the improvement? > > I will open a JIRA - thank you > > > > 2. The TwillRunner is decided to survive process restart with the ability > > to rediscover all the running twill applications via ZooKeeper. However, > > due to the natural of async operations in ZK, you might need to call > > "lookup" couple times before all the necessary information is synced up > > from ZK after the process restarted. > Interesting - let me try - it does not look like this from the code, but I > may be missing something > > > > Terence > > > >> On Fri, Dec 23, 2016 at 11:18 AM, Yuliya Feldman > wrote: > >> > >> Thank you very much for the reply > >> Please see inline > >> > >> > >>> On Fri, Dec 23, 2016 at 11:10 AM, Terence Yim > wrote: > >>> > >>> Hi, > >>> > >>> 1. It really depends on how much resources that your application need. > >>> Twill simply act as a bridge between your app and YARN, however, the > YARN > >>> cluster itself needs to have enough resources (memory and vcores) to > run > >>> your application. > >> I definitely agree that YARN should have capacity. What I am trying to > say > >> is that if I want to change my mind and resize 2nd time before 1st > request > >> was satisfied I can not do it. What if I mistyped number of requested > >> containers - put 100 instead of 10 and YARN will never have this > capacity. > >> If I change back to 10 it won't change it unless 100 is satisfied. > >> > >>> > >>> 2. You should be able to do that through the TwillRunner.lookup method. > >> Do > >>> you mean you tried but it doesn't return anything? > >> TwillRunner.lookup works ONLY if application that uses > TwillRunner.lookup > >> (YARN/Twill client another words) NEVER restarted - if it restarted all > the > >> information is lost and I am not sure how to make TwillRunner to obtain > it > >> again from running cluster. > >> > >>> > >>> Terence > >>> > >>>> On Thu, Dec 22, 2016 at 2:20 PM, Yuliya Feldman > >>> wrote: > >>> > >>>> Hello there, > >>>> > >>>> I started using Twill recently and I came across couple of issues I > >>> wanted > >>>> to check on: > >>>> > >>>> 1. If I resize YARN cluster to more capacity it can handle I can't > >> resize > >>>> down, as it did not satisfy first request > >>>> > >>>> 2. If my application that spawns up Twill YARN Cluster restarts > >> (meaning > >>> I > >>>> am losing YarnTwillRunnerService) I can not get hold of the > >>> TwillController > >>>> after it even I know runId and what not. > >>>> > >>>> Could anybody advise/confirm/deny on the issues I am seeing? > >>>> > >>>> Thanks in advance > >> >
[jira] [Created] (TWILL-202) TwillAppMaster should not wait indefinitely for resources in case of changeInstance() request
Yuliya Feldman created TWILL-202: Summary: TwillAppMaster should not wait indefinitely for resources in case of changeInstance() request Key: TWILL-202 URL: https://issues.apache.org/jira/browse/TWILL-202 Project: Apache Twill Issue Type: Improvement Components: core, zookeeper Reporter: Yuliya Feldman changeInstances() in it's core creates sequential znode that TwillAppMaster processes in order and until current;y processed request is not satisfied rest of the changeInstances() requests will be sitting not processed. The problem comes up if user made unreasonable changeInstances() request that will be never satisfied with current YARN capacity. User could have mistyped something. I suggest to at least put a limit on waiting time for resources to check if there are more requests pending in line to abandon current one and proceed to next. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: How/does Twill can survive a restart of TwillClient
Thank you very much for the reply Please see inline On Fri, Dec 23, 2016 at 11:10 AM, Terence Yim wrote: > Hi, > > 1. It really depends on how much resources that your application need. > Twill simply act as a bridge between your app and YARN, however, the YARN > cluster itself needs to have enough resources (memory and vcores) to run > your application. > I definitely agree that YARN should have capacity. What I am trying to say is that if I want to change my mind and resize 2nd time before 1st request was satisfied I can not do it. What if I mistyped number of requested containers - put 100 instead of 10 and YARN will never have this capacity. If I change back to 10 it won't change it unless 100 is satisfied. > > 2. You should be able to do that through the TwillRunner.lookup method. Do > you mean you tried but it doesn't return anything? > TwillRunner.lookup works ONLY if application that uses TwillRunner.lookup (YARN/Twill client another words) NEVER restarted - if it restarted all the information is lost and I am not sure how to make TwillRunner to obtain it again from running cluster. > > Terence > > On Thu, Dec 22, 2016 at 2:20 PM, Yuliya Feldman wrote: > > > Hello there, > > > > I started using Twill recently and I came across couple of issues I > wanted > > to check on: > > > > 1. If I resize YARN cluster to more capacity it can handle I can't resize > > down, as it did not satisfy first request > > > > 2. If my application that spawns up Twill YARN Cluster restarts (meaning > I > > am losing YarnTwillRunnerService) I can not get hold of the > TwillController > > after it even I know runId and what not. > > > > Could anybody advise/confirm/deny on the issues I am seeing? > > > > Thanks in advance > > >
How/does Twill can survive a restart of TwillClient
Hello there, I started using Twill recently and I came across couple of issues I wanted to check on: 1. If I resize YARN cluster to more capacity it can handle I can't resize down, as it did not satisfy first request 2. If my application that spawns up Twill YARN Cluster restarts (meaning I am losing YarnTwillRunnerService) I can not get hold of the TwillController after it even I know runId and what not. Could anybody advise/confirm/deny on the issues I am seeing? Thanks in advance