Re: Question for Mesos gurus
Thank you Adam When you say:but before you upgrade Mesos to 0.23, you should upgrade your scheduler (and executor) libmesos to 0.22.x Do you mean - recompile? Does this sentence from link with upgrade instructions you provided means the same? Rebuild and install any modules so that upgraded masters/slaves can use them Thanks,Yuliya From: Adam Bordelon a...@mesosphere.io To: dev@myriad.incubator.apache.org; yuliya Feldman yufeld...@yahoo.com Sent: Tuesday, August 25, 2015 10:06 AM Subject: Re: Question for Mesos gurus Mesos guarantees forward and backward compatibility by one minor version. It is expected that you upgrade the entire cluster to one consecutive version before upgrading any component to the next. So, if your scheduler jar's libmesos is from 0.21.x, you can upgrade your Mesos master/agents to 0.22.x safely, but before you upgrade Mesos to 0.23, you should upgrade your scheduler (and executor) libmesos to 0.22.x. See http://mesos.apache.org/documentation/latest/upgrades/ for other special notes and recommended upgrade order. Once we reach Mesos 1.0 (when the new HTTP API stabilizes), then we'll have stronger guarantees about version compatibility within a major version. On Tue, Aug 25, 2015 at 8:33 AM, yuliya Feldman yufeld...@yahoo.com.invalid wrote: Hello guys, I wonder about compatibility of Mesos protobuf for Myriad usage. If I complied Myriad with Mesos version 0.22.1/0.21.1 but on the cluster I have Mesos 0.23 - is it suppose to be compatible? Yesterday our guys came across an exception(see below). When switching jars to mesos-0.21.1 issue went away. Thanks,Yuliya 15/08/24 10:57:40 INFO scheduler.TaskFactory$NMTaskFactoryImpl: yarn.resourcemanager.hostname is set to rm.marathon.mesos via YARN_RESOURCEMANAGER_OPTS. Passing it into YARN_NODEMANAGER_OPTS. Aug 24, 2015 10:57:40 AM com.lmax.disruptor.FatalExceptionHandler handleEventException SEVERE: Exception processing: 1 com.ebay.myriad.scheduler.event.ResourceOffersEvent@74a1e0a5 java.lang.NoSuchMethodError: org.apache.mesos.Protos$TaskInfo$Builder.setData(Lcom/google/protobuf/ByteString;)Lorg/apache/mesos/Protos$TaskInfo$Builder; at com.ebay.myriad.scheduler.TaskFactory$NMTaskFactoryImpl.createTask(TaskFactory.java:310) at com.ebay.myriad.scheduler.event.handlers.ResourceOffersEventHandler.onEvent(ResourceOffersEventHandler.java:98) at com.ebay.myriad.scheduler.event.handlers.ResourceOffersEventHandler.onEvent(ResourceOffersEventHandler.java:55) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/08/24 10:57:40 ERROR yarn.YarnUncaughtExceptionHandler: Thread Thread[pool-2-thread-3,5,main] threw an Exception. java.lang.RuntimeException: java.lang.NoSuchMethodError: org.apache.mesos.Protos$TaskInfo$Builder.setData(Lcom/google/protobuf/ByteString;)Lorg/apache/mesos/Protos$TaskInfo$Builder; at com.lmax.disruptor.FatalExceptionHandler.handleEventException(FatalExceptionHandler.java:45) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:147) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoSuchMethodError: org.apache.mesos.Protos$TaskInfo$Builder.setData(Lcom/google/protobuf/ByteString;)Lorg/apache/mesos/Protos$TaskInfo$Builder; at com.ebay.myriad.scheduler.TaskFactory$NMTaskFactoryImpl.createTask(TaskFactory.java:310) at com.ebay.myriad.scheduler.event.handlers.ResourceOffersEventHandler.onEvent(ResourceOffersEventHandler.java:98) at com.ebay.myriad.scheduler.event.handlers.ResourceOffersEventHandler.onEvent(ResourceOffersEventHandler.java:55) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) ... 3 more
Re: Question for Mesos gurus
Yes, you'll have to recompile your scheduler/executor against the latest libmesos. In the upgrade guide, this is mentioned as Upgrade the schedulers by linking the latest native library / jar / egg (if necessary). The modules instructions apply to C++ plugins for the Mesos master/slaves themselves. On Tue, Aug 25, 2015 at 10:52 AM, yuliya Feldman yufeld...@yahoo.com.invalid wrote: Thank you Adam When you say:but before you upgrade Mesos to 0.23, you should upgrade your scheduler (and executor) libmesos to 0.22.x Do you mean - recompile? Does this sentence from link with upgrade instructions you provided means the same? Rebuild and install any modules so that upgraded masters/slaves can use them Thanks,Yuliya From: Adam Bordelon a...@mesosphere.io To: dev@myriad.incubator.apache.org; yuliya Feldman yufeld...@yahoo.com Sent: Tuesday, August 25, 2015 10:06 AM Subject: Re: Question for Mesos gurus Mesos guarantees forward and backward compatibility by one minor version. It is expected that you upgrade the entire cluster to one consecutive version before upgrading any component to the next. So, if your scheduler jar's libmesos is from 0.21.x, you can upgrade your Mesos master/agents to 0.22.x safely, but before you upgrade Mesos to 0.23, you should upgrade your scheduler (and executor) libmesos to 0.22.x. See http://mesos.apache.org/documentation/latest/upgrades/ for other special notes and recommended upgrade order. Once we reach Mesos 1.0 (when the new HTTP API stabilizes), then we'll have stronger guarantees about version compatibility within a major version. On Tue, Aug 25, 2015 at 8:33 AM, yuliya Feldman yufeld...@yahoo.com.invalid wrote: Hello guys, I wonder about compatibility of Mesos protobuf for Myriad usage. If I complied Myriad with Mesos version 0.22.1/0.21.1 but on the cluster I have Mesos 0.23 - is it suppose to be compatible? Yesterday our guys came across an exception(see below). When switching jars to mesos-0.21.1 issue went away. Thanks,Yuliya 15/08/24 10:57:40 INFO scheduler.TaskFactory$NMTaskFactoryImpl: yarn.resourcemanager.hostname is set to rm.marathon.mesos via YARN_RESOURCEMANAGER_OPTS. Passing it into YARN_NODEMANAGER_OPTS. Aug 24, 2015 10:57:40 AM com.lmax.disruptor.FatalExceptionHandler handleEventException SEVERE: Exception processing: 1 com.ebay.myriad.scheduler.event.ResourceOffersEvent@74a1e0a5 java.lang.NoSuchMethodError: org.apache.mesos.Protos$TaskInfo$Builder.setData(Lcom/google/protobuf/ByteString;)Lorg/apache/mesos/Protos$TaskInfo$Builder; at com.ebay.myriad.scheduler.TaskFactory$NMTaskFactoryImpl.createTask(TaskFactory.java:310) at com.ebay.myriad.scheduler.event.handlers.ResourceOffersEventHandler.onEvent(ResourceOffersEventHandler.java:98) at com.ebay.myriad.scheduler.event.handlers.ResourceOffersEventHandler.onEvent(ResourceOffersEventHandler.java:55) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/08/24 10:57:40 ERROR yarn.YarnUncaughtExceptionHandler: Thread Thread[pool-2-thread-3,5,main] threw an Exception. java.lang.RuntimeException: java.lang.NoSuchMethodError: org.apache.mesos.Protos$TaskInfo$Builder.setData(Lcom/google/protobuf/ByteString;)Lorg/apache/mesos/Protos$TaskInfo$Builder; at com.lmax.disruptor.FatalExceptionHandler.handleEventException(FatalExceptionHandler.java:45) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:147) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoSuchMethodError: org.apache.mesos.Protos$TaskInfo$Builder.setData(Lcom/google/protobuf/ByteString;)Lorg/apache/mesos/Protos$TaskInfo$Builder; at com.ebay.myriad.scheduler.TaskFactory$NMTaskFactoryImpl.createTask(TaskFactory.java:310) at com.ebay.myriad.scheduler.event.handlers.ResourceOffersEventHandler.onEvent(ResourceOffersEventHandler.java:98) at com.ebay.myriad.scheduler.event.handlers.ResourceOffersEventHandler.onEvent(ResourceOffersEventHandler.java:55) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) ... 3 more
Question for Mesos gurus
Hello guys, I wonder about compatibility of Mesos protobuf for Myriad usage. If I complied Myriad with Mesos version 0.22.1/0.21.1 but on the cluster I have Mesos 0.23 - is it suppose to be compatible? Yesterday our guys came across an exception(see below). When switching jars to mesos-0.21.1 issue went away. Thanks,Yuliya 15/08/24 10:57:40 INFO scheduler.TaskFactory$NMTaskFactoryImpl: yarn.resourcemanager.hostname is set to rm.marathon.mesos via YARN_RESOURCEMANAGER_OPTS. Passing it into YARN_NODEMANAGER_OPTS. Aug 24, 2015 10:57:40 AM com.lmax.disruptor.FatalExceptionHandler handleEventException SEVERE: Exception processing: 1 com.ebay.myriad.scheduler.event.ResourceOffersEvent@74a1e0a5 java.lang.NoSuchMethodError: org.apache.mesos.Protos$TaskInfo$Builder.setData(Lcom/google/protobuf/ByteString;)Lorg/apache/mesos/Protos$TaskInfo$Builder; at com.ebay.myriad.scheduler.TaskFactory$NMTaskFactoryImpl.createTask(TaskFactory.java:310) at com.ebay.myriad.scheduler.event.handlers.ResourceOffersEventHandler.onEvent(ResourceOffersEventHandler.java:98) at com.ebay.myriad.scheduler.event.handlers.ResourceOffersEventHandler.onEvent(ResourceOffersEventHandler.java:55) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/08/24 10:57:40 ERROR yarn.YarnUncaughtExceptionHandler: Thread Thread[pool-2-thread-3,5,main] threw an Exception. java.lang.RuntimeException: java.lang.NoSuchMethodError: org.apache.mesos.Protos$TaskInfo$Builder.setData(Lcom/google/protobuf/ByteString;)Lorg/apache/mesos/Protos$TaskInfo$Builder; at com.lmax.disruptor.FatalExceptionHandler.handleEventException(FatalExceptionHandler.java:45) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:147) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoSuchMethodError: org.apache.mesos.Protos$TaskInfo$Builder.setData(Lcom/google/protobuf/ByteString;)Lorg/apache/mesos/Protos$TaskInfo$Builder; at com.ebay.myriad.scheduler.TaskFactory$NMTaskFactoryImpl.createTask(TaskFactory.java:310) at com.ebay.myriad.scheduler.event.handlers.ResourceOffersEventHandler.onEvent(ResourceOffersEventHandler.java:98) at com.ebay.myriad.scheduler.event.handlers.ResourceOffersEventHandler.onEvent(ResourceOffersEventHandler.java:55) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) ... 3 more
[jira] [Updated] (MYRIAD-7) Run MyriadMesosScheduler inside YARN's resource manager JVM.
[ https://issues.apache.org/jira/browse/MYRIAD-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santosh Marella updated MYRIAD-7: - Issue Type: Improvement (was: Bug) Run MyriadMesosScheduler inside YARN's resource manager JVM. Key: MYRIAD-7 URL: https://issues.apache.org/jira/browse/MYRIAD-7 Project: Myriad Issue Type: Improvement Reporter: Santosh Marella Assignee: Santosh Marella The objective of this change is to run MyriadMesosScheduler inside YARN's resource manager JVM. 1. Added Myriad{Fair,Capacity,Fifo}Scheduler classes that extend from Yarn's {Fair,Capacity,Fifo}Scheduler classes respectively. 2. Added build dependencies on Hadoop. 3. Encountered slf4j's multiple binding conflicts because drop wizard uses logback implementation while YARN uses slf4j-log4j12 implementation for slf4j API. After discussions with Mohit, we've decided to remove dependencies on drop wizard. 4. Refactored the rest of the code to that effect. Retained non-conflicting deps like codahale metrics, jackson's dataformat/databind/annotations/yaml and hibernate validator. 5. Added build rules to package the myriad jar and the dependencies to build/libs dir. These DO NOT INCLUDE hadoop jars and their dependencies as the intent is to deploy myriad jar and it's deps into an existing YARN installation. Build and Deployment guidelines: 1. From myriad dir in the local git repo, run ./gradlew jar. 2. This should produce myriad-0.0.1.jar and other deps under build/libs dir. 3. Copy build/libs/*.jar to HADOOP_HOME/share/hadoop/yarn. 4. Modify HADOOP_HOME/etc/hadoop/yarn-site.xml to have the following entry: {code:xml} property nameyarn.resourcemanager.scheduler.class/name valuecom.ebay.myriad.scheduler.yarn.MyriadFairScheduler/value /property {code} 5. Restart Resource Manager process. CAVEATS: - The REST API to myriad is currently broken as we eliminated the dependencies on dropwizard. We need to bring up a web app for myriad and expose the REST API through that. - Mesos also requires a native library (libmesos.so). Our build process currently does not add that to myriad jar. We need to manually add that to YARN's native lib dir $HADOOP_HOME/lib/native. I'll open tasks for both the above and track them separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (MYRIAD-7) Run MyriadMesosScheduler inside YARN's resource manager JVM.
[ https://issues.apache.org/jira/browse/MYRIAD-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santosh Marella reopened MYRIAD-7: -- Assignee: Santosh Marella Run MyriadMesosScheduler inside YARN's resource manager JVM. Key: MYRIAD-7 URL: https://issues.apache.org/jira/browse/MYRIAD-7 Project: Myriad Issue Type: Bug Reporter: Santosh Marella Assignee: Santosh Marella The objective of this change is to run MyriadMesosScheduler inside YARN's resource manager JVM. 1. Added Myriad{Fair,Capacity,Fifo}Scheduler classes that extend from Yarn's {Fair,Capacity,Fifo}Scheduler classes respectively. 2. Added build dependencies on Hadoop. 3. Encountered slf4j's multiple binding conflicts because drop wizard uses logback implementation while YARN uses slf4j-log4j12 implementation for slf4j API. After discussions with Mohit, we've decided to remove dependencies on drop wizard. 4. Refactored the rest of the code to that effect. Retained non-conflicting deps like codahale metrics, jackson's dataformat/databind/annotations/yaml and hibernate validator. 5. Added build rules to package the myriad jar and the dependencies to build/libs dir. These DO NOT INCLUDE hadoop jars and their dependencies as the intent is to deploy myriad jar and it's deps into an existing YARN installation. Build and Deployment guidelines: 1. From myriad dir in the local git repo, run ./gradlew jar. 2. This should produce myriad-0.0.1.jar and other deps under build/libs dir. 3. Copy build/libs/*.jar to HADOOP_HOME/share/hadoop/yarn. 4. Modify HADOOP_HOME/etc/hadoop/yarn-site.xml to have the following entry: {code:xml} property nameyarn.resourcemanager.scheduler.class/name valuecom.ebay.myriad.scheduler.yarn.MyriadFairScheduler/value /property {code} 5. Restart Resource Manager process. CAVEATS: - The REST API to myriad is currently broken as we eliminated the dependencies on dropwizard. We need to bring up a web app for myriad and expose the REST API through that. - Mesos also requires a native library (libmesos.so). Our build process currently does not add that to myriad jar. We need to manually add that to YARN's native lib dir $HADOOP_HOME/lib/native. I'll open tasks for both the above and track them separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MYRIAD-55) Add a API to destroy a Myriad/YARN cluster
[ https://issues.apache.org/jira/browse/MYRIAD-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santosh Marella updated MYRIAD-55: -- Issue Type: Improvement (was: Bug) Add a API to destroy a Myriad/YARN cluster Key: MYRIAD-55 URL: https://issues.apache.org/jira/browse/MYRIAD-55 Project: Myriad Issue Type: Improvement Reporter: Santosh Marella This is similar to destroy option in Marathon. We need a way to distinguish between accidental death of ResourceManager vs an explicit request from admin to shutdown the YARN cluster (both RM and the NMs that were launched by the framework). In the former case, Mesos needs to wait until a new instance of the framework connects back and the framework's HA should kick in. In the latter case, the framework should tell mesos that it wants to shut down and it should shut down all the tasks (NMs) that it previously launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)