from:"John Omernik"

[jira] [Commented] (MYRIAD-96) Support network isolation between distinct YARN clusters using overlay networks

2016-10-08 Thread John Omernik (JIRA)


[ 
https://issues.apache.org/jira/browse/MYRIAD-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557947#comment-15557947
 ] 

John Omernik commented on MYRIAD-96:


All, I've implemented Calico Networks on my Mesos cluster. Now, I understand 
the concerns about performance in overlays, however, based on this 
https://www.projectcalico.org/calico-dataplane-performance/ I think performance 
will be more than acceptable for multiple Yarn clusters running together. In 
general, I think with the CNI interface, we should just allow Myriad to start 
node managers (in docker or out of docker) with the CNI based setup, (with 
Docker using the unified containerizer) so if people want to use Calico or 
other overlay networks they can.  Based on my exp with Calico and Marathon now, 
I think it could this could be a wonderful match and really assist some of the 
port management aspect of multi-tenancy.  

> Support network isolation between distinct YARN clusters using overlay 
> networks
> ---
>
> Key: MYRIAD-96
> URL: https://issues.apache.org/jira/browse/MYRIAD-96
> Project: Myriad
>  Issue Type: Improvement
>Reporter: Swapnil Daingade
>Assignee: Swapnil Daingade
>
> * Enable creation of a overlay networks per tenant YARN cluster using a 
> virtual switch (like Open vSwitch)
> * Connect different docker containers belonging to a cluster to the same 
> overlay network.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Challenges after MapR 5.1 Upgrade.

2016-04-06 Thread John Omernik

Thanks Darin, and thank you to Yuliya as well. Darin, your fix worked,
Yuliya helped me troubleshoot what appears to be a failed MapR 5.1
upgrade.  Basically many of the libs that MapR links into the hadoop
classpath didn't get linked. I reinstalled hadoop components and all seems
well now.

Thanks!

John

On Mon, Apr 4, 2016 at 7:57 PM, Darin Johnson 
wrote:

> Hey John,
>
> I noticed these lines in your yarn-site.xml:
> 
> 
> yarn.scheduler.minimum-allocation-mb
> 512
>
> 
>
> 
> yarn.scheduler.minimum-allocation-vcores
>
> 1
> 
>
> If your attempting to launch a zero resource nodemanager for fgs that will
> result in the first stack trace.  Both should be explicitly 0 for that
> feature to work (defaults are 1024 and 1 resp, which will fail).  You do
> have them set below to 0, however I'm in certain which would take
> precedence.
> On Apr 4, 2016 5:19 PM, "John Omernik"  wrote:
>
> > This was a Upgrade from 5.0.  I will post here, note: I have removed the
> > mapr_shuffle to get node managers to work, however, I am seeing other odd
> > things, so any help would be appreciated.
> >
> > 
> > 
> >
> > 
> > 
> > yarn.nodemanager.aux-services
> > mapreduce_shuffle,myriad_executor
> > 
> > 
> > 
> > yarn.resourcemanager.hostname
> > myriadprod.marathonprod.mesos
> > 
> > 
> >
>  yarn.nodemanager.aux-services.mapreduce_shuffle.class
> > org.apache.hadoop.mapred.ShuffleHandler
> > 
> > 
> > yarn.nodemanager.aux-services.myriad_executor.class
> >
>  org.apache.myriad.executor.MyriadExecutorAuxService
> > 
> > 
> > yarn.nm.liveness-monitor.expiry-interval-ms
> > 2000
> > 
> > 
> > yarn.am.liveness-monitor.expiry-interval-ms
> > 1
> > 
> > 
> > yarn.resourcemanager.nm.liveness-monitor.interval-ms
> > 1000
> > 
> > 
> > 
> > yarn.nodemanager.resource.cpu-vcores
> > ${nodemanager.resource.cpu-vcores}
> > 
> > 
> > yarn.nodemanager.resource.memory-mb
> > ${nodemanager.resource.memory-mb}
> > 
> >
> > 
> > 
> > yarn.scheduler.minimum-allocation-mb
> > 512
> > 
> >
> > 
> > yarn.scheduler.minimum-allocation-vcores
> > 1
> > 
> >
> >
> > 
> >   
> >
> > yarn.nodemanager.address
> > ${myriad.yarn.nodemanager.address}
> > 
> > 
> > yarn.nodemanager.webapp.address
> > ${myriad.yarn.nodemanager.webapp.address}
> > 
> > 
> > yarn.nodemanager.webapp.https.address
> > ${myriad.yarn.nodemanager.webapp.address}
> > 
> > 
> > yarn.nodemanager.localizer.address
> > ${myriad.yarn.nodemanager.localizer.address}
> > 
> >
> > 
> > 
> > yarn.resourcemanager.scheduler.class
> >
>  org.apache.myriad.scheduler.yarn.MyriadFairScheduler
> > 
> >
> > 
> >
> > yarn.scheduler.minimum-allocation-vcores
> > 0
> > 
> > 
> > yarn.scheduler.minimum-allocation-vcores
> > 0
> > 
> > 
> > 
> > Who will execute(launch) the
> containers.
> > yarn.nodemanager.container-executor.class
> > ${yarn.nodemanager.container-executor.class}
> > 
> > 
> > The class which should help the LCE handle
> > resources.
> >
> >
> >
> yarn.nodemanager.linux-container-executor.resources-handler.class
> >
> >
> >
> ${yarn.nodemanager.linux-container-executor.resources-handler.class}
> > 
> > 
> >
> > yarn.nodemanager.linux-container-executor.cgroups.hierarchy
> >
> >
> >
> ${yarn.nodemanager.linux-container-executor.cgroups.hierarchy}
> > 
> > 
> >
> > yarn.nodemanager.linux-container-executor.cgroups.mount
> >
> > ${yarn.nodemanager.linux-container-executor.cgroups.mount}
> > 
> > 
> >
> > yarn.nodemanager.linux-container-executor.cgroups.mount-path
> >
> >
> >
> ${yarn.nodemanager.linux-container-executor.cgroups.mount-path}
> > 
> > 
> > yarn.nodemanage

[jira] [Commented] (MYRIAD-195) Node Managers randomly die on Mesos

2016-04-06 Thread John Omernik (JIRA)


[ 
https://issues.apache.org/jira/browse/MYRIAD-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228178#comment-15228178
 ] 

John Omernik commented on MYRIAD-195:
-

You may want to consider an upgrade to 5.1 with MapR.  MapR 5.1 has scripts 
that take into account Myriad for the nodemanager volumes (I believe).  I know 
I did manage to around issues on 5.0, but it was a challenge. 

> Node Managers randomly die on Mesos
> ---
>
> Key: MYRIAD-195
> URL: https://issues.apache.org/jira/browse/MYRIAD-195
> Project: Myriad
>  Issue Type: Bug
>  Components: Executor
>Affects Versions: Myriad 0.1.0
> Environment: Ubuntu 14.04; kernel 3.13.0-66-generic; MapR 5.0
>Reporter: Miguel Bernadin
>
> Hello, I have been noticing that the Node Managers randomly die on mesos. 
> Here are the attached two logs below. The first is the mesos logs, and the 
> second is the node manager createNMVolume log. Looking to see if anyone else 
> is experiencing this. 
> Mesos logs: 
> 16/03/25 13:51:10 INFO nodemanager.NodeManager: STARTUP_MSG: 
> / STARTUP_MSG: 
> Starting NodeManager STARTUP_MSG: host = nodemanager/10.1.194.71 STARTUP_MSG: 
> args = [] STARTUP_MSG: version = 2.7.0-mapr-1506 STARTUP_MSG: classpath = 
> /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/mapr-hbase-5.0.0-mapr.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/maprfs-diagnostic-tools-5.0.0-mapr.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jetty-6.1.26.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/maprfs-jni-5.0.0-mapr.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/hadoop-annotations-2.7.0-mapr-1506.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/mapr-hbase-5.0.0-mapr-tests.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/httpcore-4.2.5.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/mysql-connector-java-5.1.25-bin.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/curator-framework-2.7.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/hadoop-auth-2.7.0-mapr-1506.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/curator-client-2.7.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jsch-0.1.42.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-collections-3.2.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/junit-4.11.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/stax-api-1.0-2.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jettison-1.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-logging-1.1.3.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/maprfs-core-5.0.0-mapr-tests.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-httpclient-3.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/gson-2.2.4.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jersey-core-1.9.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jersey-server-1.9.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/avro-1.7.4.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/hamcrest-core-1.3.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-lang-2.6.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/paranamer-2.3.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jsr305-3.0.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/guava-13.0.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/libprotodefs-5.0.0-mapr.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/commo

Re: Challenges after MapR 5.1 Upgrade.

2016-04-04 Thread John Omernik

This was a Upgrade from 5.0.  I will post here, note: I have removed the
mapr_shuffle to get node managers to work, however, I am seeing other odd
things, so any help would be appreciated.






yarn.nodemanager.aux-services
mapreduce_shuffle,myriad_executor



yarn.resourcemanager.hostname
myriadprod.marathonprod.mesos


yarn.nodemanager.aux-services.mapreduce_shuffle.class
org.apache.hadoop.mapred.ShuffleHandler


yarn.nodemanager.aux-services.myriad_executor.class
org.apache.myriad.executor.MyriadExecutorAuxService


yarn.nm.liveness-monitor.expiry-interval-ms
2000


yarn.am.liveness-monitor.expiry-interval-ms
1


yarn.resourcemanager.nm.liveness-monitor.interval-ms
1000



yarn.nodemanager.resource.cpu-vcores
${nodemanager.resource.cpu-vcores}


yarn.nodemanager.resource.memory-mb
${nodemanager.resource.memory-mb}




yarn.scheduler.minimum-allocation-mb
512



yarn.scheduler.minimum-allocation-vcores
1




  

yarn.nodemanager.address
${myriad.yarn.nodemanager.address}


yarn.nodemanager.webapp.address
${myriad.yarn.nodemanager.webapp.address}


yarn.nodemanager.webapp.https.address
${myriad.yarn.nodemanager.webapp.address}


yarn.nodemanager.localizer.address
${myriad.yarn.nodemanager.localizer.address}




yarn.resourcemanager.scheduler.class
org.apache.myriad.scheduler.yarn.MyriadFairScheduler



   
yarn.scheduler.minimum-allocation-vcores
0


yarn.scheduler.minimum-allocation-vcores
0



Who will execute(launch) the containers.
yarn.nodemanager.container-executor.class
${yarn.nodemanager.container-executor.class}


The class which should help the LCE handle
resources.

yarn.nodemanager.linux-container-executor.resources-handler.class

${yarn.nodemanager.linux-container-executor.resources-handler.class}



yarn.nodemanager.linux-container-executor.cgroups.hierarchy

${yarn.nodemanager.linux-container-executor.cgroups.hierarchy}


yarn.nodemanager.linux-container-executor.cgroups.mount

${yarn.nodemanager.linux-container-executor.cgroups.mount}



yarn.nodemanager.linux-container-executor.cgroups.mount-path

${yarn.nodemanager.linux-container-executor.cgroups.mount-path}


yarn.nodemanager.linux-container-executor.group
${yarn.nodemanager.linux-container-executor.group}


yarn.nodemanager.linux-container-executor.path
${yarn.home}/bin/container-executor


yarn.http.policy
HTTP_ONLY



On Mon, Apr 4, 2016 at 3:53 PM, yuliya Feldman 
wrote:

> YarnDefaultProperties.java that defines class for mapr_direct_shuffle
> should be there even in 5.0, so nothing new there even if maprfs jar is
> outdated - could you also check that?
> Also could you paste content of your yarn-site.xml here?
> Thanks,Yuliya
>
>   From: yuliya Feldman 
>  To: "dev@myriad.incubator.apache.org" 
>  Sent: Monday, April 4, 2016 1:43 PM
>  Subject: Re: Challenges after MapR 5.1 Upgrade.
>
> Hello John,
> Did you upgrade to 5.1 or installed new one?
> Feels like MapR default properties were not loaded - I need to poke around
> and then I will ask you for additional info
> Thanks,Yuliya
>
>   From: John Omernik 
>  To: dev@myriad.incubator.apache.org
>  Sent: Monday, April 4, 2016 12:29 PM
>  Subject: Challenges after MapR 5.1 Upgrade.
>
> I had at one point Myriad working fine in MapR 5.0.  I updated to 5.1, and
> repackaged my hadoop tgz for remote distribution and now I have two
> problems occurring.
>
> 1. At first when I had the mapr direct shuffle enabled per the
> yarn-site.xml on the myriad documentaion, node managers would not start,
> and would fail with the error below.
>
> 2. Once I removed the mapr shuffle from the yarn-site, I got node managers
> started however, when I tried to launch a size 0, I got the other error
> below. Not sure what's happening here.
>
> Any thoughts would be appreciated. Like I said, this was working with 5.0,
> and now doesn't work in 5.1.
>
> Thanks!
>
> John
>
> SHuffle Error
>
> 16/04/04 13:46:34 INFO service.AbstractService: Service NodeManager failed
> in state INITED; cause: java.lang.RuntimeException: No class defined for
> mapr_direct_shuffle
> java.lang.RuntimeException: No class defined for mapr_direct_shuffle
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:139)
>

Re: Challenges after MapR 5.1 Upgrade.

2016-04-04 Thread John Omernik

So prior to the upgrade, I had this code block commented out, however, it
doesn't work without the comments either:

   

yarn.scheduler.minimum-allocation-vcores

0





yarn.scheduler.minimum-allocation-vcores

0



On Mon, Apr 4, 2016 at 3:04 PM, John Omernik  wrote:

> 
>
> yarn.resourcemanager.scheduler.class
>
> org.apache.myriad.scheduler.yarn.MyriadFairScheduler
> 
>
> 
>
> On Mon, Apr 4, 2016 at 2:52 PM, Darin Johnson 
> wrote:
>
>> Hey John, I think the MapR guys will have some answers for you on the
>> first
>> stack trace, sounds like a missing jar on you class path.  On the second
>> I'm interested in knowing what MyriadScheduler your attempting to run,
>> FAIR, CAPACITY, etc (Will be in your yarn-site.xml).
>>
>> Darin
>>
>> On Mon, Apr 4, 2016 at 3:29 PM, John Omernik  wrote:
>>
>> > I had at one point Myriad working fine in MapR 5.0.  I updated to 5.1,
>> and
>> > repackaged my hadoop tgz for remote distribution and now I have two
>> > problems occurring.
>> >
>> > 1. At first when I had the mapr direct shuffle enabled per the
>> > yarn-site.xml on the myriad documentaion, node managers would not start,
>> > and would fail with the error below.
>> >
>> > 2. Once I removed the mapr shuffle from the yarn-site, I got node
>> managers
>> > started however, when I tried to launch a size 0, I got the other error
>> > below. Not sure what's happening here.
>> >
>> > Any thoughts would be appreciated. Like I said, this was working with
>> 5.0,
>> > and now doesn't work in 5.1.
>> >
>> > Thanks!
>> >
>> > John
>> >
>> > SHuffle Error
>> >
>> > 16/04/04 13:46:34 INFO service.AbstractService: Service NodeManager
>> failed
>> > in state INITED; cause: java.lang.RuntimeException: No class defined for
>> > mapr_direct_shuffle
>> > java.lang.RuntimeException: No class defined for mapr_direct_shuffle
>> > at
>> >
>> >
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:139)
>> > at
>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>> > at
>> >
>> >
>> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>> > at
>> >
>> >
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:250)
>> > at
>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>> > at
>> >
>> >
>> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>> > at
>> >
>> >
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:256)
>> > at
>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>> > at
>> >
>> >
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)
>> > at
>> >
>> >
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
>> > 16/04/04 13:46:34 INFO impl.MetricsSystemImpl: Stopping NodeManager
>> metrics
>> > system...
>> >
>> >
>> > Zero Sized Node Manager Error:
>> >
>> > 16/04/04 14:22:49 INFO service.AbstractService: Service
>> > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl failed
>> in
>> > state STARTED; cause:
>> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
>> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved
>> SHUTDOWN
>> > signal from Resourcemanager ,Registration of NodeManager failed, Message
>> > from ResourceManager: NodeManager from  hadoopmapr4.brewingintel.com
>> > doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the
>> > NodeManager.
>> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
>> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved
>> SHUTDOWN
>> > signal from Resourcemanager ,Registration of NodeManager failed, Message
>> > from ResourceManager: NodeManager from  hadoopmapr4.brewingintel.com
>> > doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the
>> > NodeManager.
>> > at
>> >
>> >
>> org.apache.hadoop.yarn.serve

Re: Challenges after MapR 5.1 Upgrade.

2016-04-04 Thread John Omernik



yarn.resourcemanager.scheduler.class

org.apache.myriad.scheduler.yarn.MyriadFairScheduler



On Mon, Apr 4, 2016 at 2:52 PM, Darin Johnson 
wrote:

> Hey John, I think the MapR guys will have some answers for you on the first
> stack trace, sounds like a missing jar on you class path.  On the second
> I'm interested in knowing what MyriadScheduler your attempting to run,
> FAIR, CAPACITY, etc (Will be in your yarn-site.xml).
>
> Darin
>
> On Mon, Apr 4, 2016 at 3:29 PM, John Omernik  wrote:
>
> > I had at one point Myriad working fine in MapR 5.0.  I updated to 5.1,
> and
> > repackaged my hadoop tgz for remote distribution and now I have two
> > problems occurring.
> >
> > 1. At first when I had the mapr direct shuffle enabled per the
> > yarn-site.xml on the myriad documentaion, node managers would not start,
> > and would fail with the error below.
> >
> > 2. Once I removed the mapr shuffle from the yarn-site, I got node
> managers
> > started however, when I tried to launch a size 0, I got the other error
> > below. Not sure what's happening here.
> >
> > Any thoughts would be appreciated. Like I said, this was working with
> 5.0,
> > and now doesn't work in 5.1.
> >
> > Thanks!
> >
> > John
> >
> > SHuffle Error
> >
> > 16/04/04 13:46:34 INFO service.AbstractService: Service NodeManager
> failed
> > in state INITED; cause: java.lang.RuntimeException: No class defined for
> > mapr_direct_shuffle
> > java.lang.RuntimeException: No class defined for mapr_direct_shuffle
> > at
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:139)
> > at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > at
> >
> >
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> > at
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:250)
> > at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > at
> >
> >
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> > at
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:256)
> > at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > at
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)
> > at
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
> > 16/04/04 13:46:34 INFO impl.MetricsSystemImpl: Stopping NodeManager
> metrics
> > system...
> >
> >
> > Zero Sized Node Manager Error:
> >
> > 16/04/04 14:22:49 INFO service.AbstractService: Service
> > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl failed in
> > state STARTED; cause:
> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN
> > signal from Resourcemanager ,Registration of NodeManager failed, Message
> > from ResourceManager: NodeManager from  hadoopmapr4.brewingintel.com
> > doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the
> > NodeManager.
> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN
> > signal from Resourcemanager ,Registration of NodeManager failed, Message
> > from ResourceManager: NodeManager from  hadoopmapr4.brewingintel.com
> > doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the
> > NodeManager.
> > at
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:230)
> > at
> > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> > at
> >
> >
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> > at
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:267)
> > at
> > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> > at
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:477)
> > at
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
> > Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
> Recieved
> > SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed,
> > Message from ResourceManager: NodeManager from
> > hadoopmapr4.brewingintel.com
> > doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the
> > NodeManager.
> > at
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:298)
> > at
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:224)
> > ... 6 more
> >
>

Challenges after MapR 5.1 Upgrade.

2016-04-04 Thread John Omernik

I had at one point Myriad working fine in MapR 5.0.  I updated to 5.1, and
repackaged my hadoop tgz for remote distribution and now I have two
problems occurring.

1. At first when I had the mapr direct shuffle enabled per the
yarn-site.xml on the myriad documentaion, node managers would not start,
and would fail with the error below.

2. Once I removed the mapr shuffle from the yarn-site, I got node managers
started however, when I tried to launch a size 0, I got the other error
below. Not sure what's happening here.

Any thoughts would be appreciated. Like I said, this was working with 5.0,
and now doesn't work in 5.1.

Thanks!

John

SHuffle Error

16/04/04 13:46:34 INFO service.AbstractService: Service NodeManager failed
in state INITED; cause: java.lang.RuntimeException: No class defined for
mapr_direct_shuffle
java.lang.RuntimeException: No class defined for mapr_direct_shuffle
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:139)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:250)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:256)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
16/04/04 13:46:34 INFO impl.MetricsSystemImpl: Stopping NodeManager metrics
system...


Zero Sized Node Manager Error:

16/04/04 14:22:49 INFO service.AbstractService: Service
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl failed in
state STARTED; cause:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN
signal from Resourcemanager ,Registration of NodeManager failed, Message
from ResourceManager: NodeManager from  hadoopmapr4.brewingintel.com
doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the
NodeManager.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN
signal from Resourcemanager ,Registration of NodeManager failed, Message
from ResourceManager: NodeManager from  hadoopmapr4.brewingintel.com
doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the
NodeManager.
at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:230)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:267)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:477)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved
SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed,
Message from ResourceManager: NodeManager from  hadoopmapr4.brewingintel.com
doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the
NodeManager.
at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:298)
at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:224)
... 6 more

[jira] [Commented] (MYRIAD-182) Ability to ignore certificate warnings on config download from SSL secured RM

2016-03-14 Thread John Omernik (JIRA)


[ 
https://issues.apache.org/jira/browse/MYRIAD-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193187#comment-15193187
 ] 

John Omernik commented on MYRIAD-182:
-

I am not sure that's the case as I believe it's failing in the conf fetch, not 
tests.  I.e. When myriad tries to get the conf from the running Resource 
Manager, it can't connect due to an invalid certificate. 

> Ability to ignore certificate warnings on config download from SSL secured RM
> -
>
> Key: MYRIAD-182
> URL: https://issues.apache.org/jira/browse/MYRIAD-182
> Project: Myriad
>  Issue Type: Bug
>  Components: Executor
>Affects Versions: Myriad 0.1.0
>Reporter: John Omernik
>
> When SSL is enabled for the Resource Manager, and the executor tries download 
> the config from /conf, if the CA is not valid a warning is thrown the 
> download fails. There are many cases where SSL certificate may not be valid 
> (especially in test, but maybe in production) thus we need the ability to 
> specify that certificate warnings should be ignored. in that case.  The 
> warning received is: 
> Failed to fetch 'https://myriadprod.marathonprod.mesos:8090/conf': Error 
> downloading resource: Peer certificate cannot be authenticated with given CA 
> certificates



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Strata San Jose - Meetup?

2016-03-09 Thread John Omernik

Hey all,

I am attending Strata this year in San Jose, and was wondering as part of
community building if we should have a meetup, even if it's just to have a
few drinks and get to know folks on list.  I've not done one of these types
of conferences before, so I don't know what's normal, but thought, after
the dev sync up call, I'd toss this out there.


Any one else attending? Interest in meeting?

John

[jira] [Commented] (MYRIAD-184) RM Ports are Hardcoded in NMExecutorCLGenImpl.java

2015-12-29 Thread John Omernik (JIRA)


[ 
https://issues.apache.org/jira/browse/MYRIAD-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074202#comment-15074202
 ] 

John Omernik commented on MYRIAD-184:
-

Looking at the code referenced: 

  @Override
  public String getConfigurationUrl() {
YarnConfiguration conf = new YarnConfiguration();
String httpPolicy = conf.get(TaskFactory.YARN_HTTP_POLICY);
if (httpPolicy != null && 
httpPolicy.equals(TaskFactory.YARN_HTTP_POLICY_HTTPS_ONLY)) {
  String address = 
conf.get(TaskFactory.YARN_RESOURCEMANAGER_WEBAPP_HTTPS_ADDRESS);
  if (address == null || address.isEmpty()) {
address = conf.get(TaskFactory.YARN_RESOURCEMANAGER_HOSTNAME) + ":8090";
  }
  return "https://"; + address + "/conf";
} else {
  String address = 
conf.get(TaskFactory.YARN_RESOURCEMANAGER_WEBAPP_ADDRESS);
  if (address == null || address.isEmpty()) {
address = conf.get(TaskFactory.YARN_RESOURCEMANAGER_HOSTNAME) + ":8088";
  }
  return "http://"; + address + "/conf";
}
  }


I don't believe there should be any code in Myriad that uses preset/hardcoded 
ports for Yarn.  That needs to come from configs, not from the Myriad code.   

> RM Ports are Hardcoded in NMExecutorCLGenImpl.java
> --
>
> Key: MYRIAD-184
> URL: https://issues.apache.org/jira/browse/MYRIAD-184
> Project: Myriad
>  Issue Type: Bug
>  Components: Executor
>Affects Versions: Myriad 0.1.0
>Reporter: John Omernik
>  Labels: easyfix, newbie
>
> In NMExecutorCLGenImpl.java, the ports for Resource Manager are derived via 
> the http.policy config setting. Instead, the ports should be using a 
> different variable that actually corresponds to running port. The ports that 
> are hard coded are the default ports for the RM for HTTP and HTTPS (8088, and 
> 8090) but if a user changed the port, the config download would fail.   Thus 
> finding a better variable here would help make it so operators are not 
> limited to the default ports in their environments. 
> (Hard Coding in function public String getConfigurationUrl())
> https://github.com/apache/incubator-myriad/blob/df7d05c8639b371b94a1e94406e2f2446d10eaaf/myriad-scheduler/src/main/java/org/apache/myriad/scheduler/NMExecutorCLGenImpl.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MYRIAD-184) RM Ports are Hardcoded in NMExecutorCLGenImpl.java

2015-12-15 Thread John Omernik (JIRA)

John Omernik created MYRIAD-184:
---

 Summary: RM Ports are Hardcoded in NMExecutorCLGenImpl.java
 Key: MYRIAD-184
 URL: https://issues.apache.org/jira/browse/MYRIAD-184
 Project: Myriad
  Issue Type: Bug
  Components: Executor
Affects Versions: Myriad 0.1.0
Reporter: John Omernik


In NMExecutorCLGenImpl.java, the ports for Resource Manager are derived via the 
http.policy config setting. Instead, the ports should be using a different 
variable that actually corresponds to running port. The ports that are hard 
coded are the default ports for the RM for HTTP and HTTPS (8088, and 8090) but 
if a user changed the port, the config download would fail.   Thus finding a 
better variable here would help make it so operators are not limited to the 
default ports in their environments. 

(Hard Coding in function public String getConfigurationUrl())

https://github.com/apache/incubator-myriad/blob/df7d05c8639b371b94a1e94406e2f2446d10eaaf/myriad-scheduler/src/main/java/org/apache/myriad/scheduler/NMExecutorCLGenImpl.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MYRIAD-183) Allow config download with authentication on RM enabled

2015-12-15 Thread John Omernik (JIRA)

John Omernik created MYRIAD-183:
---

 Summary: Allow config download with authentication on RM enabled
 Key: MYRIAD-183
 URL: https://issues.apache.org/jira/browse/MYRIAD-183
 Project: Myriad
  Issue Type: Bug
  Components: Executor
Affects Versions: Myriad 0.1.0
Reporter: John Omernik


It would be nice to be able to enabled authentication on the Resource Manager 
page, and still allow node managers to download the config.  Thus we need a way 
to specify authentication credentials when the node managers start.  This would 
be helpful for security enabled clusters, and especially useful for when we 
achieve true multi tenancy in that we can ensure clusters are not confused 
accidentally with other clusters. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MYRIAD-182) Ability to ignore certificate warnings on config download from SSL secured RM

2015-12-15 Thread John Omernik (JIRA)

John Omernik created MYRIAD-182:
---

 Summary: Ability to ignore certificate warnings on config download 
from SSL secured RM
 Key: MYRIAD-182
 URL: https://issues.apache.org/jira/browse/MYRIAD-182
 Project: Myriad
  Issue Type: Bug
  Components: Executor
Affects Versions: Myriad 0.1.0
Reporter: John Omernik


When SSL is enabled for the Resource Manager, and the executor tries download 
the config from /conf, if the CA is not valid a warning is thrown the download 
fails. There are many cases where SSL certificate may not be valid (especially 
in test, but maybe in production) thus we need the ability to specify that 
certificate warnings should be ignored. in that case.  The warning received is: 

Failed to fetch 'https://myriadprod.marathonprod.mesos:8090/conf': Error 
downloading resource: Peer certificate cannot be authenticated with given CA 
certificates





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: A Myriad Story

2015-12-02 Thread John Omernik

So I am still confused by the FGS/mininum-allocation-vcores.

>> This also implicitly means that YARN cannot allocate a container unless
at least "min allocation vcores" are available on a NM. If a NM has less
than
"miin allocation vcores", that NM will not get any containers allocated to
it, unless existing
containers finish (in case of plain YARN and with Myriad CGS) or
unless Mesos offers more resources (in case of FGS).

I actually found that I had the setting at 1, and I had 1 CGS NM running,
and 4 FGS (zero profile) NMs running, and when it allocated containers, the
zero profiles still allocated containers even though the setting was 1 and
the NM had < 1 available vcores.   So I am still trying to figure out why
this worked in this use case so I am clear on the setting.

John



On Mon, Nov 30, 2015 at 2:16 PM, Santosh Marella 
wrote:

> Thanks for trying out these experiments.
>
> >I thought this would have
> >broken FGS, but apparently it didn't (I started my nodes with min
> >allocation CPU = 1 and FGS still worked for me... not sure about that,
> >would love feedback there)
>
> I presume you are referring to "yarn.scheduler.minimum-allocation-vcores".
>
> The behavior of this variable is that when an app requests for a container
> with
> less than the "min allocation vcores", then RM ends up allocating a
> container
> with "min allocation vcores". YARN's default value for this is 1 - i.e. if
> an app wants
> a container with < 1 cpu vcores, YARN allocates a container with 1 CPU
> vcore.
>
> This also implicitly means that YARN cannot allocate a container unless
> at least "min allocation vcores" are available on a NM. If a NM has less
> than
> "miin allocation vcores", that NM will not get any containers allocated to
> it, unless existing
> containers finish (in case of plain YARN and with Myriad CGS) or
> unless Mesos offers more resources (in case of FGS).
>
> In that sense, FGS/CGS do not interfere with "min allocation vcores".
> Rather
> FGS and CGS just influence the NM capacities.
>
> Hope this helps.
>
> Thanks,
> Santosh
>
> On Tue, Nov 24, 2015 at 9:25 AM, John Omernik  wrote:
>
> > Since a vast majority of my posts are me struggling with something or
> > breaking something, I thought I'd take the time to craft a story of
> Myriad
> > success.
> >
> > Over the weekend, I took the time to run Elastic Search on Yarn using the
> > es-yarn package that elastic search has. This is a beta package, and
> > struggles with some components such as "Storage" for the data.
> >
> > With MapR I've been able to create a place and some scripts to manage the
> > data issue. This combined with ES2's dynamic node allocation, and the
> > Myriad's fine grain scaling made it so I have a powerful way to
> elasticize
> > elastic search!
> >
> > Basically, I did this through some simple steps.  The first was to take
> the
> > "include" file (esinc.sh) from the distribution and add some items to it
> > and tell the es-yarn framework to use this instead of the included file.
> > This allowed me to set some parameters at start time with environmental
> > variables.
> >
> > Simple steps.
> >
> > Download the es-yarn packages and the elaticsearch zip.
> >
> > (optional: I added https://github.com/royrusso/elasticsearch-HQ as a
> > plugin, basically I unzipped the ES2 zip, ran the plugin script, and the
> > rezipped the package)
> >
> > In a location in MapR I copied the esinc.sh to a location out of the zip
> > (in the root of say my working directory,
> > /mapr/mycluster/mesos/dev/es-yarn/). (see below) Then I created a script
> > that was how I started and scaled up the clusters. (see below). I do have
> > some notes on how it works in the script.
> >
> > For basics this was awesome. I didn't use FGS at first, because of a bug
> in
> > the es-yarn
> > https://github.com/elastic/elasticsearch-hadoop/issues/598 if the min
> > allocation cpu is 0 there is a divide by 0. I thought this would have
> > broken FGS, but apparently it didn't (I started my nodes with min
> > allocation CPU = 1 and FGS still worked for me... not sure about that,
> > would love feedback there)
> >
> > When I "start" a cluster, I initialize it and eventually my cluster is
> > running with the specified es nodes. If I want to scale up, I run the
> > script again with the same cluster name. Those nodes are added to the ES
> > cluster, and now I am running two yarn applications. I can only

[jira] [Created] (MYRIAD-179) Support Revocable resources in Mesos

2015-12-02 Thread John Omernik (JIRA)

John Omernik created MYRIAD-179:
---

 Summary: Support Revocable resources in Mesos
 Key: MYRIAD-179
 URL: https://issues.apache.org/jira/browse/MYRIAD-179
 Project: Myriad
  Issue Type: Improvement
  Components: Scheduler
Affects Versions: Myriad 0.1.0
Reporter: John Omernik


Mesos has introduced revocable resources.   Based on my reading of things, 
Myriad would be an awesome use case for over subscription, especially when you 
combine it with the Fine Grain Scaling (FGS).  

Based on what I've read on oversubscription, if Myriad was aware of 
oversubscription, we could have Myriad be smart about various Yarn containers. 
Have some jobs that may be production jobs, be tagged in such a way that they 
could run on non-revocable resources, but we could  have other yarn jobs with 
certain users/flags, especially in FGS mode, be submitted using the revocable 
resources. This would be exceptionally powerful for big map reduce jobs etc. 

These are the jobs that would be adhoc in nature, and in addition to not using 
resources when no jobs are running, the node managers, when they did run 
certain jobs would run on the revocable resources so they could be killed if 
needed.  

I am speaking now not from a Dev perspective, so this may be a lot harder than 
it seems, I am just trying to outline use cases. 

Another use case (I think both are very valid and worth pursuing)  would be 
once we have the the multi-tenancy built in, have a whole myriad framework 
dedicated to adhoc type jobs, and have another myriad framework dedicated to 
production jobs.  These adhoc jobs could be setup in such a way that  all 
submissions would be run with revocable resources. Thus being appropriate for 
dev work, or other non production type jobs.  Obviously this hinges on being 
able to run two Myriad clusters on the same Mesos cluster. 

The other thing, is a whole frame was set to be revocable resources, we'd have 
to ensure the resource manager was running on non-revocable resources... while 
containers for Yarn jobs can be killed, we don't want the whole framework to 
die.  

I see use cases for both, this just seems to add another layer of awesome 
flexibility as it pertains to jobs on the cluster. 

I'd be interested in flushing this idea out more with the dev team. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Oversubscription in Mesos

2015-12-02 Thread John Omernik

Hey all, just curious if there has been any discussion around supporting
oversubscription in Myriad.  Based on my reading of things, Myriad would be
an awesome use case for over subscription, especially when you combine it
with the FGS.  Based on what I've read on oversubscription, if Myriad was
aware of oversubscription, we could have Myriad be smart about various Yarn
containers, and have some jobs that may be production jobs, they could run
on non-revocable resources, but could we have yarn jobs with certain
users/flags, especially in FGS mode be submitted using the revocable
resources?   These are the jobs that would be adhoc in nature, and in
addition to not using resources when no jobs are running, the node
managers, when they did run certain jobs would run on the revocable
resources.

I am speaking now not from a Dev perspective, so this may be a lot harder
than it seems.

Another approach would be once we have the the multi-tenancy built in, have
a whole myriad framework dedicate to adhoc type jobs, and have another
myriad framework dedicated to production jobs.

I see use cases for both, this just seems to add another layer of awesome
flexibility as it pertains to jobs on the cluster.

I'd be interested in the group's thoughts here.

John

A Myriad Story

2015-11-24 Thread John Omernik

Since a vast majority of my posts are me struggling with something or
breaking something, I thought I'd take the time to craft a story of Myriad
success.

Over the weekend, I took the time to run Elastic Search on Yarn using the
es-yarn package that elastic search has. This is a beta package, and
struggles with some components such as "Storage" for the data.

With MapR I've been able to create a place and some scripts to manage the
data issue. This combined with ES2's dynamic node allocation, and the
Myriad's fine grain scaling made it so I have a powerful way to elasticize
elastic search!

Basically, I did this through some simple steps.  The first was to take the
"include" file (esinc.sh) from the distribution and add some items to it
and tell the es-yarn framework to use this instead of the included file.
This allowed me to set some parameters at start time with environmental
variables.

Simple steps.

Download the es-yarn packages and the elaticsearch zip.

(optional: I added https://github.com/royrusso/elasticsearch-HQ as a
plugin, basically I unzipped the ES2 zip, ran the plugin script, and the
rezipped the package)

In a location in MapR I copied the esinc.sh to a location out of the zip
(in the root of say my working directory,
/mapr/mycluster/mesos/dev/es-yarn/). (see below) Then I created a script
that was how I started and scaled up the clusters. (see below). I do have
some notes on how it works in the script.

For basics this was awesome. I didn't use FGS at first, because of a bug in
the es-yarn
https://github.com/elastic/elasticsearch-hadoop/issues/598 if the min
allocation cpu is 0 there is a divide by 0. I thought this would have
broken FGS, but apparently it didn't (I started my nodes with min
allocation CPU = 1 and FGS still worked for me... not sure about that,
would love feedback there)

When I "start" a cluster, I initialize it and eventually my cluster is
running with the specified es nodes. If I want to scale up, I run the
script again with the same cluster name. Those nodes are added to the ES
cluster, and now I am running two yarn applications. I can only scale down
by applications. So, if I want to scale down, then I have to kill an
application, if I started 3 ES nodes with that application, I'll scale down
by 3 nodes. Thus, there is an argument to always scale by one ES node,
especially if you are using larger nodes (I wonder what sort of Application
manager overhead I'd get on that).

Either way, this worked really well.

The cool thing was with FGS though. I had one node running in a "small"
config and 4 running with zero (even though I set the min allocation size
to 1, this still started and seemed to work).  When I submitted the request
for ES nodes, they got put into mesos tasks for each container and it
worked great.  When I scaled the application down it too worked great.
This provided me huge flexibility in scaling up and down without reserving
resources for elastic search clusters.  Kudos to Myriad!


My only comment to the Myriad crew would be a wiki article explaining FGS a
little bit. I just "did it" and it worked, but a little bit more on how it
worked, the challenges, gotchas etc.  Would be outstanding.

Thanks to everyone's hard work on Myriad, this project has lots of power
that it can give Admins/users, and I just wanted to share a win here after
all of my "how do I ... " posts

John




startcluster.sh

#!/bin/bash





#Perhaps these should be parameters or at least check to see if they are
set in the ENV, if not use these?



ES_JAR="elasticsearch-yarn-2.2.0-beta1.jar"  # Jarfile of es-yarn to use

ES_NFSMOUNT="/mapr/mycluster" # Root of NFS Mount in MapR

ES_BASELOC="/mesos/dev/es-yarn"   # Location of this script, the
esinc.sh, and the basis for all things es-yarn

ES_DATALOC="/data"   # This is the data location,
which is $ES_BASELOC$ES_DATALOC  it creates directories under that for each
clustername, then each node

ES_PROVISION_LOC="/tmp/esprovision/" # Where the jar file for
es-yarn, and the elastic search zip file is



 # These are your node names. It needs these to do it's unicast discovery.
This may need to be updated (perhaps I can curl the Mesos Master to get the
node list)

ES_UNICAST_HOSTS="node1.mydomain.com,node2.mydomain.com,node3.mydomain.com"




# The elastic search version you are running (the es-yarn jar uses this to
pick the right zip file, make sure you have not changed the name of the ES
Zip file)

ES_VER="2.0.0"




# Cluster settings.  Name and Port


ES_CLUSTERNAME="MYESCLUSTER"

ES_TRANSPORT_PORT="59300-59400"

ES_HTTP_PORT="59200-59300"



# For this run, the number of nodes to add in "this" application. (Each
submission is a yarn application) and the node size

NUM_NODES="3"

NODEMEM="2048"

NODECPU="2"




# Don't change anything else here:



ES_INCLUDE="${ES_NFSMOUNT}${ES_BASELOC}/esinc.sh"

ES_YARN_JAR="${ES_NFSMOUNT}${ES_BASELOC}/${ES_JAR}"





ES_ENV="env.ES_CLUSTERNAME=$ES_CLUSTE

Re: UI nit pik

2015-11-24 Thread John Omernik

It's Windows 7 Enterprise running Firefox ESR 38.4.0.   I am running on a
large screen (1920x1080)



On Tue, Nov 24, 2015 at 9:54 AM, Jim Klucar  wrote:

> John,
>
> What OS are you on and what version of Firefox? I don't doubt something is
> weird, that's the nature of CSS.
>
> I have a ticket out there to revamp the UI, so this may be OBE soonish.
>
> Jim
>
> On Tue, Nov 24, 2015 at 10:41 AM, John Omernik  wrote:
>
> > I am looking at the UI, and wanted to confirm others are seeing this too
> > before I JIRAed it. But in Firefox, on the "Tasks" page, the header bar
> at
> > the top (with the Local, menu items and config string) end halfway
> through
> > the "Active Tasks" header.  It's fine.. i.e it's not a concern from a
> > usability standpoint, however, it looks funny.  Is anyone else seeing
> this
> > or is it just me?
> >
> > John
> >
>

UI nit pik

2015-11-24 Thread John Omernik

I am looking at the UI, and wanted to confirm others are seeing this too
before I JIRAed it. But in Firefox, on the "Tasks" page, the header bar at
the top (with the Local, menu items and config string) end halfway through
the "Active Tasks" header.  It's fine.. i.e it's not a concern from a
usability standpoint, however, it looks funny.  Is anyone else seeing this
or is it just me?

John

Re: Nodemanager Startup Timeout

2015-11-19 Thread John Omernik

Thank you, that fixed the issue, it was exactly that.

On Thu, Nov 19, 2015 at 11:29 AM, Santosh Marella 
wrote:

> There is an executor timeout setting for mesos slave. Default is 1 min
> IIRC. Please increase that to, say, 5 minutes.
>
> MapR will fix the local volume creation script to be faster.
>
> --
> Sent from mobile
> On Nov 19, 2015 9:13 AM, "John Omernik"  wrote:
>
> > Is there a setting for Nodemanager startup timeout? Some of my nodes are
> on
> > really rough boxes, and the time for the MapR local volume script to
> start
> > is taking too long (I think) and the task is failing without an exit code
> > or anything. I am "thinking" it's a timeout thing, because neither the
> mapr
> > volume script, nor yarn is returning any sort of error codes.
> >
>

Nodemanager Startup Timeout

2015-11-19 Thread John Omernik

Is there a setting for Nodemanager startup timeout? Some of my nodes are on
really rough boxes, and the time for the MapR local volume script to start
is taking too long (I think) and the task is failing without an exit code
or anything. I am "thinking" it's a timeout thing, because neither the mapr
volume script, nor yarn is returning any sort of error codes.

Re: Struggling with Permissions

2015-11-19 Thread John Omernik

Yes, but only in the yarn-site.xml, I have not set cgroups: true in the
myriad config, so I am not sure the relationship there...



On Thu, Nov 19, 2015 at 10:33 AM, yuliya Feldman <
yufeld...@yahoo.com.invalid> wrote:

> You mean that if you enable cgroups in yarn /tmp as a slave temp dir works
> just fine?
>   From: John Omernik 
>  To: dev@myriad.incubator.apache.org; yuliya Feldman 
>  Sent: Thursday, November 19, 2015 8:02 AM
>  Subject: Re: Struggling with Permissions
>
> Ok all I now have a real answer and a solution to the needing to change
> /tmp locations in Mesos Slaves.
>
> Basically, it wasn't a difference between Redhat and Ubuntu, it was a
> mistake I made in my yarn-site.xml for the resource manager.  Basically I
> copied the yarn-site from the wiki but on one I had uncommented the cgroups
> section. My hypothesis: when running with Cgroups information, Yarn doesn't
> see the /tmp and it's permissions, therefore creating the tgz with the
> script I posted earlier works.  The permissions are what Yarn requires.
> Once I set that on my Redhat cluster, all was well.  I realized that in
> comparing configs.  That addresses most of my concerns, and I recommend
> people use that where possible to avoid the Mesos Slave changes.  We also
> may want to do some notes in the wiki around this, but I found it through
> hacking, and I'd welcome more "from a point of view of understanding"
> comments, I just made it work :)
>
> I'll send another email shortly on my last issue.
>
> On Wed, Nov 18, 2015 at 3:16 PM, John Omernik  wrote:
>
> > Baffling... so as I said, I fixed the mapr.host that works well (as long
> > as I have one NM per physical node) and I got my Ubuntu based Mesos
> cluster
> > to work with Myriad.  At this point I am baffled on why Myriad/Yarn would
> > complain about permissions on Redhat but not Ubuntu, below I've pulled
> > files in /, /hadoop-2.7.0, /hadoop-2.7.0/bin and down the etc tree so you
> > can see how they are setup the same. I am running as the same users in
> > marathon etc. Same myriad configs... but I am getting the permissions
> issue
> > for Redhat, and Ubuntu is working fine (and more than just getting the
> NMs
> > running, I flexed up 6 NMs and was running hive queries with no issues!)
> >
> > The only differences I can see is that the java versions are different
> > (1.8.0_45 for Ubuntu and 1.8.0_65 for Redhat) and the apparently the
> builds
> > of hadoop-2.7.0 are different from MapR. (They are different in size as
> > well, but that may be the difference between a RPM Redhat install and a
> > Debian Ubuntu install, I guess I didn't expect different hadoop builds
> > between platforms though...)
> >
> > If you can see anything else, please let me know, I'd love to have this
> > working.
> >
> > John
> >
> >
> > Here are the dumps:  Ubuntu... this is working:
> >
> > STARTUP_MSG:  build = g...@github.com:mapr/private-hadoop-common.git -r
> > fc95119f587541fb3a9af0dbeeed23c974178115; compiled by 'root' on
> > 2015-08-19T20:02Z
> > STARTUP_MSG:  java = 1.8.0_45-internal
> >
> > No Error
> > Ubuntu
> >
> > marathon user: mapr
> > myriadframeuser: mapr
> > myriadsuperuser: mapr
> >
> > /
> > drwxr-xr-x 10 mapr mapr 4 KB Sep 10 14:40 hadoop-2.7.0
> > -rw-r--r-- 1 mapr mapr 77 KB Nov 18 12:48 conf
> > -rw-r--r-- 1 mapr mapr 282 MB Nov 18 12:48 hadoop-2.7.0.tgz
> > -rw-r--r-- 1 mapr mapr 156 KB Nov 18 14:04 stderr
> > -rw-r--r-- 1 mapr mapr 743 B Nov 18 12:48 stdout
> >
> > /hadoop-2.7.0/
> >
> > drwxrwxrwx 3 mapr root 4 KB Nov 18 12:48 logs
> > drwxr-xr-x 2 mapr root 4 KB Sep 10 14:40 bin
> > drwxr-xr-x 3 mapr root 4 KB Sep 10 14:40 etc
> > drwxr-xr-x 2 mapr root 4 KB Sep 10 14:40 include
> > drwxr-xr-x 3 mapr root 4 KB Sep 10 14:40 lib
> > drwxr-xr-x 2 mapr root 4 KB Sep 10 14:40 libexec
> > drwxr-xr-x 2 mapr root 4 KB Sep 10 14:40 sbin
> > drwxr-xr-x 4 mapr root 4 KB Sep 10 14:40 share
> > -rw-r--r-- 1 mapr root 15 KB Jul 09 04:36 LICENSE.txt
> > -rw-r--r-- 1 mapr root 101 B Jul 09 04:36 NOTICE.txt
> > -rw-r--r-- 1 mapr root 1 KB Jul 09 04:36
> >
> > /hadoop-2.7.0/bin
> >
> > -rwxr-xr-x 1 mapr root 9 KB Jul 09 04:36 hadoop
> > -rwxr-xr-x 1 mapr root 10 KB Jul 09 04:36 hadoop.cmd
> > -rwxr-xr-x 1 mapr root 12 KB Jul 09 04:36 hdfs
> > -rwxr-xr-x 1 mapr root 7 KB Jul 09 04:36 hdfs.cmd
> > -rwxr-xr-x 1 mapr root 7 KB Jul 09 04:36 mapred
> > -rwxr-xr-x 1 mapr root 6 KB Jul 09 04:36 mapred.cmd
> > -rwxr-xr-x 1 map

Last weird issue

2015-11-19 Thread John Omernik

So my last weird issue may be a MapR specific issue, but I wanted to lay it
out here because it's odd. If you recall, I was talking about the mapr.host
not being correct and that was causing some "issues" in that when the
nodemanager tried to run, it would use the wrong hostname for the locality
of the shuffle volumes.  I addressed that specific issue with setting the
mapr.host to be the hostname using

yarnEnvironment:

  YARN_HOME: hadoop-2.7.0

  YARN_NODEMANAGER_OPTS: "-Dnodemanager.resource.io-spindles=4.0
-Dmapr.host=$(hostname -f)"


in the Myriad config. This runs the hostname -f command and sets the
mapr.host to be correct at run time.

The weird thing is was that while the resourcemanager.hostname is correct
from my yarn-site.xml


yarn.resourcemanager.hostname myriad.marathon.mesos - yarn-site.xml


There were a number of settings that were using the hostname of my box (and
I found it came from the mapr.host through some testing, when I saw it it
was using hostname -f, I ran it once with just hostname, and no FQDN in
this setup, therefore these items are using mapr.host)  that are related to
the resource manager.

Looking below, you can see that resourcemanager.address,
resourcemanager.scheduler.address, resourcemanager.admin.address,
resourcemanager.resource-tracker.address all seem to be being filled AFTER
I set the the mapr.host,


Yet, if you look at
https://hadoop.apache.org/docs/r2.7.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
it states that it should be using
yarn.resourcemanager.admin.address${yarn.resourcemanager.hostname}:8033

yarn.resourcemanager.hostname (I believe all of these settings are like
this) for the settings, yet it's obviously using mapr.host instead.  What
components would be setting this? Overwriting the default?  mapr.host is
weird in that it's mapr only, but I thought I addressed that, and you can
see that the conf shows yarn.resourcemanager.hostname to be correct, thus I
am at loss here.

In addition, the other strange thing is my node managers are working, I am
not sure where these settings may hurt me, but I don't like that that they
are obviously RM settings using the NM hostname, and while I don't see
errors now, I am sure there will be errors at some point.

Any thoughts on this would be welcome.

Settings: (Note: UB stands for Ubuntu, it's which of my boxes this setting
came from)

The format is


name

UB: value - source


yarn.resourcemanager.address

UB: hadoopmapr5:8032 - programatically

yarn.resourcemanager.hostname

UB: myriad.marathon.mesos - yarn-site.xml

yarn.resourcemanager.scheduler.address

UB: hadoopmapr5:8030 - programatically

mapr.host

UB: hadoopmapr5.brewingintel.com -

yarn.resourcemanager.admin.address

UB: hadoopmapr5:8033 - programatically

yarn.resourcemanager.resource-tracker.address

UB: hadoopmapr5:8031 - programatically

Re: Struggling with Permissions

2015-11-19 Thread John Omernik

Ok all I now have a real answer and a solution to the needing to change
/tmp locations in Mesos Slaves.

Basically, it wasn't a difference between Redhat and Ubuntu, it was a
mistake I made in my yarn-site.xml for the resource manager.   Basically I
copied the yarn-site from the wiki but on one I had uncommented the cgroups
section. My hypothesis: when running with Cgroups information, Yarn doesn't
see the /tmp and it's permissions, therefore creating the tgz with the
script I posted earlier works.  The permissions are what Yarn requires.
Once I set that on my Redhat cluster, all was well.  I realized that in
comparing configs.  That addresses most of my concerns, and I recommend
people use that where possible to avoid the Mesos Slave changes.  We also
may want to do some notes in the wiki around this, but I found it through
hacking, and I'd welcome more "from a point of view of understanding"
comments, I just made it work :)

I'll send another email shortly on my last issue.

On Wed, Nov 18, 2015 at 3:16 PM, John Omernik  wrote:

> Baffling... so as I said, I fixed the mapr.host that works well (as long
> as I have one NM per physical node) and I got my Ubuntu based Mesos cluster
> to work with Myriad.  At this point I am baffled on why Myriad/Yarn would
> complain about permissions on Redhat but not Ubuntu, below I've pulled
> files in /, /hadoop-2.7.0, /hadoop-2.7.0/bin and down the etc tree so you
> can see how they are setup the same. I am running as the same users in
> marathon etc. Same myriad configs... but I am getting the permissions issue
> for Redhat, and Ubuntu is working fine (and more than just getting the NMs
> running, I flexed up 6 NMs and was running hive queries with no issues!)
>
> The only differences I can see is that the java versions are different
> (1.8.0_45 for Ubuntu and 1.8.0_65 for Redhat) and the apparently the builds
> of hadoop-2.7.0 are different from MapR. (They are different in size as
> well, but that may be the difference between a RPM Redhat install and a
> Debian Ubuntu install, I guess I didn't expect different hadoop builds
> between platforms though...)
>
> If you can see anything else, please let me know, I'd love to have this
> working.
>
> John
>
>
> Here are the dumps:  Ubuntu... this is working:
>
> STARTUP_MSG:   build = g...@github.com:mapr/private-hadoop-common.git -r
> fc95119f587541fb3a9af0dbeeed23c974178115; compiled by 'root' on
> 2015-08-19T20:02Z
> STARTUP_MSG:   java = 1.8.0_45-internal
>
> No Error
> Ubuntu
>
> marathon user: mapr
> myriadframeuser: mapr
> myriadsuperuser: mapr
>
> /
> drwxr-xr-x 10 mapr mapr 4 KB Sep 10 14:40 hadoop-2.7.0
> -rw-r--r-- 1 mapr mapr 77 KB Nov 18 12:48 conf
> -rw-r--r-- 1 mapr mapr 282 MB Nov 18 12:48 hadoop-2.7.0.tgz
> -rw-r--r-- 1 mapr mapr 156 KB Nov 18 14:04 stderr
> -rw-r--r-- 1 mapr mapr 743 B Nov 18 12:48 stdout
>
> /hadoop-2.7.0/
>
> drwxrwxrwx 3 mapr root 4 KB Nov 18 12:48 logs
> drwxr-xr-x 2 mapr root 4 KB Sep 10 14:40 bin
> drwxr-xr-x 3 mapr root 4 KB Sep 10 14:40 etc
> drwxr-xr-x 2 mapr root 4 KB Sep 10 14:40 include
> drwxr-xr-x 3 mapr root 4 KB Sep 10 14:40 lib
> drwxr-xr-x 2 mapr root 4 KB Sep 10 14:40 libexec
> drwxr-xr-x 2 mapr root 4 KB Sep 10 14:40 sbin
> drwxr-xr-x 4 mapr root 4 KB Sep 10 14:40 share
> -rw-r--r-- 1 mapr root 15 KB Jul 09 04:36 LICENSE.txt
> -rw-r--r-- 1 mapr root 101 B Jul 09 04:36 NOTICE.txt
> -rw-r--r-- 1 mapr root 1 KB Jul 09 04:36
>
> /hadoop-2.7.0/bin
>
> -rwxr-xr-x 1 mapr root 9 KB Jul 09 04:36 hadoop
> -rwxr-xr-x 1 mapr root 10 KB Jul 09 04:36 hadoop.cmd
> -rwxr-xr-x 1 mapr root 12 KB Jul 09 04:36 hdfs
> -rwxr-xr-x 1 mapr root 7 KB Jul 09 04:36 hdfs.cmd
> -rwxr-xr-x 1 mapr root 7 KB Jul 09 04:36 mapred
> -rwxr-xr-x 1 mapr root 6 KB Jul 09 04:36 mapred.cmd
> -rwxr-xr-x 1 mapr root 2 KB Jul 09 04:36 rcc
> -rwxr-xr-x 1 mapr root 172 KB Jul 09 04:36 test-container-executor
> -rwxr-xr-x 1 mapr root 14 KB Jul 09 04:36 yarn
> -rwxr-xr-x 1 mapr root 11 KB Jul 09 04:36 yarn.cmd
> r-x--- 1 root mapr 140 KB Jul 09 04:36 container-executor
>
> /hadoop-2.7.0/etc
>
> drwxr-xr-x 2 mapr root 4 KB Nov 18 12:48 hadoop
>
> /hadoop-2.7.0/etc/hadoop/
>
> rw-r--r-- 1 mapr root 4 KB Jul 09 04:36 capacity-scheduler.xml
> -rw-r--r-- 1 mapr root 1 KB Jul 09 04:36 configuration.xsl
> -rw-r--r-- 1 root root 168 B Oct 06 08:37 container-executor.cfg
> -rw-r--r-- 1 mapr root 775 B Oct 06 08:37 core-site.xml
> -rw-r--r-- 1 mapr root 631 B Jul 09 04:36 fair-scheduler.xml
> -rw-r--r-- 1 mapr root 4 KB Jul 09 04:36 hadoop-env.cmd
> -rw-r--r-- 1 mapr root 4 KB Jul 09 04:36 hadoop-env.sh
> -rw-r--r-- 1 mapr root 2 KB Jul 09 04:36 hadoop-metrics.properties
> -rw-r--r-- 1 mapr root 3 KB Jul

Re: Struggling with Permissions

2015-11-18 Thread John Omernik

mapr root   620 B Jul 09
05:38 httpfs-site.xml
-rw-r--r-- 1  mapr root   3 KB   Jul 09
05:38 kms-acls.xml
-rw-r--r-- 1  mapr root   1 KB   Jul 09
05:38 kms-env.sh
-rw-r--r-- 1  mapr root   2 KB   Jul 09
05:38 kms-log4j.properties
-rw-r--r-- 1  mapr root   5 KB   Jul 09
05:38 kms-site.xml
-rw-r--r-- 1  mapr root   11 KB Jul 09
05:38 log4j.properties
-rw-r--r-- 1  mapr root   931 B Jul 09
05:38 mapred-env.cmd
-rw-r--r-- 1  mapr root   1 KB   Jul 09
05:38 mapred-env.sh
-rw-r--r-- 1  mapr root   4 KB   Jul 09
05:38 mapred-queues.xml.template
-rw-r--r-- 1  mapr root   1 KB   Nov 02
18:22  mapred-site.xml
-rw-r--r-- 1  mapr root   1 KB   Jul 09
05:38 mapred-site.xml.template
-rw-r--r-- 1  mapr root   3 KB   Nov 18
15:57  myriad-config-default.yml
-rw-r--r-- 1  mapr root   10 BJul
09 05:38 slaves
-rw-r--r-- 1  mapr root   2 KB   Nov 02
18:22  ssl-client.xml
-rw-r--r-- 1  mapr root   2 KB   Jul 09
05:38 ssl-client.xml.example
-rw-r--r-- 1  mapr root   2 KB   Nov 02
18:22  ssl-server.xml
-rw-r--r-- 1  mapr root   2 KB   Jul 09
05:38 ssl-server.xml.example
-rw-r--r-- 1  mapr root   2 KB   Jul 09
05:38 yarn-env.cmd
-rw-r--r-- 1  mapr root   5 KB   Jul 09
05:38 yarn-env.sh
-rw-r--r-- 1  mapr root   2 KB   Oct 01
16:56yarn-site-2015-10-01.20-56.xml
-rw-r--r-- 1  mapr root   2 KB   Oct 08
06:32yarn-site-2015-10-08.10-32.xml
-rw-r--r-- 1  mapr root   2 KB   Oct 16
15:33yarn-site-2015-10-16.19-33.xml
-rw-r--r-- 1  mapr root   2 KB   Nov 02
18:22  yarn-site-2015-11-02.23-22.xml
-rw-r--r-- 1  mapr mapr 76 KB Nov 18
16:01  yarn-site.xml
-rw-r--r-- 1  mapr root   4 KB   Jul 09
05:38 yarn-site.xml.template

On Wed, Nov 18, 2015 at 12:41 PM, John Omernik  wrote:

> So there are two issues currently I am looking into. The first is the
> permissions of directories.  I'd still like to get the feelings from the
> group on that because I've not managed to get Myriad/Yarn working on one
> cluster (based on Ubuntu 14.04) but can't get it to work on another cluster
> based on Red Hat 7.  It's strange from what I can tell everything is the
> same, but the Redhat cluster complains about the the permissions
> /etc/hadoop not being owned by root (it's owned by mapr:root but on the
> ubuntu cluster that works fine with the same ownership!)  I do notice that
> my build times reported by mapr are different.. but that may just be the
> build for Redhat vs the build for Ubuntu?  Still digging into that one...
>
> As to the hostname / mapr.host issue. I found a neat hack that may work
> for folks
>
> By setting this in my myriad config:
>
> yarnEnvironment:
>
>   YARN_HOME: hadoop-2.7.0
>
>   YARN_NODEMANAGER_OPTS: "-Dnodemanager.resource.io-spindles=4.0
> -Dmapr.host=$(hostname -f)"
>
>
> I am able to get the mapr.host set back to be the correct hostname where
> the nodemanager is running, this helps with a number of issues.  I thought
> about this, and realized it would be better if I could get the hostname to
> the createTTVolume script but use a unique name for the mount point (What
> if I have multiple NMs on a single physical node?)
>
> So, I tried:
>
> YARN_NODEMANAGER_OPTS: "-Dnodemanager.resource.io-spindles=4.0
> -Dmapr.host=$(basename `pwd`)"
>
>
> Thinking that if I could get the directory name of the "run" in my
> sandbox, I should be reasonably assured of uniqueness.  That seemed to work
> when the nodemanager kicked off the the command:
>
> /opt/mapr/server/createTTVolume.sh hadoopmapr2.brewingintel.com
> /var/mapr/local/48833481-0c7a-4728-8f93-bcf9b545ad81/mapred
> /var/mapr/local/48833481-0c7a-4728-8f93-bcf9b545ad81/mapred/nodeManager yarn
>
>
> However, the script never returned, and the task failed.  So at this
> point, I think we can get good info passed to the MapR script, but I am

Re: Struggling with Permissions

2015-11-18 Thread John Omernik

gz ${HADOOP_VER}/


# Copy to the URI location... note I am using MapR so I cp it directly to
the MapFS location via NFS share, it would probably be good to use a hadoop
copy command for interoperability

echo " Copying to HDFS Location"

cp ${HADOOP_VER}.tgz ${URI_LOC}/


# I do this because it worked... not sure if I remo

#sudo chown mapr:mapr ${URI_LOC}/${HADOOP_VER}.tgz


#echo " Cleaning unpacked location"

sudo rm -rf ./${HADOOP_VER}

sudo rm ./${HADOOP_VER}.tgz




On Wed, Nov 18, 2015 at 9:40 AM, yuliya Feldman  wrote:

> I would love to have that piece of code "configurable", but it is not at
> the moment.
> Will send you patch offline.
> Thanks,Yuliya
>   From: John Omernik 
>  To: dev@myriad.incubator.apache.org; yuliya Feldman 
>  Sent: Wednesday, November 18, 2015 6:02 AM
>  Subject: Re: Struggling with Permissions
>
> Yuliya, I would be interested in the patch for MapR, is that a patch for
> Myriad or a patch for Hadoop on MapR?  I wonder if there is a hadoop env
> file I could modified in my TGZ to help address the issue on my nodes as
> well. Can you describe what "mapr.host" is and if I can force overwrite
> that in my ENV file or will MapR clobber that at a later point in
> execution? I am thinking that with some simple sed, I could "fix" the conf
> file.
>
> Wait, I suppose there is no way for me edit the command used to run the
> node manager... there's a thought. Could Myriad provide an ENV value or
> something that would allow us to edit the command or insert something into
> the command that is used to run the NM?  (below is the the command on my
> cluster)  Basically, if there was a way to template that  and alter it in
> the Myriad config, I could add commands to update the variables in the conf
> file before it's copied to yarn-site on every node... just spitballing
> ideas here...
>
>
>
> sudo tar -zxpf hadoop-2.7.0-NM.tar.gz && sudo chown mapr . && cp conf
> hadoop-2.7.0/etc/hadoop/yarn-site.xml; export YARN_HOME=hadoop-2.7.0; sudo
> -E -u mapr -H env YARN_HOME=hadoop-2.7.0
> YARN_NODEMANAGER_OPTS=-Dnodemanager.resource.io-spindles=4.0
> -Dyarn.resourcemanager.hostname=myriad.marathon.mesos
>
> -Dyarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor
> -Dnodemanager.resource.cpu-vcores=2 -Dnodemanager.resource.memory-mb=8192
> -Dmyriad.yarn.nodemanager.address=0.0.0.0:31984
> -Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31233
> -Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31716
> -Dmyriad.mapreduce.shuffle.port=31786  /bin/yarn nodemanager
>
>
>
> On Tue, Nov 17, 2015 at 4:44 PM, yuliya Feldman
>  > wrote:
>
> > Hadoop (not Mapr) requires whole path starting from "/" be owned by root
> > and writable only by root
> > The second problem is exactly what I was talking about configuration
> being
> > taken from RM that overwrites local one
> > I can give you a patch to mitigate the issue for Mapr if you are building
> > from source.
> > Thanks,Yuliya
> >  From: John Omernik 
> >  To: dev@myriad.incubator.apache.org
> >  Sent: Tuesday, November 17, 2015 1:15 PM
> >  Subject: Re: Struggling with Permissions
> >
> > Well sure /tmp is world writeable but /tmp/mesos is not world writable
> thus
> > there is a sandbox to play in there... or am I missing something. Not to
> > mention my tmp is rwt which is world writable but only the creator or
> root
> > can modify (based on the googles).
> > Yuliya:
> >
> > I am seeing a weird behavior with MapR as it relates to (I believe) the
> > mapr_direct_shuffle.
> >
> > In the Node Manager logs, I see things starting and it saying "Checking
> for
> > local volume, if local volume is not present command will create and
> mount
> > it"
> >
> > Command invoked is : /opt/mapr/server/createTTVolume.sh
> > hadoopmapr7.brewingintel.com /var/mapr/local/
> > hadoopmapr2.brewingintel.com/mapred /var/mapr/local/
> > hadoopmapr2.brewingintel.com/mapred/nodeManager yarn
> >
> >
> > What is interesting here is hadoopmapr7 is the nodemanager it's trying to
> > start on, however the mount point it's trying to create is hadoopmapr2
> > which is the node the resource manager happened to fall on...  I was very
> > confused by that because in no place should hadoopmapr2 be "known" to the
> > nodemanager, because it thinks the resource manager hostname is
> > myriad.marathon.mesos.
> >
> > So why was it hard coding to the node the resource manager is running on?
&g

Re: Struggling with Permissions

2015-11-18 Thread John Omernik

One other idea... could the use of Cgroups or not using cgroups affect
this? I.e. I wonder if it worked for me when I was messing with CGRoups
settings, if I used Cgroups could it potentially act as a CHROOT on the
sandbox so from the perspective of the executor it is running in the root
with the proper permissions.

Also, if we are bundling everything there, if CGROUPS isn't the answer,
could something like CHROOT help us to trick Yarn into thinking permissions
are fine all the way to root? (Once again brainstorming)



On Wed, Nov 18, 2015 at 7:56 AM, John Omernik  wrote:

> I understand that Hadoop (yarn)  requires those permission.  My concern
> still stands in that I had it running at one point without this issue. I
> can't reproduce it now, and trying to figure that out (so at some point,
> the permissions I had set seemed to allow it to run, without changing the
> slaves) unless it was a mirage, or some other quirk, it was running.  Can
> others confirm that they have to change their Mesos setup in order to run
> Myriad?  This seems very odd to me in that A. If the permissions are wrong
> to the point where we can't run the framework Mesos running with default
> settings. (Writing to /tmp is the standard location for Mesos, and there
> have been no changes to /tmp permissions, the permissions are drwxrwxrwt
> root:root  just like a standard install of Ubuntu 14.04) B. Everyone here
> who has run Myriad have thus changed their default settings on Mesos and C.
> That this change of Mesos isn't in the documentation.
>
> In my environment that I believe that it "ran" at some point (something I
> did on executor tgz permissions helped it to work) combined with  something
> so big (requiring a change on every Mesos slave in your cluster) and
> critical to even getting Myriad (i.e. Yarn won't run in the default Mesos
> setup) to run isn't in the documentation of getting Myriad to run makes me
> question whether there is no work around to this.
>
> So let me ask the group
>
> 1. Did everyone here make the change in Mesos to get Myriad to run?  If
> Myriad is running in your environment (it doesn't matter if you are running
> MapR or not) and you have not changed your default /tmp location for Slave
> Sandboxes, please share your /tmp ownership and permissions and let us know
> if it runs. If it did require a change, please let us know what you changed
> it to, and if it affected any other frameworks.  (That is my largest
> concern in that changing permissions on this location is not a Myriad only
> change)
>
> 2. If everyone DID make this change, then we NEED to get this documented
> because this will be a huge speed bump in people trying it out and getting
> it running.
>
> Yuliya, I am not trying to be a pain, but as a user this is a very strange
> to me that something so fundamental to the operation is not clear in the
> documentation. I just want to ensure we (probably mostly me) understands
> this completely before I go and make changes to every node in my mesos
> cluster.
>
> On Tue, Nov 17, 2015 at 4:44 PM, yuliya Feldman <
> yufeld...@yahoo.com.invalid> wrote:
>
>> Hadoop (not Mapr) requires whole path starting from "/" be owned by root
>> and writable only by root
>> The second problem is exactly what I was talking about configuration
>> being taken from RM that overwrites local one
>> I can give you a patch to mitigate the issue for Mapr if you are building
>> from source.
>> Thanks,Yuliya
>>   From: John Omernik 
>>  To: dev@myriad.incubator.apache.org
>>  Sent: Tuesday, November 17, 2015 1:15 PM
>>  Subject: Re: Struggling with Permissions
>>
>> Well sure /tmp is world writeable but /tmp/mesos is not world writable
>> thus
>> there is a sandbox to play in there... or am I missing something. Not to
>> mention my tmp is rwt which is world writable but only the creator or root
>> can modify (based on the googles).
>> Yuliya:
>>
>> I am seeing a weird behavior with MapR as it relates to (I believe) the
>> mapr_direct_shuffle.
>>
>> In the Node Manager logs, I see things starting and it saying "Checking
>> for
>> local volume, if local volume is not present command will create and mount
>> it"
>>
>> Command invoked is : /opt/mapr/server/createTTVolume.sh
>> hadoopmapr7.brewingintel.com /var/mapr/local/
>> hadoopmapr2.brewingintel.com/mapred /var/mapr/local/
>> hadoopmapr2.brewingintel.com/mapred/nodeManager yarn
>>
>>
>> What is interesting here is hadoopmapr7 is the nodemanager it's trying to
>> start on, however the mount point it's trying to create is hadoopmapr2
>>

Re: Struggling with Permissions

2015-11-18 Thread John Omernik

Yuliya, I would be interested in the patch for MapR, is that a patch for
Myriad or a patch for Hadoop on MapR?  I wonder if there is a hadoop env
file I could modified in my TGZ to help address the issue on my nodes as
well. Can you describe what "mapr.host" is and if I can force overwrite
that in my ENV file or will MapR clobber that at a later point in
execution? I am thinking that with some simple sed, I could "fix" the conf
file.

Wait, I suppose there is no way for me edit the command used to run the
node manager... there's a thought. Could Myriad provide an ENV value or
something that would allow us to edit the command or insert something into
the command that is used to run the NM?  (below is the the command on my
cluster)  Basically, if there was a way to template that  and alter it in
the Myriad config, I could add commands to update the variables in the conf
file before it's copied to yarn-site on every node... just spitballing
ideas here...



sudo tar -zxpf hadoop-2.7.0-NM.tar.gz && sudo chown mapr . && cp conf
hadoop-2.7.0/etc/hadoop/yarn-site.xml; export YARN_HOME=hadoop-2.7.0; sudo
-E -u mapr -H env YARN_HOME=hadoop-2.7.0
YARN_NODEMANAGER_OPTS=-Dnodemanager.resource.io-spindles=4.0
-Dyarn.resourcemanager.hostname=myriad.marathon.mesos
-Dyarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor
-Dnodemanager.resource.cpu-vcores=2 -Dnodemanager.resource.memory-mb=8192
-Dmyriad.yarn.nodemanager.address=0.0.0.0:31984
-Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31233
-Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31716
-Dmyriad.mapreduce.shuffle.port=31786  /bin/yarn nodemanager

On Tue, Nov 17, 2015 at 4:44 PM, yuliya Feldman  wrote:

> Hadoop (not Mapr) requires whole path starting from "/" be owned by root
> and writable only by root
> The second problem is exactly what I was talking about configuration being
> taken from RM that overwrites local one
> I can give you a patch to mitigate the issue for Mapr if you are building
> from source.
> Thanks,Yuliya
>   From: John Omernik 
>  To: dev@myriad.incubator.apache.org
>  Sent: Tuesday, November 17, 2015 1:15 PM
>  Subject: Re: Struggling with Permissions
>
> Well sure /tmp is world writeable but /tmp/mesos is not world writable thus
> there is a sandbox to play in there... or am I missing something. Not to
> mention my tmp is rwt which is world writable but only the creator or root
> can modify (based on the googles).
> Yuliya:
>
> I am seeing a weird behavior with MapR as it relates to (I believe) the
> mapr_direct_shuffle.
>
> In the Node Manager logs, I see things starting and it saying "Checking for
> local volume, if local volume is not present command will create and mount
> it"
>
> Command invoked is : /opt/mapr/server/createTTVolume.sh
> hadoopmapr7.brewingintel.com /var/mapr/local/
> hadoopmapr2.brewingintel.com/mapred /var/mapr/local/
> hadoopmapr2.brewingintel.com/mapred/nodeManager yarn
>
>
> What is interesting here is hadoopmapr7 is the nodemanager it's trying to
> start on, however the mount point it's trying to create is hadoopmapr2
> which is the node the resource manager happened to fall on...  I was very
> confused by that because in no place should hadoopmapr2 be "known" to the
> nodemanager, because it thinks the resource manager hostname is
> myriad.marathon.mesos.
>
> So why was it hard coding to the node the resource manager is running on?
>
> Well if I look at the conf file in the sandbox (the file that gets copied
> to be yarn-site.xml for node managers.  There ARE four references the
> hadoopmapr2. Three of the four say "source programatically" and one is just
> set... that's mapr.host.  Could there be some down stream hinkyness going
> on with how MapR is setting hostnames?  All of these variables seem "wrong"
> in that mapr.host (on the node manager) should be hadoopmapr7 in this case,
> and the resource managers should all be myriad.marathon.mesos.  I'd be
> interested in your thoughts here, because I am stumped at how these are
> getting set.
>
>
>
>
> yarn.resourcemanager.addresshadoopmapr2:8032programatically
> mapr.hosthadoopmapr2.brewingintel.com
> 
>
> yarn.resourcemanager.resource-tracker.addresshadoopmapr2:8031programatically
>
> yarn.resourcemanager.admin.addresshadoopmapr2:8033programatically
>
>
>
>
>
>
>
> On Tue, Nov 17, 2015 at 2:51 PM, Darin Johnson 
> wrote:
>
> > Yuliya: Are you referencing yarn.nodemanager.hostname or a mapr specific
> > option?
> >
> > I'm working right now on passing a
> > -Dyarn.nodemanager.hostname=offer.getHostName().  Useful if you've got
> >

Re: Struggling with Permissions

2015-11-18 Thread John Omernik

I understand that Hadoop (yarn)  requires those permission.  My concern
still stands in that I had it running at one point without this issue. I
can't reproduce it now, and trying to figure that out (so at some point,
the permissions I had set seemed to allow it to run, without changing the
slaves) unless it was a mirage, or some other quirk, it was running.  Can
others confirm that they have to change their Mesos setup in order to run
Myriad?  This seems very odd to me in that A. If the permissions are wrong
to the point where we can't run the framework Mesos running with default
settings. (Writing to /tmp is the standard location for Mesos, and there
have been no changes to /tmp permissions, the permissions are drwxrwxrwt
root:root  just like a standard install of Ubuntu 14.04) B. Everyone here
who has run Myriad have thus changed their default settings on Mesos and C.
That this change of Mesos isn't in the documentation.

In my environment that I believe that it "ran" at some point (something I
did on executor tgz permissions helped it to work) combined with  something
so big (requiring a change on every Mesos slave in your cluster) and
critical to even getting Myriad (i.e. Yarn won't run in the default Mesos
setup) to run isn't in the documentation of getting Myriad to run makes me
question whether there is no work around to this.

So let me ask the group

1. Did everyone here make the change in Mesos to get Myriad to run?  If
Myriad is running in your environment (it doesn't matter if you are running
MapR or not) and you have not changed your default /tmp location for Slave
Sandboxes, please share your /tmp ownership and permissions and let us know
if it runs. If it did require a change, please let us know what you changed
it to, and if it affected any other frameworks.  (That is my largest
concern in that changing permissions on this location is not a Myriad only
change)

2. If everyone DID make this change, then we NEED to get this documented
because this will be a huge speed bump in people trying it out and getting
it running.

Yuliya, I am not trying to be a pain, but as a user this is a very strange
to me that something so fundamental to the operation is not clear in the
documentation. I just want to ensure we (probably mostly me) understands
this completely before I go and make changes to every node in my mesos
cluster.

On Tue, Nov 17, 2015 at 4:44 PM, yuliya Feldman  wrote:

> Hadoop (not Mapr) requires whole path starting from "/" be owned by root
> and writable only by root
> The second problem is exactly what I was talking about configuration being
> taken from RM that overwrites local one
> I can give you a patch to mitigate the issue for Mapr if you are building
> from source.
> Thanks,Yuliya
>   From: John Omernik 
>  To: dev@myriad.incubator.apache.org
>  Sent: Tuesday, November 17, 2015 1:15 PM
>  Subject: Re: Struggling with Permissions
>
> Well sure /tmp is world writeable but /tmp/mesos is not world writable thus
> there is a sandbox to play in there... or am I missing something. Not to
> mention my tmp is rwt which is world writable but only the creator or root
> can modify (based on the googles).
> Yuliya:
>
> I am seeing a weird behavior with MapR as it relates to (I believe) the
> mapr_direct_shuffle.
>
> In the Node Manager logs, I see things starting and it saying "Checking for
> local volume, if local volume is not present command will create and mount
> it"
>
> Command invoked is : /opt/mapr/server/createTTVolume.sh
> hadoopmapr7.brewingintel.com /var/mapr/local/
> hadoopmapr2.brewingintel.com/mapred /var/mapr/local/
> hadoopmapr2.brewingintel.com/mapred/nodeManager yarn
>
>
> What is interesting here is hadoopmapr7 is the nodemanager it's trying to
> start on, however the mount point it's trying to create is hadoopmapr2
> which is the node the resource manager happened to fall on...  I was very
> confused by that because in no place should hadoopmapr2 be "known" to the
> nodemanager, because it thinks the resource manager hostname is
> myriad.marathon.mesos.
>
> So why was it hard coding to the node the resource manager is running on?
>
> Well if I look at the conf file in the sandbox (the file that gets copied
> to be yarn-site.xml for node managers.  There ARE four references the
> hadoopmapr2. Three of the four say "source programatically" and one is just
> set... that's mapr.host.  Could there be some down stream hinkyness going
> on with how MapR is setting hostnames?  All of these variables seem "wrong"
> in that mapr.host (on the node manager) should be hadoopmapr7 in this case,
> and the resource managers should all be myriad.marathon.mesos.  I'd be
> interested in your thought

Re: Struggling with Permissions

2015-11-17 Thread John Omernik

What's even stranger is I can't for life of me find where "mapr.host" gets
set or used.  I did a grep -P -R "mapr\.host" ./*  in /opt/mapr (which
included me pulling down the myriad code into
/opt/mapr/myriad/incubator-myriad) and found only one reference in
/opt/mapr/server/mapr_yarn_install.sh



  yarn.nodemanager.hostname

  \${mapr.host}

" | sudo tee -a ${YARN_CONF_FILE}


But I don't think that is being called at all by the resource manager...


(Note when I create my tarball from /opt/mapr/hadoop/hadoop-2.7.0 directory
I am using tar -zcfhp  to both preserver permissions and include the files
that symlinked... not sure if that affects things here.. )





On Tue, Nov 17, 2015 at 3:15 PM, John Omernik  wrote:

> Well sure /tmp is world writeable but /tmp/mesos is not world writable
> thus there is a sandbox to play in there... or am I missing something. Not
> to mention my tmp is rwt which is world writable but only the creator or
> root can modify (based on the googles).
> Yuliya:
>
> I am seeing a weird behavior with MapR as it relates to (I believe) the
> mapr_direct_shuffle.
>
> In the Node Manager logs, I see things starting and it saying "Checking
> for local volume, if local volume is not present command will create and
> mount it"
>
> Command invoked is : /opt/mapr/server/createTTVolume.sh
> hadoopmapr7.brewingintel.com /var/mapr/local/
> hadoopmapr2.brewingintel.com/mapred /var/mapr/local/
> hadoopmapr2.brewingintel.com/mapred/nodeManager yarn
>
>
> What is interesting here is hadoopmapr7 is the nodemanager it's trying to
> start on, however the mount point it's trying to create is hadoopmapr2
> which is the node the resource manager happened to fall on...  I was very
> confused by that because in no place should hadoopmapr2 be "known" to the
> nodemanager, because it thinks the resource manager hostname is
> myriad.marathon.mesos.
>
> So why was it hard coding to the node the resource manager is running on?
>
> Well if I look at the conf file in the sandbox (the file that gets copied
> to be yarn-site.xml for node managers.  There ARE four references the
> hadoopmapr2. Three of the four say "source programatically" and one is just
> set... that's mapr.host.  Could there be some down stream hinkyness going
> on with how MapR is setting hostnames?  All of these variables seem "wrong"
> in that mapr.host (on the node manager) should be hadoopmapr7 in this case,
> and the resource managers should all be myriad.marathon.mesos.   I'd be
> interested in your thoughts here, because I am stumped at how these are
> getting set.
>
>
>
>
> yarn.resourcemanager.addresshadoopmapr2:8032programatically
> mapr.hosthadoopmapr2.brewingintel.com
> 
>
> yarn.resourcemanager.resource-tracker.addresshadoopmapr2:8031programatically
>
> yarn.resourcemanager.admin.addresshadoopmapr2:8033programatically
>
>
>
>
>
> On Tue, Nov 17, 2015 at 2:51 PM, Darin Johnson 
> wrote:
>
>> Yuliya: Are you referencing yarn.nodemanager.hostname or a mapr specific
>> option?
>>
>> I'm working right now on passing a
>> -Dyarn.nodemanager.hostname=offer.getHostName().  Useful if you've got
>> extra ip's for a san or management network.
>>
>> John: Yeah the permissions on the tarball are a pain to get right.  I'm
>> working on Docker Support and a build script for the tarball, which should
>> make things easier.  Also, to the point of using world writable
>> directories
>> it's a little scary from the security side of things to allow executables
>> to run there, especially things running as privileged users.  Many
>> distro's
>> of linux will mount /tmp noexec.
>>
>> Darin
>>
>> On Tue, Nov 17, 2015 at 2:53 PM, yuliya Feldman
>> > > wrote:
>>
>> > Please change workdir directory for mesos slave to one that is not /tmp
>> > and make sure that dir is owned by root.
>> > There is one more caveat with binary distro and MapR - in Myriad code
>> for
>> > binary distro configuration is copied from RM to NMs - it doe snot work
>> for
>> > MapR since we need hostname (yes for the sake of local volumes) to be
>> > unique.
>> > MapR will have Myriad release to handle this situation.
>> >   From: John Omernik 
>> >  To: dev@myriad.incubator.apache.org
>> >  Sent: Tuesday, November 17, 2015 11:37 AM
>> >  Subject: Re: Struggling with Permissions
>> >
>> > Oh hey, I found a post by me back on Sept 9.  I looked at the Jiras and
>> > followed the instructions with the same errors. At

Re: Struggling with Permissions

2015-11-17 Thread John Omernik

Well sure /tmp is world writeable but /tmp/mesos is not world writable thus
there is a sandbox to play in there... or am I missing something. Not to
mention my tmp is rwt which is world writable but only the creator or root
can modify (based on the googles).
Yuliya:

I am seeing a weird behavior with MapR as it relates to (I believe) the
mapr_direct_shuffle.

In the Node Manager logs, I see things starting and it saying "Checking for
local volume, if local volume is not present command will create and mount
it"

Command invoked is : /opt/mapr/server/createTTVolume.sh
hadoopmapr7.brewingintel.com /var/mapr/local/
hadoopmapr2.brewingintel.com/mapred /var/mapr/local/
hadoopmapr2.brewingintel.com/mapred/nodeManager yarn

What is interesting here is hadoopmapr7 is the nodemanager it's trying to
start on, however the mount point it's trying to create is hadoopmapr2
which is the node the resource manager happened to fall on...  I was very
confused by that because in no place should hadoopmapr2 be "known" to the
nodemanager, because it thinks the resource manager hostname is
myriad.marathon.mesos.

So why was it hard coding to the node the resource manager is running on?

Well if I look at the conf file in the sandbox (the file that gets copied
to be yarn-site.xml for node managers.  There ARE four references the
hadoopmapr2. Three of the four say "source programatically" and one is just
set... that's mapr.host.  Could there be some down stream hinkyness going
on with how MapR is setting hostnames?  All of these variables seem "wrong"
in that mapr.host (on the node manager) should be hadoopmapr7 in this case,
and the resource managers should all be myriad.marathon.mesos.   I'd be
interested in your thoughts here, because I am stumped at how these are
getting set.

yarn.resourcemanager.addresshadoopmapr2:8032programatically
mapr.hosthadoopmapr2.brewingintel.com

yarn.resourcemanager.resource-tracker.addresshadoopmapr2:8031programatically
yarn.resourcemanager.admin.addresshadoopmapr2:8033programatically

On Tue, Nov 17, 2015 at 2:51 PM, Darin Johnson 
wrote:

> Yuliya: Are you referencing yarn.nodemanager.hostname or a mapr specific
> option?
>
> I'm working right now on passing a
> -Dyarn.nodemanager.hostname=offer.getHostName().  Useful if you've got
> extra ip's for a san or management network.
>
> John: Yeah the permissions on the tarball are a pain to get right.  I'm
> working on Docker Support and a build script for the tarball, which should
> make things easier.  Also, to the point of using world writable directories
> it's a little scary from the security side of things to allow executables
> to run there, especially things running as privileged users.  Many distro's
> of linux will mount /tmp noexec.
>
> Darin
>
> On Tue, Nov 17, 2015 at 2:53 PM, yuliya Feldman
>  > wrote:
>
> > Please change workdir directory for mesos slave to one that is not /tmp
> > and make sure that dir is owned by root.
> > There is one more caveat with binary distro and MapR - in Myriad code for
> > binary distro configuration is copied from RM to NMs - it doe snot work
> for
> > MapR since we need hostname (yes for the sake of local volumes) to be
> > unique.
> > MapR will have Myriad release to handle this situation.
> >   From: John Omernik 
> >  To: dev@myriad.incubator.apache.org
> >  Sent: Tuesday, November 17, 2015 11:37 AM
> >  Subject: Re: Struggling with Permissions
> >
> > Oh hey, I found a post by me back on Sept 9.  I looked at the Jiras and
> > followed the instructions with the same errors. At this point do I still
> > need to have a place where the entire path is owned by root? That seems
> > like a an odd requirement (a changed of each node to facilitate a
> > framework)
> >
> >
> >
> >
> >
> > On Tue, Nov 17, 2015 at 1:25 PM, John Omernik  wrote:
> >
> > > Hey all, I am struggling with permissions on myriad, trying to get the
> > > right permissions in the tgz as well as who to run as.  I am running in
> > > MapR, which means I need to run as mapr or root (otherwise my volume
> > > creation scripts will fail on MapR, MapR folks, we should talk more
> about
> > > those scripts)
> > >
> > > But back to the code, I've had lots issues. When I run the
> Frameworkuser
> > > and Superuser as mapr, it unpacks everything as MapR and I get a
> > > "/bin/container-executor" must be owned by root but is owned by 700 (my
> > > mapr UID).
> > >
> > > So now I am running as root, and I am getting the error below as it
> > > relates to /tmp. I am not sure which /tmp this ref

Re: Struggling with Permissions

2015-11-17 Thread John Omernik

Is this change going to be required for all mesos installations that would
run Myriad with remote distribution?  I guess I'd like to respectfully
challenge that notion that to use remote distribution we need to make a
cluster wide change to all of our slave nodes that would require a restart
of the mesos slaves.  That seems to me to be quite a large requirement.

What's strange to me, is at one point I had this working without the that
requirement, but I can't reproduce how I created the tgz to make it work.
Where does the requirement to have the /tmp not be world writable come into
play? This is seems like a strange requirement in that as a executor, it's
world should start at the root of it's container.  I.e. how does it even
know that that it's parent directory has different permissions? Can we just
just traverse frameworks up? If I had a command that said rm -rf ../../*
would that work?  Maybe I just didn't dig into what the sandbox was before
this, but my thought that from the perspective of the executors / was, for
example:

/tmp/mesos/slaves/20151007-102829-1660987584-5050-15078-S5/frameworks/d9aab75d-1a74-489d-976d-805ce55364ff-0011/executors/myriad_executord9aab75d-1a74-489d-976d-805ce55364ff-0011d9aab75d-1a74-489d-976d-805ce55364ff-O18052420151007-102829-1660987584-5050-15078-S5/runs/70c6ba08-5c7a-4399-bf5a-34ac328e66e1/

as a ls of that directory shows me the unpacked tgz etc.  How does the
nodemanager know that the parent directories, specifically /tmp is world
writable, and why does it care?

I am not trying to be belligerent, I just want to understand this without
just changing a cluster wide setting/restarting mesos.

On Tue, Nov 17, 2015 at 1:53 PM, yuliya Feldman  wrote:

> Please change workdir directory for mesos slave to one that is not /tmp
> and make sure that dir is owned by root.
> There is one more caveat with binary distro and MapR - in Myriad code for
> binary distro configuration is copied from RM to NMs - it doe snot work for
> MapR since we need hostname (yes for the sake of local volumes) to be
> unique.
> MapR will have Myriad release to handle this situation.
>   From: John Omernik 
>  To: dev@myriad.incubator.apache.org
>  Sent: Tuesday, November 17, 2015 11:37 AM
>  Subject: Re: Struggling with Permissions
>
> Oh hey, I found a post by me back on Sept 9.  I looked at the Jiras and
> followed the instructions with the same errors. At this point do I still
> need to have a place where the entire path is owned by root? That seems
> like a an odd requirement (a changed of each node to facilitate a
> framework)
>
>
>
>
>
> On Tue, Nov 17, 2015 at 1:25 PM, John Omernik  wrote:
>
> > Hey all, I am struggling with permissions on myriad, trying to get the
> > right permissions in the tgz as well as who to run as.  I am running in
> > MapR, which means I need to run as mapr or root (otherwise my volume
> > creation scripts will fail on MapR, MapR folks, we should talk more about
> > those scripts)
> >
> > But back to the code, I've had lots issues. When I run the Frameworkuser
> > and Superuser as mapr, it unpacks everything as MapR and I get a
> > "/bin/container-executor" must be owned by root but is owned by 700 (my
> > mapr UID).
> >
> > So now I am running as root, and I am getting the error below as it
> > relates to /tmp. I am not sure which /tmp this refers to. the /tmp that
> my
> > slave is executing in? (i.e. my local mesos agent /tmp directory) or my
> > MaprFS /tmp directory (both of which are world writable, as /tmp
> typically
> > is... or am I mistaken here?)
> >
> > Any thoughts on how to get this to resolve? This is when nodemanager is
> > trying to start running as root and root for both of my Myriad users.
> >
> > Thanks!
> >
> >
> > Caused by: ExitCodeException exitCode=24: File /tmp must not be world or
> group writable, but is 1777
> >
> >
> >
> >
>
>
>
>

Re: Struggling with Permissions

2015-11-17 Thread John Omernik

Oh hey, I found a post by me back on Sept 9.  I looked at the Jiras and
followed the instructions with the same errors. At this point do I still
need to have a place where the entire path is owned by root? That seems
like a an odd requirement (a changed of each node to facilitate a framework)



On Tue, Nov 17, 2015 at 1:25 PM, John Omernik  wrote:

> Hey all, I am struggling with permissions on myriad, trying to get the
> right permissions in the tgz as well as who to run as.  I am running in
> MapR, which means I need to run as mapr or root (otherwise my volume
> creation scripts will fail on MapR, MapR folks, we should talk more about
> those scripts)
>
> But back to the code, I've had lots issues. When I run the Frameworkuser
> and Superuser as mapr, it unpacks everything as MapR and I get a
> "/bin/container-executor" must be owned by root but is owned by 700 (my
> mapr UID).
>
> So now I am running as root, and I am getting the error below as it
> relates to /tmp. I am not sure which /tmp this refers to. the /tmp that my
> slave is executing in? (i.e. my local mesos agent /tmp directory) or my
> MaprFS /tmp directory (both of which are world writable, as /tmp typically
> is... or am I mistaken here?)
>
> Any thoughts on how to get this to resolve? This is when nodemanager is
> trying to start running as root and root for both of my Myriad users.
>
> Thanks!
>
>
> Caused by: ExitCodeException exitCode=24: File /tmp must not be world or 
> group writable, but is 1777
>
>
>
>

Struggling with Permissions

2015-11-17 Thread John Omernik

Hey all, I am struggling with permissions on myriad, trying to get the
right permissions in the tgz as well as who to run as.  I am running in
MapR, which means I need to run as mapr or root (otherwise my volume
creation scripts will fail on MapR, MapR folks, we should talk more about
those scripts)

But back to the code, I've had lots issues. When I run the Frameworkuser
and Superuser as mapr, it unpacks everything as MapR and I get a
"/bin/container-executor" must be owned by root but is owned by 700 (my
mapr UID).

So now I am running as root, and I am getting the error below as it relates
to /tmp. I am not sure which /tmp this refers to. the /tmp that my slave is
executing in? (i.e. my local mesos agent /tmp directory) or my MaprFS /tmp
directory (both of which are world writable, as /tmp typically is... or am
I mistaken here?)

Any thoughts on how to get this to resolve? This is when nodemanager is
trying to start running as root and root for both of my Myriad users.

Thanks!


Caused by: ExitCodeException exitCode=24: File /tmp must not be world
or group writable, but is 1777

Re: Yarn-Site

2015-10-05 Thread John Omernik

I see. That makes sense.  Thanks for the tip.

Is it safe to pull down a recent version at this point? Are we using the
official "master" or phase1?  (the lazy man in me is asking for a link to
the current repo so I don't have to read back over emails to see where I
should go :)



On Mon, Oct 5, 2015 at 3:25 PM, Darin Johnson 
wrote:

> Hey John,
>
> Are you trying to run the resource manager from the tar ball via marathon?
> It's doable, my suggested approach would be to use a json like this:
>
> {
>   "id": "resource-manager",
>   "uris": ["hdfs://namenode:port/dist/hadoop-2.7.0.tgz",
>  "hdfs://namenode:port/dist/conf/hadoop/yarn-site.xml",
>  "hdfs://namenode:port/dist/conf/hadoop/hdfs-site.xml",
>  "hdfs:///dist/conf/hadoop/core-site.xml",
>  "hdfs://namenode:port/dist/conf/hadoop/mapred-site.xml"],
>   "cmd": "cp *.xml hadoop-2.7.0/etc/hadoop && cd hadoop-2.7.0 && bin/yarn
> resourcemanager",
>   "mem": 16,
>   "cpu": 1
>   "instances" : 1,
>   "user": "yarn"
> }
>
> Basically it keeps you from redoing the tar ball every time you edit a
> config, instead you just upload the new yarn-site.xml.  The Node Manager
> gets it's config from the Resource Manager (I'm assuming this is all for
> remote distribution, otherwise creating the tar ball is optional).
>
> Darin
>
> On Mon, Oct 5, 2015 at 2:36 PM, John Omernik  wrote:
>
> > Hey all, I've been waiting until the chaos of the code move has died
> down.
> > I am looking to get this working on my MapR cluster now, and would like
> > some clarification on instructions here:
> >
> >
> >
> >
> https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators
> >
> > Basically, in the instructions below, it has the "remove the
> > yarn-site.xml.  Yet to run the resource manager with myriad, you need the
> > yarn-site to be packaged with things (unless I am reading that
> incorrectly)
> > Is the only option right now to created a tarball for nodemanagers, and
> > have this be different from the tarball for the resource manager?
> >
> > Step 5: Create the Tarball
> >
> > The tarball has all of the files needed for the Node Managers and
> Resource
> > Managers. The following shows how to create the tarball and place it in
> > HDFS:
> > cd ~
> > sudo cp -rp $YARN_HOME .
> > sudo rm $YARN_HOME/etc/hadoop/yarn-site.xml
> > sudo tar -zcpf ~/hadoop-2.7.1.tar.gz hadoop-2.7.1
> > hadoop fs -put ~/hadoop-2.7.1.tar.gz /dist
> >
>

Yarn-Site

2015-10-05 Thread John Omernik

Hey all, I've been waiting until the chaos of the code move has died down.
I am looking to get this working on my MapR cluster now, and would like
some clarification on instructions here:


https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators

Basically, in the instructions below, it has the "remove the
yarn-site.xml.  Yet to run the resource manager with myriad, you need the
yarn-site to be packaged with things (unless I am reading that incorrectly)
Is the only option right now to created a tarball for nodemanagers, and
have this be different from the tarball for the resource manager?

Step 5: Create the Tarball

The tarball has all of the files needed for the Node Managers and  Resource
Managers. The following shows how to create the tarball and place it in
HDFS:
cd ~
sudo cp -rp $YARN_HOME .
sudo rm $YARN_HOME/etc/hadoop/yarn-site.xml
sudo tar -zcpf ~/hadoop-2.7.1.tar.gz hadoop-2.7.1
hadoop fs -put ~/hadoop-2.7.1.tar.gz /dist

Re: Cassandra Summit

2015-09-25 Thread John Omernik

"Why would you want to do that?"

As a potential user of Myriad, in the enterprise I see a number of reasons
I'd "want to do that" they are:

- The ability to use Mesos' purpose built and well design resource
management with Map Reduce. Right now Yarn is is the only option to run Map
Reduce V2 Applications, and while Yarn is far superior to Resource
Management in Map Reduce V1, we have still have an important application
that is intrinsically tied to the resource schedule. Things that run on
resource schedulers should not be tied to them. Map Reduce V2 should not
have a specific resource scheduler as a requirement.

- Multi Tenancy: Right now if you have a cluster of computers, you can run
one Yarn cluster on them.  With Myriad, the option exists to have smaller
clusters, that are purpose built running on one set of harder, think a Yarn
cluster for marketing, or one for HR.  This is great option for better
utilizing your resources, as well as better scaling growth and costs
associated with growth. Consider setting up separate clusters in Yarn
without Mesos: Many services duplicated, VMs or Physical node management
issues, etc.

- To build on Multi Tenancy, consider different version of Yarn and Map
Reduce. Right now, a new feature or bug fix comes out in a version of Yarn,
and there is not a good way to put that into play with your data. You have
to go through horrible testing process just to upgrade, and you have to
make sure ALL other jobs are not affected by the upgrade. With Myriad, keep
your production jobs at version X of yarn, and then spin up a new Yarn
cluster at version x+1.  Now you can test your jobs slowly, and migrated
them one by one without impact to production processes.  Upgrading is now
not all or nothing, but a controlled process where you can "fail fast" i.e.
if the job doesn't work, roll it back to the older version of Yarn.

- The ability to have applications (think Docker containers) sitting right
next to the data (Hadoop data) they may be interacting with. Monitoring all
the jobs in one place rather than distinct clusters for containers and
others for data frameworks.

- Data frameworks!!  Like the multi-tenancy conversation, what happens when
you want to have Drill or Impala, plus Map Reduce V2 (multiple of these),
plus Spark, or Storm, or Kafka all working together.  With Yarn now, you
it's much more locked in to a monolithic cluster, still with static
partitioning all over the place (think a Cloudera cluster with Yarn, Impala
and Hive... want to change something? You have to make sure all the pieces
change together)  With Mesos/Myriad, you have the flexibility to move and
try new things, with minimal impact to your production, without standing up
addition servers/clusters.  Myriad is the missing link here in that YARN
only applications (Map ReduceV2!!!) are now part of that vision for a
unified data center, you no longer have to make a choice between Myriad or
Yarn, now it's Myriad AND Yarn.

Those are the points that get me excited, ecosystem lock in a huge concern
for many enterprises.   I don't want to imply I am not excited about the
dynamic flexup/flexdown or the HA components, obviously those are awesome
too, but for me those are cherries on top to the other components that let
me envision a data environment where options exist everywhere, where
innovation can happen faster, and I never have a situation where an idea is
left on the cutting room floor because We don't support X.

Random thoughts from me...

John



On Fri, Sep 25, 2015 at 7:59 AM, Jim Klucar  wrote:

> Awesome. I assume it was good talk? I need to get better at answering the
> "Why would you want to do that?" question.
>
> On Thu, Sep 24, 2015 at 9:08 PM, Ken Sipe  wrote:
>
> > I just gave a talk at the cassandra summit.  It included details around
> > spark and analytics with cassandra in the cluster.  There were lots of
> > questions, etc.   I just wanted to let this group know that the 2nd
> largest
> > topic of conversation and questions was around myriad… there was a lot of
> > excitement for our project.
> >
> > Ken
>

Re: Question about the Wiki Instructions on yarn-site.xml

2015-09-09 Thread John Omernik

As to editing directly, I am at a new employer, and we are trying to hash
out if I can sign the Apache Committer Agreement stuff.  Thus, my thoughts,
if the group wants them, will have to be in informal forum posts which I
can't make any claim to from an IP perspective. I will work on the
committers document approved, and do more directly with the Wiki, sorry for
the roundaboutness.

On Wed, Sep 9, 2015 at 1:07 PM, Ruth Harris  wrote:

> Hi all,
>
> If you can clarify for me also about what the original instructions for the
> Admin was trying to do and provide clearer information, I can update with
> wiki information and then update the .markdown file in github.
> Alternatively, the SME can update the wiki directly.
>
> I only walked through the config and build information associated with the
> Developer information.
>
> Thanks, Ruth
>
> On Wed, Sep 9, 2015 at 8:52 AM, Darin Johnson 
> wrote:
>
> > John,
> > Understood I don't think making the tempdir be setup that way is ideal.
> > We've had issues with other frameworks in the past.
> > Darin
> > On Sep 9, 2015 11:48 AM, "John Omernik"  wrote:
> >
> > > Well at this point my biggest issue the root user stuff in the other
> > thread
> > > and figuring out how to get it to work without making my slave's mesos
> > temp
> > > only writable by root (is there a work around? And is this a best
> > practice
> > > anyhow? what are the down stream effects of this etc)
> > >
> > > On Wed, Sep 9, 2015 at 10:45 AM, Darin Johnson <
> dbjohnson1...@gmail.com>
> > > wrote:
> > >
> > > > Hey John I'm going to try to recreate issue using vanilla hadoop
> later
> > > > today.  Any other settings I should know about?
> > > > Darin
> > > > On Sep 9, 2015 9:42 AM, "John Omernik"  wrote:
> > > >
> > > > > This was another "slipped in" question in my other thread, I am
> > > breaking
> > > > > out for specific instructions.  Basically, I was struggling with
> with
> > > > some
> > > > > things in the wiki on this page:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators
> > > > >
> > > > > In step 5:
> > > > > Step 5: Configure YARN to use Myriad
> > > > >
> > > > > Modify the */opt/hadoop-2.7.0/etc/hadoop/yarn-site.xml* file as
> > > > instructed
> > > > > in Sample: myriad-config-default.yml
> > > > > <
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MYRIAD/Sample%3A+myriad-config-default.yml
> > > > > >
> > > > > .
> > > > >
> > > > >
> > > > > Issue 1: It should link to the yarn-site.xml page, not hte
> > > > > myriad-config.default.yml page
> > > > >
> > > > > Issue 2:
> > > > > It has us put that information in the yarn-site.xml This makes
> sense.
> > > > The
> > > > > resource manager needs to be aware of the myriad stuff.
> > > > >
> > > > > Then I go to create a tarball, (which I SHOULD be able to use for
> > both
> > > > > resource manager and nodemanager... right?) However, the
> instructions
> > > > state
> > > > > to remove the *.xml files.
> > > > >
> > > > > Step 6: Create the Tarball
> > > > >
> > > > > The tarball has all of the files needed for the Node Managers and
> > > > Resource
> > > > > Managers. The following shows how to create the tarball and place
> it
> > in
> > > > > HDFS:
> > > > > cd ~
> > > > > sudo cp -rp /opt/hadoop-2.7.0 .
> > > > > sudo rm hadoop-2.7.0/etc/hadoop/*.xml
> > > > > sudo tar -zcpf ~/hadoop-2.7.0.tar.gz hadoop-2.7.0
> > > > > hadoop fs -put ~/hadoop-2.7.0.tar.gz /dist
> > > > >
> > > > >
> > > > > What I ended up doing... since I am running the resourcemanager
> > > (myriad)
> > > > in
> > > > > marathon, is I created two tarballs. One is my
> hadoop-2.7.0-RM.tar.gz
> > > > which
> > > > > has the all the xml files still in the tar ball for shipping to
> > > marathon.
> > > > > Then other is hadoop-2.7.0-NM.tar.gz which per the instructions
> > removes
> > > > the
> > > > > *.xml files from the /etc/hadoop/ directory.
> > > > >
> > > > >
> > > > > I guess... my logic is that myriad creates the conf directory for
> the
> > > > > nodemanagers... but then I thought, and I overthinking something?
> Am
> > I
> > > > > missing something? Could that be factoring into what I am doing
> here?
> > > > >
> > > > >
> > > > > Obviously my first steps are to add the extra yarn-site.xml
> entries,
> > > but
> > > > in
> > > > > this current setup, they are only going into the resource manager
> > > > yarn-site
> > > > > as the the node-managers don't have a yarn-site in their
> directories.
> > > > Am I
> > > > > looking at this correctly?  Perhaps we could rethink the removal
> > > process
> > > > of
> > > > > the XML files in the tarball to allow this to work correctly with a
> > > > single
> > > > > tarball?
> > > > >
> > > > > If I am missing something here, please advise!
> > > > >
> > > > >
> > > > > John
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Ruth Harris
> Sr. Technical Writer, MapR
>

Re: Question about the Wiki Instructions on yarn-site.xml

2015-09-09 Thread John Omernik

Well at this point my biggest issue the root user stuff in the other thread
and figuring out how to get it to work without making my slave's mesos temp
only writable by root (is there a work around? And is this a best practice
anyhow? what are the down stream effects of this etc)

On Wed, Sep 9, 2015 at 10:45 AM, Darin Johnson 
wrote:

> Hey John I'm going to try to recreate issue using vanilla hadoop later
> today.  Any other settings I should know about?
> Darin
> On Sep 9, 2015 9:42 AM, "John Omernik"  wrote:
>
> > This was another "slipped in" question in my other thread, I am breaking
> > out for specific instructions.  Basically, I was struggling with with
> some
> > things in the wiki on this page:
> >
> >
> https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators
> >
> > In step 5:
> > Step 5: Configure YARN to use Myriad
> >
> > Modify the */opt/hadoop-2.7.0/etc/hadoop/yarn-site.xml* file as
> instructed
> > in Sample: myriad-config-default.yml
> > <
> >
> https://cwiki.apache.org/confluence/display/MYRIAD/Sample%3A+myriad-config-default.yml
> > >
> > .
> >
> >
> > Issue 1: It should link to the yarn-site.xml page, not hte
> > myriad-config.default.yml page
> >
> > Issue 2:
> > It has us put that information in the yarn-site.xml This makes sense.
> The
> > resource manager needs to be aware of the myriad stuff.
> >
> > Then I go to create a tarball, (which I SHOULD be able to use for both
> > resource manager and nodemanager... right?) However, the instructions
> state
> > to remove the *.xml files.
> >
> > Step 6: Create the Tarball
> >
> > The tarball has all of the files needed for the Node Managers and
> Resource
> > Managers. The following shows how to create the tarball and place it in
> > HDFS:
> > cd ~
> > sudo cp -rp /opt/hadoop-2.7.0 .
> > sudo rm hadoop-2.7.0/etc/hadoop/*.xml
> > sudo tar -zcpf ~/hadoop-2.7.0.tar.gz hadoop-2.7.0
> > hadoop fs -put ~/hadoop-2.7.0.tar.gz /dist
> >
> >
> > What I ended up doing... since I am running the resourcemanager (myriad)
> in
> > marathon, is I created two tarballs. One is my hadoop-2.7.0-RM.tar.gz
> which
> > has the all the xml files still in the tar ball for shipping to marathon.
> > Then other is hadoop-2.7.0-NM.tar.gz which per the instructions removes
> the
> > *.xml files from the /etc/hadoop/ directory.
> >
> >
> > I guess... my logic is that myriad creates the conf directory for the
> > nodemanagers... but then I thought, and I overthinking something? Am I
> > missing something? Could that be factoring into what I am doing here?
> >
> >
> > Obviously my first steps are to add the extra yarn-site.xml entries, but
> in
> > this current setup, they are only going into the resource manager
> yarn-site
> > as the the node-managers don't have a yarn-site in their directories.
> Am I
> > looking at this correctly?  Perhaps we could rethink the removal process
> of
> > the XML files in the tarball to allow this to work correctly with a
> single
> > tarball?
> >
> > If I am missing something here, please advise!
> >
> >
> > John
> >
>

Re: Getting Nodes to be "Running" in Mesos

2015-09-09 Thread John Omernik

So focusing on this issue to run Myriad at this point, we would need to

1. Run Myriad as root (i.e. in marathon "user":"root",  must be added to
the json so it runs as root)
2. Have the frameworkUser be root
3. Have the frameoworkSuperUser either be root or be someone who can
passwordlessly sudo to root.
4. Have the entire path of the slave work-dir be owned by root and only
writable by root up to where the container-executor.cfg exists.

On point 4, so for me I am running my slaves pointing to a work directory
that is /opt/mapr/mesos/tmp/slave in that I have some space issues on some
of my nodes /.   Even if I ran it to /tmp I would run into the same
problem. If I found a new place to put the work directly on every slave,
where it was root writable from / to the .cfg file, then it would work.
But, would other frameworks fail? Or would their chown process actually fix
things so they could write?  This seems like a huge work around to get
Myriad running.

At this point is there another way to get Myriad or is running all as root
the only way? Just trying to get myriad back up and running here.



On Tue, Sep 8, 2015 at 9:30 PM, Darin Johnson 
wrote:

> Yuliya, the reason for the chown framework user . is that the the executor
> (as frameworkUser) must write some files the the MESOS_DIRECTORY,
> specifically stderr, stdout and at the time the capsule dir (now
> obsolete).  I suppose we could touch these files and then give them the
> proper permissions.
>
> I was planning to remove a lot of the code once MESOS-1790 is resolved, Jim
> submitted a patch already.  In particular, there would no longer be a
> frameworkSuperUser (it's there so we can extract the tarball and preserve
> ownership/permissions for container-executor), and the frameworkUser would
> just run the yarn nodemanger.  If we continue to require the
> MESOS_DIRECTORY to be owned by root and we'll be required to continue to
> run it in a way similar to it is currently.  I really don't like the idea
> of running frameworks as root or even with passwordless sudo if I can help
> it, but at the time it was the only work around.
>
> So I guess the question is frameworkSuperUser something that we'd like to
> eventually depricate or is it here for good?  Also, I should comment on
> Mesos-1790 to see what's going on with the patch.
>
> Darin
>
>
>
> On Sep 8, 2015 7:12 PM, "yuliya Feldman" 
> wrote:
>
> > John,
> > It is a problem with permissions for container-executor.cfg - it requires
> > whole path to it to be owned by root.
> > One step is to change work-dir for mesos-slave to point to a different
> > directory (not tmp) that is writable only by root.
> > It still does not solve full issue since binary distro is changing
> > permissions of the distro directory to a framework user.
> > If framework user is root and myriad is running as root it can be solved,
> > otherwise we need changes to binary distro code.
> > I was planning to do it, but got distracted by other stuff. Will try to
> > look at it this week.
> > Thanks,Yuliya
> >   From: John Omernik 
> >  To: dev@myriad.incubator.apache.org; yuliya Feldman <
> yufeld...@yahoo.com>
> >  Sent: Tuesday, September 8, 2015 1:31 PM
> >  Subject: Re: Getting Nodes to be "Running" in Mesos
> >
> > interesting... when I did root as the framework user then I got this:
> >
> > ExitCodeException exitCode=24: File /tmp must not be world or group
> > writable, but is 1777
> >
> > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
> > at org.apache.hadoop.util.Shell.run(Shell.java:456)
> > at
> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
> > at
> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
> > 15/09/08 15:30:38 INFO nodemanager.ContainerExecutor:
> > 15/09/08 15:30:38 INFO service.AbstractService: Service NodeManager
> > failed in state INITED; cause:
> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> > initialize container executor
> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> > initialize container execu

Question about the Wiki Instructions on yarn-site.xml

2015-09-09 Thread John Omernik

This was another "slipped in" question in my other thread, I am breaking
out for specific instructions.  Basically, I was struggling with with some
things in the wiki on this page:
https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators

In step 5:
Step 5: Configure YARN to use Myriad

Modify the */opt/hadoop-2.7.0/etc/hadoop/yarn-site.xml* file as instructed
in Sample: myriad-config-default.yml

.


Issue 1: It should link to the yarn-site.xml page, not hte
myriad-config.default.yml page

Issue 2:
It has us put that information in the yarn-site.xml This makes sense.  The
resource manager needs to be aware of the myriad stuff.

Then I go to create a tarball, (which I SHOULD be able to use for both
resource manager and nodemanager... right?) However, the instructions state
to remove the *.xml files.

Step 6: Create the Tarball

The tarball has all of the files needed for the Node Managers and  Resource
Managers. The following shows how to create the tarball and place it in
HDFS:
cd ~
sudo cp -rp /opt/hadoop-2.7.0 .
sudo rm hadoop-2.7.0/etc/hadoop/*.xml
sudo tar -zcpf ~/hadoop-2.7.0.tar.gz hadoop-2.7.0
hadoop fs -put ~/hadoop-2.7.0.tar.gz /dist


What I ended up doing... since I am running the resourcemanager (myriad) in
marathon, is I created two tarballs. One is my hadoop-2.7.0-RM.tar.gz which
has the all the xml files still in the tar ball for shipping to marathon.
Then other is hadoop-2.7.0-NM.tar.gz which per the instructions removes the
*.xml files from the /etc/hadoop/ directory.


I guess... my logic is that myriad creates the conf directory for the
nodemanagers... but then I thought, and I overthinking something? Am I
missing something? Could that be factoring into what I am doing here?


Obviously my first steps are to add the extra yarn-site.xml entries, but in
this current setup, they are only going into the resource manager yarn-site
as the the node-managers don't have a yarn-site in their directories.  Am I
looking at this correctly?  Perhaps we could rethink the removal process of
the XML files in the tarball to allow this to work correctly with a single
tarball?

If I am missing something here, please advise!


John

Requirement for Active Profile at Startup - DNS Delay

2015-09-09 Thread John Omernik

I tossed a few small things into my larger thread about getting Myriad
running, so I am going to start separate threads to break them out.


When starting Myriad, it seems we now need to have at least one node
manager specified at startup (based on the config file as seen below)

nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero
profile.

  medium: 1 # 


This is going to lead to task failures with mesos dns because the name
won't be ready right away (potentially a 1 minute delay after kicking off
Myriad) do we NEED to have a non-0 profile nodemanager startup with the
resource manager? Can't we start Myriad with no nodemanagers, and then have
automation (i.e. a startup procedure with a script to flex up after the
myriad.marathon.mesos names begins to resolve) to flex up?

Re: Getting Nodes to be "Running" in Mesos

2015-09-08 Thread John Omernik

Changing the path for the slave is hard because if we have other frameworks
running and they need to write, only writable by root isn't really an
option.  Is this an option that the executor can do prior to executing the
node manager (fixing the permissions on the folders it's running with, an
then running as a non privileged user? )

On Tue, Sep 8, 2015 at 6:12 PM, yuliya Feldman 
wrote:

> John,
> It is a problem with permissions for container-executor.cfg - it requires
> whole path to it to be owned by root.
> One step is to change work-dir for mesos-slave to point to a different
> directory (not tmp) that is writable only by root.
> It still does not solve full issue since binary distro is changing
> permissions of the distro directory to a framework user.
> If framework user is root and myriad is running as root it can be solved,
> otherwise we need changes to binary distro code.
> I was planning to do it, but got distracted by other stuff. Will try to
> look at it this week.
> Thanks,Yuliya
>   From: John Omernik 
>  To: dev@myriad.incubator.apache.org; yuliya Feldman 
>  Sent: Tuesday, September 8, 2015 1:31 PM
>  Subject: Re: Getting Nodes to be "Running" in Mesos
>
> interesting... when I did root as the framework user then I got this:
>
> ExitCodeException exitCode=24: File /tmp must not be world or group
> writable, but is 1777
>
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
> at org.apache.hadoop.util.Shell.run(Shell.java:456)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
> 15/09/08 15:30:38 INFO nodemanager.ContainerExecutor:
> 15/09/08 15:30:38 INFO service.AbstractService: Service NodeManager
> failed in state INITED; cause:
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> initialize container executor
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> initialize container executor
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
> Caused by: java.io.IOException: Linux container executor not
> configured properly (error=24)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
> ... 3 more
> Caused by: ExitCodeException exitCode=24: File /tmp must not be world
> or group writable, but is 1777
>
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
> at org.apache.hadoop.util.Shell.run(Shell.java:456)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
> ... 4 more
> 15/09/08 15:30:38 WARN service.AbstractService: When stopping the
> service NodeManager : java.lang.NullPointerException
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:162)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:274)
> at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
> 15/09/08 15:30:38 FATAL nodemanager.NodeManager: Error starting NodeManager
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> initialize container executor
> at

Re: Getting Nodes to be "Running" in Mesos

2015-09-08 Thread John Omernik

interesting... when I did root as the framework user then I got this:

ExitCodeException exitCode=24: File /tmp must not be world or group
writable, but is 1777

at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
15/09/08 15:30:38 INFO nodemanager.ContainerExecutor:
15/09/08 15:30:38 INFO service.AbstractService: Service NodeManager
failed in state INITED; cause:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
Caused by: java.io.IOException: Linux container executor not
configured properly (error=24)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
... 3 more
Caused by: ExitCodeException exitCode=24: File /tmp must not be world
or group writable, but is 1777

at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
... 4 more
15/09/08 15:30:38 WARN service.AbstractService: When stopping the
service NodeManager : java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:162)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:274)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
15/09/08 15:30:38 FATAL nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
Caused by: java.io.IOException: Linux container executor not
configured properly (error=24)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
... 3 more
Caused by: ExitCodeException exitCode=24: File /tmp must not be world
or group writable, but is 1777

at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
... 4 more
15/09/08 15:30:38 INFO nodemanager.NodeManager: SHUTDOWN_MSG:

On Tue, Sep 8, 2015 at 3:26 PM, John Omernik  wrote:

> So some progress: I am getting the error below complaining about ownership
> of files.  In marathon I have user:root on my task, in the myriad config, I
> have  mapr is user 700, so I am unsure on that, I will try with
>

Re: Getting Nodes to be "Running" in Mesos

2015-09-08 Thread John Omernik

So some progress: I am getting the error below complaining about ownership
of files.  In marathon I have user:root on my task, in the myriad config, I
have  mapr is user 700, so I am unsure on that, I will try with
framworkUser being root, see if that works?

frameworkUser: mapr # Should be the same user running the resource manager.

frameworkSuperUser: darkness # Must be root or have passwordless sudo on
all nodes!

at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
... 3 more
Caused by: ExitCodeException exitCode=24: File
/tmp/mesos/slaves/20150907-111332-1660987584-5050-8033-S1/frameworks/20150907-111332-1660987584-5050-8033-0005/executors/myriad_executor20150907-111332-1660987584-5050-8033-000520150907-111332-1660987584-5050-8033-O12269720150907-111332-1660987584-5050-8033-S1/runs/8c48f443-f768-45b1-8cb2-55ff5b5a99d8
must be owned by root, but is owned by 700

at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
... 4 more
15/09/08 15:23:24 WARN service.AbstractService: When stopping the service
NodeManager : java.lang.NullPointerException
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:162)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:274)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
15/09/08 15:23:24 FATAL nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
Caused by: java.io.IOException: Linux container executor not configured
properly (error=24)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
... 3 more
Caused by: ExitCodeException exitCode=24: File
/tmp/mesos/slaves/20150907-111332-1660987584-5050-8033-S1/frameworks/20150907-111332-1660987584-5050-8033-0005/executors/myriad_executor20150907-111332-1660987584-5050-8033-000520150907-111332-1660987584-5050-8033-O12269720150907-111332-1660987584-5050-8033-S1/runs/8c48f443-f768-45b1-8cb2-55ff5b5a99d8
must be owned by root, but is owned by 700

at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
... 4 more
15/09/08 15:23:24 INFO nodemanager.NodeManager: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down NodeManager at
hadoopmapr2.brewingintel.com/192.168.0.99
/

On Tue, Sep 8, 2015 at 3:23 PM, John Omernik  wrote:

> Also a side note:  The Flexing up and now having to have at least one node
> manager specified at startup:
>
> nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero
> profile.
>
>   medium: 1 # 
>
>
> Is going to lead to task failures with mesos dns because the name won't be
> ready right away (1 minute delay after kicking off Myriad) do we NEED to
> have a non-0 profile nodemanager startup with the resource manager?
>
> On Tue, Sep 8, 2015 at 3:16 PM, John Omernik  wrote:
>
>> Cool.  Question about the yarn-site.xml in general.
>>
>> I was struggling with some things in the wiki on this page:
>> https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators
>>
>> Basically in step 5:
>> Step 5: Configure YARN to use Myriad
>>
>> Modify the */opt/hadoop-2.7.0/etc/hadoop/yarn-site.xml* file as
>> instructed in Sample: myriad-config-default.yml
>> <https://cwiki.apache.org/confluence/display/MYRIAD/Sample%3A+

Re: Getting Nodes to be "Running" in Mesos

2015-09-08 Thread John Omernik

Also a side note:  The Flexing up and now having to have at least one node
manager specified at startup:

nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero
profile.

  medium: 1 # 


Is going to lead to task failures with mesos dns because the name won't be
ready right away (1 minute delay after kicking off Myriad) do we NEED to
have a non-0 profile nodemanager startup with the resource manager?

On Tue, Sep 8, 2015 at 3:16 PM, John Omernik  wrote:

> Cool.  Question about the yarn-site.xml in general.
>
> I was struggling with some things in the wiki on this page:
> https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators
>
> Basically in step 5:
> Step 5: Configure YARN to use Myriad
>
> Modify the */opt/hadoop-2.7.0/etc/hadoop/yarn-site.xml* file as
> instructed in Sample: myriad-config-default.yml
> <https://cwiki.apache.org/confluence/display/MYRIAD/Sample%3A+myriad-config-default.yml>
> .
>
>
> (It should not link to the yml, but to the yarn site, side issue) it has
> us put that information in the yarn-site.xml This makes sense.  The
> resource manager needs to be aware of the myriad stuff.
>
> Then I go to create a tarbal, (which I SHOULD be able to use for both
> resource manager and nodemanager... right?) However, the instructions state
> to remove the *.xml files.
>
> Step 6: Create the Tarball
>
> The tarball has all of the files needed for the Node Managers and
> Resource Managers. The following shows how to create the tarball and place
> it in HDFS:
> cd ~
> sudo cp -rp /opt/hadoop-2.7.0 .
> sudo rm hadoop-2.7.0/etc/hadoop/*.xml
> sudo tar -zcpf ~/hadoop-2.7.0.tar.gz hadoop-2.7.0
> hadoop fs -put ~/hadoop-2.7.0.tar.gz /dist
>
>
> What I ended up doing... since I am running the resourcemanager (myriad)
> in marathon, is I created two tarballs. One is my hadoop-2.7.0-RM.tar.gz
> which has the all the xml files still in the tar ball for shipping to
> marathon. Then other is hadoop-2.7.0-NM.tar.gz which per the instructions
> removes the *.xml files from the /etc/hadoop/ directory.
>
>
> I guess... my logic is that myriad creates the conf directory for the
> nodemanagers... but then I thought, and I overthinking something? Am I
> missing something? Could that be factoring into what I am doing here?
>
>
> Obviously my first steps are to add the extra yarn-site.xml entries, but
> in this current setup, they are only going into the resource manager
> yarn-site as the the node-managers don't have a yarn-site in their
> directories.
>
>
>
>
>
>
>
> On Tue, Sep 8, 2015 at 3:09 PM, yuliya Feldman <
> yufeld...@yahoo.com.invalid> wrote:
>
>> Take a look at :   https://github.com/mesos/myriad/pull/128
>> for yarn-site.xml updates
>>
>>   From: John Omernik 
>>  To: dev@myriad.incubator.apache.org
>>  Sent: Tuesday, September 8, 2015 12:38 PM
>>  Subject: Getting Nodes to be "Running" in Mesos
>>
>> So I am playing around with a recent build of Myriad, and I am using MapR
>> 5.0 (hadoop-2.7.0) I hate to use the dev list as a "help Myriad won't run"
>> forum, so please forgive me if I am using the list wrong.
>>
>> Basically, I seem to be able to get myriad running, and the things up, and
>> it tries to start a nodemanager.
>>
>> In mesos, the status of the nodemanager task never gets past staging, and
>> eventually, fails.  The logs for both the node manager and myriad, seem to
>> look healthy, and I am not sure where I should look next to troubleshoot
>> what is happening. Basically you can see the registration of the
>> nodemanager, and then it fails with no error in the logs... Any thoughts
>> would be appreciated on where I can look next for troubleshooting.
>>
>>
>> Node Manager Logs (complete)
>>
>> STARTUP_MSG:  build = g...@github.com:mapr/private-hadoop-common.git
>> -r fc95119f587541fb3a9af0dbeeed23c974178115; compiled by 'root' on
>> 2015-08-19T20:02Z
>> STARTUP_MSG:  java = 1.8.0_45-internal
>> /
>> 15/09/08 14:35:23 INFO nodemanager.NodeManager: registered UNIX signal
>> handlers for [TERM, HUP, INT]
>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
>>
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEventType
>> for class
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher
>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
>>
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.applicati

Re: Getting Nodes to be "Running" in Mesos

2015-09-08 Thread John Omernik

Cool.  Question about the yarn-site.xml in general.

I was struggling with some things in the wiki on this page:
https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators

Basically in step 5:
Step 5: Configure YARN to use Myriad

Modify the */opt/hadoop-2.7.0/etc/hadoop/yarn-site.xml* file as instructed
in Sample: myriad-config-default.yml
<https://cwiki.apache.org/confluence/display/MYRIAD/Sample%3A+myriad-config-default.yml>
.

(It should not link to the yml, but to the yarn site, side issue) it has us
put that information in the yarn-site.xml This makes sense.  The resource
manager needs to be aware of the myriad stuff.

Then I go to create a tarbal, (which I SHOULD be able to use for both
resource manager and nodemanager... right?) However, the instructions state
to remove the *.xml files.

Step 6: Create the Tarball

The tarball has all of the files needed for the Node Managers and  Resource
Managers. The following shows how to create the tarball and place it in
HDFS:
cd ~
sudo cp -rp /opt/hadoop-2.7.0 .
sudo rm hadoop-2.7.0/etc/hadoop/*.xml
sudo tar -zcpf ~/hadoop-2.7.0.tar.gz hadoop-2.7.0
hadoop fs -put ~/hadoop-2.7.0.tar.gz /dist

What I ended up doing... since I am running the resourcemanager (myriad) in
marathon, is I created two tarballs. One is my hadoop-2.7.0-RM.tar.gz which
has the all the xml files still in the tar ball for shipping to marathon.
Then other is hadoop-2.7.0-NM.tar.gz which per the instructions removes the
*.xml files from the /etc/hadoop/ directory.

I guess... my logic is that myriad creates the conf directory for the
nodemanagers... but then I thought, and I overthinking something? Am I
missing something? Could that be factoring into what I am doing here?

Obviously my first steps are to add the extra yarn-site.xml entries, but in
this current setup, they are only going into the resource manager yarn-site
as the the node-managers don't have a yarn-site in their directories.

On Tue, Sep 8, 2015 at 3:09 PM, yuliya Feldman 
wrote:

> Take a look at :   https://github.com/mesos/myriad/pull/128
> for yarn-site.xml updates
>
>   From: John Omernik 
>  To: dev@myriad.incubator.apache.org
>  Sent: Tuesday, September 8, 2015 12:38 PM
>  Subject: Getting Nodes to be "Running" in Mesos
>
> So I am playing around with a recent build of Myriad, and I am using MapR
> 5.0 (hadoop-2.7.0) I hate to use the dev list as a "help Myriad won't run"
> forum, so please forgive me if I am using the list wrong.
>
> Basically, I seem to be able to get myriad running, and the things up, and
> it tries to start a nodemanager.
>
> In mesos, the status of the nodemanager task never gets past staging, and
> eventually, fails.  The logs for both the node manager and myriad, seem to
> look healthy, and I am not sure where I should look next to troubleshoot
> what is happening. Basically you can see the registration of the
> nodemanager, and then it fails with no error in the logs... Any thoughts
> would be appreciated on where I can look next for troubleshooting.
>
>
> Node Manager Logs (complete)
>
> STARTUP_MSG:  build = g...@github.com:mapr/private-hadoop-common.git
> -r fc95119f587541fb3a9af0dbeeed23c974178115; compiled by 'root' on
> 2015-08-19T20:02Z
> STARTUP_MSG:  java = 1.8.0_45-internal
> /
> 15/09/08 14:35:23 INFO nodemanager.NodeManager: registered UNIX signal
> handlers for [TERM, HUP, INT]
> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEventType
> for class
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher
> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationEventType
> for class
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher
> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizationEventType
> for class
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService
> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesEventType
> for class
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices
> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorEventType
> for class
> org.apache.hadoop.yarn.server.nodemanager.contain

Getting Nodes to be "Running" in Mesos

2015-09-08 Thread John Omernik

So I am playing around with a recent build of Myriad, and I am using MapR
5.0 (hadoop-2.7.0) I hate to use the dev list as a "help Myriad won't run"
forum, so please forgive me if I am using the list wrong.

Basically, I seem to be able to get myriad running, and the things up, and
it tries to start a nodemanager.

In mesos, the status of the nodemanager task never gets past staging, and
eventually, fails.  The logs for both the node manager and myriad, seem to
look healthy, and I am not sure where I should look next to troubleshoot
what is happening. Basically you can see the registration of the
nodemanager, and then it fails with no error in the logs... Any thoughts
would be appreciated on where I can look next for troubleshooting.


Node Manager Logs (complete)

STARTUP_MSG:   build = g...@github.com:mapr/private-hadoop-common.git
-r fc95119f587541fb3a9af0dbeeed23c974178115; compiled by 'root' on
2015-08-19T20:02Z
STARTUP_MSG:   java = 1.8.0_45-internal
/
15/09/08 14:35:23 INFO nodemanager.NodeManager: registered UNIX signal
handlers for [TERM, HUP, INT]
15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEventType
for class 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher
15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationEventType
for class 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher
15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizationEventType
for class 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService
15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesEventType
for class org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices
15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorEventType
for class 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEventType
for class 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher
15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.nodemanager.ContainerManagerEventType
for class 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.nodemanager.NodeManagerEventType for
class org.apache.hadoop.yarn.server.nodemanager.NodeManager
15/09/08 14:35:24 INFO impl.MetricsConfig: loaded properties from
hadoop-metrics2.properties
15/09/08 14:35:24 INFO impl.MetricsSystemImpl: Scheduled snapshot
period at 10 second(s).
15/09/08 14:35:24 INFO impl.MetricsSystemImpl: NodeManager metrics
system started
15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerEventType
for class 
org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler
15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadEventType
for class 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadService
15/09/08 14:35:24 INFO localizer.ResourceLocalizationService: per
directory file limit = 8192
15/09/08 14:35:24 INFO localizer.ResourceLocalizationService:
usercache path :
file:///tmp/hadoop-mapr/nm-local-dir/usercache_DEL_1441740924753
15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizerEventType
for class 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker
15/09/08 14:35:24 WARN containermanager.AuxServices: The Auxilurary
Service named 'mapreduce_shuffle' in the configuration is for class
org.apache.hadoop.mapred.ShuffleHandler which has a name of
'httpshuffle'. Because these are not the same tools trying to send
ServiceData and read Service Meta Data may have issues unless the
refer to the name in the config.
15/09/08 14:35:24 INFO containermanager.AuxServices: Adding auxiliary
service httpshuffle, "mapreduce_shuffle"
15/09/08 14:35:24 INFO monitor.ContainersMonitorImpl:  Using
ResourceCalculatorPlugin

Re: [jira] [Commented] (MYRIAD-129) Creating custom profiles requires configurations changes on all nodes.

2015-09-08 Thread John Omernik

Agreed!

On Tue, Sep 8, 2015 at 1:36 PM, Santosh Marella 
wrote:

> John - did you intend to comment on the JIRA? Hitting "reply-all" doesn't
> seem to update the JIRA.
>
> Also, to your note - with the recent HA feature in Myriad, the state is
> stored on the DFS. But it's trivial to implement a state store for ZK (just
> a new class with a few methods to override).
>
> I feel it's better to have all of Myriad's state in a single place (state
> store). Where you want to persist that state store should be left
> configurable (ZK vs DFS) by the admin.
>
> Santosh
>
> On Tue, Sep 8, 2015 at 11:27 AM, John Omernik  wrote:
>
> > Since the profile information is small, could this be done in Zookeeper?
> >
> > On Tue, Sep 8, 2015 at 12:59 PM, Yuliya Feldman (JIRA) 
> > wrote:
> >
> > >
> > > [
> > >
> >
> https://issues.apache.org/jira/browse/MYRIAD-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735288#comment-14735288
> > > ]
> > >
> > > Yuliya Feldman commented on MYRIAD-129:
> > > ---
> > >
> > > I think we should have API/UI to add/remove/modify profiles. Also keep
> > all
> > > the configuration updates in state store in case of failover
> > >
> > > > Creating custom profiles requires configurations changes on all
> nodes.
> > > >
> --
> > > >
> > > > Key: MYRIAD-129
> > > > URL:
> https://issues.apache.org/jira/browse/MYRIAD-129
> > > > Project: Myriad
> > > >  Issue Type: Bug
> > > >Reporter: Aashreya Ravi Shankar
> > > >
> > >
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian JIRA
> > > (v6.3.4#6332)
> > >
> >
>

Re: [jira] [Commented] (MYRIAD-129) Creating custom profiles requires configurations changes on all nodes.

2015-09-08 Thread John Omernik

Since the profile information is small, could this be done in Zookeeper?

On Tue, Sep 8, 2015 at 12:59 PM, Yuliya Feldman (JIRA) 
wrote:

>
> [
> https://issues.apache.org/jira/browse/MYRIAD-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735288#comment-14735288
> ]
>
> Yuliya Feldman commented on MYRIAD-129:
> ---
>
> I think we should have API/UI to add/remove/modify profiles. Also keep all
> the configuration updates in state store in case of failover
>
> > Creating custom profiles requires configurations changes on all nodes.
> > --
> >
> > Key: MYRIAD-129
> > URL: https://issues.apache.org/jira/browse/MYRIAD-129
> > Project: Myriad
> >  Issue Type: Bug
> >Reporter: Aashreya Ravi Shankar
> >
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Re: Mryiad Mesoscon talk live, now

2015-08-21 Thread John Omernik

awesome thanks for sharing!

On Fri, Aug 21, 2015 at 12:57 PM, Jim Klucar  wrote:

> If you want to watch live, its being broadcast on Periscope.
>
>  https://t.co/dm0i4CugfK
>

Re: myriad scheduler startup with HDP2.7

2015-08-19 Thread John Omernik

Fundamentally, I would imagine that the goals of Myriad, even using a
distribution's build of hadoop, is to have the classpath be entirely
contained.  I.e. There should be no need for any classpath on a node to run
resource manager or node manager. This may post challenges, in that I know
in MapR some libs are linked to the /opt/mapr/lib folder.  Perhaps when we
talk about building the tarball in the remote distribution, we should
explore this idea, and perhaps use flags that include the files if they are
links.

John


On Wed, Aug 19, 2015 at 1:53 PM, yuliya Feldman  wrote:

> as you can imagine you need matching versions of jackson* jars otherwise
> you might get into issues of incompatibility
> Easiest for you at the moment to put myriad dependency jars on classpath
> before others
>   From: Bill Sparks 
>  To: "dev@myriad.incubator.apache.org" 
> Cc: yuliya Feldman 
>  Sent: Wednesday, August 19, 2015 10:48 AM
>  Subject: Re: myriad scheduler startup with HDP2.7
>
> Well thats the point, I do have 2.2.3 installed as that's the version
> shipped with HDP 2.3 and that gets loaded first in the classpath for YARN
> resourcemanager.
>
> I guess I have three alternatives.
>
> 1) build myriad using 2.2.3, thus matching the HDP installed jar's
> 2) replace the HDP version with 2.5.1, not sure what's that going to do
> for HDP compatibility
> 3) prepend a new classpath for yarn resourcemanager to pick up myriad
> versioned jars first.
>
> --
> Jonathan (Bill) Sparks
> Software Architecture
> Cray Inc.
>
>
>
>
>
>
>
> On 8/19/15 12:36 PM, "Adam Bordelon"  wrote:
>
> >Myriad should be using jackson 2.5.1
> >
> https://github.com/mesos/myriad/blob/d6d765736ba1c8f59aa967457527331e1dab6
> >743/myriad-scheduler/build.gradle#L13
> >Double-check your build.gradle, and make sure you don't have a jackson
> >2.2.3 preinstalled somewhere else on your system
> >
> >On Wed, Aug 19, 2015 at 8:20 AM, Bill Sparks  wrote:
> >
> >> Odd the class path reported in the yarn log contains jackson-core-2.2.3
> >> and not 2.5.1. Is there a way to build myriad to match the version
> >> supported by HDP - that being 2.2.3 ?
> >>
> >>
> >> --
> >> Jonathan (Bill) Sparks
> >> Software Architecture
> >> Cray Inc.
> >>
> >>
> >>
> >>
> >>
> >> On 8/19/15 10:11 AM, "Bill Sparks"  wrote:
> >>
> >> >Thanks I'll check..
> >> >
> >> >--
> >> >Jonathan (Bill) Sparks
> >> >Software Architecture
> >> >Cray Inc.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >On 8/19/15 10:09 AM, "yuliya Feldman" 
> >> wrote:
> >> >
> >> >>This method is part of JsonFactory class which is part of jackson-core
> >> >>jar
> >> >>See if you have some other jars on the classpath (different versions)
> >> >>that precede jackson-core-2.5.1.jar
> >> >>  From: Bill Sparks 
> >> >> To: "dev@myriad.incubator.apache.org"
> >>
> >> >> Sent: Wednesday, August 19, 2015 7:08 AM
> >> >> Subject: myriad scheduler startup with HDP2.7
> >> >>
> >> >>I'm sure this is been resolved, but I've been triaging why I'm getting
> >> >>the following error on resourcemanager startup. Everything on the
> >> >>configuration side looks correct, but I must have missed something.
> >> >>
> >> >>
> >> >>
> >> >>2015-08-19 08:53:04,718 FATAL
> >> >>org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error
> >> >>starting ResourceManager
> >> >>
> >> >>java.lang.NoSuchMethodError:
> >>
> com.fasterxml.jackson.dataformat.yaml.YAMLFactory._decorate(Ljava/io/In
> pu
> >> >>t
> >> >>Stream;Lcom/fasterxml/jackson/core/io/IOContext;)Ljava/io/InputStream;
> >> >>
> >> >>at
> >>
> com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFact
> or
> >> >>y
> >> >>.java:299)
> >> >>
> >> >>at
> >>
> com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFact
> or
> >> >>y
> >> >>.java:14)
> >> >>
> >> >>at
> >>
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java
> :2
> >> >>0
> >> >>11)
> >> >>
> >> >>at com.ebay.myriad.Main.initialize(Main.java:70)
> >> >>
> >> >>at
> >>
> com.ebay.myriad.scheduler.yarn.interceptor.MyriadInitializationIntercep
> to
> >> >>r
> >> >>.init(MyriadInitializationInterceptor.java:32)
> >> >>
> >> >>at
> >>
> com.ebay.myriad.scheduler.yarn.interceptor.CompositeInterceptor.init(Co
> mp
> >> >>o
> >> >>siteInterceptor.java:76)
> >> >>
> >> >>at
> >>
> com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.serviceInit(MyriadFa
> ir
> >> >>S
> >> >>cheduler.java:50)
> >> >>
> >> >>at
> >>
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163
> )
> >> >>
> >> >>at
> >>
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService
> .j
> >> >>a
> >> >>va:107)
> >> >>
> >> >>at
> >>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveS
> er
> >> >>v
> >> >>ices.serviceInit(ResourceManager.java:572)
> >> >>
> >> >>at
> >>
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163
> )
> >> >>
>

Re: Documentation Comments

2015-08-19 Thread John Omernik

Happy to sign the ICLA.  Who do I send it to? Ruth, I defer to your writing
skills and suggestions on how to help.  Happy to help in the way that you'd
find easiest.


John

On Wed, Aug 19, 2015 at 1:48 PM, Ruth Harris  wrote:

> hi John,
>
> Thank you for your feedback. I'm the assigned technical writer for the
> Myriad project. I'll also be working on updating the information.
>
> I'm also ok with what Adam indicated: directly editing or adding a John's
> page. But please be aware that I'll also be working on the content. Last
> week I did some cleanup work on the original files in GitHub and then
> brought them into the Wiki, although, I still have some more work in terms
> of organizing and identifying holes.
>
> If you like, I can create a "John's comments" page and then work on
> incorporating some of the obvious things that you mentioned.
>
> Thanks, Ruth
>
> Ruth Harris
> Sr. Tech. Writer
> rhar...@mapr.com
>
> On Wed, Aug 19, 2015 at 11:37 AM, John Omernik  wrote:
>
> > Thanks Adam, I signed up with "mandoskippy".
> >
> > I am honored to help in this capacity, for updating etc, do we go through
> > some kind of review? Is it better to ask questions on the dev list then
> > update when consensus occurs? How about when I'd like to post a page and
> > then have someone review the work? If I have a comment on the page, is
> that
> > public or can I just send to author? Just curious on any guidelines I
> > should be following in that regard.
> >
> > John
> >
> >
> >
> > On Wed, Aug 19, 2015 at 1:29 PM, Adam Bordelon 
> wrote:
> >
> > > John, thanks a ton for your valuable feedback! We're glad to have your
> > > perspective as a user of the project, and I'm ready+willing to give you
> > > edit access to the wiki if you want to update it with your learnings,
> > > elaborate anything that's unclear, or add a new "John's tips" page.
> Just
> > > sign up for a wiki account, send me your accountId, and I'll grant you
> > edit
> > > access.
> > > (I'll let others answer your specific questions)
> > >
> > > On Wed, Aug 19, 2015 at 6:28 AM, John Omernik 
> wrote:
> > >
> > > > Today, I will be playing the role of the fool/jester trying to get
> > Myriad
> > > > running. Basically, since getting Myriad running with Santosh quite a
> > > while
> > > > ago, and now trying again with new versions of Hadoop, MapR, and
> > Myriad,
> > > I
> > > > wanted to hit up the wiki (
> > > > https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Home) and
> > > > outline
> > > > points that as a non-dev living the code, are unclear to someone
> trying
> > > to
> > > > utilize myriad or understand it's operation.
> > > >
> > > > Obviously, some of my points can be answered with "look here in the
> > code"
> > > > or look at this page, but I will try to outline my thought processes
> > as I
> > > > reviewed the current docs.  Sometimes the way I approached the
> problem
> > > led
> > > > me down a path of to a certain page, missing the answer in a
> different
> > > > page, and thus some cross linking could be helpful.
> > > >
> > > > Please do not let my points be taken as anything other than a desire
> to
> > > > improve how accessible Myriad is to the community, this is not a
> > critique
> > > > of the hard work everyone has done on the project.  I also understand
> > > that
> > > > given the work load and other issues, that fixing these issues in
> > > > documentation may not be a priority.  I am listing them out here, so
> > that
> > > > those folks who are SMEs on various points may be able to quickly add
> > > stuff
> > > > and we'll organize it later.
> > > >
> > > >
> > > > *Remote Distribution: *
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Remote+Distribution
> > > >
> > > > This whole section could use some work from a standpoint of what runs
> > > where
> > > > and where that component gets its files.  For example, I think it
> would
> > > > help people to understand that the whole tarball created in step 6
> has
> > > all
> > > > the files for node managers and resource manage

Re: Documentation Comments

2015-08-19 Thread John Omernik

Thanks Adam, I signed up with "mandoskippy".

I am honored to help in this capacity, for updating etc, do we go through
some kind of review? Is it better to ask questions on the dev list then
update when consensus occurs? How about when I'd like to post a page and
then have someone review the work? If I have a comment on the page, is that
public or can I just send to author? Just curious on any guidelines I
should be following in that regard.

John



On Wed, Aug 19, 2015 at 1:29 PM, Adam Bordelon  wrote:

> John, thanks a ton for your valuable feedback! We're glad to have your
> perspective as a user of the project, and I'm ready+willing to give you
> edit access to the wiki if you want to update it with your learnings,
> elaborate anything that's unclear, or add a new "John's tips" page. Just
> sign up for a wiki account, send me your accountId, and I'll grant you edit
> access.
> (I'll let others answer your specific questions)
>
> On Wed, Aug 19, 2015 at 6:28 AM, John Omernik  wrote:
>
> > Today, I will be playing the role of the fool/jester trying to get Myriad
> > running. Basically, since getting Myriad running with Santosh quite a
> while
> > ago, and now trying again with new versions of Hadoop, MapR, and Myriad,
> I
> > wanted to hit up the wiki (
> > https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Home) and
> > outline
> > points that as a non-dev living the code, are unclear to someone trying
> to
> > utilize myriad or understand it's operation.
> >
> > Obviously, some of my points can be answered with "look here in the code"
> > or look at this page, but I will try to outline my thought processes as I
> > reviewed the current docs.  Sometimes the way I approached the problem
> led
> > me down a path of to a certain page, missing the answer in a different
> > page, and thus some cross linking could be helpful.
> >
> > Please do not let my points be taken as anything other than a desire to
> > improve how accessible Myriad is to the community, this is not a critique
> > of the hard work everyone has done on the project.  I also understand
> that
> > given the work load and other issues, that fixing these issues in
> > documentation may not be a priority.  I am listing them out here, so that
> > those folks who are SMEs on various points may be able to quickly add
> stuff
> > and we'll organize it later.
> >
> >
> > *Remote Distribution: *
> >
> >
> https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Remote+Distribution
> >
> > This whole section could use some work from a standpoint of what runs
> where
> > and where that component gets its files.  For example, I think it would
> > help people to understand that the whole tarball created in step 6 has
> all
> > the files for node managers and resource managers.  Basically, everything
> > runs from there. Here is a small example I am currently working with:
> >
> >
> > Starting Myriad:
> > Option 1: Use Marathon (provide example json, here is mine)
> > {
> > "cmd": "env && export
> >
> >
> YARN_RESOURCEMANAGER_OPTS=-Dyarn.resourcemanager.hostname=myriad.marathon.mesos
> > && hadoop-2.7.0/bin/yarn resourcemanager",
> > "uris": ["maprfs:///mesos/myriad/hadoop-2.7.0.tar.gz"],
> > "cpus": 1.0,
> > "mem": 1024,
> > "id": "myriad",
> > "instances": 1,
> > "user": "mapr"
> > }
> >
> > In this case, Marathon grabs the hadoop tarball and pulls it down, this
> > tarball also has the Myriad yml file. When it executes the resource
> > manager, it is brought up in Myriad and ready to run node managers by
> > pulling the tarball to the slave nodes and executing the nodemanager.  (I
> > would imagine the work with history server etc would also use this
> > tarball?).
> >
> > From here it will us NMInstances to launch a node manager.  (Note, this
> is
> > different from when I originally set things up... before, I could run the
> > resource manager/myriad without a nodemanager, now it seems it's required
> > based on the config in the src... could we expound on this in the docs
> > somewhere?)
> >
> >
> > Option 2:  (Are there other ways to launch the resource manager?)
> >
> > Step 6: So something that is unclear to me is  the handling of the
> > hadoop/yarn config files.  In Step 6 on this page, there is "sudo rm
> > hadoop-
> > 2.5.0/etc/hadoop/*.xml&quo

Node manager issues now

2015-08-19 Thread John Omernik

So thanks everyone on the NMInstances bug.   I am not getting a different
issue with Myriad in that I have a permissions error with the remote tar
ball distribution.

In my old setup (Hadoop 2.5.0, MapR 4.1, some preincubator version of
Myriad)

I would run with the config having

nodemanager:

  jvmMaxMemoryMB: 1024  # Xmx for NM JVM process.

  user: mapr  # The user to run NM process as.

  cpus: 0.2 # CPU needed by NM process.

  cgroups: false# Whether NM should support CGroups. If set to
'true', myriad automatically

# configures yarn-site.xml to attach YARN's cgroups
under Me


So user: mapr.  Now, I realized that this no longer works in the verion I
just cloned, the error message was clear to me that this was no longer an
acceptable item.


frameworkUser: mapr # Should be the same user running the resource manager.

frameworkSuperUser: darkness # Must be root or have passwordless sudo on
all nodes!


So these are the settings I use, also, I run marathon with  "user": "mapr"
 (the resource manager).


So I see three different places to set users.  darkness is a Superuser with
passwordless Sudo as requested. mapr is my cluster user, and mapr worked
before, and I run the resource manager as that user in marathon.  Myriad
spins up fine, but then when it tries to kick off a nodemanager, I get the
error below.  Note, user 700 is the mapr user.


Any thoughts on who I should run this as would be appreciated!


STARTUP_MSG:   build = g...@github.com:mapr/private-hadoop-common.git
-r 5264b1d5c5c2a849ee0eb09cfcbbed19fb0bfb53; compiled by 'root' on
2015-07-02T23:46Z
STARTUP_MSG:   java = 1.8.0_45-internal
/
15/08/19 07:01:09 INFO nodemanager.NodeManager: registered UNIX signal
handlers for [TERM, HUP, INT]
15/08/19 07:01:10 WARN nodemanager.LinuxContainerExecutor: Exit code
from container executor initialization is : 24
ExitCodeException exitCode=24: File
/tmp/mesos/slaves/20150818-152209-1677764800-5050-22280-S2/frameworks/20150818-152209-1677764800-5050-22280-/executors/myriad_executor20150818-152209-1677764800-5050-22280-S2/runs/1d06f1f1-02a9-4413-80de-7393e9e0935e
must be owned by root, but is owned by 700

at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
15/08/19 07:01:10 INFO nodemanager.ContainerExecutor:
15/08/19 07:01:10 INFO service.AbstractService: Service NodeManager
failed in state INITED; cause:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
Caused by: java.io.IOException: Linux container executor not
configured properly (error=24)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
... 3 more
Caused by: ExitCodeException exitCode=24: File
/tmp/mesos/slaves/20150818-152209-1677764800-5050-22280-S2/frameworks/20150818-152209-1677764800-5050-22280-/executors/myriad_executor20150818-152209-1677764800-5050-22280-S2/runs/1d06f1f1-02a9-4413-80de-7393e9e0935e
must be owned by root, but is owned by 700

at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
... 4 more
15/08/19 07:01:10 WARN service.AbstractService: When stopping the
service NodeManager : java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:162)
at 
org.apache.hadoop.yarn.server.n

Documentation Comments

2015-08-19 Thread John Omernik

bove).

Frameworks and usernames.   I think the users that the framework runs as,
the actual node and resource managers, etc is confusing to a user (I am
very confused!)  When I first got Myriad up I set my user under the
executor to be mapr, and then it appeared to work with impersonation from
queries etc.  Now, I am trying the remote distribution and I have users set
in the config, potentially a user in my marathon json, and I am getting
errors on permissions of files when a node manager tries to start (a
separate issue I will post later). Basically, this is complex, and a page
describing out what needs to run where with which permissions and how that
interacts will be huge for people looking to put this into play.

*Example Yarn Site:*
https://cwiki.apache.org/confluence/display/MYRIAD/Example%3A+yarn-site.xml

This is helpful, but where does it go?  Remember, the remote distribution
had us delete the yarn-site in the hadoop etc folder.

*Myriad Webapp *
 https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Webapp

This should be fleshed out a bit more.  Also, it's in the
/myriad-scheduler/src/main/resources/webapp based on my git clone, but in
the wiki that's not listed.  I had to dig for it.

Some questions here: could the webapp  be built during the myriad building
process? Could it be then be packaged as tarball for execution either
manually via marathon or automatically in a container on mesos?  I
understand this is a fresh piece of the puzzle, I am just thinking about
and verbalizing the "where" on this for the future



Those are the items that come to mind thus far.  I hope the tone of my
email is correct, this is a great project, and I want others to try it as I
have.

John Omernik

Re: Odd Errors

2015-08-18 Thread John Omernik

This was 100% the fix, it's running now, thanks everyone. This is all
pretty awesome the way it's working. Thanks again!

On Tue, Aug 18, 2015 at 5:31 PM, Santosh Marella 
wrote:

> Okay. Reported the problem in MYRIAD-126.
>
> Santosh
>
> On Tue, Aug 18, 2015 at 3:27 PM, John Omernik  wrote:
>
> > That's the issue (the yml) I'll merge tomorrow and try it. Thanks Santosh
> > On Aug 18, 2015 5:19 PM, "Santosh Marella" 
> wrote:
> >
> > > I think your myriad-config-default.yml is missing a "nmInstances"
> > section.
> > > Are you running it with a older .yml file?
> > >
> > > Santosh
> > >
> > > On Tue, Aug 18, 2015 at 3:06 PM, John Omernik 
> wrote:
> > >
> > > > I am working to stumble through this as Santosh helped me get a pre
> > > > incubator version of Myriad running, and now I upgraded a bunch of
> > stuff
> > > > and wanted to try some of the more recent features. I setup the
> remote
> > > > distribution, created what I think would be a good a json for
> marathon
> > > and
> > > > then I am getting the dreaded Null Pointer Exception without much
> > help...
> > > >
> > > > Based on the logs, it appears to be pulling my URI down with the
> proper
> > > > pathing and trying to execute the resource manager from the tar ball,
> > > > perhaps this will get me kicked off the dev list but my dev foo is
> > weak,
> > > > thus I am not sure how to troubleshoot this. :) Any help would be
> > > > appreciated.
> > > >
> > > >
> > > > 15/08/18 17:00:28 INFO mortbay.log: Started
> > > > SelectChannelConnector@0.0.0.0:8192
> > > > 15/08/18 17:00:28 INFO myriad.Main: Initializing HealthChecks
> > > > 15/08/18 17:00:28 INFO myriad.Main: Initializing Profiles
> > > > 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
> tiny
> > > > with CPU: 1 and Memory: 4096
> > > > 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
> > > > small with CPU: 2 and Memory: 8192
> > > > 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
> > > > medium with CPU: 4 and Memory: 16384
> > > > 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
> > > > large with CPU: 8 and Memory: 32768
> > > > 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
> huge
> > > > with CPU: 12 and Memory: 49152
> > > > 15/08/18 17:00:28 INFO myriad.Main: Validating nmInstances..
> > > > 15/08/18 17:00:28 INFO service.AbstractService: Service
> > > >
> > >
> >
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
> > > > failed in state INITED; cause: java.lang.RuntimeException: Failed to
> > > > initialize myriad
> > > > java.lang.RuntimeException: Failed to initialize myriad
> > > > at
> > > >
> > >
> >
> com.ebay.myriad.scheduler.yarn.interceptor.MyriadInitializationInterceptor.init(MyriadInitializationInterceptor.java:35)
> > > > at
> > > >
> > >
> >
> com.ebay.myriad.scheduler.yarn.interceptor.CompositeInterceptor.init(CompositeInterceptor.java:76)
> > > > at
> > > >
> > >
> >
> com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.serviceInit(MyriadFairScheduler.java:50)
> > > > at
> > > >
> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:570)
> > > > at
> > > >
> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:997)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:262)
> > > > at
> > > >
> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>

Re: Odd Errors

2015-08-18 Thread John Omernik

That's the issue (the yml) I'll merge tomorrow and try it. Thanks Santosh
On Aug 18, 2015 5:19 PM, "Santosh Marella"  wrote:

> I think your myriad-config-default.yml is missing a "nmInstances" section.
> Are you running it with a older .yml file?
>
> Santosh
>
> On Tue, Aug 18, 2015 at 3:06 PM, John Omernik  wrote:
>
> > I am working to stumble through this as Santosh helped me get a pre
> > incubator version of Myriad running, and now I upgraded a bunch of stuff
> > and wanted to try some of the more recent features. I setup the remote
> > distribution, created what I think would be a good a json for marathon
> and
> > then I am getting the dreaded Null Pointer Exception without much help...
> >
> > Based on the logs, it appears to be pulling my URI down with the proper
> > pathing and trying to execute the resource manager from the tar ball,
> > perhaps this will get me kicked off the dev list but my dev foo is weak,
> > thus I am not sure how to troubleshoot this. :) Any help would be
> > appreciated.
> >
> >
> > 15/08/18 17:00:28 INFO mortbay.log: Started
> > SelectChannelConnector@0.0.0.0:8192
> > 15/08/18 17:00:28 INFO myriad.Main: Initializing HealthChecks
> > 15/08/18 17:00:28 INFO myriad.Main: Initializing Profiles
> > 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile tiny
> > with CPU: 1 and Memory: 4096
> > 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
> > small with CPU: 2 and Memory: 8192
> > 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
> > medium with CPU: 4 and Memory: 16384
> > 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
> > large with CPU: 8 and Memory: 32768
> > 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile huge
> > with CPU: 12 and Memory: 49152
> > 15/08/18 17:00:28 INFO myriad.Main: Validating nmInstances..
> > 15/08/18 17:00:28 INFO service.AbstractService: Service
> >
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
> > failed in state INITED; cause: java.lang.RuntimeException: Failed to
> > initialize myriad
> > java.lang.RuntimeException: Failed to initialize myriad
> > at
> >
> com.ebay.myriad.scheduler.yarn.interceptor.MyriadInitializationInterceptor.init(MyriadInitializationInterceptor.java:35)
> > at
> >
> com.ebay.myriad.scheduler.yarn.interceptor.CompositeInterceptor.init(CompositeInterceptor.java:76)
> > at
> >
> com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.serviceInit(MyriadFairScheduler.java:50)
> > at
> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > at
> >
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> > at
> >
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:570)
> > at
> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > at
> >
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:997)
> > at
> >
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:262)
> > at
> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > at
> >
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1220)
> > Caused by: java.lang.NullPointerException
> > at com.ebay.myriad.Main.validateNMInstances(Main.java:166)
> > at com.ebay.myriad.Main.run(Main.java:98)
> > at com.ebay.myriad.Main.initialize(Main.java:80)
> > at
> >
> com.ebay.myriad.scheduler.yarn.interceptor.MyriadInitializationInterceptor.init(MyriadInitializationInterceptor.java:32)
> > ... 10 more
> > 15/08/18 17:00:28 INFO service.AbstractService: Service
> > RMActiveServices failed in state INITED; cause:
> > java.lang.RuntimeException: Failed to initialize myriad
> > java.lang.RuntimeException: Failed to initialize myriad
> > at
> >
> com.ebay.myriad.scheduler.yarn.interceptor.MyriadInitializationInterceptor.init(MyriadInitializationInterceptor.java:35)
> > at
> >
> com.ebay.myriad.scheduler.yarn.interceptor.CompositeInterceptor.init(CompositeInterceptor.java:76)
> > at
> >
> com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.serviceInit(MyriadFairScheduler.java:50)
> > at
> > org.

Odd Errors

2015-08-18 Thread John Omernik

I am working to stumble through this as Santosh helped me get a pre
incubator version of Myriad running, and now I upgraded a bunch of stuff
and wanted to try some of the more recent features. I setup the remote
distribution, created what I think would be a good a json for marathon and
then I am getting the dreaded Null Pointer Exception without much help...

Based on the logs, it appears to be pulling my URI down with the proper
pathing and trying to execute the resource manager from the tar ball,
perhaps this will get me kicked off the dev list but my dev foo is weak,
thus I am not sure how to troubleshoot this. :) Any help would be
appreciated.


15/08/18 17:00:28 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:8192
15/08/18 17:00:28 INFO myriad.Main: Initializing HealthChecks
15/08/18 17:00:28 INFO myriad.Main: Initializing Profiles
15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile tiny
with CPU: 1 and Memory: 4096
15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
small with CPU: 2 and Memory: 8192
15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
medium with CPU: 4 and Memory: 16384
15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
large with CPU: 8 and Memory: 32768
15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile huge
with CPU: 12 and Memory: 49152
15/08/18 17:00:28 INFO myriad.Main: Validating nmInstances..
15/08/18 17:00:28 INFO service.AbstractService: Service
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
failed in state INITED; cause: java.lang.RuntimeException: Failed to
initialize myriad
java.lang.RuntimeException: Failed to initialize myriad
at 
com.ebay.myriad.scheduler.yarn.interceptor.MyriadInitializationInterceptor.init(MyriadInitializationInterceptor.java:35)
at 
com.ebay.myriad.scheduler.yarn.interceptor.CompositeInterceptor.init(CompositeInterceptor.java:76)
at 
com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.serviceInit(MyriadFairScheduler.java:50)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:570)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:997)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:262)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1220)
Caused by: java.lang.NullPointerException
at com.ebay.myriad.Main.validateNMInstances(Main.java:166)
at com.ebay.myriad.Main.run(Main.java:98)
at com.ebay.myriad.Main.initialize(Main.java:80)
at 
com.ebay.myriad.scheduler.yarn.interceptor.MyriadInitializationInterceptor.init(MyriadInitializationInterceptor.java:32)
... 10 more
15/08/18 17:00:28 INFO service.AbstractService: Service
RMActiveServices failed in state INITED; cause:
java.lang.RuntimeException: Failed to initialize myriad
java.lang.RuntimeException: Failed to initialize myriad
at 
com.ebay.myriad.scheduler.yarn.interceptor.MyriadInitializationInterceptor.init(MyriadInitializationInterceptor.java:35)
at 
com.ebay.myriad.scheduler.yarn.interceptor.CompositeInterceptor.init(CompositeInterceptor.java:76)
at 
com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.serviceInit(MyriadFairScheduler.java:50)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:570)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:997)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:262)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1220)
Caused by: java.lang.NullPointerException
at com.ebay.myriad.Main.validateNMInstances(Main.java:166)
at com.ebay.myriad.Main.run(Main.java:98)
at com.ebay.myriad.Main.initialize(Main.java:80)
at 
com.ebay.myriad.scheduler.yarn.interceptor.MyriadInitializationInterceptor.init(MyriadInitializationInterceptor.java:32)
... 10 more

Re: Myriad 0.1 release scope

2015-08-18 Thread John Omernik

Ok, so I tried the remote distribution of the Myriad per the docs, I
guess,it could probably use some information related to "how" to run
resource manager if it's in the tar.gz.  Perhaps an example marathon json.
I am playing with it now to figure it out.

On Tue, Aug 18, 2015 at 3:48 PM, yuliya Feldman  wrote:

> mesos/myriad is the right one so far
>   From: John Omernik 
>  To: dev@myriad.incubator.apache.org; yuliya Feldman 
>  Sent: Tuesday, August 18, 2015 1:44 PM
>  Subject: Re: Myriad 0.1 release scope
>
> (So if I clone that repo, am I cloning the right one?)
>
>
>
> On Tue, Aug 18, 2015 at 3:43 PM, John Omernik  wrote:
>
> > Ok, I was going off
> > https://github.com/mesos/myriad/blob/phase1/docs/myriad-configuration.md
> >
> > I will try it.
> >
> > John
> >
> > On Tue, Aug 18, 2015 at 3:40 PM, yuliya Feldman <
> > yufeld...@yahoo.com.invalid> wrote:
> >
> >> You actually do not need to rebuild even today - just keep this file in
> >> hadoop config directory that is on the classpath: like .../etc/hadoop
> >>  From: John Omernik 
> >>  To: dev@myriad.incubator.apache.org
> >>  Sent: Tuesday, August 18, 2015 1:35 PM
> >>  Subject: Re: Myriad 0.1 release scope
> >>
> >> On the release scope, will having the myriad configuration file exist
> >> outside the jar (i.e. you can change configuration without rebuilding)
> be
> >> part of the .1 release scope?
> >>
> >>
> >>
> >> On Mon, Aug 10, 2015 at 10:01 PM, Santosh Marella <
> smare...@maprtech.com>
> >> wrote:
> >>
> >> > Hello All,
> >> >
> >> >  I've merged the FGS changes into phase1. Built and tested both coarse
> >> > grained scaling and fine grained scaling, UI on a 4 node cluster.
> >> >
> >> >  If anyone finds things are not working as expected, please let me
> know.
> >> >
> >> > Thanks,
> >> > Santosh
> >> >
> >> > On Fri, Aug 7, 2015 at 10:46 AM, Santosh Marella <
> smare...@maprtech.com
> >> >
> >> > wrote:
> >> >
> >> > > Hello guys,
> >> > >
> >> > > I propose merging FGS into phase1. As I said before, I think it's
> at a
> >> > > point where the functionality works reasonably well.
> >> > > Any future improvements/fixes/UI changes can be done via separate
> >> JIRAs.
> >> > >
> >> > > Unless there are any major concerns, I'd like to merge FGS into
> phase1
> >> > > *EOD Monday* (PDT).
> >> > >
> >> > > Thanks,
> >> > > Santosh
> >> > >
> >> > > On Wed, Aug 5, 2015 at 8:16 PM, Santosh Marella <
> >> smare...@maprtech.com>
> >> > > wrote:
> >> > >
> >> > >> I feel FGS is very close to making it into 0.1. PR 116 addresses
> >> moving
> >> > >> to hadoop 2.7 and making FGS and CGS coexist. This PR was recently
> >> > reviewed
> >> > >> by Yulia and Darin. Darin had also tried out FGS on hadoop 2.6.x
> and
> >> > 2.7.x
> >> > >> clusters and it seemed to have worked as expected. Unless there are
> >> more
> >> > >> reviews/feedback, it can be merged into issue_14. Once PR 116 is
> >> merged
> >> > >> into issue_14, issue_14 can be merged into phase1.
> >> > >>
> >> > >> Thanks,
> >> > >> Santosh
> >> > >>
> >> > >> On Tue, Aug 4, 2015 at 4:54 PM, Adam Bordelon 
> >> > wrote:
> >> > >>
> >> > >>> We do have a JIRA 0.1.0 "fix version" field, but none of our
> issues
> >> use
> >> > >>> it
> >> > >>> yet.
> >> > >>> I think the goal was just to take what we have and make it work
> >> under
> >> > >>> Apache infrastructure, then vote on that for 0.1.0.
> >> > >>> Although other features like HA or FGS would be great, let's try
> to
> >> get
> >> > >>> our
> >> > >>> first Apache release out ASAP.
> >> > >>> We can create 0.1.1 or 0.2.0 fix versions for subsequent releases
> >> with
> >> > >>> other issues/features. Roadmap would be great.
> >> > >>> (I'm just summarizing what we discussed a month or two ago. Feel
> >> free
> >> > to
> >> > >>> correct me or disagree with this approach.)
> >> > >>>
> >> > >>> On Tue, Aug 4, 2015 at 4:44 PM, Swapnil Daingade <
> >> > >>> swapnil.daing...@gmail.com
> >> > >>> > wrote:
> >> > >>>
> >> > >>> > Hi all,
> >> > >>> >
> >> > >>> > Was wondering what would be the scope for the Myriad 0.1
> release.
> >> > >>> > It would be nice to have a roadmap page somewhere and target
> >> > >>> > features to releases (JIRA 'fix version' field perhaps)
> >> > >>> >
> >> > >>> > Regards
> >> > >>> > Swapnil
> >> > >>> >
> >> > >>>
> >> > >>
> >> > >>
> >> > >
> >> >
> >>
> >>
> >>
> >>
> >
> >
>
>
>
>

Re: Myriad 0.1 release scope

2015-08-18 Thread John Omernik

(So if I clone that repo, am I cloning the right one?)

On Tue, Aug 18, 2015 at 3:43 PM, John Omernik  wrote:

> Ok, I was going off
> https://github.com/mesos/myriad/blob/phase1/docs/myriad-configuration.md
>
> I will try it.
>
> John
>
> On Tue, Aug 18, 2015 at 3:40 PM, yuliya Feldman <
> yufeld...@yahoo.com.invalid> wrote:
>
>> You actually do not need to rebuild even today - just keep this file in
>> hadoop config directory that is on the classpath: like .../etc/hadoop
>>   From: John Omernik 
>>  To: dev@myriad.incubator.apache.org
>>  Sent: Tuesday, August 18, 2015 1:35 PM
>>  Subject: Re: Myriad 0.1 release scope
>>
>> On the release scope, will having the myriad configuration file exist
>> outside the jar (i.e. you can change configuration without rebuilding) be
>> part of the .1 release scope?
>>
>>
>>
>> On Mon, Aug 10, 2015 at 10:01 PM, Santosh Marella 
>> wrote:
>>
>> > Hello All,
>> >
>> >  I've merged the FGS changes into phase1. Built and tested both coarse
>> > grained scaling and fine grained scaling, UI on a 4 node cluster.
>> >
>> >  If anyone finds things are not working as expected, please let me know.
>> >
>> > Thanks,
>> > Santosh
>> >
>> > On Fri, Aug 7, 2015 at 10:46 AM, Santosh Marella > >
>> > wrote:
>> >
>> > > Hello guys,
>> > >
>> > > I propose merging FGS into phase1. As I said before, I think it's at a
>> > > point where the functionality works reasonably well.
>> > > Any future improvements/fixes/UI changes can be done via separate
>> JIRAs.
>> > >
>> > > Unless there are any major concerns, I'd like to merge FGS into phase1
>> > > *EOD Monday* (PDT).
>> > >
>> > > Thanks,
>> > > Santosh
>> > >
>> > > On Wed, Aug 5, 2015 at 8:16 PM, Santosh Marella <
>> smare...@maprtech.com>
>> > > wrote:
>> > >
>> > >> I feel FGS is very close to making it into 0.1. PR 116 addresses
>> moving
>> > >> to hadoop 2.7 and making FGS and CGS coexist. This PR was recently
>> > reviewed
>> > >> by Yulia and Darin. Darin had also tried out FGS on hadoop 2.6.x and
>> > 2.7.x
>> > >> clusters and it seemed to have worked as expected. Unless there are
>> more
>> > >> reviews/feedback, it can be merged into issue_14. Once PR 116 is
>> merged
>> > >> into issue_14, issue_14 can be merged into phase1.
>> > >>
>> > >> Thanks,
>> > >> Santosh
>> > >>
>> > >> On Tue, Aug 4, 2015 at 4:54 PM, Adam Bordelon 
>> > wrote:
>> > >>
>> > >>> We do have a JIRA 0.1.0 "fix version" field, but none of our issues
>> use
>> > >>> it
>> > >>> yet.
>> > >>> I think the goal was just to take what we have and make it work
>> under
>> > >>> Apache infrastructure, then vote on that for 0.1.0.
>> > >>> Although other features like HA or FGS would be great, let's try to
>> get
>> > >>> our
>> > >>> first Apache release out ASAP.
>> > >>> We can create 0.1.1 or 0.2.0 fix versions for subsequent releases
>> with
>> > >>> other issues/features. Roadmap would be great.
>> > >>> (I'm just summarizing what we discussed a month or two ago. Feel
>> free
>> > to
>> > >>> correct me or disagree with this approach.)
>> > >>>
>> > >>> On Tue, Aug 4, 2015 at 4:44 PM, Swapnil Daingade <
>> > >>> swapnil.daing...@gmail.com
>> > >>> > wrote:
>> > >>>
>> > >>> > Hi all,
>> > >>> >
>> > >>> > Was wondering what would be the scope for the Myriad 0.1 release.
>> > >>> > It would be nice to have a roadmap page somewhere and target
>> > >>> > features to releases (JIRA 'fix version' field perhaps)
>> > >>> >
>> > >>> > Regards
>> > >>> > Swapnil
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>>
>>
>>
>
>

Re: Myriad 0.1 release scope

2015-08-18 Thread John Omernik

Ok, I was going off
https://github.com/mesos/myriad/blob/phase1/docs/myriad-configuration.md

I will try it.

John

On Tue, Aug 18, 2015 at 3:40 PM, yuliya Feldman  wrote:

> You actually do not need to rebuild even today - just keep this file in
> hadoop config directory that is on the classpath: like .../etc/hadoop
>   From: John Omernik 
>  To: dev@myriad.incubator.apache.org
>  Sent: Tuesday, August 18, 2015 1:35 PM
>  Subject: Re: Myriad 0.1 release scope
>
> On the release scope, will having the myriad configuration file exist
> outside the jar (i.e. you can change configuration without rebuilding) be
> part of the .1 release scope?
>
>
>
> On Mon, Aug 10, 2015 at 10:01 PM, Santosh Marella 
> wrote:
>
> > Hello All,
> >
> >  I've merged the FGS changes into phase1. Built and tested both coarse
> > grained scaling and fine grained scaling, UI on a 4 node cluster.
> >
> >  If anyone finds things are not working as expected, please let me know.
> >
> > Thanks,
> > Santosh
> >
> > On Fri, Aug 7, 2015 at 10:46 AM, Santosh Marella 
> > wrote:
> >
> > > Hello guys,
> > >
> > > I propose merging FGS into phase1. As I said before, I think it's at a
> > > point where the functionality works reasonably well.
> > > Any future improvements/fixes/UI changes can be done via separate
> JIRAs.
> > >
> > > Unless there are any major concerns, I'd like to merge FGS into phase1
> > > *EOD Monday* (PDT).
> > >
> > > Thanks,
> > > Santosh
> > >
> > > On Wed, Aug 5, 2015 at 8:16 PM, Santosh Marella  >
> > > wrote:
> > >
> > >> I feel FGS is very close to making it into 0.1. PR 116 addresses
> moving
> > >> to hadoop 2.7 and making FGS and CGS coexist. This PR was recently
> > reviewed
> > >> by Yulia and Darin. Darin had also tried out FGS on hadoop 2.6.x and
> > 2.7.x
> > >> clusters and it seemed to have worked as expected. Unless there are
> more
> > >> reviews/feedback, it can be merged into issue_14. Once PR 116 is
> merged
> > >> into issue_14, issue_14 can be merged into phase1.
> > >>
> > >> Thanks,
> > >> Santosh
> > >>
> > >> On Tue, Aug 4, 2015 at 4:54 PM, Adam Bordelon 
> > wrote:
> > >>
> > >>> We do have a JIRA 0.1.0 "fix version" field, but none of our issues
> use
> > >>> it
> > >>> yet.
> > >>> I think the goal was just to take what we have and make it work under
> > >>> Apache infrastructure, then vote on that for 0.1.0.
> > >>> Although other features like HA or FGS would be great, let's try to
> get
> > >>> our
> > >>> first Apache release out ASAP.
> > >>> We can create 0.1.1 or 0.2.0 fix versions for subsequent releases
> with
> > >>> other issues/features. Roadmap would be great.
> > >>> (I'm just summarizing what we discussed a month or two ago. Feel free
> > to
> > >>> correct me or disagree with this approach.)
> > >>>
> > >>> On Tue, Aug 4, 2015 at 4:44 PM, Swapnil Daingade <
> > >>> swapnil.daing...@gmail.com
> > >>> > wrote:
> > >>>
> > >>> > Hi all,
> > >>> >
> > >>> > Was wondering what would be the scope for the Myriad 0.1 release.
> > >>> > It would be nice to have a roadmap page somewhere and target
> > >>> > features to releases (JIRA 'fix version' field perhaps)
> > >>> >
> > >>> > Regards
> > >>> > Swapnil
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>
>
>
>

Re: Myriad 0.1 release scope

2015-08-18 Thread John Omernik

On the release scope, will having the myriad configuration file exist
outside the jar (i.e. you can change configuration without rebuilding) be
part of the .1 release scope?

On Mon, Aug 10, 2015 at 10:01 PM, Santosh Marella 
wrote:

> Hello All,
>
>   I've merged the FGS changes into phase1. Built and tested both coarse
> grained scaling and fine grained scaling, UI on a 4 node cluster.
>
>   If anyone finds things are not working as expected, please let me know.
>
> Thanks,
> Santosh
>
> On Fri, Aug 7, 2015 at 10:46 AM, Santosh Marella 
> wrote:
>
> > Hello guys,
> >
> > I propose merging FGS into phase1. As I said before, I think it's at a
> > point where the functionality works reasonably well.
> > Any future improvements/fixes/UI changes can be done via separate JIRAs.
> >
> > Unless there are any major concerns, I'd like to merge FGS into phase1
> > *EOD Monday* (PDT).
> >
> > Thanks,
> > Santosh
> >
> > On Wed, Aug 5, 2015 at 8:16 PM, Santosh Marella 
> > wrote:
> >
> >> I feel FGS is very close to making it into 0.1. PR 116 addresses moving
> >> to hadoop 2.7 and making FGS and CGS coexist. This PR was recently
> reviewed
> >> by Yulia and Darin. Darin had also tried out FGS on hadoop 2.6.x and
> 2.7.x
> >> clusters and it seemed to have worked as expected. Unless there are more
> >> reviews/feedback, it can be merged into issue_14. Once PR 116 is merged
> >> into issue_14, issue_14 can be merged into phase1.
> >>
> >> Thanks,
> >> Santosh
> >>
> >> On Tue, Aug 4, 2015 at 4:54 PM, Adam Bordelon 
> wrote:
> >>
> >>> We do have a JIRA 0.1.0 "fix version" field, but none of our issues use
> >>> it
> >>> yet.
> >>> I think the goal was just to take what we have and make it work under
> >>> Apache infrastructure, then vote on that for 0.1.0.
> >>> Although other features like HA or FGS would be great, let's try to get
> >>> our
> >>> first Apache release out ASAP.
> >>> We can create 0.1.1 or 0.2.0 fix versions for subsequent releases with
> >>> other issues/features. Roadmap would be great.
> >>> (I'm just summarizing what we discussed a month or two ago. Feel free
> to
> >>> correct me or disagree with this approach.)
> >>>
> >>> On Tue, Aug 4, 2015 at 4:44 PM, Swapnil Daingade <
> >>> swapnil.daing...@gmail.com
> >>> > wrote:
> >>>
> >>> > Hi all,
> >>> >
> >>> > Was wondering what would be the scope for the Myriad 0.1 release.
> >>> > It would be nice to have a roadmap page somewhere and target
> >>> > features to releases (JIRA 'fix version' field perhaps)
> >>> >
> >>> > Regards
> >>> > Swapnil
> >>> >
> >>>
> >>
> >>
> >
>

Re: Establish versions to target for first incubator release?

2015-07-14 Thread John Omernik

Would it be made so older versions of Yarn wouldn't work with the incubator
release, or just allow a way to run on older versions, but gracefully not
allow FGS on older versions of Yarn?

On Tuesday, July 14, 2015, yuliya Feldman 
wrote:

> +1 on 2.7
>   From: Jim Klucar >
>  To: dev@myriad.incubator.apache.org 
>  Sent: Tuesday, July 14, 2015 5:36 PM
>  Subject: Establish versions to target for first incubator release?
>
> The FGS discussion made me wonder if we've put a line in the sand about
> what versions of YARN and Mesos we're going to target for the first Myriad
> incubator release. Might be nice to start getting some kind of mileage on
> specific versions. Perhaps 0.23 and 2.7? Is this premature?
>
>
>

-- 
Sent from my iThing

Re: Logo!

2015-05-25 Thread John Omernik

What are the rules in "alluding" to other project logos? Permission? How do
other projects feel about that?  Showing the Mesos triangles, and he having
a few elephants, perhaps in color distributed around, and he having other
triangles, perhaps with allusions to other logos like storm, spark, docker,
chronos, marathon, aurora, others?  To show what we are focusing on the
other logos or allusions to logos could be in grey while the elephants
could be colored (in a traditional blue?).   I wouldn't like showing the
elephant as "bigger" just highlighted as that's what we are doing.

Just a thought.

On Sunday, May 24, 2015, Jim Klucar  wrote:

> I like how Flink used the Apache feather colors in their squirrel logo.
> https://flink.apache.org/
>
> On Sat, May 23, 2015 at 11:08 AM, Ken Sipe  > wrote:
>
> > I like that... How about a couple of elephants sitting on top of a mesos
> > platform.
> >
> > Sent from my iPhone
> >
> > > On May 22, 2015, at 9:19 PM, Brandon Gulla  >
> > wrote:
> > >
> > > I too like the triangle idea.
> > >
> > > Another idea: the hadoop elephant wearing a cape displaying the Mesos
> > logo.
> > > Curled in its trunk is a ball of yarn dangling.
> > >> On May 22, 2015 6:19 PM, "Darin Johnson"  >
> > wrote:
> > >>
> > >> I like the idea of several pixelated(with mesos M's) possibly
> > interesting.
> > >> On May 22, 2015 7:07 PM, "yuliya Feldman"  >
> > >> wrote:
> > >>
> > >>> Yeah - sorry - that is what I meant elephant per triangle, but to be
> > >>> modest - not in all triangles, since people run other stuff - besides
> > >> hadoop
> > >>>  From: John Omernik >
> > >>> To: dev@myriad.incubator.apache.org ; yuliya Feldman <
> > >> yufeld...@yahoo.com >
> > >>> Sent: Friday, May 22, 2015 3:42 PM
> > >>> Subject: Re: Logo!
> > >>>
> > >>> Ooo, elephantS  :) (you can have a herd of elephants now!)
> > >>>
> > >>> :)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Fri, May 22, 2015 at 4:59 PM, yuliya Feldman
> > >>>  > >>>> wrote:
> > >>>
> > >>>> How about small elephant in some of the little "mesos" triangles.
> > >>>> From: Will Ochandarena  >
> > >>>> To: dev@myriad.incubator.apache.org 
> > >>>> Cc: Nitin Bandugula >
> > >>>> Sent: Friday, May 22, 2015 2:15 PM
> > >>>> Subject: Logo!
> > >>>>
> > >>>> Dev team - now that the name has been locked down it's time we
> started
> > >>>> building our identity by getting a logo.
> > >>>>
> > >>>> The MapR marketing team has agreed to fund and arrange for a logo,
> but
> > >> in
> > >>>> true Apache style we want to make this a community process.  We will
> > >> soon
> > >>>> reach out to an agency who will crowd-source the design, allowing
> > >>> everyone
> > >>>> to vote on your favorite.  As a first step we need to come up with a
> > >>>> general idea of what the logo should be - a creative direction if
> you
> > >>>> will.  Below are a couple of ideas, please chime in with your own.
> > >>>>
> > >>>> *Please reply by 5/27.*
> > >>>>
> > >>>> 1. Represent joining of two worlds/communities
> > >>>> -- May incorporate elements of Hadoop
> > >>>> <http://hadoop.apache.org/images/hadoop-logo.jpg> and Mesos
> > >>>> <
> https://tctechcrunch2011.files.wordpress.com/2013/09/mesos_logo.png>
> > >>>> logos
> > >>>>
> > >>>> 2. Expand on the word Myriad (countless or great number) - maybe
> write
> > >>> out
> > >>>> myriad with a *myriad* of tiny dots
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>
> >
>


-- 
Sent from my iThing

Re: Logo!

2015-05-22 Thread John Omernik

Ooo, elephantS  :) (you can have a herd of elephants now!)

:)


On Fri, May 22, 2015 at 4:59 PM, yuliya Feldman  wrote:

> How about small elephant in some of the little "mesos" triangles.
>   From: Will Ochandarena 
>  To: dev@myriad.incubator.apache.org
> Cc: Nitin Bandugula 
>  Sent: Friday, May 22, 2015 2:15 PM
>  Subject: Logo!
>
> Dev team - now that the name has been locked down it's time we started
> building our identity by getting a logo.
>
> The MapR marketing team has agreed to fund and arrange for a logo, but in
> true Apache style we want to make this a community process.  We will soon
> reach out to an agency who will crowd-source the design, allowing everyone
> to vote on your favorite.  As a first step we need to come up with a
> general idea of what the logo should be - a creative direction if you
> will.  Below are a couple of ideas, please chime in with your own.
>
> *Please reply by 5/27.*
>
> 1. Represent joining of two worlds/communities
> -- May incorporate elements of Hadoop
>  and Mesos
> 
> logos
>
> 2. Expand on the word Myriad (countless or great number) - maybe write out
> myriad with a *myriad* of tiny dots
>
>
>
>

Re: Recommending or requiring mesos dns?

2015-05-21 Thread John Omernik

I don't want to be negative, in that in concept, the idea has merit. That
said, I am extremely concerned about performance. If there is a x%
performance hit on this, and there is another method that may take more
work, but not have the performance hit, I think we should focus on that.  I
understand that there may be "smallish" applications, that may work for
this, however, I think it's a danger of scale in that while it may work on
a small scale in dev/testing, someone who tried that approach and then
TRIES to scale may be severely disappointed.

On Wed, May 20, 2015 at 8:49 PM, Swapnil Daingade <
swapnil.daing...@gmail.com> wrote:

> Trying to send image again. This time as attachment.
>
> Regards
> Swapnil
>
>
> On Wed, May 20, 2015 at 5:43 PM, Swapnil Daingade <
> swapnil.daing...@gmail.com> wrote:
>
>> Hi John,
>>
>> Are you suggesting something like this ?
>>
>> In issue 96 we are proposing something that will not require port mapping.
>> Can you take a look and give your thoughts
>> https://github.com/mesos/myriad/issues/96
>>
>> Regards
>> Swapnil
>>
>> 
>>
>> On Fri, May 15, 2015 at 6:44 AM, John Omernik  wrote:
>>
>>> This is true. In this setup thought, we wouldn't be using the "random
>>> ports" We'd be assigning the ports that will be used by the RM (the 5)
>>> per
>>> cluster (with config changes) a head of time.  That is what the RM would
>>> know as its ports.  At this point, when marathon spins up a RM, HA proxy
>>> would take the service ports (which would be the same ports the RM
>>> "thinks"
>>> is running on) and forward them to the ports that mesos has proxied (in
>>> the
>>> available ports list). I've done this in Docker, but not on native
>>> marathon
>>> run processes. I need to look into that more.
>>>
>>> One concern I have with the HAProxy is long running TCP connections (I am
>>> not sure if this applies to Yarn/RM)  Basically on one particular use
>>> case:
>>> Running a Hive Thrift (hiveserver2) service in docker on the mesos
>>> cluster
>>> with HAProxy. I found if I submitted a query that was long, that the
>>> query
>>> would be submitted, and HAProxy would not seen connections for a while
>>> and
>>> kill the proxy to the backend. This was annoying to say the least.
>>>  Would
>>> this occur with HAProxy? I really think that if the haproxy-marathon
>>> bridge
>>> would be used we'd have to be certain that condition wouldn't occur, even
>>> hidden. (I would hate for something to happen where that condition
>>> occurs,
>>> however, Yarn is able to "reset" without error, adding a bit of latency
>>> to
>>> the process, and have that go unaddressed).
>>>
>>> So other than the HAProxy weirdness I saw, that approach could work, and
>>> then mesos-dns is just a nice component for administrators and users.
>>> What
>>> do I mean by that?
>>>
>>> Well, let's say you have a cluster of node1, node2, node3, and node4.
>>>
>>> You assign the 5 yarn ports (and service ports) for that cluster to be
>>> 15000, 15001, 15002, 15003, 15004.
>>>
>>> Myriad starts a node manager. It sets in the RM config (and all NM
>>>  configs) the ports based on the 5 above
>>>
>>> Mesos grabs 5 random ports in it's allowed range (default 3 to 31000)
>>>
>>> When Mesos starts the RM process, lets say it starts it on node2.
>>>
>>> Node2 now has ports 3,30001,30002,30003,and 30004 listening and is
>>> forwarding those to 15000,15001,15002,15003, and 15004 on the listening
>>> process.  (Note, I know this is doable with Docker contained processes,
>>> can
>>> Marathon do it outside of docker?)
>>>
>>> Now haproxy's config is updated. on EVERY node, the ports 15000-15004 are
>>> listening and are forwarding to Node2 on ports 3-30004.
>>>
>>> To your point on "needing" mesos-dns. Technically no, we don't need it.
>>> we
>>> can tell our NMs to connect to any node on ports 15000-15004. This will
>>> work. But it's we may get added latency (rack to rack forwarding etc
>>> extra
>>> hops).
>>>
>>> Instead, if we set the NMs to connect to myriad-dev-1.marathon.mesos  It
>>> could return an IP that is THE node it's running on.  That way we get the
>>&

Re: Recommending or requiring mesos dns?

2015-05-15 Thread John Omernik

This is true. In this setup thought, we wouldn't be using the "random
ports" We'd be assigning the ports that will be used by the RM (the 5) per
cluster (with config changes) a head of time.  That is what the RM would
know as its ports.  At this point, when marathon spins up a RM, HA proxy
would take the service ports (which would be the same ports the RM "thinks"
is running on) and forward them to the ports that mesos has proxied (in the
available ports list). I've done this in Docker, but not on native marathon
run processes. I need to look into that more.

One concern I have with the HAProxy is long running TCP connections (I am
not sure if this applies to Yarn/RM)  Basically on one particular use case:
Running a Hive Thrift (hiveserver2) service in docker on the mesos cluster
with HAProxy. I found if I submitted a query that was long, that the query
would be submitted, and HAProxy would not seen connections for a while and
kill the proxy to the backend. This was annoying to say the least.   Would
this occur with HAProxy? I really think that if the haproxy-marathon bridge
would be used we'd have to be certain that condition wouldn't occur, even
hidden. (I would hate for something to happen where that condition occurs,
however, Yarn is able to "reset" without error, adding a bit of latency to
the process, and have that go unaddressed).

So other than the HAProxy weirdness I saw, that approach could work, and
then mesos-dns is just a nice component for administrators and users. What
do I mean by that?

Well, let's say you have a cluster of node1, node2, node3, and node4.

You assign the 5 yarn ports (and service ports) for that cluster to be
15000, 15001, 15002, 15003, 15004.

Myriad starts a node manager. It sets in the RM config (and all NM
 configs) the ports based on the 5 above

Mesos grabs 5 random ports in it's allowed range (default 3 to 31000)

When Mesos starts the RM process, lets say it starts it on node2.

Node2 now has ports 3,30001,30002,30003,and 30004 listening and is
forwarding those to 15000,15001,15002,15003, and 15004 on the listening
process.  (Note, I know this is doable with Docker contained processes, can
Marathon do it outside of docker?)

Now haproxy's config is updated. on EVERY node, the ports 15000-15004 are
listening and are forwarding to Node2 on ports 3-30004.

To your point on "needing" mesos-dns. Technically no, we don't need it. we
can tell our NMs to connect to any node on ports 15000-15004. This will
work. But it's we may get added latency (rack to rack forwarding etc extra
hops).

Instead, if we set the NMs to connect to myriad-dev-1.marathon.mesos  It
could return an IP that is THE node it's running on.  That way we get the
advantage of having the NMs connect to the box with the process.  HA proxy
takes the requests, and sends to the mesos ports (3-30004) which Mesos
then sends to the process on ports 15000-15004.

So without mesos-dns: you just connect to any node on the service ports and
it "works" but when it comes to self documentation, connecting to
myriad-dev-1.marathon.mesos seems more descriptive than saying the NM is on
node2.yourdomain.  Especially when it's not... potential for administrative
confusion.

With mesos-dns, you connect to the descriptive name, and it works. But then
given my concerns with HAProxy, do we even NEED it? All HAProxy is doing at
that point is opening a port on a node, sending to another mesos approved
port only to send it to the same port the process is listening on. Are we
adding complexity?

This is a great discussion as it speaks to some intrinsic challenges that
exist in data center OSes :)

.

On Thu, May 14, 2015 at 1:50 PM, Santosh Marella 
wrote:

> I might be missing something, but I didn't understand why mesos-dns would
> be required in addition to HAProxy. If we configure RM to bind to random
> ports, but have RM reachable via HAProxy on RM's service ports, won't all
> the clients (such as NMs/HiveServer2 etc) just use HAProxy to reach to RM?
> If yes, why is mesos-dns needed?
>
> I have very limited knowledge about HAProxy configuration in a mesos
> cluster. I just read through this doc:
> https://docs.mesosphere.com/getting-started/service-discovery/ and what I
> inferred is that a HAProxy instance runs on every slave node and if NM
> running on a slave node has to reach to RM, it would simply use a RM's
> address that looks like "localhost:9" (where 9 is a admin
> identified RPC service port for RM).
> Since HAProxy on NM's localhost listens on 9, it just forwards the
> traffic to RM's IP:RandomPort. Am I understanding this correctly?
>
> Thanks,
> Santosh
>
> On Tue, May 12, 2015 at 5:41 AM, John Omernik  wrote:
>
> > The challenge I think is the ports. So we have

Re: Recommending or requiring mesos dns?

2015-05-12 Thread John Omernik

The challenge I think is the ports. So we have 5 ports that are needed for
a RM, do we predefine those? I think Yuliya is saying yes, we should.  An
interesting compromise... rather than truly random ports,  when we define a
Yarn cluster, we have the responsibility to define out 5 "service" ports
using the Martahon/HA Proxy Service ports. (This now requires HA Proxy as
well as mesos-dns.  I'd recommend some work being done on documenting
HAProxy for use with the haproxy script, I know that I stumbled a bit
trying to get HAProxy setup, but that just may be my own lack of knowledge
on the subject) These ports will have to be available across the cluster,
and will map to whichever ports Mesos Assigns to the RM.

This makes sense to me, a "Yarn Cluster Creation" event on a Mesos cluster
is something we want to be flexible, but it's not something that will
likely be "self service". I.e. we won't have users just creating Yarn
clusters at will. It will likely be something that, when requested, the
Admin can identify 5 available service ports, and lock those into that
cluster... that way when the Yarn RM spins up, it has it's service ports
defined (and thus the Node managers always know which ports to connect to).
Combined with Mesos DNS, this could actually work out very well, as you can
the name of the RM can be hard coded, and the ports will just work no
matter which node it spins up.

>From an HA perspective, The only advantage at this point that preallocating
the failover RM is speed of recovery.  (and guarantee of resources being
available if failover occurs).  Perhaps we could consider this as an option
for those who need fast or guaranteed recovery but not make it a
requirement?

The service port method will not work however for the node manager ports.
That said, I "believe" that as myriad spins up a node manager, it can
dynamically allocate the ports, and thus report those to the resource
manager on registration. Someone may need to help me out on that one, as I
am not sure.  Also, since the node manager is host specific, mesos-dns is
not required, it can register to the resource manager with what ever ports
are allocated, and the hostname it's running on.  I guess the question here
is, when Myriad requests the resources, and mesos allocates the ports, can
myriad, prior to actually starting the node manager, update the configs
with the allocated ports?   Or is this even needed?

This is a great discussion.

On Mon, May 11, 2015 at 9:58 PM, yuliya Feldman  wrote:

> As far as I understand in this case Apache YARN RM HA will kick in - which
> means all the ids, hosts, ports for all RMs will need to be defined
> somewhere and I wonder how it will be defined in this situation since those
> either need to be in yarn-site.xml or using "-D".
> In case of Mesos-DNS usage no need to setup RM HA at all and no warm
> standby needed. Marathon will start RM somewhere in case of failure and
> clients will rediscover it based on the same hostname.
> Am I missing anything?
>   From: Adam Bordelon 
>  To: dev@myriad.incubator.apache.org
>  Sent: Monday, May 11, 2015 7:26 PM
>  Subject: Re: Recommending or requiring mesos dns?
>
> I'm a +1 for random ports. You can also use Marathon's servicePort field to
> let HAProxy redirect from the servicePort to the actual hostPort for the
> service on each node. Mesos-DNS will similarly direct you to the correct
> host:port given the appropriate task name.
>
> Is there a reason we can't just have Marathon launch two RM tasks for the
> same YARN cluster? One would be the leader, and the other would redirect to
> it until failover. Once one fails over, the other will start taking
> traffic, and Marathon will try to launch a new backup RM when the resources
> are available. If the YARN RM cannot provide us this functionality on its
> own, perhaps we can write a simple wrapper script for it.
>
>
>
> On Fri, May 8, 2015 at 11:57 AM, John Omernik  wrote:
>
> > I would advocate random ports  because there should not be a limitation
> of
> > running only one RM per node.  If we want true portability, there should
> be
> > the ability to have RM for the cluster YarnProd to run to run on node1
> and
> > also have RM for the cluster YarnDev running on Node1. (if it so happens
> to
> > land this way).  That way the number of clusters isn't limited by the
> > number of physical nodes.
> >
> > On Fri, May 8, 2015 at 1:33 PM, Santosh Marella 
> > wrote:
> >
> > > RM can store its data either in HDFS or in ZooKeeper. The data store is
> > > configurable. There is a config property in YARN
> > > (yarn.resourcemanager.recovery.enabled) that tells RM whether it should
> > try
> > > to recover the

Re: Recommending or requiring mesos dns?

2015-05-08 Thread John Omernik

I would advocate random ports  because there should not be a limitation of
running only one RM per node.  If we want true portability, there should be
the ability to have RM for the cluster YarnProd to run to run on node1 and
also have RM for the cluster YarnDev running on Node1. (if it so happens to
land this way).  That way the number of clusters isn't limited by the
number of physical nodes.

On Fri, May 8, 2015 at 1:33 PM, Santosh Marella 
wrote:

> RM can store its data either in HDFS or in ZooKeeper. The data store is
> configurable. There is a config property in YARN
> (yarn.resourcemanager.recovery.enabled) that tells RM whether it should try
> to recover the metadata about the previously submitted apps, the containers
> allocated to them etc from the state store.
>
> Pre allocation of a backup rm is a great idea. Thinking about it a bit
> more, I felt it might be better to have such an option available in
> Marathon rather than building it in Myriad (and in all frameworks/services
> that wants HA/failover).
>
>  Let's say we launch a service X via marathon that requires some resources
> (cpus/mem/ports) and we want 1 instance of that service to be always
> available. Marathon promises restart of the service if it goes down. But,
> as far as I understand, marathon can restart the service on another node
> only if the resources required by service X are available on that node
> *after* the service goes down. In other words, Marathon doesn't proactively
> "reserve" these resources on another node as a backup for failover.
>
> Again, not all services launched via Marathon requires this, but perhaps
> there should be an config option to specify if a service desires to have
> marathon keep a backup node ready-to-go in the event of failure.
>
>
> On Thu, May 7, 2015 at 4:12 PM, John Omernik  wrote:
>
> > So I may be lookng at this wrong, but where is the data for the rm stored
> > if it does fail over? How will it know to pick up where it left off? This
>
> is just one area I am low in understanding on.
> >
> >
>
> >  That said, what about pre allocating a second failover rm some where on
> > the cluster.  (I am just tossing an idea here, in that there are probably
> > many reasons not to do this) but here is how I could see it happening.
> >
> 1. Myriad starts a rm asking for 5 random available ports.  Mesos replies
> > starting the rm and reports to myriad the 5 ports used for the services
> you
> > listed below.
> >
> > 2. Myriad then checks a config value of number of "hot spares" lets say
> we
> > specify 1. Myriad then puts in a resource request to mesos for CPU and
> > memory required for the rm, but specifically asks for the same 5 ports
> > allocated to the first. Basically it reserves a spot on another node with
> > the same ports available. It may tak a bit, but there should be that
> > availability. Until this request is met, the yarn cluster is in a ha
> > compromised position.
> >
>
>This is exactly what I think we should do, but why use random ports
> instead of standard RM ports? If you have 10 slave nodes in your mesos
> cluster, then there are 10 potential spots for RM to be launched on.
> However, if you choose to launch multiple RMs (multiple YARN clusters),
> then you can probably launch utmost 5 (with remaining 5 nodes available
>
> >
> > 3. At this point the perhaps we start another instance of rm right away
> > (depends on my first question on where the rm stores into about
> > jobs/applications) or the frame work just holds the spot, waiting for a
> > lack of heart beat (failover condition) on the primay resource manager.
> >
> > 4. If we can run the spare with no issues, it's a simple update of the
> dns
> > record and node managers connect to the new rm ( and another rm is
> > preallocated for redundancy). If we can't actually execute the secondary
> rm
> > until failover conditions, we can now execute the new rm, and the ports
> > will be the same.
> >
> > This may seem kludgey at first, but done correctly, it may actually limit
> > the length of failover time as the rm is preallocated.  Rms are not huge
> > from a resource perspective thus it may be a small cost for those who
> want
> > failover and multiple clusters (thus having dynamic ports)
> >
> > I will keep thinking this through, and would welcome feedback.
> >
> > On Thursday, May 7, 2015, Santosh Marella  wrote:
> >
> > > Hi John,
> > >
> > >   Great views about extending mesos dns for rm's discovery. Some
> > thoughts:
> > >1. There are 5 primary interface

Re: Recommending or requiring mesos dns?

2015-05-07 Thread John Omernik

So I may be lookng at this wrong, but where is the data for the rm stored
if it does fail over? How will it know to pick up where it left off?  This
is just one area I am low in understanding on.

 That said, what about pre allocating a second failover rm some where on
the cluster.  (I am just tossing an idea here, in that there are probably
many reasons not to do this) but here is how I could see it happening.

1. Myriad starts a rm asking for 5 random available ports.  Mesos replies
starting the rm and reports to myriad the 5 ports used for the services you
listed below.

2. Myriad then checks a config value of number of "hot spares" lets say we
specify 1. Myriad then puts in a resource request to mesos for CPU and
memory required for the rm, but specifically asks for the same 5 ports
allocated to the first. Basically it reserves a spot on another node with
the same ports available. It may tak a bit, but there should be that
availability. Until this request is met, the yarn cluster is in a ha
compromised position.

3. At this point the perhaps we start another instance of rm right away
(depends on my first question on where the rm stores into about
jobs/applications) or the frame work just holds the spot, waiting for a
lack of heart beat (failover condition) on the primay resource manager.

4. If we can run the spare with no issues, it's a simple update of the dns
record and node managers connect to the new rm ( and another rm is
preallocated for redundancy). If we can't actually execute the secondary rm
until failover conditions, we can now execute the new rm, and the ports
will be the same.

This may seem kludgey at first, but done correctly, it may actually limit
the length of failover time as the rm is preallocated.  Rms are not huge
from a resource perspective thus it may be a small cost for those who want
failover and multiple clusters (thus having dynamic ports)

I will keep thinking this through, and would welcome feedback.

On Thursday, May 7, 2015, Santosh Marella  wrote:

> Hi John,
>
>   Great views about extending mesos dns for rm's discovery. Some thoughts:
>1. There are 5 primary interfaces RM exposes that are bound to standard
> ports.
> a. RPC interface for clients that want to submit applications to
> YARN (port 8032).
> b. RPC interface for NMs to connect back/HB to RM (port 8031).
> c. RPC interface for App Masters to connect back/HB to RM (port
> 8030).
> d. RPC interface for admin to interact with RM via CLI (port 8033).
> e. Web Interface for RM's UI (port 8088).
>2. When we launch RM using Marathon, it's probably better to mention in
> marathon's config that RM will use the above ports. This is because, if RM
> doesn't listens on random ports (as opposed to the above listed standard
> ports), when RM fails over, the new RM gets ports that might be different
> from the ones used by the old RM. This makes the RM's discovery hard,
> especially post failover.
>3. It looks like what you are proposing is a way to update mesos-dns as
> to what ports RM's services are listening on. And when RM fails over, these
> ports would get updated in mesos-dns. Is my understanding correct? If yes,
> one challenge I see is that the clients that want to connect to the above
> listed RM interfaces also need to pull the changes to RM's port numbers
> from mesos-dns dynamically. Not sure how that might be possible.
>
>   Regarding your question about NM ports
>   1. NM has the following ports:
>   a. RPC port for app masters to launch containers (this is a random
> port).
>   b. RPC port for localization service. (port 8040)
>   c. Web port for NM's UI (port 8042).
>2. Ports (a) and (c) are relayed to RM when NM registers with RM. Port
> (b) is passed to a local container executor process via command line args.
>3. As you rightly reckon, we need a mechanism at launch of NM to pass
> the mesos allocated ports to NM for the above interfaces. We can try
> to use variable
> expansion
> <
> http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/conf/Configuration.html
> >
> mechanism hadoop has to achieve this.
>
> Thanks,
> Santosh
>
> On Thu, May 7, 2015 at 3:51 AM, John Omernik  > wrote:
>
> > I've implemented mesos-dns and use marathon to launch my myriad
> framework.
> > It shows up as myriad.marahon.mesos and makes it easy to find what node
> the
> > framework launched the resource manager on.
> >
> >  What if we made myriad mesos-dns aware, and prior to launching the yarn
> > rm, it could register in mesos dns. This would mean both the ip addresses
> > and the ports (we need to figure out multiple ports in mesos-dns). Then
> it
> > could write out ports a

Re: Recommending or requiring mesos dns?

2015-05-07 Thread John Omernik

On point three, I have been running the rm with marathon which does put it
mesos - dns, I'd highly recommend this approach (thanks Santosh)
On May 7, 2015 8:37 AM, "Ken Sipe"  wrote:

> John,
>
> 1. +1 to mesos-dns aware
> 2. all tasks deployed by mesos are already in mesos-dns.  so all the nm
> are there (we should make sure they have good names.
> 3. the RM is not usually started with mesos… if it was it would also be
> listed in mesos-dns, however a process started outside mesos is not
> currently added to mesos-dns.  At some point mesos-dns will allow for out
> of band server registration… but it isn’t there today.
> 4. I would like to see multi-yarn clusters on mesos supported with
> multi-myriad.  Each myriad would managed it’s cluster and would register
> with a unique framework id.
>
> ken
>
> > On May 7, 2015, at 5:51 AM, John Omernik  wrote:
> >
> > I've implemented mesos-dns and use marathon to launch my myriad
> framework.
> > It shows up as myriad.marahon.mesos and makes it easy to find what node
> the
> > framework launched the resource manager on.
> >
> > What if we made myriad mesos-dns aware, and prior to launching the yarn
> > rm, it could register in mesos dns. This would mean both the ip addresses
> > and the ports (we need to figure out multiple ports in mesos-dns). Then
> it
> > could write out ports and host names in the nm configs by checking mesos
> > dns for which ports the resource manager is using.
> >
> > Side question:  when a node manager registers with the resource manager
> > are the ports the nm is running on completely up to the nm? Ie I can run
> my
> > nm web server any port, Yarn just explains that to the rm on
> registration?
> > Because then we need a mechanism at launch of the nm task to understand
> > which ports mesos has allocated to the nm and update the yarn-site for
> that
> > nm before launch Perhaps mesos-dns as a requirement isn't needed,
> but I
> > am trying to walk through options that get us closer to multiple yarn
> > clusters on a mesos cluster.
> >
> > John
> >
> >
> > --
> > Sent from my iThing
>
>

Recommending or requiring mesos dns?

2015-05-07 Thread John Omernik

I've implemented mesos-dns and use marathon to launch my myriad framework.
It shows up as myriad.marahon.mesos and makes it easy to find what node the
framework launched the resource manager on.

 What if we made myriad mesos-dns aware, and prior to launching the yarn
rm, it could register in mesos dns. This would mean both the ip addresses
and the ports (we need to figure out multiple ports in mesos-dns). Then it
could write out ports and host names in the nm configs by checking mesos
dns for which ports the resource manager is using.

Side question:  when a node manager registers with the resource manager
are the ports the nm is running on completely up to the nm? Ie I can run my
nm web server any port, Yarn just explains that to the rm on registration?
Because then we need a mechanism at launch of the nm task to understand
which ports mesos has allocated to the nm and update the yarn-site for that
nm before launch Perhaps mesos-dns as a requirement isn't needed, but I
am trying to walk through options that get us closer to multiple yarn
clusters on a mesos cluster.

John


-- 
Sent from my iThing

Controlling Logs

2015-04-06 Thread John Omernik

Is there a way to change, or even control some of the logs in Myriad?

Specifically, the logs below, IMHO, there is not a lot of value add here to
have it be so chatting with the offers received if there are no pending
tasks. Perhaps make this a debug option if things are really wrong with
your cluster, but by default, if there are no pending tasks, then don't
print about offers... I think that could help keep log volume down, and
help with signal to noise when troubleshooting other issues.

Thoughts?





15/04/06 08:15:52 INFO handlers.ResourceOffersEventHandler: Received offers 2
15/04/06 08:15:52 INFO handlers.ResourceOffersEventHandler: No pending
tasks, declining all offers
15/04/06 08:15:55 INFO handlers.ResourceOffersEventHandler: Received offers 2
15/04/06 08:15:55 INFO handlers.ResourceOffersEventHandler: No pending
tasks, declining all offers
15/04/06 08:15:56 INFO handlers.ResourceOffersEventHandler: Received offers 1
15/04/06 08:15:56 INFO handlers.ResourceOffersEventHandler: No pending
tasks, declining all offers
15/04/06 08:15:57 INFO handlers.ResourceOffersEventHandler: Received offers 2
15/04/06 08:15:57 INFO handlers.ResourceOffersEventHandler: No pending
tasks, declining all offers
15/04/06 08:16:00 INFO handlers.ResourceOffersEventHandler: Received offers 2
15/04/06 08:16:00 INFO handlers.ResourceOffersEventHandler: No pending
tasks, declining all offers
15/04/06 08:16:01 INFO handlers.ResourceOffersEventHandler: Received offers 1
15/04/06 08:16:01 INFO handlers.ResourceOffersEventHandler: No pending
tasks, declining all offers
15/04/06 08:16:02 INFO handlers.ResourceOffersEventHandler: Received offers 2
15/04/06 08:16:02 INFO handlers.ResourceOffersEventHandler: No pending
tasks, declining all offers
15/04/06 08:16:05 INFO handlers.ResourceOffersEventHandler: Received offers 2
15/04/06 08:16:05 INFO handlers.ResourceOffersEventHandler: No pending
tasks, declining all offers
15/04/06 08:16:06 INFO handlers.ResourceOffersEventHandler: Received offers 1
15/04/06 08:16:06 INFO handlers.ResourceOffersEventHandler: No pending
tasks, declining all offers
15/04/06 08:16:07 INFO handlers.ResourceOffersEventHandler: Received offers 2
15/04/06 08:16:07 INFO handlers.ResourceOffersEventHandler: No pending
tasks, declining all offers

Some successes

2015-04-03 Thread John Omernik

Hey all, recently joined and wanted to share some success I am having with
Myriad on my test cluster. Obviously some of the issues that have been
talked about here and on the git issues I've run into, but all in all it's
been a great experience. (I had help).

A few notes:

My cluster is a Mesos based cluster running on top of a MapR filesystem
(4.0.2).  It's working pretty well for things like Spark and Docker, MRv1
is a hacked setup that I wouldn't recommend to anyone, but it was sorta
working.  I do multiple things with this cluster, but one is a crude packet
capture process that really works well from an "edge case" point of view
due to the use of a hive transform and other crazy stuff.

1. Hive is working great.  No issues there, tweaked some mapreduce
settings, added some profiles that fit my cluster and things seems to be
humming along well.

2. The API was confusing until it was explained to me. Basically, coming
from a marathon world, I saw the instances setting as the "number" of
instances I want running, rather than go up or down by x instances.  I see
why this the API is setup like this, but perhaps some consideration to make
it more intuitive?  Like an option to specify what you want to running
addition to the flex up and flex down.   Also, on the flex down, is there
an option to specify which instances you want to flex down? On flex up, I
can setup 1 large, then run 2 medium, and then have 2 small running on the
cluster, but on the way down, it appears it's only the number of instances
I want flexed.

3. If I shut down the resource manager, (on purpose) there should be a way
to have that auto kill nodemanagers. Right? As of now if I want to reset
things, I need to scale down in marathon, then run a script on each node
that kills processes.

4.  The myriad-config-default.yml needs to be moved outside the bundled jar
so we can update our clusters without rebuilding. I know this is alpha and
it's probably on a list, but I figured I'd mention it. (Perhaps checking
the location of executor, then class path etc).

5. I'd be happy to run though any tests or check any bugs people may want a
confirmation on with my cluster.  It's not "production" but it is doing
work so I have some flexibility in changing things up. I wish I could do
more on the coding side, but I am more of a hacker/scripter than a java
dev, and would hate for any of my bad code to make it into a project like
this with so much potential.


All is in all,  I am quite impressed. It seems more stable than my MRv1 on
Mesos/MapR so that's nice.  Still playing with settings and other things,
and wanted to share some successes instead of just issues, thanks for all
the hard work here.

John Omernik

75 matches

Mail list logo