Re: Mesos API - How to send argument to task?

2016-07-10 Thread Shuai Lin
For the xml files part, I think you can set uris in CommandInfo with
file:// scheme, e.g. file:///etc/path/to/foo.xml. Then in your executor you
can access the files in /mnt/mesos/sandbox/foo.xml.

On Tue, Jul 5, 2016 at 2:11 PM, Bryan Fok  wrote:

>
>
>
>
>
> 
> Hi all
>
> I am writing a python framework which has a custom executor. Each task I
> submit will need to pass 2 string as arguments, each task also need 2
> unique xml configuration files in the slave for the custom executor. So how
> do I pass arguments around for each task, as well as files as arguments?
>
>
>
>
>
>


Re: MesosCon North America videos and Europe CFP!

2016-06-18 Thread Shuai Lin
It works for me. Maybe you can just visit the youtube videos list directly:
https://www.youtube.com/playlist?list=PLGeM09tlguZQVL7ZsfNMffX9h1rGNVqnC .

On Sat, Jun 18, 2016 at 7:44 PM, Barry Kaplan  wrote:

> I get redirect to https://pi.pardot.com/mesoscon-north-america-2016-videos,
> which wants a password.
>
> On Tue, Jun 14, 2016 at 8:40 PM, Chris Schaefer  wrote:
>
>> Hi All,
>>
>> Over 60 recordings of keynotes, sessions and lightning talks from
>> MesosCon North America 2016 (Denver) are available at:
>>
>> * http://go.linuxfoundation.org/mesoscon-north-america-2016-videos
>>
>> Lots of great content here to catch up on if you were not able to attend
>> or would like to review your favorite sessions!
>>
>>
>> The CFP for MesosCon Europe 2016 (Amsterdam) closes this Friday June the
>> 17th. Please help spread the word with these handy Click-to-Tweet links:
>>
>> * http://ctt.ec/GvBrm - The CFP is open for #MesosCon Europe until June
>> 17! Submit your proposal NOW - bit.ly/1PNAKKn #microservices #containers
>>
>> * http://ctt.ec/QUfkF - Speak at #MesosCon in Amsterdam! Submit your
>> proposal today, the CFP closes on June 17 - bit.ly/1PNAKKn #DevOps
>> #microservices
>>
>>
>> Thanks!
>>
>> The MesosCon Program Committee
>> (Kiersten Gaffney, David Greenberg, Dave Lester, Chris Aniszczyk & Chris
>> Schaefer)
>>
>>
>>
>


Re: Redirect mesos logs out of /var/log/messages

2016-05-25 Thread Shuai Lin
@haosdent :thumbsup:

On Thu, May 26, 2016 at 12:39 AM, haosdent <haosd...@gmail.com> wrote:

> Hi, @June In rsyslog, "& stop" is used to discard message from further
> processing so that it would not duplicated in /var/log/messages.
>
> On Wed, May 25, 2016 at 11:32 PM, June Taylor <j...@umn.edu> wrote:
>
>> Shuai,
>>
>> Thank you for your quick reply. Just confirming: What is the & stop
>> portion for?
>>
>>
>> Thanks,
>> June Taylor
>> System Administrator, Minnesota Population Center
>> University of Minnesota
>>
>> On Wed, May 25, 2016 at 10:30 AM, Shuai Lin <linshuai2...@gmail.com>
>> wrote:
>>
>>> Here's what we use to redirect slave logs to its own file on ubuntu
>>> 14.04:
>>>
>>> $ cat /etc/rsyslog.d/30-mesos.conf
>>>> :programname, contains, "mesos-slave" /var/log/mesos-slave.log
>>>> & stop
>>>
>>>
>>> The master part is basically the same.
>>>
>>> On Wed, May 25, 2016 at 11:02 PM, June Taylor <j...@umn.edu> wrote:
>>>
>>>> Has anyone successfully taken the Mesos-related log messages that
>>>> appear in /var/log/messages and moved them to another file? Not just
>>>> duplicated them, but removed them.
>>>>
>>>> Thanks,
>>>> June Taylor
>>>> System Administrator, Minnesota Population Center
>>>> University of Minnesota
>>>>
>>>
>>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: Redirect mesos logs out of /var/log/messages

2016-05-25 Thread Shuai Lin
Here's what we use to redirect slave logs to its own file on ubuntu 14.04:

$ cat /etc/rsyslog.d/30-mesos.conf
> :programname, contains, "mesos-slave" /var/log/mesos-slave.log
> & stop


The master part is basically the same.

On Wed, May 25, 2016 at 11:02 PM, June Taylor  wrote:

> Has anyone successfully taken the Mesos-related log messages that appear
> in /var/log/messages and moved them to another file? Not just duplicated
> them, but removed them.
>
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota
>


Re: Framework Scheduling on Slave Question

2016-05-25 Thread Shuai Lin
>
>  Does mesos-slave know how to pass scheduler requests back to a Mesos
> master node?  Does one have to have mesos-master running on slave nodes to
> do this?  Am I smoking bad stuff?


No problem at all. AFAIK it's a very common practice to have marathon
running other frameworks.

On Wed, May 25, 2016 at 9:52 AM, haosdent  wrote:

> Hi, @Kent.
>
> > I’m trying to run a framework scheduler on a Mesos slave node.
> > Does one have to have mesos-master running on slave nodes to do this?
> If you mean run it manually, your could start your scheduler in any
> machine, just make sure the network connection works between framework and
> Mesos master.
>
> > Does mesos-slave know how to pass scheduler requests back to a Mesos
> master node?
> Framework only could communicate with Mesos Master, could not send message
> to Mesos Slave directly unless forward via Mesos Master.
>
>
> On Wed, May 25, 2016 at 3:14 AM, Kent Harris  wrote:
>
>> Pardon me if this is a newbie question.  I’m trying to port an existing
>> simulation system that is comprised of many processes.  The master process
>> is executed by a user (on a Mesos master node) and it has a custom
>> Framework that simply launches a process, call it the “root” process”, via
>> a CommandInfo executor on a slave node.
>>
>> The root process represents a simulation system and it needs to further
>> schedule many processes that represent various parts of the simulation
>> (communicating via ZMQ primarily).  To do this I have a second custom
>> framework invoked by the root process that also uses  CommandInfo executor
>> semantics to launch a number of tasks I call agents.  As an aside, all
>> agents have a common base structure and a custom plugin structure where
>> plugins represent different types of simulated hardware.
>>
>> Thus I’m trying to run a framework scheduler on a Mesos slave node.  I
>> that possible?Does mesos-slave know how to pass scheduler requests back
>> to a Mesos master node?  Does one have to have mesos-master running on
>> slave nodes to do this?  Am I smoking bad stuff?
>>
>> Advice appreciated.
>>
>> - Kent
>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Change the role of a framework

2016-04-28 Thread Shuai Lin
Hi list,

For some reason I need to change the role of an existing framework
(marathon)  from the default role "*" to a specific role, say "services", I
don't find any existing documentation on this, so here are the steps that I
take on a staging cluster:

- stop all HA marathon instances, only left one running

- set the marathon role (/etc/marathon/conf/mesos_role), and restart
marathon
  - at this moment marathon is still using "*" role because master won't
update the role of a framework when it re-registers
  - for that to happen we need to do a mesos master fail over

- stop the current active mesos-master, so marathon would use the new role
after the master failover

- now: marathon is using "services" role, which means it would accept
resources from both slaves with default '*' role and slaves with "services"
role

- for each slave:
  - stop the slave
  - change the role (/etc/mesos-slave/default_role) to "services"
  - remove /tmp/mesos/meta/slaves
  - restart docker (otherwise the old running executors/tasks won't be
killed)
  - restart the slave

During the process all running tasks are killed and restarted, but that's
acceptable to me.

Now all slaves is running with role "services" and marathon is running with
role "services".  So far the cluster seems to be working fine, but I'm not
sure if the steps I take have any un-noticed impacts, since this is a
somewhat un-documented procedure.

Any comments?

Regards,
Shuai


Re: Marathon Docker Application Deployment Issue

2016-04-22 Thread Shuai Lin
IIRC there is an option that can force pull a image when launching a task,
even if the image is already there. Can you paste the request you sent to
marathon?

On Fri, Apr 22, 2016 at 11:19 AM, <aishwarya.adyanth...@accenture.com>
wrote:

> Hi,
>
>
>
> I have pulled the docker image on to my slave machine and haven’t used a
> docker hub instead.
>
>
>
> *From:* Shuai Lin [mailto:linshuai2...@gmail.com]
> *Sent:* 21 April 2016 18:34
>
> *To:* user <user@mesos.apache.org>
> *Subject:* Re: Marathon Docker Application Deployment Issue
>
>
>
> As you can see from the slave log, the docker daemon on your mesos slave
> failed to  pull the image, because it can't connect to docker hub.
>
>
>
> On Thu, Apr 21, 2016 at 7:30 PM, <aishwarya.adyanth...@accenture.com>
> wrote:
>
> Hi,
>
>
>
> This is what is being seen in the file mesos-slave.ERROR:
>
>
>
> E0421 03:01:38.740025  2797 slave.cpp:3703] Container
> 'de0c5a4b-0041-4684-9f63-659094046a9e' for executor
> 'python-app.4883b6ab-076d-11e6-8f06-02423a6b81f2' of framework
> 687c0248-cfc6-4710-80c3-fd302da3a6ba- failed to start: Failed to
> 'docker -H unix:///var/run/docker.sock pull python:3': exit status = exited
> with status 1 stderr = Network timed out while trying to connect to
> https://index.docker.io/v1/repositories/library/python/images. You may
> want to check your internet connection or if you are behind a proxy.
>
> E0421 03:02:18.786658  2797 slave.cpp:3703] Container
> '3b6d9d6c-9771-4905-802e-a380cb14724c' for executor
> 'python-app.60696b7e-076d-11e6-8f06-02423a6b81f2' of framework
> 687c0248-cfc6-4710-80c3-fd302da3a6ba- failed to start: Failed to
> 'docker -H unix:///var/run/docker.sock pull python:3': exit status = exited
> with status 1 stderr = Network timed out while trying to connect to
> https://index.docker.io/v1/repositories/library/python/images. You may
> want to check your internet connection or if you are behind a proxy.
>
> E0421 03:02:18.789577  2799 slave.cpp:3703] Container
> 'e8594886-9e64-4a9c-a0c2-58f73303da3d' for executor
> 'python-app.6068f64d-076d-11e6-8f06-02423a6b81f2' of framework
> 687c0248-cfc6-4710-80c3-fd302da3a6ba- failed to start: Failed to
> 'docker -H unix:///var/run/docker.sock pull python:3': exit status = exited
> with status 1 stderr = Network timed out while trying to connect to
> https://index.docker.io/v1/repositories/library/python/images. You may
> want to check your internet connection or if you are behind a proxy.
>
> E0421 03:02:21.453006  2804 process.cpp:1966] Failed to shutdown socket
> with fd 9: Transport endpoint is not connected
>
> E0421 03:02:22.856468  2804 process.cpp:1966] Failed to shutdown socket
> with fd 9: Transport endpoint is not connected
>
>
>
> Thank you.
>
>
>
> *From:* Shuai Lin [mailto:linshuai2...@gmail.com]
> *Sent:* 21 April 2016 15:54
> *To:* user <user@mesos.apache.org>
> *Subject:* Re: Marathon Docker Application Deployment Issue
>
>
>
> There should be detailed error messages somewhere. Check your mesos slave
> logs, marathon logs, and docker daemon logs.
>
>
>
> On Thu, Apr 21, 2016 at 6:18 PM, <aishwarya.adyanth...@accenture.com>
> wrote:
>
> Hi ,
>
>
>
> I  have setup single mesos master and single slave machine in my
> environment. And trying to run Docker container on mesos slave.
>
> I am able to run the application in CLI but unable to deploy using
> Marathon GUI.I am trying to deploy hello-world image.
>
>
>
> It keeps showing up status as waiting under deployment.
>
> I don’t know why is it taking so much of time as hello world is a small
> image it should not take much time.
>
>
>
> Can you please help me with this.
>
>
>
> Thank you
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> __
>
> www.accenture.com
>
>
>
>
>


Re: Marathon Docker Application Deployment Issue

2016-04-21 Thread Shuai Lin
As you can see from the slave log, the docker daemon on your mesos slave
failed to  pull the image, because it can't connect to docker hub.

On Thu, Apr 21, 2016 at 7:30 PM, <aishwarya.adyanth...@accenture.com> wrote:

> Hi,
>
>
>
> This is what is being seen in the file mesos-slave.ERROR:
>
>
>
> E0421 03:01:38.740025  2797 slave.cpp:3703] Container
> 'de0c5a4b-0041-4684-9f63-659094046a9e' for executor
> 'python-app.4883b6ab-076d-11e6-8f06-02423a6b81f2' of framework
> 687c0248-cfc6-4710-80c3-fd302da3a6ba- failed to start: Failed to
> 'docker -H unix:///var/run/docker.sock pull python:3': exit status = exited
> with status 1 stderr = Network timed out while trying to connect to
> https://index.docker.io/v1/repositories/library/python/images. You may
> want to check your internet connection or if you are behind a proxy.
>
> E0421 03:02:18.786658  2797 slave.cpp:3703] Container
> '3b6d9d6c-9771-4905-802e-a380cb14724c' for executor
> 'python-app.60696b7e-076d-11e6-8f06-02423a6b81f2' of framework
> 687c0248-cfc6-4710-80c3-fd302da3a6ba- failed to start: Failed to
> 'docker -H unix:///var/run/docker.sock pull python:3': exit status = exited
> with status 1 stderr = Network timed out while trying to connect to
> https://index.docker.io/v1/repositories/library/python/images. You may
> want to check your internet connection or if you are behind a proxy.
>
> E0421 03:02:18.789577  2799 slave.cpp:3703] Container
> 'e8594886-9e64-4a9c-a0c2-58f73303da3d' for executor
> 'python-app.6068f64d-076d-11e6-8f06-02423a6b81f2' of framework
> 687c0248-cfc6-4710-80c3-fd302da3a6ba- failed to start: Failed to
> 'docker -H unix:///var/run/docker.sock pull python:3': exit status = exited
> with status 1 stderr = Network timed out while trying to connect to
> https://index.docker.io/v1/repositories/library/python/images. You may
> want to check your internet connection or if you are behind a proxy.
>
> E0421 03:02:21.453006  2804 process.cpp:1966] Failed to shutdown socket
> with fd 9: Transport endpoint is not connected
>
> E0421 03:02:22.856468  2804 process.cpp:1966] Failed to shutdown socket
> with fd 9: Transport endpoint is not connected
>
>
>
> Thank you.
>
>
>
> *From:* Shuai Lin [mailto:linshuai2...@gmail.com]
> *Sent:* 21 April 2016 15:54
> *To:* user <user@mesos.apache.org>
> *Subject:* Re: Marathon Docker Application Deployment Issue
>
>
>
> There should be detailed error messages somewhere. Check your mesos slave
> logs, marathon logs, and docker daemon logs.
>
>
>
> On Thu, Apr 21, 2016 at 6:18 PM, <aishwarya.adyanth...@accenture.com>
> wrote:
>
> Hi ,
>
>
>
> I  have setup single mesos master and single slave machine in my
> environment. And trying to run Docker container on mesos slave.
>
> I am able to run the application in CLI but unable to deploy using
> Marathon GUI.I am trying to deploy hello-world image.
>
>
>
> It keeps showing up status as waiting under deployment.
>
> I don’t know why is it taking so much of time as hello world is a small
> image it should not take much time.
>
>
>
> Can you please help me with this.
>
>
>
> Thank you
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> __
>
> www.accenture.com
>
>
>


Re: Marathon Docker Application Deployment Issue

2016-04-21 Thread Shuai Lin
There should be detailed error messages somewhere. Check your mesos slave
logs, marathon logs, and docker daemon logs.

On Thu, Apr 21, 2016 at 6:18 PM,  wrote:

> Hi ,
>
>
>
> I  have setup single mesos master and single slave machine in my
> environment. And trying to run Docker container on mesos slave.
>
> I am able to run the application in CLI but unable to deploy using
> Marathon GUI.I am trying to deploy hello-world image.
>
>
>
> It keeps showing up status as waiting under deployment.
>
> I don’t know why is it taking so much of time as hello world is a small
> image it should not take much time.
>
>
>
> Can you please help me with this.
>
>
>
> Thank you
>
>
>
>
>
>
>
>
>
>
>
> --
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> __
>
> www.accenture.com
>


Re: Docker image on mesos-slave

2016-04-20 Thread Shuai Lin
Take a look at https://github.com/spotify/docker-gc , and as Abhishek
mentioned, you need to setup a cron job to do this.

On Wed, Apr 20, 2016 at 2:32 PM, Dhiraj Thakur 
wrote:

> Hi Folks,
>
> I have few question related to handling of docker images.
>
> Does mesos ever delete older docker images from slave?
>
> We push incremental docker image on daily basis image:build_1 and because
> of that images are piling up on mesos slave.
>
> Does mesos offer any option to delete older images?
>
> Or do we need to create cron job to delete older images?
>
>
> -Dhiraj
>


Re: libmesos on alpine linux?

2016-04-16 Thread Shuai Lin
Take a look at
http://stackoverflow.com/questions/35614923/errors-compiling-mesos-on-alpine-linux
, this guy has successfully patched an older version of the mesos to build
on alpine linux.

On Sun, Apr 17, 2016 at 3:19 AM, Dick Davies  wrote:

> Has anyone been able to build libmesos (0.28.x ideally) on Alpine Linux
> yet?
>
> I'm trying to get a smaller spark docker image and though that was
> straightforward, the docs say I need libmesos in the image to be able
> to use it (which I find a bit suprising, but it seems to be correct).
>


Re: Pyspark Cluster Mode

2016-04-14 Thread Shuai Lin
To run the dispatcher  in marathon I would recommend use a docker image
like mesosphere/spark https://hub.docker.com/r/mesosphere/spark/tags/

One problem is how to access the dispatcher since it may be launched on any
one the slaves. You can setup a service discovery mechanism like
marathon-lb or mesos-dns for this purpose, but it may be a little overkill
if you don't need them except here.

On simple approach is to specify --net=host in the marathon task for the
dispatch, and run a haproxy on the your your master server that tries all
the slaves:

listen mesos-spark-dispatcher 0.0.0.0:7077
> server node1 10.0.1.1:7077 check
> server node2 10.0.1.2:7077 check
> server node3 10.0.1.3:7077 check


Then use "--master=mesos://yourmaster:7077" in your spark-submit command.



On Thu, Apr 14, 2016 at 10:03 PM, June Taylor  wrote:

> Pradeep,
>
> Thank you for your reply. I have read that documentation, but it leaves
> out a lot of key pieces. Have you actually run MesosClusterDispatcher on
> Marathon? If so, can you please share your JSON configuration for the
> application?
>
>
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota
>
> On Wed, Apr 13, 2016 at 11:32 AM, Pradeep Chhetri <
> pradeep.chhetr...@gmail.com> wrote:
>
>> In cluster mode, you need to first run *MesosClusterDispatcher*
>> application on marathon (Read more about that here:
>> http://spark.apache.org/docs/latest/running-on-mesos.html#cluster-mode)
>>
>> In both client and cluster mode, you need to specify --master flag while
>> submitting job, the only difference is that you will specifying the value
>> as the URL of dispatcher in cluster mode
>> (mesos://:) while in client mode, you
>> will be specifying URL of mesos-master
>> (mesos://:)
>>
>> On Wed, Apr 13, 2016 at 3:24 PM, June Taylor  wrote:
>>
>>> I'm interested in what the "best practice" is for running pyspark jobs
>>> against a mesos cluster.
>>>
>>> Right now, we're simply passing the --master mesos://host:5050 flag,
>>> which appears to register a framework properly.
>>>
>>> However, I was told this isn't "cluster mode" - and I'm a bit confused.
>>> What is the recommended method of doing this?
>>>
>>> Thanks,
>>> June Taylor
>>> System Administrator, Minnesota Population Center
>>> University of Minnesota
>>>
>>
>>
>>
>> --
>> Regards,
>> Pradeep Chhetri
>>
>
>


Re: Mesos master not joining cluster

2016-04-14 Thread Shuai Lin
Hi Shakeel, what do you mean by "one of the master was not participating in
the quorum"? Can you paste related lines from the logs of that master?

On Thu, Apr 14, 2016 at 8:44 PM, shakeel 
wrote:

> Hi,
>
> I have three mesos master configured. They have all been working
> properly for a while and today I noticed one of the master was not
> participating in the quorum.
>
> A reboot did not resolve the problem.
>
> All three of the masters are configured with a quorum of 2.
>
> Has anyone experienced this problem before and what is the easy way of
> getting the master server back in the quorum?
>
> Thanks
>
>
> Kind Regards
> Shakeel Suffee
>
> --
> The information contained in this message is for the intended addressee
> only and may contain confidential and/or privileged information. If you are
> not the intended addressee, please delete this message and notify the
> sender; do not copy or distribute this message or disclose its contents to
> anyone. Any views or opinions expressed in this message are those of the
> author and do not necessarily represent those of Motortrak Limited or of
> any of its associated companies. No reliance may be placed on this message
> without written confirmation from an authorised representative of the
> company.
>
> Registered in England 3098391 V.A.T. Registered No. 667463890
>


Re: Problems with scheduling tasks in mesos and spark

2016-04-13 Thread Shuai Lin
Have you tried setting the "spark.cores.max" in sparkconf? Check
http://spark.apache.org/docs/1.6.1/running-on-mesos.html :

 You can cap the maximum number of cores using conf.set("spark.cores.max",
> "10") (for example).


On Thu, Apr 14, 2016 at 12:53 AM, Andreas Tsarida <
andreas.tsar...@teralytics.ch> wrote:

>
> Hello,
>
> I’m trying to figure out a solution for dynamic resource allocation in
> mesos within the same framework ( spark ).
>
> Scenario :
> 1 - run spark a job in coarse mode
> 2 - run second job in coarse mode
>
> Second job will not start unless first job finishes which is not something
> that I would want. The problem is small when the job running doesn’t take
> too long but when it does nobody can work on the cluster.
>
> Best scenario would be to have mesos revoke resources from the first job
> and try to allocate resources to the second job.
>
> If there anybody else who solved this issue in another way ?
>
> Thanks
>


Re: Vote TODAY MesosCon voting closes today, Friday March 25

2016-03-25 Thread Shuai Lin
Not sure whether it's personal feeling or not, but in the voting page I see
on a scale of 1 to 10, 1 for "accept" and 10 for "reject" is quite
counter-intuitive.

On Sat, Mar 26, 2016 at 1:32 AM, Kiersten Gaffney 
wrote:

> If you haven't already, please take a few minutes the next few days and
> review what members of the community have submitted!
>
> Voting forms close TODAY, Friday, March 25, 2016, 11:55 PST
>
> A total of 154 proposals were submitted in time for #MesosCon review, up
> significantly from 63 submitted for last year’s conference. Similar to last
> year, the MesosCon program committee is opening these proposals up for
> community review/feedback to better-inform our decisions about what should
> be included in the program.
>
> In order to make it easier to review a subset of the proposals, we’ve
> segmented them based upon two loose themes: Developer and Users.
>
> Developers: http://bit.ly/1RpZPvj
>
> Talks on how frameworks can be used, developed, and integrate with Mesos.
>
> Users: http://bit.ly/1Mspaxp
>
> A combination of talks that are use cases (how company x uses Mesos), and
> operations-focused (how we deploy x, use Docker, etc).
>
> The forms above also include an opportunity to indicate which sessions you
> didn't see proposed but would like to attend.
>
> Thanks in advance for your participation!
>
> Kiersten, Dave, and David (Program Committee)
>


Re: [VOTE] Release Apache Mesos 0.28.0 (rc1)

2016-03-07 Thread Shuai Lin
Maybe also https://issues.apache.org/jira/browse/MESOS-4877 and
https://issues.apache.org/jira/browse/MESOS-4878 ?

On Tue, Mar 8, 2016 at 9:13 AM, Jie Yu  wrote:

> I'd like to fix https://issues.apache.org/jira/browse/MESOS-4888 as well
> if you guys plan to cut another RC
>
> On Mon, Mar 7, 2016 at 10:16 AM, Daniel Osborne <
> daniel.osbo...@metaswitch.com> wrote:
>
>> -1
>>
>> If it doesn’t cause too much pain, I'm hoping we can squeeze a relatively
>> small patch which restores Mesos' ability to extract Docker assigned IPs.
>> This has been broken with Docker 1.10's release over  a month ago, and
>> prevents service discovery and DNS from working.
>>
>> Mesos-4370: https://issues.apache.org/jira/browse/MESOS-4370
>> RB# 43093: https://reviews.apache.org/r/43093/
>>
>> I've built 0.28.0-rc1 with this patch and can confirm that it fixes it as
>> expected.
>>
>> Apologies for not bringing this to attention earlier.
>>
>> Thanks all,
>> Dan
>>
>> -Original Message-
>> From: Vinod Kone [mailto:vinodk...@apache.org]
>> Sent: Thursday, March 3, 2016 5:44 PM
>> To: dev ; user 
>> Subject: [VOTE] Release Apache Mesos 0.28.0 (rc1)
>>
>> Hi all,
>>
>>
>> Please vote on releasing the following candidate as Apache Mesos 0.28.0.
>>
>>
>> 0.28.0 includes the following:
>>
>>
>> 
>>
>>   * [MESOS-4343] - A new cgroups isolator for enabling the net_cls
>> subsystem in
>>
>> Linux. The cgroups/net_cls isolator allows operators to provide
>> network
>>
>>
>> performance isolation and network segmentation for containers within
>> a Mesos
>>
>> cluster. To enable the cgroups/net_cls isolator, append
>> `cgroups/net_cls` to
>>
>> the `--isolation` flag when starting the slave. Please refer to
>>
>>
>> docs/mesos-containerizer.md for more details.
>>
>>
>>
>>
>>
>>   * [MESOS-4687] - The implementation of scalar resource values (e.g.,
>> "2.5
>>
>>
>> CPUs") has changed. Mesos now reliably supports resources with up to
>> three
>>
>> decimal digits of precision (e.g., "2.501 CPUs"); resources with more
>> than
>>
>> three decimal digits of precision will be rounded. Internally,
>> resource math
>>
>> is now done using a fixed-point format that supports three decimal
>> digits of
>>
>> precision, and then converted to/from floating point for input and
>> output,
>>
>> respectively. Frameworks that do their own resource math and
>> manipulate
>>
>>
>> fractional resources may observe differences in roundoff error and
>> numerical
>>
>> precision.
>>
>>
>>
>>
>>
>>   * [MESOS-4479] - Reserved resources can now optionally include "labels".
>>
>>
>> Labels are a set of key-value pairs that can be used to associate
>> metadata
>>
>> with a reserved resource. For example, frameworks can use this
>> feature to
>>
>> distinguish between two reservations for the same role at the same
>> agent
>>
>> that are intended for different purposes.
>>
>>
>>
>>
>>
>>   * [MESOS-2840] - **Experimental** support for container images in Mesos
>>
>>
>> containerizer (a.k.a. Unified Containerizer). This allows frameworks
>> to
>>
>>
>> launch Docker/Appc containers using Mesos containerizer without
>> relying on
>>
>> docker daemon (engine) or rkt. The isolation of the containers is
>> done using
>>
>> isolators. Please refer to docs/container-image.md for currently
>> supported
>>
>> features and limitations.
>>
>>
>>
>>
>>
>>   * [MESOS-4793] - **Experimental** support for v1 Executor HTTP API. This
>>
>>
>> allows executors to send HTTP requests to the /api/v1/executor agent
>>
>>
>> endpoint without the need for an executor driver. Please refer to
>>
>>
>> docs/executor-http-api.md for more details.
>>
>>
>>
>>
>>
>> Additional API Changes:
>>
>>
>>   * [MESOS-4066] - Agent should not return partial state when a request
>> is made to /state endpoint during recovery.
>>
>>   * [MESOS-4547] - Introduce TASK_KILLING state.
>>
>>
>>   * [MESOS-4712] - Remove 'force' field from the Subscribe Call in v1
>> Scheduler API.
>>
>>   * [MESOS-4591] - Change the object of ReserveResources and CreateVolume
>> ACLs to `roles`.
>>
>>   * [MESOS-4712] - Remove 'force' field from the Subscribe Call in v1
>> Scheduler API.
>>
>>   * [MESOS-4591] - Change the object of ReserveResources and CreateVolume
>> ACLs to `roles`.
>>
>>   * [MESOS-3583] - Add stream IDs for HTTP schedulers.
>>
>>
>> The CHANGELOG for the release is available at:
>>
>>
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.28.0-rc1
>>
>>
>> 
>>
>>
>> The candidate for Mesos 0.28.0 release is available at:
>>
>>
>> https://dist.apache.org/repos/dist/dev/mesos/0.28.0-rc1/mesos-0.28.0.tar.gz
>>
>>
>> The tag to be voted on is 

Re: [VOTE] Release Apache Mesos 0.28.0 (rc1)

2016-03-04 Thread Shuai Lin
>
>   * [MESOS-4712] - Remove 'force' field from the Subscribe Call in v1
> Scheduler API.
>   * [MESOS-4591] - Change the object of ReserveResources and CreateVolume
> ACLs to `roles`.
>   * [MESOS-4712] - Remove 'force' field from the Subscribe Call in v1
> Scheduler API.


MESOS-4712 is included twice.

On Fri, Mar 4, 2016 at 1:25 PM, Vinod Kone  wrote:

> On Thu, Mar 3, 2016 at 5:43 PM, Vinod Kone  wrote:
>
> > Tue Mar  10 17:00:00 PST 2016
>
>
> Sorry. This should be Mar 8th not 10th.
>


Re: Downloading s3 uris

2016-02-26 Thread Shuai Lin
If you don't want to configure hadoop on your mesos slaves, the only
workaround I see is to write a "hadoop" script and put it in your PATH. It
need to support the following usage patterns:

- hadoop version
- hadoop fs -copyToLocal s3n://path /target/directory/

On Sat, Feb 27, 2016 at 12:31 AM, Aaron Carey  wrote:

> I was trying to avoid generating urls for everything as this will
> complicate things a lot.
>
> Is there a straight forward way to get the fetcher to do it directly?
>
> --
> *From:* haosdent [haosd...@gmail.com]
> *Sent:* 26 February 2016 16:27
> *To:* user
> *Subject:* Re: Downloading s3 uris
>
> I think still could pass AWSAccessKeyId if it is private?
> http://www.bucketexplorer.com/documentation/amazon-s3--how-to-generate-url-for-amazon-s3-files.html
>
> On Sat, Feb 27, 2016 at 12:25 AM, Abhishek Amralkar <
> abhishek.amral...@talentica.com> wrote:
>
>> In that case do we need to keep bucket/files public?
>>
>> -Abhishek
>>
>> From: Zhitao Li 
>> Reply-To: "user@mesos.apache.org" 
>> Date: Friday, 26 February 2016 at 8:23 AM
>> To: "user@mesos.apache.org" 
>> Subject: Re: Downloading s3 uris
>>
>> Haven't directly used s3 download, but I think a workaround (if you don't
>> care ACL about the files) is to use http
>> 
>>  url
>> instead.
>>
>> On Feb 26, 2016, at 8:17 AM, Aaron Carey  wrote:
>>
>> I'm attempting to fetch files from s3 uris in mesos, but we're not using
>> hdfs in our cluster... however I believe I need the client installed.
>>
>> Is it possible to just have the client running without a full hdfs setup?
>>
>> I haven't been able to find much information in the docs, could someone
>> point me in the right direction?
>>
>> Thanks!
>>
>> Aaron
>>
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: Mesos fetcher in dockerized slave

2016-02-24 Thread Shuai Lin
ping @Tim, I think this bug also affects
https://issues.apache.org/jira/browse/MESOS-4743 .



On Wed, Jan 20, 2016 at 10:20 PM, Shuai Lin <linshuai2...@gmail.com> wrote:

> The testing of this case requires to build a docker image for mesos-slave,
> so it seems not practical to add a test case for it in the mesos tests.
>
> Anyway, here is the scripts I use for testing this issue:
> https://gist.github.com/lins05/14455e92f37e91fd46ff
>
> On Wed, Jan 20, 2016 at 10:30 AM, Shuai Lin <linshuai2...@gmail.com>
> wrote:
>
>> Hi Tim,
>>
>> The review is here: https://reviews.apache.org/r/42390/ , would you
>> please take a look?
>>
>> Regards,
>> Shuai
>>
>> On Sat, Jan 9, 2016 at 9:42 AM, Shuai Lin <linshuai2...@gmail.com> wrote:
>>
>>> Hi Maria and Tim,
>>>
>>> I'm setting up a test case for this scenario that would fail, after
>>> which I'll begin on fixing it.
>>>
>>> Is it feasible to include the fixing in the new release?
>>>
>>>
>>> I'm not sure, does 0.27 have an estimated release date now?
>>>
>>>
>>> Regards,
>>> Shuai
>>>
>>> On Sat, Jan 9, 2016 at 1:11 AM, Timothy Chen <t...@mesosphere.io> wrote:
>>>
>>>> I can shepherd no problem.
>>>>
>>>> Tim
>>>>
>>>> On Dec 25, 2015, at 4:32 PM, Shuai Lin <linshuai2...@gmail.com> wrote:
>>>>
>>>> I'll work on it. @Tim could you shepherd it?
>>>>
>>>> On Sat, Dec 26, 2015 at 2:49 AM, Marica Antonacci <
>>>> marica.antona...@ba.infn.it> wrote:
>>>>
>>>>> Hi Tim and Shuai,
>>>>>
>>>>> thank you very much for your reply. I have opened a JIRA issue on
>>>>> this: https://issues.apache.org/jira/browse/MESOS-4249
>>>>> I hope it will be patched soon :)
>>>>>
>>>>> Best regards,
>>>>> Marica
>>>>>
>>>>>
>>>>> Il giorno 24/dic/2015, alle ore 17:54, Tim Chen <t...@mesosphere.io>
>>>>> ha scritto:
>>>>>
>>>>> Hi Marica/Shuai,
>>>>>
>>>>> Sorry haven't been able to spend the time to repro, but looks like
>>>>> Shuai confirmed it.
>>>>>
>>>>> Can one of you file a JIRA?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Tim
>>>>>
>>>>> On Thu, Dec 24, 2015 at 6:16 AM, Shuai Lin <linshuai2...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Marica,
>>>>>>
>>>>>> I can reproduce the problem exactly as you described in the first
>>>>>> email of this thread. Without `MESOS_DOCKER_MESOS_IMAGE` environment
>>>>>> variable set, the fetcher works just fine; With it, the fetcher steps 
>>>>>> seems
>>>>>> skipped. This looks like a bug to me.
>>>>>>
>>>>>> Regards,
>>>>>> Shuai
>>>>>>
>>>>>> On Tue, Dec 22, 2015 at 7:41 PM, Marica Antonacci <
>>>>>> marica.antona...@ba.infn.it> wrote:
>>>>>>
>>>>>>> Dear all,
>>>>>>>
>>>>>>> I have not solved this issue yet. Please, can anyone run the same
>>>>>>> test and let me know if the fetcher is correctly invoked?
>>>>>>> The test is really simple, just try to start a dockerized app (see
>>>>>>> json definition file below) through marathon on a mesos slave running 
>>>>>>> in a
>>>>>>> docker container started with the option —docker_mesos_image=>>>>>> slave
>>>>>>> image>.
>>>>>>> I would appreciate very much any feedback.
>>>>>>>
>>>>>>> Sample Marathon app:
>>>>>>> {
>>>>>>>  "id": "test-app",
>>>>>>>  "container": {
>>>>>>>"type": "DOCKER",
>>>>>>>"docker": {
>>>>>>>  "image": "libmesos/ubuntu"
>>>>>>>}
>>>>>>>  },
>>>>>>>  "cpus": 1,
>>>>>>>  "mem": 512,
>>>>>>>  "uris": [ "
>

Re: Can Marathon ensure single instance of a service at any give time?

2016-02-23 Thread Shuai Lin
>
> If I would like to allow it to restart on any node in a cluster can I use
> Marathon to simplify the implementation or it warrants more involved
> implementation using Zoo. Does Mesos provide any other helpers to simplify
> this use case?


Marathon can do that, but be aware that there is possibility that in some
edge cases like network partitions, there could be two instances running at
the same time.

For example, if the mesos slave that runs your cache-updating service can't
connect with mesos master due to network connection problems, after a while
mesos master would think the slave is dead, and tells marathon that the
task is lost. In this case marathon would launch another instance of your
cache-updating service, the result - two instances are running at the same
time.

To avoid this, you can pin the service to a specific slave by using the
constraints provided by marathon, as @klaus suggested above, but this would
lost the flexibility of running it inside a mesos cluster. Otherwise you
have to use a distributed consensus solution like zookeeper.

On Tue, Feb 23, 2016 at 6:06 PM, Petr Novak  wrote:

> Hello,
> if I need to run single stateless instance or only a single leader doing a
> work at any given time. Something I would typically implement using Zoo
> Curator LeaderSelector. Can I use Marathon to ensure this without having to
> implement mutual exclusion myself? Let's assume that other parts of the
> architecture aren't designed well to support more running workers at a time.
>
> Currently we have a service which updates cache, it runs on one node and
> when it fails it is restarted and PID file is used to ensure single
> instance. Pretty naive implementation, possibly doesn;t work in all edge
> cases.
>
> If I would like to allow it to restart on any node in a cluster can I use
> Marathon to simplify the implementation or it warrants more involved
> implementation using Zoo. Does Mesos provide any other helpers to simplify
> this use case?
>
> Or is Marathon designed only to run stateless services which can possibly
> run in multiple instances?
>
> Many thanks,
> Petr
>
>
>


Re: Zookeeper & Paxos: Why?

2016-02-14 Thread Shuai Lin
Hi,

As far as I have read, Paxos is not related to mesos master election. It is
used to implement the "replicated logs", as the storage backend of the
registry where information like slaves, quota, and maintenance schedules
are persisted (check
https://github.com/apache/mesos/blob/0.27.0/src/master/registry.proto#L27 ).

I think https://issues.apache.org/jira/browse/MESOS-1471 would make this
more clear.

For more on the replicated logs, check
https://issues.apache.org/jira/browse/MESOS-1471?jql=project%20%3D%20MESOS%20AND%20text%20~%20%22replicated%20log%22

Regards,
Shuai


On Mon, Feb 15, 2016 at 4:15 AM, Elias Levy 
wrote:

> Good day,
>
> Apologies if this question has been answered elsewhere, but I've not come
> across an answer to it.  Mesos masters use Zookeeper to master election.
> Mesos also appears to make use of Paxos, although I am less clear on its
> intended purpose.
>
> Why the use of two distinct consensus systems?
>
> The two should be largely equivalent.  I would imagine selecting a single
> one would be preferable to lower the complexity of the system and to avoid
> mismatched states (e.g. Zookeeper and Paxos disagreeing about the
> visibility of Mesos masters if ZK members are not colocated with them).
>
> Also, what is Paxos used for within Mesos?
>
>
>


Re: memory limit exceeded ==> KILL instead of TERM (first)

2016-02-12 Thread Shuai Lin
I'm not familiar with why SIGKILL is sent directly without SIGTERM, but is
it possible to have your consul registry cleaned up when task killed by
adding consul health checks?

On Fri, Feb 12, 2016 at 6:12 PM, Harry Metske 
wrote:

> Hi,
>
> we have a Mesos (0.27) cluster running with (here relevant) slave options:
> --cgroups_enable_cfs=true
> --cgroups_limit_swap=true
> --isolation=cgroups/cpu,cgroups/mem
>
> What we see happening is that people are running Tasks (Java applications)
> and specify a memory resource limit that is too low, which cause these
> tasks to be terminated, see logs below.
> That's all fine, after all you should specify reasonable memory limits.
> It looks like the slave sends a KILL signal when the limit is reached, so
> the application has no chance to do recovery termination, which (in our
> case) results in consul registrations not being cleaned up.
> Is there a specific reason why the slave does not first send a TERM
> signal, and if that does not help after a certain timeout, send a KILL
> signal?
> That would give us a chance to cleanup consul registrations (and other
> cleanup).
>
> kind regards,
> Harry
>
>
> I0212 09:27:49.238371 11062 containerizer.cpp:1460] Container
> bed2585a-c361-4c66-afd9-69e70e748ae2 has reached its limit for resource
> mem(*):160 and will be terminated
>
> I0212 09:27:49.238418 11062 containerizer.cpp:1227] Destroying container
> 'bed2585a-c361-4c66-afd9-69e70e748ae2'
>
> I0212 09:27:49.240932 11062 cgroups.cpp:2427] Freezing cgroup
> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>
> I0212 09:27:49.345171 11062 cgroups.cpp:1409] Successfully froze cgroup
> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
> 104.21376ms
>
> I0212 09:27:49.347303 11062 cgroups.cpp:2445] Thawing cgroup
> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2
>
> I0212 09:27:49.349453 11062 cgroups.cpp:1438] Successfullly thawed cgroup
> /sys/fs/cgroup/freezer/mesos/bed2585a-c361-4c66-afd9-69e70e748ae2 after
> 2.123008ms
>
> I0212 09:27:49.359627 11062 slave.cpp:3481] executor(1)@
> 10.239.204.142:43950 exited
>
> I0212 09:27:49.381942 11062 containerizer.cpp:1443] Executor for container
> 'bed2585a-c361-4c66-afd9-69e70e748ae2' has exited
>
> I0212 09:27:49.389766 11062 provisioner.cpp:306] Ignoring destroy request
> for unknown container bed2585a-c361-4c66-afd9-69e70e748ae2
>
> I0212 09:27:49.389853 11062 slave.cpp:3816] Executor
> 'fulltest02.6cd29bd8-d162-11e5-a4df-005056aa67df' of framework
> 7baec9af-018f-4a4c-822a-117d61187471-0001 terminated with signal Killed
>


Re: Using Virtual Hosts

2016-02-11 Thread Shuai Lin
Since you already have haproxy running, why not use it as a reverse proxy?

On Fri, Feb 12, 2016 at 3:31 AM, Alfredo Carneiro <
alfr...@simbioseventures.com> wrote:

> Hi guys,
>
> I have been searching for the past few weeks about Mesos and VHosts,
> saddly, I have not found anything useful.
>
> I have a mesos cluster running some webapps. So, I have assigned specifc
> ports to these apps, so I access this apps using
> *http://:*. How could I use Virtual Hosts to
> access these apps? *http://myapp.com *?
>
> 1x Mesos Master with HAProxy and Chronos
> 9x Mesos Slave with Docker
>
> Thanks,
>
> --
> Alfredo Miranda
>


Re: mesos 0.23, long term quering state.json data.

2016-02-03 Thread Shuai Lin
I would suggest first check the possibility of  whether it's a problem of
the vm/docker networking, e.g. run a web server in docker in the vm, and
try to download some files from it, and vice versa.

On Thu, Feb 4, 2016 at 2:17 AM, tommy xiao  wrote:

> hi, All
>
> I came across another case. the user is use vmware vsphere to setup a vm.
> then start three ubuntu servers. install mesos-slave.  the mesos-slave
> state.json always take about some minutes to result json. i have already
> check the env, we can't found any different settings.
>
> because the mesos-slave is in docker, when i use docker restart
> mesos-slave. the http://xxx.xx.xx:5051/state.json will quickly response.
> this is reproduce case in mesos 0.23 version. but i testing this env in my
> local vsphere, the cluster can setup correctly. it let me very confuse.
> anyone can give some advise on it?
>
>
>
>
> 2016-02-03 15:56 GMT+08:00 tommy xiao :
>
>> the final result, this is caused by ksoftirqd eats 100% of one CPU core.
>> there is not mesos-slave's issue, even i installed the optimized binary.
>>
>> 2016-02-03 12:13 GMT+08:00 tommy xiao :
>>
>>> today, i have testing the issued cluster,  found the ksoftirqd eats 100%
>>> of one CPU core, then the mesos-slave state.json will get timeout.  i will
>>> later testing the O2 optimized mesos-slave binary again. then i will report
>>> the result later.
>>>
>>> 2016-02-02 23:41 GMT+08:00 tommy xiao :
>>>
 haosdent, tomorrow  i will testing the rebuild package on the exists
 cluster. please hold on.

 2016-02-02 23:27 GMT+08:00 haosdent :

> does the performance problem still exists?
> On Feb 2, 2016 11:17 PM, "tommy xiao"  wrote:
>
>> with test result, when i remove CFLAGS='-g -O2 -w' . it works now.
>>
>> make -j2  CXXFLAGS='-g -O2 -w -std=c++11'
>>
>> 2016-02-02 22:38 GMT+08:00 tommy xiao :
>>
>>> just make sure the libleveldb-dev is not relative with mesos build.
>>> even without libleveldb-dev, i also can't build successful. if i remove 
>>> O2,
>>> just default O0, the make build will get successful. it is weird.
>>>
>>> 2016-02-02 19:24 GMT+08:00 haosdent :
>>>
 Why you execute `sudo apt-get install libleveldb-dev` before
 compile? Mesos have a bundle leveldb package.

 On Tue, Feb 2, 2016 at 7:21 PM, tommy xiao 
 wrote:

> David,
>
> i am follow your suggest, and build with O2, came cross leveldb
> error.
> this is ubuntu server, i use below instruction:
> sudo apt-get install libleveldb-dev
> git clone https://git-wip-us.apache.org/repos/asf/mesos.git
> cd mesos
> ./bootstrap
> mkdir build && cd build && ../configure
> make -j2  CFLAGS='-g -O2 -w' CXXFLAGS='-g -O2 -w -std=c++11'
>
>
> cd leveldb && \
>
>   make  CC="gcc" CXX="g++" OPT="-g -O2 -w -std=c++11 -fPIC"
>
> make[4]: Entering directory
> `/home/dsxiao/mesos/build/3rdparty/leveldb'
>
> g++ -pthread -shared -Wl,-soname
> -Wl,/home/dsxiao/mesos/build/3rdparty/leveldb/libleveldb.so.1 -g -O2 
> -w
> -std=c++11 -fPIC db/builder.cc db/c.cc db/dbformat.cc db/db_impl.cc
> db/db_iter.cc db/filename.cc db/log_reader.cc db/log_writer.cc
> db/memtable.cc db/repair.cc db/table_cache.cc db/version_edit.cc
> db/version_set.cc db/write_batch.cc table/block_builder.cc 
> table/block.cc
> table/filter_block.cc table/format.cc table/iterator.cc 
> table/merger.cc
> table/table_builder.cc table/table.cc table/two_level_iterator.cc
> util/arena.cc util/bloom.cc util/cache.cc util/coding.cc 
> util/comparator.cc
> util/crc32c.cc util/env.cc util/env_posix.cc util/filter_policy.cc
> util/hash.cc util/histogram.cc util/logging.cc util/options.cc
> util/status.cc  port/port_posix.cc -o libleveldb.so.1.4
>
> g++ -g -O2 -w -std=c++11 -c db/builder.cc -o db/builder.o
>
> db/builder.cc:5:24: fatal error: db/builder.h: No such file or
> directory
>
>  #include "db/builder.h"
>
> ^
>
> compilation terminated.
>
> db/builder.cc:5:24: fatal error: db/builder.h: No such file or
> directory
>
>  #include "db/builder.h"
>
> ^
>
> compilation terminated.
>
> make[4]: *** [db/builder.o] Error 1
>
> make[4]: *** Waiting for unfinished jobs
>
> db/dbformat.cc:6:25: fatal error: db/dbformat.h: No such file or
> directory

Re: Unable to receive offers / long delays when starting or restarting.

2016-02-02 Thread Shuai Lin
Is there any warning/error message in marathon logs when it takes a long
time to deploy/redeploy your micro service? Also worth take a look of the
mesos slave logs.

On Tue, Feb 2, 2016 at 6:55 AM, Rodrick Brown 
wrote:

> My cluster consist of 9 slaves server split in 1/2 for two primary
> applications (Spark | Scala Microservices)
>
>- Spark - (server 1,2,3,4,8)  attributes: "rack:spark"
>- Long running Microservices (server 5,6,7,9) attributes "rack:ms"
>
>
> The spark jobs run in coarse mode and the majority of them are short lived
> they run for about  ~10-15 minutes via Chronos and shutdown. They start
> every 15 minutes about ~45 jobs.
>
> We do lots of deploys daily mostly to the "rack:ms" nodes where these jobs
> are started via Marathon and run until we need to deploy a new release of
> code.
>
> Recently I started noticing jobs are taking forever to restart or startup
> like they're not receiving valid offers.
> The cluster resources consists of the following resources I always have
> more than enough idle resources available to bring up/down new services yet
> I've seen one scenario where a service took almost 10 minutes to restart.
>
>
> CPUs Mem
> Total 120 456.8 GB
> Used 53.6 140.5 GB
> Offered 0 0 B
> Idle 66.4 316.3 GB
> How can I combat this delay? I'm not using roles could this be the
> problem?
> Chronos jobs always seem to run fine but they require much less resource
> than my long running Scala services.
> Here is a sample job definition for in Marathon.
>
> {
>"id": "production/index-service",
>"cmd": "env && /opt/orchard/production/index-server/bin/run_jar.sh",
>"cpus": 1.0,
>"mem": 4096,
>"disk": 1000,
>"user": "orchard",
>"instances": 2,
>"constraints": [
>  [
>"hostname","UNIQUE"
>  ],
>  [
>"rack", "LIKE", "ms"
>  ]
>],
>"requirePorts": true,
>"labels": {
>  "ENV": "production",
>  "HAPROXY_GROUP": "microservice"
>},
>  "ports": [
>  31703,
>  31803,
>  31903
>],
>"maxLaunchDelaySeconds": 3,
>"backoffFactor": 1.20,
>"healthChecks": [
>  {
>"gracePeriodSeconds": 3,
>"intervalSeconds": 5,
>"maxConsecutiveFailures": 3,
>"protocol": "TCP",
>"portIndex": 1,
>"timeoutSeconds": 5
>  }
>],
> "upgradeStrategy": {
>"minimumHealthCapacity": 0.5,
>"maximumOverCapacity": 0.2
>}
> }
>
> Any advice appreciated thanks.
>
> *NOTICE TO RECIPIENTS*: This communication is confidential and intended
> for the use of the addressee only. If you are not an intended recipient of
> this communication, please delete it immediately and notify the sender by
> return email. Unauthorized reading, dissemination, distribution or copying
> of this communication is prohibited. This communication does not constitute
> an offer to sell or a solicitation of an indication of interest to purchase
> any loan, security or any other financial product or instrument, nor is it
> an offer to sell or a solicitation of an indication of interest to purchase
> any products or services to any persons who are prohibited from receiving
> such information under applicable law. The contents of this communication
> may not be accurate or complete and are subject to change without notice.
> As such, Orchard App, Inc. (including its subsidiaries and affiliates,
> "Orchard") makes no representation regarding the accuracy or completeness
> of the information contained herein. The intended recipient is advised to
> consult its own professional advisors, including those specializing in
> legal, tax and accounting matters. Orchard does not provide legal, tax or
> accounting advice.
>


Re: Issues on Zk configuration in Marathon

2016-02-01 Thread Shuai Lin
I think you need to either pin the tasks to some of the slaves (e.g. using
marathon "CLUSTER"

constraint) so that you can have a static configuration for your zk
instances, or you need some type of service discovery.

On Mon, Feb 1, 2016 at 9:09 PM, Sam  wrote:

> Hello guys
> One quick question in Marathon with Mesos,
> We are trying to deploy Zk with Marathon to make sure that Zk is always
> available no matter one of nodes crashed. For example : we got Zk1,Zk2 and
> Zk3, Zk1 need to have IP address of Zk2 and Zk3; Zk2 need to have IP
> address of Zk1 and Zk3 , same to Zk3 .  The issue is when one of them
> crashed , and Marathon spin up new Zk, how to have old  IP address
> configuration set into new instance ? I think this is issue to all App
> cluster that need to have each other configuration respectively.
> Looking forward to having solution to get it done . Appreciate
>
> Regards,
> Sam
>
> Sent from my iPhone


Re: Unable to build 2.6 on OS X

2016-01-28 Thread Shuai Lin
A googling of "configure: error: invalid variable name" leads me to
http://askubuntu.com/a/590679 . The reason: the first dash in your
'--with-apr' is not correct typed.

btw I would suggest similiar questsions go to dev list instead of user list.

On Fri, Jan 29, 2016 at 1:02 PM, Rinaldo Digiorgio 
wrote:

> I am trying to build 0.26 on OS/X  10.11.2 and failing in the configure
> step.
> I think all the required libraries are installed
>
> configure: error: cannot find libapr-1 headers
> ---
> libapr-1 is required for mesos to build.
> —
>
>
> ../configure  –-with-apr=/usr/local/Cellar/apr/1.5.2/libexec/
> configure: error: invalid variable name: `–-with-apr’
>
>
> configure —help shows the following so the option should be accepted
>
>   --with-apr=[=DIR]   specify where to locate the apr-1 library
>
>
> Rinaldo
>
>
>


Re: Mesos fetcher in dockerized slave

2016-01-20 Thread Shuai Lin
The testing of this case requires to build a docker image for mesos-slave,
so it seems not practical to add a test case for it in the mesos tests.

Anyway, here is the scripts I use for testing this issue:
https://gist.github.com/lins05/14455e92f37e91fd46ff

On Wed, Jan 20, 2016 at 10:30 AM, Shuai Lin <linshuai2...@gmail.com> wrote:

> Hi Tim,
>
> The review is here: https://reviews.apache.org/r/42390/ , would you
> please take a look?
>
> Regards,
> Shuai
>
> On Sat, Jan 9, 2016 at 9:42 AM, Shuai Lin <linshuai2...@gmail.com> wrote:
>
>> Hi Maria and Tim,
>>
>> I'm setting up a test case for this scenario that would fail, after which
>> I'll begin on fixing it.
>>
>> Is it feasible to include the fixing in the new release?
>>
>>
>> I'm not sure, does 0.27 have an estimated release date now?
>>
>>
>> Regards,
>> Shuai
>>
>> On Sat, Jan 9, 2016 at 1:11 AM, Timothy Chen <t...@mesosphere.io> wrote:
>>
>>> I can shepherd no problem.
>>>
>>> Tim
>>>
>>> On Dec 25, 2015, at 4:32 PM, Shuai Lin <linshuai2...@gmail.com> wrote:
>>>
>>> I'll work on it. @Tim could you shepherd it?
>>>
>>> On Sat, Dec 26, 2015 at 2:49 AM, Marica Antonacci <
>>> marica.antona...@ba.infn.it> wrote:
>>>
>>>> Hi Tim and Shuai,
>>>>
>>>> thank you very much for your reply. I have opened a JIRA issue on this:
>>>> https://issues.apache.org/jira/browse/MESOS-4249
>>>> I hope it will be patched soon :)
>>>>
>>>> Best regards,
>>>> Marica
>>>>
>>>>
>>>> Il giorno 24/dic/2015, alle ore 17:54, Tim Chen <t...@mesosphere.io> ha
>>>> scritto:
>>>>
>>>> Hi Marica/Shuai,
>>>>
>>>> Sorry haven't been able to spend the time to repro, but looks like
>>>> Shuai confirmed it.
>>>>
>>>> Can one of you file a JIRA?
>>>>
>>>> Thanks!
>>>>
>>>> Tim
>>>>
>>>> On Thu, Dec 24, 2015 at 6:16 AM, Shuai Lin <linshuai2...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Marica,
>>>>>
>>>>> I can reproduce the problem exactly as you described in the first
>>>>> email of this thread. Without `MESOS_DOCKER_MESOS_IMAGE` environment
>>>>> variable set, the fetcher works just fine; With it, the fetcher steps 
>>>>> seems
>>>>> skipped. This looks like a bug to me.
>>>>>
>>>>> Regards,
>>>>> Shuai
>>>>>
>>>>> On Tue, Dec 22, 2015 at 7:41 PM, Marica Antonacci <
>>>>> marica.antona...@ba.infn.it> wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I have not solved this issue yet. Please, can anyone run the same
>>>>>> test and let me know if the fetcher is correctly invoked?
>>>>>> The test is really simple, just try to start a dockerized app (see
>>>>>> json definition file below) through marathon on a mesos slave running in 
>>>>>> a
>>>>>> docker container started with the option —docker_mesos_image=>>>>> image>.
>>>>>> I would appreciate very much any feedback.
>>>>>>
>>>>>> Sample Marathon app:
>>>>>> {
>>>>>>  "id": "test-app",
>>>>>>  "container": {
>>>>>>"type": "DOCKER",
>>>>>>"docker": {
>>>>>>  "image": "libmesos/ubuntu"
>>>>>>}
>>>>>>  },
>>>>>>  "cpus": 1,
>>>>>>  "mem": 512,
>>>>>>  "uris": [ "
>>>>>> http://www.stat.cmu.edu/~cshalizi/402/lectures/16-glm-practicals/snoqualmie.csv;
>>>>>> ],
>>>>>>  "cmd": "cd $MESOS_SANDBOX; ls -latr; while sleep 10; do date -u +%T;
>>>>>> done"
>>>>>> }
>>>>>>
>>>>>> Docker run command to start dockerized mesos slave:
>>>>>>
>>>>>> # docker run -d MESOS_HOSTNAME= -e MESOS_IP= -e
>>>>>> MESOS_MASTER=zk://:2181,:2181,:2181/mesos -e
>>>>>> MESOS_CONTAINERIZERS=docker,mesos
>>>>>> -e MES

Re: Mesos fetcher in dockerized slave

2016-01-19 Thread Shuai Lin
Hi Tim,

The review is here: https://reviews.apache.org/r/42390/ , would you please
take a look?

Regards,
Shuai

On Sat, Jan 9, 2016 at 9:42 AM, Shuai Lin <linshuai2...@gmail.com> wrote:

> Hi Maria and Tim,
>
> I'm setting up a test case for this scenario that would fail, after which
> I'll begin on fixing it.
>
> Is it feasible to include the fixing in the new release?
>
>
> I'm not sure, does 0.27 have an estimated release date now?
>
>
> Regards,
> Shuai
>
> On Sat, Jan 9, 2016 at 1:11 AM, Timothy Chen <t...@mesosphere.io> wrote:
>
>> I can shepherd no problem.
>>
>> Tim
>>
>> On Dec 25, 2015, at 4:32 PM, Shuai Lin <linshuai2...@gmail.com> wrote:
>>
>> I'll work on it. @Tim could you shepherd it?
>>
>> On Sat, Dec 26, 2015 at 2:49 AM, Marica Antonacci <
>> marica.antona...@ba.infn.it> wrote:
>>
>>> Hi Tim and Shuai,
>>>
>>> thank you very much for your reply. I have opened a JIRA issue on this:
>>> https://issues.apache.org/jira/browse/MESOS-4249
>>> I hope it will be patched soon :)
>>>
>>> Best regards,
>>> Marica
>>>
>>>
>>> Il giorno 24/dic/2015, alle ore 17:54, Tim Chen <t...@mesosphere.io> ha
>>> scritto:
>>>
>>> Hi Marica/Shuai,
>>>
>>> Sorry haven't been able to spend the time to repro, but looks like Shuai
>>> confirmed it.
>>>
>>> Can one of you file a JIRA?
>>>
>>> Thanks!
>>>
>>> Tim
>>>
>>> On Thu, Dec 24, 2015 at 6:16 AM, Shuai Lin <linshuai2...@gmail.com>
>>> wrote:
>>>
>>>> Hi Marica,
>>>>
>>>> I can reproduce the problem exactly as you described in the first email
>>>> of this thread. Without `MESOS_DOCKER_MESOS_IMAGE` environment variable
>>>> set, the fetcher works just fine; With it, the fetcher steps seems skipped.
>>>> This looks like a bug to me.
>>>>
>>>> Regards,
>>>> Shuai
>>>>
>>>> On Tue, Dec 22, 2015 at 7:41 PM, Marica Antonacci <
>>>> marica.antona...@ba.infn.it> wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> I have not solved this issue yet. Please, can anyone run the same test
>>>>> and let me know if the fetcher is correctly invoked?
>>>>> The test is really simple, just try to start a dockerized app (see
>>>>> json definition file below) through marathon on a mesos slave running in a
>>>>> docker container started with the option —docker_mesos_image=>>>> image>.
>>>>> I would appreciate very much any feedback.
>>>>>
>>>>> Sample Marathon app:
>>>>> {
>>>>>  "id": "test-app",
>>>>>  "container": {
>>>>>"type": "DOCKER",
>>>>>"docker": {
>>>>>  "image": "libmesos/ubuntu"
>>>>>}
>>>>>  },
>>>>>  "cpus": 1,
>>>>>  "mem": 512,
>>>>>  "uris": [ "
>>>>> http://www.stat.cmu.edu/~cshalizi/402/lectures/16-glm-practicals/snoqualmie.csv;
>>>>> ],
>>>>>  "cmd": "cd $MESOS_SANDBOX; ls -latr; while sleep 10; do date -u +%T;
>>>>> done"
>>>>> }
>>>>>
>>>>> Docker run command to start dockerized mesos slave:
>>>>>
>>>>> # docker run -d MESOS_HOSTNAME= -e MESOS_IP= -e
>>>>> MESOS_MASTER=zk://:2181,:2181,:2181/mesos -e
>>>>> MESOS_CONTAINERIZERS=docker,mesos
>>>>> -e MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins -e MESOS_LOG_DIR=/var/log -e
>>>>> MESOS_docker_mesos_image=mesos-slave -v /sys/fs/cgroup:/sys/fs/cgroup -v
>>>>> /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos
>>>>> --name slave --net host --privileged --pid host mesos-slave
>>>>>
>>>>> Thank you very much in advance!
>>>>> Best regards,
>>>>> Marica
>>>>>
>>>>> Il giorno 19/dic/2015, alle ore 19:32, Marica Antonacci <
>>>>> marica.antona...@ba.infn.it> ha scritto:
>>>>>
>>>>> Dear Tim,
>>>>>
>>>>> I have collected some information from my test environment, starting
>>>>> the slave container wi

Re: 答复: can mesos run in SUSE Linux 11?

2016-01-15 Thread Shuai Lin
1. For kernel < 3.10, process isolation would have problems. See the
discussion in https://issues.apache.org/jira/browse/MESOS-3974 for details.

2. From http://mesos.apache.org/gettingstarted/ , GCC 4.8.1+ or clang 3.5+)
is required to compile the source.



On Sat, Jan 16, 2016 at 3:36 PM, Linyuxin <linyu...@huawei.com> wrote:

> Thanks for the reply.
>
>
>
> I still have two questions:
>
> 1.   Is there any pitfall if the linux kernel version less than 3.10
> which is recommended in the document?
>
> 2.   I compiled the source in SUSE 11 SP3 with g++4.7, but I
> encountered a error:
>
> configure: error: *** A compiler with support for C++11 language features
> is required.
>
> Any suggestion?
>
>
>
> *发件人:* Shuai Lin [mailto:linshuai2...@gmail.com]
> *发送时间:* 2016年1月16日 15:13
> *收件人:* user@mesos.apache.org
> *主题:* Re: can mesos run in SUSE Linux 11?
>
>
>
> There is no official package for SUSE on the downloads page of mesosphere:
> https://open.mesosphere.com/downloads/mesos/#apache-mesos-0.26.0 . So I
> guess you have to either compile from source, or run mesos master/slave in
> docker containers.
>
>
>
> On Sat, Jan 16, 2016 at 1:47 PM, Linyuxin <linyu...@huawei.com> wrote:
>
> Hi All,
>
>
>
>  I want to know if mesos can run in SUSE Linux 11.
>
> I can not find any information from the document reference.
>
>
>
> Thanks.
>
>
>


Re: can mesos run in SUSE Linux 11?

2016-01-15 Thread Shuai Lin
There is no official package for SUSE on the downloads page of mesosphere:
https://open.mesosphere.com/downloads/mesos/#apache-mesos-0.26.0 . So I
guess you have to either compile from source, or run mesos master/slave in
docker containers.

On Sat, Jan 16, 2016 at 1:47 PM, Linyuxin  wrote:

> Hi All,
>
>
>
>  I want to know if mesos can run in SUSE Linux 11.
>
> I can not find any information from the document reference.
>
>
>
> Thanks.
>


Re: slave nodes are living in two cluster and can not remove correctly.

2016-01-14 Thread Shuai Lin
Based on your description, you have two clusters:

- old cluster B, with mesos 0.25, and the master ip is 10.88.169.195
- new cluster A, with mesos 0.22, and the master ip is 10.90.12.29

Also you have a slave S, 10.90.5.19, which was originally in cluster B, and
you have reconfigured it to join cluster A, but forgot to cleanup the slave
work dir.

>From the logs, S is now registered with cluster A (which is what you
intended), but S is still shown in the slaves list of cluster B (which is
confusing), and the master of cluster B is still sending messages to S:

```
W0105 19:05:38.207882 6450 slave.cpp:1973] Ignoring shutdown framework
message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0116 from
master@10.90.12.29:5050 because it is not from the registered master (
master@10.88.169.195:5050)
```

What's in the master logs of cluster A and B?  That could help others
understand the problem.



On Fri, Jan 15, 2016 at 12:27 PM, X Brick  wrote:

> sorry for the wrong api response of cluster A
>
> {
>>   "active": true,
>>   "attributes": {
>> "apps": "logstash",
>> "colo": "cn5",
>> "type": "prod"
>>   },
>>   "hostname": "l-bu128g5-10k10.ops.cn2.qunar.com",
>>   "id": "20151230-034049-3282655242-5050-1802-S7",
>>   "pid": "slave(1)@10.90.5.19:5051",
>>   "registered_time": 1452094227.39161,
>>   "reregistered_time": 1452831994.32924,
>>   "resources": {
>> "cpus": 32,
>> "disk": 2728919,
>> "mem": 128126,
>> "ports": "[8100-1, 31000-32000]"
>>   }
>> }
>>
>
> 2016-01-15 12:22 GMT+08:00 X Brick :
>
>> Hi folks,
>>
>> I meet a very strange issue when I migrated two nodes from one cluster to
>> another about one week ago.
>>
>> Two nodes:
>>
>> l-bu128g3-10k10.ops.cn2
>> l-bu128g5-10k10.ops.cn2
>>
>> I did not clean the mesos data dir before they join the another cluster,
>> then I found the nodes live in two cluster at the same time.
>>
>> Cluster A (Mesos 0.22):
>>
>>
>> Cluster B (Mesos 0.25):
>>
>>
>> ​
>> ​
>> I thought maybe the old data make these happened, so I clear up these two
>> nodes data dir and rejoin the cluster A. But nothing changed, they still
>> come back to the old cluster(Cluster B).
>>
>>
>> Here is the "/master/slaves" response:
>>
>> Cluster A:
>>
>> {
>>>   "slaves": [
>>> {
>>>   "active": true,
>>>   "attributes": {
>>> "apps": "logstash",
>>> "colo": "cn5",
>>> "type": "prod"
>>>   },
>>>   "hostname": "l-bu128g9-10k10.ops.cn2.qunar.com",
>>>   "id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S5",
>>>   "pid": "slave(1)@10.90.5.23:5051",
>>>   "registered_time": 1451990379.49813,
>>>   "reregistered_time": 1452093251.39516,
>>>   "resources": {
>>> "cpus": 32,
>>> "disk": 2728919,
>>> "mem": 128126,
>>> "ports": "[8100-1, 31000-32000]"
>>>   }
>>> },
>>>
>>>
>> Cluster B:
>>
>> {
>>>   "slaves": [
>>> {
>>>   "active": false,
>>>   "attributes": {
>>> "apps": "logstash",
>>> "colo": "cn5",
>>> "type": "prod"
>>>   },
>>>   "hostname": "l-bu128g5-10k10.ops.cn2.qunar.com",
>>>   "id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S2",
>>>   "offered_resources": {
>>> "cpus": 0,
>>> "disk": 0,
>>> "mem": 0
>>>   },
>>>   "pid": "slave(1)@10.90.5.19:5051",
>>>   "registered_time": 1451988622.66323,
>>>   "reserved_resources": {},
>>>   "resources": {
>>> "cpus": 32.0,
>>> "disk": 2728919.0,
>>> "mem": 128126.0,
>>> "ports": "[8100-1, 31000-32000]"
>>>   },
>>>   "unreserved_resources": {
>>> "cpus": 32.0,
>>> "disk": 2728919.0,
>>> "mem": 128126.0,
>>> "ports": "[8100-1, 31000-32000]"
>>>   },
>>>   "used_resources": {
>>> "cpus": 0,
>>> "disk": 0,
>>> "mem": 0
>>>   }
>>> },
>>> .
>>>
>>>
>>
>> I found some useful logs:
>>
>>
>>> I0105 18:36:22.683724 6452 slave.cpp:2248] Updated checkpointed
>>> resources from to
>>> I0105 18:37:09.900497 6459 slave.cpp:3926] Current disk usage 0.06%. Max
>>> allowed age: 1.798706758587755days
>>> I0105 18:37:22.678374 6453 slave.cpp:3146] Master marked the slave as
>>> disconnected but the slave considers itself registered! Forcing
>>> re-registration.
>>> I0105 18:37:22.678699 6453 slave.cpp:694] Re-detecting master
>>> I0105 18:37:22.678715 6471 status_update_manager.cpp:176] Pausing
>>> sending status updates
>>> I0105 18:37:22.678753 6453 slave.cpp:741] Detecting new master
>>> I0105 18:37:22.678977 6456 status_update_manager.cpp:176] Pausing
>>> sending status updates
>>> I0105 18:37:22.679047 6455 slave.cpp:705] New master detected at
>>> master@10.88.169.195:5050
>>> I0105 18:37:22.679108 6455 slave.cpp:768] Authenticating with master
>>> master@10.88.169.195:5050
>>> I0105 18:37:22.679136 6455 slave.cpp:773] Using default CRAM-MD5
>>> authenticatee
>>> I0105 

Re: Mesos fetcher in dockerized slave

2016-01-08 Thread Shuai Lin
Hi Maria and Tim,

I'm setting up a test case for this scenario that would fail, after which
I'll begin on fixing it.

Is it feasible to include the fixing in the new release?


I'm not sure, does 0.27 have an estimated release date now?


Regards,
Shuai

On Sat, Jan 9, 2016 at 1:11 AM, Timothy Chen <t...@mesosphere.io> wrote:

> I can shepherd no problem.
>
> Tim
>
> On Dec 25, 2015, at 4:32 PM, Shuai Lin <linshuai2...@gmail.com> wrote:
>
> I'll work on it. @Tim could you shepherd it?
>
> On Sat, Dec 26, 2015 at 2:49 AM, Marica Antonacci <
> marica.antona...@ba.infn.it> wrote:
>
>> Hi Tim and Shuai,
>>
>> thank you very much for your reply. I have opened a JIRA issue on this:
>> https://issues.apache.org/jira/browse/MESOS-4249
>> I hope it will be patched soon :)
>>
>> Best regards,
>> Marica
>>
>>
>> Il giorno 24/dic/2015, alle ore 17:54, Tim Chen <t...@mesosphere.io> ha
>> scritto:
>>
>> Hi Marica/Shuai,
>>
>> Sorry haven't been able to spend the time to repro, but looks like Shuai
>> confirmed it.
>>
>> Can one of you file a JIRA?
>>
>> Thanks!
>>
>> Tim
>>
>> On Thu, Dec 24, 2015 at 6:16 AM, Shuai Lin <linshuai2...@gmail.com>
>> wrote:
>>
>>> Hi Marica,
>>>
>>> I can reproduce the problem exactly as you described in the first email
>>> of this thread. Without `MESOS_DOCKER_MESOS_IMAGE` environment variable
>>> set, the fetcher works just fine; With it, the fetcher steps seems skipped.
>>> This looks like a bug to me.
>>>
>>> Regards,
>>> Shuai
>>>
>>> On Tue, Dec 22, 2015 at 7:41 PM, Marica Antonacci <
>>> marica.antona...@ba.infn.it> wrote:
>>>
>>>> Dear all,
>>>>
>>>> I have not solved this issue yet. Please, can anyone run the same test
>>>> and let me know if the fetcher is correctly invoked?
>>>> The test is really simple, just try to start a dockerized app (see json
>>>> definition file below) through marathon on a mesos slave running in a
>>>> docker container started with the option —docker_mesos_image=>>> image>.
>>>> I would appreciate very much any feedback.
>>>>
>>>> Sample Marathon app:
>>>> {
>>>>  "id": "test-app",
>>>>  "container": {
>>>>"type": "DOCKER",
>>>>"docker": {
>>>>  "image": "libmesos/ubuntu"
>>>>}
>>>>  },
>>>>  "cpus": 1,
>>>>  "mem": 512,
>>>>  "uris": [ "
>>>> http://www.stat.cmu.edu/~cshalizi/402/lectures/16-glm-practicals/snoqualmie.csv;
>>>> ],
>>>>  "cmd": "cd $MESOS_SANDBOX; ls -latr; while sleep 10; do date -u +%T;
>>>> done"
>>>> }
>>>>
>>>> Docker run command to start dockerized mesos slave:
>>>>
>>>> # docker run -d MESOS_HOSTNAME= -e MESOS_IP= -e
>>>> MESOS_MASTER=zk://:2181,:2181,:2181/mesos -e
>>>> MESOS_CONTAINERIZERS=docker,mesos
>>>> -e MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins -e MESOS_LOG_DIR=/var/log -e
>>>> MESOS_docker_mesos_image=mesos-slave -v /sys/fs/cgroup:/sys/fs/cgroup -v
>>>> /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos
>>>> --name slave --net host --privileged --pid host mesos-slave
>>>>
>>>> Thank you very much in advance!
>>>> Best regards,
>>>> Marica
>>>>
>>>> Il giorno 19/dic/2015, alle ore 19:32, Marica Antonacci <
>>>> marica.antona...@ba.infn.it> ha scritto:
>>>>
>>>> Dear Tim,
>>>>
>>>> I have collected some information from my test environment, starting
>>>> the slave container with and without the —docker_mesos_image startup flag.
>>>> Please let me know if you need further input. Thank you very much for your
>>>> support!
>>>>
>>>> Using the flag —docker_mesos_image:
>>>>
>>>> root@mesos-slave:~# docker ps
>>>> CONTAINER IDIMAGE   COMMAND
>>>> CREATED STATUS  PORTS   NAMES
>>>> b30cea22a07clibmesos/ubuntu "/bin/sh -c 'cd $MESO"   2
>>>> minutes ago   Up 2 minutes
>>>> mesos-db70e09f-f39d-491c-8480-73d9858c140b-S0.

Re: mesos, big data and service discovery

2015-12-30 Thread Shuai Lin
What about specifying all non-local instances as "backup" in haproxy.cfg?
This way haproxy would only direct traffic to the local instance as long as
the local instance is alive.

For example, if you plan to use the haproxy-marathon-bridge script, you can
modify this line to achieve that:
https://github.com/mesosphere/marathon/blob/8b3ce8844dcc53055345914ef11019789dd843cf/bin/haproxy-marathon-bridge#L162
.


On Thu, Dec 31, 2015 at 1:56 AM, vincent gromakowski <
vincent.gromakow...@gmail.com> wrote:

> I am currently using mesos as a big data backend for spark, cassandra,
> kafka and elasticsearch but I cannot find a good overall design regarding
> service discovery. I explain:
> Generally, the service discovery is managed by a HAproxy instance on each
> node which redirect trafic from service ports to real assigned network
> ports. Currently I am not using it because the cluster is quite small and I
> don't need to deploy lots of service but I am thinking on futur design that
> will allows me to scale.
> The problem with HAproxy dealing with all network trafic is that I am
> afraid it will break the data locality which is so important in the big
> data world regarding performances.
> For example when Spark tries to connect to elasticsearch, it will discover
> the elasticsearch topology and try to launch tasks next to elasticsearch
> shards. If HAproxy intercept network flows, what would be the result ?
> Will HAproxy masquarade the elasticsearch  IP/ports ? Same thing for Kafka
> and Cassandra ?
>
> I assume it depends on each connector but it's very hard to find any
> information. Thanks for your help if you have any experience in it.
> Regards
>
>
>


Re: make slaves not getting tasks anymore

2015-12-30 Thread Shuai Lin
>
> I need to wait until all tasks are done and during this time no new tasks
> should be started on this slave


This is  exactly what maintenance mode is designed for. But to achieve
this, it requires the cooperation of the framework. When the operator adds
a maintenance schedule for a slave, mesos master would first send "inverse
offers" to all frameworks that have tasks running on that slave, and the
frameworks are "assumed to" move the tasks away to other slaves.

But the framework can ignore the inverse offers as well, for example, I
can't find any code to handle it in marathon code.




> Also the maintenance mode seems not to be an option:

When maintenance is triggered by the operator, all agents on the machine
> are told to shutdown


Be aware that the maintenance process is a two-phase process:

- the first step is "adding the maintenance schedule", the operator tells
master "I would take slaveX down for maintenance in 1 hour, please ask the
frameworks to move their tasks to other slaves", as I described above
- the second step is "starting the maintenance", the operator tells the
master "I'm taking this slave down RIGHT NOW". The master would kill all
tasks on that slave and asks the mesos-slave process to exit, as described
in the paragrah you quoted in the original mesasge.

In a word, it mostly depends on the frameworks you use.


On Wed, Dec 30, 2015 at 7:43 PM, Mike Michel  wrote:

> Hi,
>
>
>
> i need to update slaves from time to time and looking for a way to take
> them out of the cluster but without killing the running tasks. I need to
> wait until all tasks are done and during this time no new tasks should be
> started on this slave. My first idea was to set a constraint
> „status:online“ for every task i start and then change the attribute of the
> slave to „offline“, restart slave process while executer still runs the
> tasks but it seems if you change the attributes of a slave it can not
> connect to the cluster without rm -rf /tmp before which will kill all tasks.
>
>
>
> Also the maintenance mode seems not to be an option:
>
>
>
> „When maintenance is triggered by the operator, all agents on the machine
> are told to shutdown. These agents are subsequently removed from the master
> which causes tasks to be updated as TASK_LOST. Any agents from machines
> in maintenance are also prevented from registering with the master.“
>
>
>
> Is there another way?
>
>
>
>
>
> Cheers
>
>
>
> Mike
>


Re: Mesos fetcher in dockerized slave

2015-12-18 Thread Shuai Lin
The problem happens to me if I don't specify the --docker_mesos_image flag.
However, specifying the flag only makes things worse: the task is failed
again and agin, but there does exist a container for this task.

master and zookeeper is running on host, and slave is running inside a
docker image:

```
sudo docker run -it --rm \
-e MESOS_HOSTNAME=localhost \
-e MESOS_IP=127.0.0.1 \
-e MESOS_MASTER=zk://127.0.0.1:2181/mesos \
-v /sys/fs/cgroup:/sys/fs/cgroup \
-v /var/run/docker.sock:/var/run/docker.sock \
--name mesos-slave \
--net host \
--privileged \
mesoscloud/mesos-slave:0.24.1-ubuntu-14.04

```

However my setup may affect the outcome: master is 0.25.0 and slave is
0.24.1 (can't find a public docker image for mesos 2.5.1)

Output of http http://127.0.0.1:8080/v2/apps (unrelevant part ommited)

```
{
  "apps": [
  "container": {
"docker": {
  "parameters": [],
  "privileged": false,
  "network": "BRIDGE",
  "image": "testapp:latest"
},
"volumes": [],
"type": "DOCKER"
  },
  "uris": [
"https://google.com/robots.txt;
  ],
}
  ]
}
```

On Fri, Dec 18, 2015 at 7:11 PM, Grzegorz Graczyk 
wrote:

> I've tried to use this flag, but cannot really run any container when this
> flag is set.
> I've raised this issue here:
> https://www.mail-archive.com/user@mesos.apache.org/msg04975.html and
> here:
> https://github.com/mesosphere/docker-containers/issues/6#issuecomment-155364351
>  but
> sadly no one was able to help me...
>
> pt., 18.12.2015 o 11:33 użytkownik Marica Antonacci <
> marica.antona...@ba.infn.it> napisał:
>
>> OK, the problem I spotted is related to the usage of the
>> flag —docker_mesos_image that allows the executor to
>>
>>
>> --docker_mesos_image=VALUEThe docker image used to launch this mesos
>> slave instance. If an image is specified, the docker containerizer assumes
>> the slave is running in a docker container, and launches executors with
>> docker containers in order to recover them when the slave restarts and
>> recovers.
>> Has anyone used this flag and tested the behavior of the fetcher?
>>
>> Thank you
>> Marica
>>
>>
>> Il giorno 18/dic/2015, alle ore 10:38, tommy xiao  ha
>> scritto:
>>
>> no docker_mesos_image flag in my docker run,  and the docker image is
>> build by myself.
>>
>>
>>
>> 2015-12-18 17:20 GMT+08:00 Marica Antonacci 
>> :
>>
>> Yes, I did check inside the container and the csv file was not downloaded
>>> as shown also by the app details (see the screenshot below).
>>>
>>> Are you running your slave with the --docker_mesos_image flag? Can you
>>> please provide me the docker run command you are using to run your
>>> dockerized slave?
>>>
>>> Thank you very much
>>>
>> Marica
>>>
>>>
>>> 
>>>
>>
>>>
>>> Il giorno 18/dic/2015, alle ore 10:00, tommy xiao  ha
>>> scritto:
>>>
>>> Hi Marica,
>>>
>>> use your test-app json, i can run it correctly, the csv is truely
>>> download by mesos slave. please check mesos-master:5050 to check the task
>>> detail download files.
>>>
>>> you describe the app container why not found the csv, because the csv is
>>> download in slave container's folder, not in app container. so if you run
>>>
>>> cd $MESOS_SANDBOX;
>>>
>>> the folder in app container is default value:
>>>
>>> MESOS_SANDBOX=/mnt/mesos/sandbox
>>> but in real world, the sandbox is in slave container, not in app
>>> container.
>>>
>>>
>>>
>>> 2015-12-18 16:11 GMT+08:00 Marica Antonacci >> >:
>>>
 Thank you very much,

 I’m using a sample application definition file, just for testing
 purpose:

 {
  "id": "test-app",
  "container": {
"type": "DOCKER",
"docker": {
  "image": "libmesos/ubuntu"
}
  },
  "cpus": 1,
  "mem": 512,
  *"uris": [
 "http://www.stat.cmu.edu/~cshalizi/402/lectures/16-glm-practicals/snoqualmie.csv
 "
 ],*
  "cmd": "cd $MESOS_SANDBOX; ls -latr; while sleep 10; do date -u +%T;
 done"
 }

 Here is the docker run command line:

 # docker run -d -e MESOS_HOSTNAME= -e MESOS_IP= -e
 MESOS_MASTER=zk://:2181,:2181,:2181/mesos
 -e MESOS_CONTAINERIZERS=docker,mesos \
   -e MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins -e
 MESOS_LOG_DIR=/var/log -e MESOS_docker_mesos_image=mesos-slave
   -v /sys/fs/cgroup:/sys/fs/cgroup -v
 /var/run/docker.sock:/var/run/docker.sock --name slave --net host
 --privileged --pid host mesos-slave


 As already mentioned, if I remove the environment variable
 MESOS_docker_mesos_image the fetcher works fine and I can see the file
 snoqualmie.csv inside the sandbox.

 Thank 

Re: Team organization around Mesos cluster

2015-12-01 Thread Shuai Lin
I think that would depends on how would you use the mesos cluster.

We have a mesos cluster of ~20 nodes to run all the production web
services. From the POV of other teams, the mesos cluster is like an
internal PaaS, and they only need to know how to manage their own apps -
how to create app instances and upgrade them (we do that with a slack chat
bot), much like the way you use a public PaaS.

The operational team does all the heavy lifting - server provisioning and
monitoring, shared service management (e.g. mysql/cache/mq) for apps
running in mesos cluster.

If you would create your own mesos framework, it would require more close
interaction between the project team and the operational team.

Hope that helps.

Regards,
Shuai

On Tue, Dec 1, 2015 at 12:55 AM, aurelien.de...@gmail.com <
aurelien.de...@gmail.com> wrote:

> Hello.
>
>
> I'm in the process of demonstrate and talk about mesos all around my
> company. Everybody is quite interested, by anytime we talk, they always
> raise the "operation" problem.
>
>
> We are a quite big company (100k in France), we're doing operation and
> system management the "old way", with big operation team taking care of a
> lot of projects, with dozen of operating procedure for each task (from
> Apache restart to database restoration). Project team think that Mesos fits
> quite badly in this way of doing things, and wonder how "real people"
> running a Mesos cluster are doing.
>
>
> Therefore, If you don't mind how you are doing things for operations,
> without disclosing any sensible information of course, any info would be
> appreciated.
>
>
> Thanks.
>
>


Re: Mesos Events Calendar

2015-11-22 Thread Shuai Lin
+1, very useful!

On Sat, Nov 21, 2015 at 2:58 PM, Michael Park  wrote:

> I can definitely add a finite list of admins, but I don't think I can just
> open it up to the public (we probably don't want to anyway).
> I've added your gmail account as one of the admins to start, I think
> perhaps we organically grow the list for interested individuals?
>
> Suggestions welcome for better approaches to this.
>
> On Fri, Nov 20, 2015 at 8:54 AM Benjamin Mahler 
> wrote:
>
>> Nice, is there a way to open up the calendar so that others can add
>> events?
>>
>> On Fri, Nov 20, 2015 at 8:42 AM, Michael Park  wrote:
>>
>>> Hello, I've created a public events calendar:
>>>
>>>
>>> https://calendar.google.com/calendar/embed?src=2hecvndc0mnaqlir34cqnfvtak%40group.calendar.google.com=America/Los_Angeles
>>>
>>> The intent is to capture the community sync schedules there as well as
>>> other events such as MesosCon and meet-ups.
>>>
>>> The following review request https://reviews.apache.org/r/40531/ embeds
>>> the Events Calendar in our Community page.
>>>
>>> [image: Screen Shot 2015-11-19 at 11.34.17 PM.png]
>>>
>>
>>


Re: Deploying containers to every mesos slave node

2015-03-12 Thread Shuai Lin
We do the same thing: running consul on each mesos slave, and use saltstack
to provision it. Why do you want to get rid of salt? You  always need some
tool to provision your server, right?

Regards,
Shuai

On Thu, Mar 12, 2015 at 4:54 PM, Aaron Carey aca...@ilm.com wrote:

  Hi All,

 In setting up our cluster, we require things like consul to be running on
 all of our nodes. I was just wondering if there was any sort of best
 practice (or a scheduler perhaps) that people could share for this sort of
 thing?

 Currently the approach is to use salt to provision each node and add
 consul/mesos slave process and so on to it, but it'd be nice to remove the
 dependency on salt.

 Thanks,
 Aaron



Re: cluster wide init

2015-01-21 Thread Shuai Lin
You can always write the init wrapper scripts for marathon. There is an
official debian package, which you can find in mesos's apt repo.

On Thu, Jan 22, 2015 at 4:20 AM, CCAAT cc...@tampabay.rr.com wrote:

 Hello all,

 I was reading about Marathon: Marathon scheduler processes were started
 outside of Mesos using init, upstart, or a similar tool [1]

This means



 So my related questions are

 Does Marathon work with mesos + Openrc as the init system?

 Are there any other frameworks that work with Mesos + Openrc?


 James



 [1] http://mesosphere.github.io/marathon/



Re: mesos and coreos?

2015-01-18 Thread Shuai Lin
Nope. First, mesos is not a framework.  A framework is what you use in
your application to help build the app itself, like spring, rails, or
django. Mesos is more fundamental.

- mesos gathers all the resources (cpus/mems/disks) of the nodes in your
cluster and make it a resource pool
- your app doesn't even know it's scheduled and managed (e.g.
started/stopped) by mesos (to be exact, by any framework running on mesos,
like marathon)

So you can think mesos as an distributed operating system , just as
mesosphere's slogan says.


On Mon, Jan 19, 2015 at 6:27 AM, Victor L vlyamt...@gmail.com wrote:

 Does that mean mesos is framework to prepare my app to take advantage of
 clustering environment?

 On Sun, Jan 18, 2015 at 1:43 PM, Tom Arnfeld t...@duedil.com wrote:

 The way I see it, Mesos is an API and framework for building and running
 distributed systems. CoreOS is an API and framework for running them.

 --

 Tom Arnfeld
 Developer // DueDil

 (+44) 7525940046
 25 Christopher Street, London, EC2A 2BS


 On Sun, Jan 18, 2015 at 3:01 PM, Jason Giedymin jason.giedy...@gmail.com
  wrote:

 The value of coreos that immediately comes to mind since I do much work
 with these tools:

 - the small foot print, it is a minimal os, meant to run containers. So
 it throws everything not needed for that out.
 - containers are the launch vehicle, thus deps are in container land. I
 can run and test containers with ease, not having to worry about multiple
 OSes.
 - with etcd and fleet, coordinating the launch and modification of both
 machines and cluster make it a breeze. Allowing you to do dynamic mesos
 scaling up or down. I add nodes at will, across multiple cloud platforms,
 ready to launch multitude of containers or just mesos.
 - security. There is a defined write strategy. You cannot write willy
 nilly to any location.
 - all the above further allow auto OS updates, which is supported today
 on all platforms that deploy coreos. This means more frequent updates since
 the os is minimal, which should increase the security effectiveness when
 compared to big box superstore OSes like Redhat or Ubuntu. Some platforms
 charge quite a bit for managed updates of this frequency and level of
 testing.

 Coreos allows me to keep apps in a configured container that I trust,
 tested, and works time and time again.

 I see coreos as a compliment.

 As a fyi I'm available for questions, debugging, and client work in this
 area.

 Hope this helps some, from real world usage.

 Sent from my iPad

  On Jan 18, 2015, at 9:16 AM, Victor L vlyamt...@gmail.com wrote:
 
  I am confused: what's the value of mesos on the top of coreos cluster?
 Mesos provides distributed resource management, fault tolerance, etc., but
 doesn't coreos provides the same things already?
  Thanks






Re: Recommended resources for master / scheduler machines

2015-01-10 Thread Shuai Lin
Hi Itamar,

You should really run zookeeper on more than one node (typically 3, 5, or 7
is very common). Otherwise, in your case, if the node running your
zookeeper servce goes down for any reason, your whole mesos installation
would stop working until you bring that node back.

Regards,
Shuai



On Sat, Jan 10, 2015 at 9:56 PM, Itamar Ostricher ita...@yowza3d.com
wrote:

 Interesting. I knew I needed to look into ZooKeeper more than I did :-)

 I don't know what's distributed mode in ZooKeeper. I can tell you we use
 a single host for the master, and configure all machines with
 zk://master-host-name:2181/mesos in /etc/mesos/zk before the mesos
 services are started.

 We don't assign a dedicated device to ZooKeeper, so maybe it bites us...

 On Thu, Jan 8, 2015 at 9:33 PM, Tomas Barton barton.to...@gmail.com
 wrote:

 Is ZooKeeper running in distributed mode?

 ZooKeeper is writes periodically all data to disk (transaction log), so
 the bottleneck could be ZooKeeper rather than
 not enough CPUs. ZooKeeper limits each key to 1MB, typically 512MB should
 be enough for ZooKeeper (or 4GB
 might not be enough, depends on your use-case).

 from ZooKeeper docs:

 ZooKeeper's transaction log must be on a dedicated device. (A dedicated
 partition is not enough.) ZooKeeper writes the log sequentially, without
 seeking Sharing your log device with other processes can cause seeks and
 contention, which in turn can cause multi-second delays.

  In particular, you should not create a situation in which ZooKeeper
 swaps to disk. The disk is death to ZooKeeper. Everything is ordered, so if
 processing one request swaps the disk, all other queued requests will
 probably do the same. the disk. DON'T SWAP.


 On 8 January 2015 at 16:47, Itamar Ostricher ita...@yowza3d.com wrote:

 Thanks Tomas.

 We're still quite far from the 10k-20k machines limit :-)

 Currently, our framework scheduler generates many (millions) of mostly
 small tasks (some in the ~100ms, some in the few seconds).
 I understand that the network is the main bottleneck, but we sometimes
 experience lost tasks, and sometimes I see master logs indicating that the
 master is unable to talk with the zookeeper service (which is on the same
 host), and I was wondering if it's related to CPU/RAM of the master machine.
 Is 1 CPU enough? 2? 4?
 1GiB RAM? 4? 8?

 On Thu, Jan 8, 2015 at 5:00 PM, Tomas Barton barton.to...@gmail.com
 wrote:

 Hi Itamar,

 there's definitely certain limit of machines which can Mesos master
 handle. This limit is between 10 000 - 20 000 (that's number
 reported by Twitter). This bottleneck is caused by event loop which
 handles communication at master.

 With hundreds of machines you should be fine. Only in case that your
 framework scheduler would demand
 too many resources for computing allocations you might encounter some
 problems.

 How does the strength of the master  scheduler machines affect the
 overall cluster performance?


 I would say that the network is usually the main bottleneck. Adding
 extra RAM won't improve mesos-master
 performance. Of course if there's high CPU load on master you might
 observe performance regression. Also
 this depends on granularity of your tasks, if you have few long running
 tasks or many short tasks (which runs
 just hundreds of ms).

 Tomas


 On 6 January 2015 at 10:12, Itamar Ostricher ita...@yowza3d.com
 wrote:

 Are there recommendations regarding master / scheduler machines
 resources as function of cluster size?

 Say I have a cluster with hundreds of slave machines and thousands of
 CPUs, with a single framework that will schedule millions of tasks.
 How does the strength of the master  scheduler machines affect the
 overall cluster performance?

 Thanks,
 - Itamar.








Re: Following Mesos

2015-01-08 Thread Shuai Lin
Hi Dave, thanks for the link to markmail and planet mesos, especially the
latter, it has many good articles about mesos!

On Fri, Jan 9, 2015 at 7:38 AM, Dave Lester daveles...@gmail.com wrote:

 Hi James,

 Thanks for asking about this! I think there's a lot of room for
 improvement here. Have you seen the Mesos markmail archive [user@
 http://markmail.org/search/?q=mesos#query:mesos%20list%3Aorg.apache.incubator.mesos-user+page:1+state:facets
 , dev@
 http://markmail.org/search/?q=mesos#query:mesos%20list%3Aorg.apache.incubator.mesos-dev+page:1+state:facets],
 or Planet Mesos http://planet.apache.org/mesos/? The former
 consolidates emails from the user and developer email lists, the latter
 syndicates blogs from community members. In my experience, markmail has
 refreshed within a few seconds of publishing so I think it's safe to trust.

 Best,
 Dave

 On Thu, Jan 8, 2015 at 3:29 PM, CCAAT cc...@tampabay.rr.com wrote:

 Hello one and all,

 What I'm looking for is an easy interface to follow this group
 timely as in an archive for others not subscribed to this group to follow
 this group, real-time. Most archives have a significant lag on the
 availability of posts to the archive; so a more robust (real time) archive
 access to the postings is desired by others I work with.

 I also vaguely recall that some participants herein have aggregated mesos
 related information and resources into singular sites, so If what
 I'm looking for there, just remind me with the link.


 TIA,
 James





Re: conf files location of mesos.

2015-01-07 Thread Shuai Lin
Yes, the mesos-master or mesos-slave binary itself only accepts command
line options, and do not read configs from places like /etc/mesos/ or
/etc/default/mesos.

If you compile from source, you either write a wrapper script to read these
configs/options, and pass them to mesos-master/mesos-slave like the
official deb package does, or pass the configs directly to mesos program in
command line.

On Thu, Jan 8, 2015 at 1:47 AM, Dick Davies d...@hellooperator.net wrote:

 Might be worth getting a packaged release for your OS, especially
 if you're new to this.

 On 7 January 2015 at 16:53, Dan Dong dongda...@gmail.com wrote:

 Hi, Brian,
   It's not there:
 ls /etc/default/mesos
 ls: cannot access /etc/default/mesos: No such file or directory

 I installed mesos from source tar ball by configure;make;make install as
 normal user.

 Cheers,
 Dan


 2015-01-07 10:43 GMT-06:00 Brian Devins brian.dev...@dealer.com:

  Try ls /etc/default/mesos instead

   From: Dan Dong dongda...@gmail.com
 Reply-To: user@mesos.apache.org user@mesos.apache.org
 Date: Wednesday, January 7, 2015 at 11:38 AM
 To: user@mesos.apache.org user@mesos.apache.org
 Subject: Re: conf files location of mesos.

Hi, All,
Thanks for your helps, I'm using version 0.21.0 of mesos. But I do
 not see any of the dirs of 'etc' or 'var' under my build directory(and any
 subdirs). What is the default conf files location for mesos 0.21.0?

 ls ~/mesos-0.21.0/build/
 3rdparty  bin  config.log  config.lt  config.status  ec2  include  lib
 libexec  libtool  Makefile  mesos.pc  mpi  sbin  share  src

Cheers,
Dan

 2015-01-07 9:47 GMT-06:00 Tomas Barton barton.to...@gmail.com:

 Hi Dan,

  this depends on your distribution. Mesosphere package comes with
 wrapper script which uses configuration
 placed in /etc/default/mesos and /etc/mesos-master, /etc/mesos-slave


 https://github.com/mesosphere/mesos-deb-packaging/blob/master/mesos-init-wrapper

  which distribution do you use?

  Tomas

 On 7 January 2015 at 16:23, Dan Dong dongda...@gmail.com wrote:

   Hi,
After installation of mesos on my cluster, where could I find the
 location of configuration files?
  E.g: mesos.conf, masters, slaves etc. I could not find any of them
 under the prefix dir and subdirs (configure
 --prefix=/home/dan/mesos-0.21.0/build/). Are there examples for the conf
 files? Thanks!

  Cheers,
  Dan





 Brian Devins* |* Java Developer
 brian.dev...@dealer.com

 [image: Dealer.com]






Re: python error when configuring mesos on CentOS6.6.

2015-01-05 Thread Shuai Lin
On Tue, Jan 6, 2015 at 4:55 AM, Dan Dong dongda...@gmail.com wrote:

 Hi, All,
   When I configure mesos 0.21.0 on CentOS6.6, I got python lib error as
 following. python2.7 and python-dev packages have been installed already.
 Any hints(e.g: ENVs)?
 (python-devel-2.6.6-52.el6.x86_64
 python27-2.7.3-6.2.el6.nux.x86_64)


Your python is 2.7, but python-devel is 2.6. That may be the problem, try
to install python-devel 2.7 and see if it would work.





 checking for python version... 2.7
 checking for python platform... linux2
 checking for python script directory...
 ${prefix}/lib/python2.7/site-packages
 checking for python extension module directory...
 ${exec_prefix}/lib64/python2.7/site-packages
 checking for python2.7... (cached) /usr/bin/python
 checking for a version of Python = '2.1.0'... yes
 checking for a version of Python = '2.6'... yes
 checking for the distutils Python package... yes
 checking for Python include path... -I/usr/include/python2.7
 checking for Python library path... -L/usr/lib64 -lpython2.7
 checking for Python site-packages path... /usr/lib/python2.7/site-packages
 checking python extra libraries... -lpthread -ldl  -lutil
 checking python extra linking flags... -Xlinker -export-dynamic
 checking consistency of all components of python development
 environment... no
 configure: error: in `/home/dan/mesos-0.21.0':
 configure: error:
   Could not link test program to Python. Maybe the main Python library has
 been
   installed in some non-standard library path. If so, pass it to configure,
   via the LDFLAGS environment variable.
   Example: ./configure LDFLAGS=-L/usr/non-standard-path/python/lib

 
ERROR!
You probably have to install the development version of the Python
 package
for your distribution.  The exact name of this package varies among
 them.

 

 Cheers,
 Dan