Re: The issue of "Failed to shutdown socket with fd xx: Transport endpoint is not connected" on Mesos master

2015-12-28 Thread Nan Xiao
Hi Klaus,

Firstly, thanks very much for your answer!

The km processes are all live:
root 129474 128024  2 22:26 pts/000:00:00 km apiserver
--address=15.242.100.60 --etcd-servers=http://15.242.100.60:4001
--service-cluster-ip-range=10.10.10.0/24 --port=
--cloud-provider=mesos --cloud-config=mesos-cloud.conf --secure-port=0
--v=1
root 129509 128024  2 22:26 pts/000:00:00 km
controller-manager --master=15.242.100.60: --cloud-provider=mesos
--cloud-config=./mesos-cloud.conf --v=1
root 129538 128024  0 22:26 pts/000:00:00 km scheduler
--address=15.242.100.60 --mesos-master=15.242.100.56:5050
--etcd-servers=http://15.242.100.60:4001 --mesos-user=root
--api-servers=15.242.100.60: --cluster-dns=10.10.10.10
--cluster-domain=cluster.local --v=2

All the logs are also seem OK, except the logs from scheduler.log:
..
I1228 22:26:37.883092  129538 messenger.go:381] Receiving message
mesos.internal.InternalMasterChangeDetected from
scheduler(1)@15.242.100.60:33077
I1228 22:26:37.883225  129538 scheduler.go:374] New master
master@15.242.100.56:5050 detected
I1228 22:26:37.883268  129538 scheduler.go:435] No credentials were
provided. Attempting to register scheduler without authentication.
I1228 22:26:37.883356  129538 scheduler.go:928] Registering with
master: master@15.242.100.56:5050
I1228 22:26:37.883460  129538 messenger.go:187] Sending message
mesos.internal.RegisterFrameworkMessage to master@15.242.100.56:5050
I1228 22:26:37.883504  129538 scheduler.go:881] will retry
registration in 1.209320575s if necessary
I1228 22:26:37.883758  129538 http_transporter.go:193] Sending message
to master@15.242.100.56:5050 via http
I1228 22:26:37.883873  129538 http_transporter.go:587] libproc target
URL http://15.242.100.56:5050/master/mesos.internal.RegisterFrameworkMessage
I1228 22:26:39.093560  129538 scheduler.go:928] Registering with
master: master@15.242.100.56:5050
I1228 22:26:39.093659  129538 messenger.go:187] Sending message
mesos.internal.RegisterFrameworkMessage to master@15.242.100.56:5050
I1228 22:26:39.093702  129538 scheduler.go:881] will retry
registration in 3.762036352s if necessary
I1228 22:26:39.093765  129538 http_transporter.go:193] Sending message
to master@15.242.100.56:5050 via http
I1228 22:26:39.093847  129538 http_transporter.go:587] libproc target
URL http://15.242.100.56:5050/master/mesos.internal.RegisterFrameworkMessage
..

>From the log, the Mesos master rejected the k8s's registeration, and
k8s retry constantly.

Have you met this issue before? Thanks very much in advance!
Best Regards
Nan Xiao


On Mon, Dec 28, 2015 at 7:26 PM, Klaus Ma  wrote:
> It seems Kubernetes is down; would you help to check kubernetes's status
> (km)?
>
> 
> Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
> Platform Symphony/DCOS Development & Support, STG, IBM GCG
> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
>
> On Mon, Dec 28, 2015 at 6:35 PM, Nan Xiao  wrote:
>>
>> Hi all,
>>
>> Greetings from me!
>>
>> I am trying to follow this tutorial
>>
>> (https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos.md)
>> to deploy "k8s on Mesos" on local machines: The k8s is the newest
>> master branch, and Mesos is the 0.26 edition.
>>
>> After running Mesos master(IP:15.242.100.56), Mesos
>> slave(IP:15.242.100.16),, and the k8s(IP:15.242.100.60), I can see the
>> following logs from Mesos master:
>>
>> ..
>> I1227 22:52:34.494478  8069 master.cpp:4269] Received update of slave
>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 at slave(1)@15.242.100.16:5051
>> (pqsfc016.ftc.rdlabs.hpecorp.net) with total oversubscribed resources
>> I1227 22:52:34.494940  8065 hierarchical.cpp:400] Slave
>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0
>> (pqsfc016.ftc.rdlabs.hpecorp.net) updated with oversubscribed
>> resources  (total: cpus(*):32; mem(*):127878; disk(*):4336;
>> ports(*):[31000-32000], allocated: )
>> I1227 22:53:06.740757  8053 http.cpp:334] HTTP GET for
>> /master/state.json from 15.242.100.60:56219 with
>> User-Agent='Go-http-client/1.1'
>> I1227 22:53:07.736419 8065 http.cpp:334] HTTP GET for
>> /master/state.json from 15.242.100.60:56241 with
>> User-Agent='Go-http-client/1.1'
>> I1227 22:53:07.767196  8070 http.cpp:334] HTTP GET for
>> /master/state.json from 15.242.100.60:56252 with
>> User-Agent='Go-http-client/1.1'
>> I1227 22:53:08.808171  8053 http.cpp:334] HTTP GET for
>> /master/state.json from 15.242.100.60:56272 with
>> User-Agent='Go-http-client/1.1'
>> I1227 22:53:08.815811  8060 master.cpp:2176] Received SUBSCRIBE call
>> for framework 'Kubernetes' at scheduler(1)@15.242.100.60:59488
>> I1227 22:53:08.816182 8060 master.cpp:2247] Subscribing framework
>> Kubernetes with checkpointing enabled and capabilities [  ]
>> I1227 22:53:08.817294  8052 hierarchical.cpp:195] Added framework
>> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-
>> I1227 22:53:08.817464  8050 master.cpp:1122] Framework
>> 9c3c6c78-0b62-4eaa-b27a-498f172e

Re: More filters on /master/tasks enpoint and filters on /master/state

2015-12-28 Thread Klaus Ma
+1

It'll also reduce master's workload; but instead of label, I'd like to make
master simpler: return tasks page by page and let framework/dashboard to
filter it themself.



Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
Platform Symphony/DCOS Development & Support, STG, IBM GCG
+86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me

On Tue, Dec 29, 2015 at 6:09 AM, Diogo Gomes  wrote:

> Hi guys, I would like your opinion about a future feature proposal.
>
> Currently, we can use HTTP API to list all our tasks running in our
> cluster using /master/tasks, but you have to list all tasks or limit/offset
> this list, we cannot filter this. I would like to filter this, using
> labels, for example. The use case will be to use mesos to fill our load
> balancer with tasks data.
>
>
> Marathon currently provides something like this, but only for his tasks,
> using /v2/apps/?label=[key]==[value]
>
>
> Diogo Gomes
>


Re: mesos-elasticsearch vs Elasticsearch with Marathon

2015-12-28 Thread Alex Rukletsov
Craig,

mind elaborating, how exactly do you run elasticsearch in Marathon?

On Mon, Dec 28, 2015 at 8:36 PM, craig w  wrote:
> In terms of discovery, elasticsearch provides that out of the box
> https://www.elastic.co/guide/en/elasticsearch/reference/1.4/modules-discovery.html.
> We deploy elasticsearch via Marathon and it works great.
>
> On Mon, Dec 28, 2015 at 2:17 PM, Eric LEMOINE  wrote:
>>
>> On Mon, Dec 28, 2015 at 7:55 PM, Alex Rukletsov 
>> wrote:
>> > Eric—
>> >
>> > give me a chance to answer that before you fall into frustration : ).
>> > Also, you can directly write to framework developers
>> > (mesos...@container-solutions.com) and they either confirm or bust my
>> > guess. Or maybe one of the authors — Frank — will chime in in this
>> > thread.
>> >
>> > Marathon has no idea about application logic, hence a "scale"
>> > operation just starts more application instances. But sometimes you
>> > may want to do extra job (track instances, report ip:port of a new
>> > instance to existing instances, and so on). That's when a dedicated
>> > framework makes sense. Each framework has a scheduler that is able to
>> > track each instance and do all aforementioned actions.
>> >
>> > How this maps to your question? AFAIK, all Elasticsearch nodes should
>> > see each other, hence once a new node is started, it should be somehow
>> > advertised to other nodes. You can do it by wrapping Elasticsearch
>> > command in a shell script and maintain some sort of an out-of-band
>> > registry, take a look at one of the first efforts [1] to run
>> > Elasticsearch on Mesos to get an impression how it may look like. But
>> > you can use a dedicated framework instead : ).
>> >
>> > [1] https://github.com/mesosphere/elasticsearch-mesos
>>
>>
>> That makes great sense Alex. Thanks for chiming in.
>
>
>
>
> --
>
> https://github.com/mindscratch
> https://www.google.com/+CraigWickesser
> https://twitter.com/mind_scratch
> https://twitter.com/craig_links


More filters on /master/tasks enpoint and filters on /master/state

2015-12-28 Thread Diogo Gomes
Hi guys, I would like your opinion about a future feature proposal.

Currently, we can use HTTP API to list all our tasks running in our cluster
using /master/tasks, but you have to list all tasks or limit/offset this
list, we cannot filter this. I would like to filter this, using labels, for
example. The use case will be to use mesos to fill our load balancer with
tasks data.


Marathon currently provides something like this, but only for his tasks,
using /v2/apps/?label=[key]==[value]


Diogo Gomes


Re: mesos-elasticsearch vs Elasticsearch with Marathon

2015-12-28 Thread craig w
In terms of discovery, elasticsearch provides that out of the box
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/modules-discovery.html.
We deploy elasticsearch via Marathon and it works great.

On Mon, Dec 28, 2015 at 2:17 PM, Eric LEMOINE  wrote:

> On Mon, Dec 28, 2015 at 7:55 PM, Alex Rukletsov 
> wrote:
> > Eric—
> >
> > give me a chance to answer that before you fall into frustration : ).
> > Also, you can directly write to framework developers
> > (mesos...@container-solutions.com) and they either confirm or bust my
> > guess. Or maybe one of the authors — Frank — will chime in in this
> > thread.
> >
> > Marathon has no idea about application logic, hence a "scale"
> > operation just starts more application instances. But sometimes you
> > may want to do extra job (track instances, report ip:port of a new
> > instance to existing instances, and so on). That's when a dedicated
> > framework makes sense. Each framework has a scheduler that is able to
> > track each instance and do all aforementioned actions.
> >
> > How this maps to your question? AFAIK, all Elasticsearch nodes should
> > see each other, hence once a new node is started, it should be somehow
> > advertised to other nodes. You can do it by wrapping Elasticsearch
> > command in a shell script and maintain some sort of an out-of-band
> > registry, take a look at one of the first efforts [1] to run
> > Elasticsearch on Mesos to get an impression how it may look like. But
> > you can use a dedicated framework instead : ).
> >
> > [1] https://github.com/mesosphere/elasticsearch-mesos
>
>
> That makes great sense Alex. Thanks for chiming in.
>



-- 

https://github.com/mindscratch
https://www.google.com/+CraigWickesser
https://twitter.com/mind_scratch
https://twitter.com/craig_links


Re: mesos-elasticsearch vs Elasticsearch with Marathon

2015-12-28 Thread Eric LEMOINE
On Mon, Dec 28, 2015 at 7:55 PM, Alex Rukletsov  wrote:
> Eric—
>
> give me a chance to answer that before you fall into frustration : ).
> Also, you can directly write to framework developers
> (mesos...@container-solutions.com) and they either confirm or bust my
> guess. Or maybe one of the authors — Frank — will chime in in this
> thread.
>
> Marathon has no idea about application logic, hence a "scale"
> operation just starts more application instances. But sometimes you
> may want to do extra job (track instances, report ip:port of a new
> instance to existing instances, and so on). That's when a dedicated
> framework makes sense. Each framework has a scheduler that is able to
> track each instance and do all aforementioned actions.
>
> How this maps to your question? AFAIK, all Elasticsearch nodes should
> see each other, hence once a new node is started, it should be somehow
> advertised to other nodes. You can do it by wrapping Elasticsearch
> command in a shell script and maintain some sort of an out-of-band
> registry, take a look at one of the first efforts [1] to run
> Elasticsearch on Mesos to get an impression how it may look like. But
> you can use a dedicated framework instead : ).
>
> [1] https://github.com/mesosphere/elasticsearch-mesos


That makes great sense Alex. Thanks for chiming in.


Re: mesos-elasticsearch vs Elasticsearch with Marathon

2015-12-28 Thread Alex Rukletsov
Eric—

give me a chance to answer that before you fall into frustration : ).
Also, you can directly write to framework developers
(mesos...@container-solutions.com) and they either confirm or bust my
guess. Or maybe one of the authors — Frank — will chime in in this
thread.

Marathon has no idea about application logic, hence a "scale"
operation just starts more application instances. But sometimes you
may want to do extra job (track instances, report ip:port of a new
instance to existing instances, and so on). That's when a dedicated
framework makes sense. Each framework has a scheduler that is able to
track each instance and do all aforementioned actions.

How this maps to your question? AFAIK, all Elasticsearch nodes should
see each other, hence once a new node is started, it should be somehow
advertised to other nodes. You can do it by wrapping Elasticsearch
command in a shell script and maintain some sort of an out-of-band
registry, take a look at one of the first efforts [1] to run
Elasticsearch on Mesos to get an impression how it may look like. But
you can use a dedicated framework instead : ).

[1] https://github.com/mesosphere/elasticsearch-mesos

On Wed, Dec 23, 2015 at 10:30 AM, Eric LEMOINE  wrote:
> On Tue, Dec 22, 2015 at 10:05 AM, craig w  wrote:
>> We'd like to use the framework once some more features are available (see
>> the road map).
>>
>> Currently we deploy ES in docker using marathon.
>
>
>
> Thank you all for your responses. I get that the situation is not as
> clear as I expected :)


Re: Role-related configuration in Mesos

2015-12-28 Thread Jeff Schroeder
Perhaps we could also support HTTP PATCH so you could just update one small
thing vs's PUT's get and set method.

On Thursday, December 17, 2015, Adam Bordelon  wrote:

> First off, if we're going to have a /reservations endpoint, we should
> follow the same PUT+DELETE pattern for reserve+unreserve, instead of
> POST+PUT. And we should consider converting /create and /destroy to
> PUT+DELETE verbs on a /volumes endpoint.
>
> Secondly, we're going to have to support the previous endpoints
> through a deprecation cycle (~6mo), so there's no rush to get those
> changes in at the same time as or before dynamic weights.
>
> Finally, it seems like the only real difference between the two
> proposals is whether (1) /roles will be the catch-all "show me
> everything about each role" endpoint that admins/users will request
> when they want to see the current state of their
> reservations/quota/weights(/volumes?), or (2) each endpoint with
> create/update (PUT/POST) and DELETE actions will also have a GET
> action that lists the current state of quotas or weights or whatever,
> and /roles can (continue to) show whatever subset of information it
> wants.
>
> In the long-run, I like the idea of consistency among these types of
> endpoints, but for the near-term scope of dynamic weights, I think you
> can leave the other endpoints alone (including /roles) and just
> implement the PUT/POST+DELETE for /weights to create/update+delete
> weight configurations. Since weights are already displayed in /roles,
> you can leave them there and not worry about creating a GET for
> /weights. That's the least amount of work/disruption you have to do to
> deliver the feature/functionality, includes no wasted work no matter
> whether we go with your proposal 1 or 2 in the long run.
> On that note, we should create a JIRA Epic for defining a proper
> RESTful API for these actions and migrating all relevant endpoints to
> the new model.
>
> Cheers,
> -Adam-
>
> P.S. Seems like RESTful APIs prefer plural nouns over singular, so it
> would be /weights instead of /weight.
>
> On Wed, Dec 16, 2015 at 4:02 AM, Yongqiao Wang  > wrote:
> > Hi guys,
> >
> > Currently, Mesos uses the following ways to configure role-related
> objects:
> > 1. For dynamic reserve resources for a role, /reserve endpoint is used to
> > reserve, another /unreserve endpoint is used to unreserve, maybe the
> third
> > endpoint should be added to show resource reservation of a role later
> due to
> > someone has issue a requirement of this.
> >
> > 2. For configuring quota for a role, only one endpoint /quota is
> provided to
> > set/remove/show quota information.
> >
> > 3. For role information, /roles endpoint is only provided to show role
> > information(contains role name, weight and the registered frameworks and
> > their used resources) that master is configured with (specified by
> --roles
> > when Mesos master startup), and the configured roles do not be changed by
> > this endpoint at runtime(without restart Mesos master). And current there
> > are two proposals in progress to support re-configure roles at runtime:
> > - Dynamic Roles(MESOS-3177): roles are stored in the registry and
> > added/deleted/removed/shown via /roles HTTP endpoints with the authorized
> > principles.
> > - Implicit Roles(MESOS-3988): any role will be allowed, subject to
> the
> > ACL/authorization system.
> >
> > After having a discussion, we all prefer to implement Implicit Roles
> rather
> > than Dynamic Roles, but dynamic weight is out scope of Implicit Roles,
> so a
> > new project will need be issued for dynamic weight, and like quota, a new
> > endpoint(such as /weight) will be added to update weight of a role at
> > runtime.
> >
> > For above design and implementation, they are all different. In order to
> > improve the user experience, some enhancements should be done for the
> same
> > behaviours between above endpoints. I have two proposals as below:
> >
> > Proposals 1, using /roles endpoint to centralizely show all roles
> > information and using other endpoints(/weight,/quota,/reservation) to
> only
> > set the role-related configuration.
> > - Implement Implicit Roles to support dynamically implicitly add/remove
> role
> > at runtime. and enhance /roles endpoint to centralizely show all role
> > information which contains role name, weight, resource reservation,
> > quota,etc.
> > - For reservation, merge /reserve and /unreserve together, end user can
> use
> > one endpoint /reservation(should better be a noun for a Restful
> endpoint) to
> > reserve(POST method) and unreserve(PUT method) resource, and does not
> > support to show reservation with this endpoint;
> > - For setting quota, end user can only use /quota endpoint to set and
> remove
> > quota, and does not support to show quota with this endpoint;
> > - For dynamic weight, add a new endpoint /weight, end user can use to
> update
> > weight of a role, and does not support to show weights with this
> endp

Re: Role-related configuration in Mesos

2015-12-28 Thread Alex Rukletsov
An example that clarifies Benjamin's point: quota is set per role indeed,
but it may change in the future (I can envision quotas for individual
frameworks as well).

I think:
  * It would be great to merge relevant actions into one endpoint and
express the difference via http verbs ("/reservation" and "/volumes").
  * All services that are not strictly related to roles, should have their
own endpoints.
  * All services somehow related to roles should be somehow "linked" in
"/roles". For example, if quota is set for role "A" as an operator I should
be able to see it when I hit "/roles". Otherwise it's very tedious to track
all "visible" roles.
  * All services strictly related to roles (role weight) do not necessarily
need their own endpoints and can be managed via "/roles".


On Fri, Dec 18, 2015 at 3:05 PM, Benjamin Bannier <
benjamin.bann...@mesosphere.io> wrote:

> Hi,
>
> like you write we use roles for a number of pretty loosely coupled
> concerns (allocation, quota, reservations).
>
> While denormalizing the endpoints like you suggest in Proposal (1)
> simplifies querying information, it limits how that coupling can be evolved
> in the future (at least if we’d like to avoid breaking the interface). That
> would be much less a problem for lightweight endpoints dealing with single
> services each.
>
>
> Cheers,
>
> Benjamin
>
>
> > On Dec 16, 2015, at 1:02 PM, Yongqiao Wang 
> wrote:
> >
> > Hi guys,
> >
> > Currently, Mesos uses the following ways to configure role-related
> objects:
> > 1. For dynamic reserve resources for a role, /reserve endpoint is used
> to reserve, another /unreserve endpoint is used to unreserve, maybe the
> third endpoint should be added to show resource reservation of a role later
> due to someone has issue a requirement of this.
> >
> > 2. For configuring quota for a role, only one endpoint /quota is
> provided to set/remove/show quota information.
> >
> > 3. For role information, /roles endpoint is only provided to show role
> information(contains role name, weight and the registered frameworks and
> their used resources) that master is configured with (specified by --roles
> when Mesos master startup), and the configured roles do not be changed by
> this endpoint at runtime(without restart Mesos master). And current there
> are two proposals in progress to support re-configure roles at runtime:
> > - Dynamic Roles(MESOS-3177): roles are stored in the registry and
> added/deleted/removed/shown via /roles HTTP endpoints with the authorized
> principles.
> > - Implicit Roles(MESOS-3988): any role will be allowed, subject to
> the ACL/authorization system.
> >
> > After having a discussion, we all prefer to implement Implicit Roles
> rather than Dynamic Roles, but dynamic weight is out scope of Implicit
> Roles, so a new project will need be issued for dynamic weight, and like
> quota, a new endpoint(such as /weight) will be added to update weight of a
> role at runtime.
> >
> > For above design and implementation, they are all different. In order to
> improve the user experience, some enhancements should be done for the same
> behaviours between above endpoints. I have two proposals as below:
> >
> > Proposals 1, using /roles endpoint to centralizely show all roles
> information and using other endpoints(/weight,/quota,/reservation) to only
> set the role-related configuration.
> > - Implement Implicit Roles to support dynamically implicitly add/remove
> role at runtime. and enhance /roles endpoint to centralizely show all role
> information which contains role name, weight, resource reservation,
> quota,etc.
> > - For reservation, merge /reserve and /unreserve together, end user can
> use one endpoint /reservation(should better be a noun for a Restful
> endpoint) to reserve(POST method) and unreserve(PUT method) resource, and
> does not support to show reservation with this endpoint;
> > - For setting quota, end user can only use /quota endpoint to set and
> remove quota, and does not support to show quota with this endpoint;
> > - For dynamic weight, add a new endpoint /weight, end user can use to
> update weight of a role, and does not support to show weights with this
> endpoint.
> >
> >
> > Proposals 2, keep the old behaviour of /roles endpoint and using other
> endpoints(/weight,/quota,/reservation) to set and show the role-related
> configuration.
> > - Implement Implicit Roles to support dynamic implicitly configure role
> at runtime. and keep the old behaviour of /roles to only show role
> information which contains role name, weight and the registered frameworks
> and their used resources.
> > - For reservation, merge /reserve and /unreserve together, end user can
> use one endpoint /reservation to reserve(POST method) resource,
> unreserve(PUT method) resource, show reserved resources(GET method);
> > - For setting quota, keep the current behaviour, and end user can use
> /quota endpoint to set(PUT method), remove(DELETE method) and show(GET
> method) quota

Re: The issue of "Failed to shutdown socket with fd xx: Transport endpoint is not connected" on Mesos master

2015-12-28 Thread Klaus Ma
It seems Kubernetes is down; would you help to check kubernetes's status
(km)?


Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
Platform Symphony/DCOS Development & Support, STG, IBM GCG
+86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me

On Mon, Dec 28, 2015 at 6:35 PM, Nan Xiao  wrote:

> Hi all,
>
> Greetings from me!
>
> I am trying to follow this tutorial
> (
> https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos.md
> )
> to deploy "k8s on Mesos" on local machines: The k8s is the newest
> master branch, and Mesos is the 0.26 edition.
>
> After running Mesos master(IP:15.242.100.56), Mesos
> slave(IP:15.242.100.16),, and the k8s(IP:15.242.100.60), I can see the
> following logs from Mesos master:
>
> ..
> I1227 22:52:34.494478  8069 master.cpp:4269] Received update of slave
> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 at slave(1)@15.242.100.16:5051
> (pqsfc016.ftc.rdlabs.hpecorp.net) with total oversubscribed resources
> I1227 22:52:34.494940  8065 hierarchical.cpp:400] Slave
> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0
> (pqsfc016.ftc.rdlabs.hpecorp.net) updated with oversubscribed
> resources  (total: cpus(*):32; mem(*):127878; disk(*):4336;
> ports(*):[31000-32000], allocated: )
> I1227 22:53:06.740757  8053 http.cpp:334] HTTP GET for
> /master/state.json from 15.242.100.60:56219 with
> User-Agent='Go-http-client/1.1'
> I1227 22:53:07.736419 8065 http.cpp:334] HTTP GET for
> /master/state.json from 15.242.100.60:56241 with
> User-Agent='Go-http-client/1.1'
> I1227 22:53:07.767196  8070 http.cpp:334] HTTP GET for
> /master/state.json from 15.242.100.60:56252 with
> User-Agent='Go-http-client/1.1'
> I1227 22:53:08.808171  8053 http.cpp:334] HTTP GET for
> /master/state.json from 15.242.100.60:56272 with
> User-Agent='Go-http-client/1.1'
> I1227 22:53:08.815811  8060 master.cpp:2176] Received SUBSCRIBE call
> for framework 'Kubernetes' at scheduler(1)@15.242.100.60:59488
> I1227 22:53:08.816182 8060 master.cpp:2247] Subscribing framework
> Kubernetes with checkpointing enabled and capabilities [  ]
> I1227 22:53:08.817294  8052 hierarchical.cpp:195] Added framework
> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-
> I1227 22:53:08.817464  8050 master.cpp:1122] Framework
> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at
> scheduler(1)@15.242.100.60:59488 disconnected
> E1227 22:53:08.817497 8073 process.cpp:1911] Failed to shutdown
> socket with fd 17: Transport endpoint is not connected
> I1227 22:53:08.817533  8050 master.cpp:2472] Disconnecting framework
> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at
> scheduler(1)@15.242.100.60:59488
> I1227 22:53:08.817595 8050 master.cpp:2496] Deactivating framework
> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at
> scheduler(1)@15.242.100.60:59488
> I1227 22:53:08.817797 8050 master.cpp:1146] Giving framework
> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at
> scheduler(1)@15.242.100.60:59488 7625.14222623576weeks to failover
> W1227 22:53:08.818389 8062 master.cpp:4840] Master returning
> resources offered to framework
> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- because the framework has
> terminated or is inactive
> I1227 22:53:08.818397  8052 hierarchical.cpp:273] Deactivated
> framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-
> I1227 22:53:08.819046  8066 hierarchical.cpp:744] Recovered
> cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000]
> (total: cpus(*):32; mem(*):127878; disk(*):4336;
> ports(*):[31000-32000], allocated: ) on slave
> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 from framework
> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-
> ..
>
> I can't figure out why Mesos master complains "Failed to shutdown
> socket with fd 17: Transport endpoint is not connected".
> Could someone give some clues on this issue?
>
> Thanks very much in advance!
>
> Best Regards
> Nan Xiao
>


The issue of "Failed to shutdown socket with fd xx: Transport endpoint is not connected" on Mesos master

2015-12-28 Thread Nan Xiao
Hi all,

Greetings from me!

I am trying to follow this tutorial
(https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos.md)
to deploy "k8s on Mesos" on local machines: The k8s is the newest
master branch, and Mesos is the 0.26 edition.

After running Mesos master(IP:15.242.100.56), Mesos
slave(IP:15.242.100.16),, and the k8s(IP:15.242.100.60), I can see the
following logs from Mesos master:

..
I1227 22:52:34.494478  8069 master.cpp:4269] Received update of slave
9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 at slave(1)@15.242.100.16:5051
(pqsfc016.ftc.rdlabs.hpecorp.net) with total oversubscribed resources
I1227 22:52:34.494940  8065 hierarchical.cpp:400] Slave
9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0
(pqsfc016.ftc.rdlabs.hpecorp.net) updated with oversubscribed
resources  (total: cpus(*):32; mem(*):127878; disk(*):4336;
ports(*):[31000-32000], allocated: )
I1227 22:53:06.740757  8053 http.cpp:334] HTTP GET for
/master/state.json from 15.242.100.60:56219 with
User-Agent='Go-http-client/1.1'
I1227 22:53:07.736419  8065 http.cpp:334] HTTP GET for
/master/state.json from 15.242.100.60:56241 with
User-Agent='Go-http-client/1.1'
I1227 22:53:07.767196  8070 http.cpp:334] HTTP GET for
/master/state.json from 15.242.100.60:56252 with
User-Agent='Go-http-client/1.1'
I1227 22:53:08.808171  8053 http.cpp:334] HTTP GET for
/master/state.json from 15.242.100.60:56272 with
User-Agent='Go-http-client/1.1'
I1227 22:53:08.815811  8060 master.cpp:2176] Received SUBSCRIBE call
for framework 'Kubernetes' at scheduler(1)@15.242.100.60:59488
I1227 22:53:08.816182  8060 master.cpp:2247] Subscribing framework
Kubernetes with checkpointing enabled and capabilities [  ]
I1227 22:53:08.817294  8052 hierarchical.cpp:195] Added framework
9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-
I1227 22:53:08.817464  8050 master.cpp:1122] Framework
9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at
scheduler(1)@15.242.100.60:59488 disconnected
E1227 22:53:08.817497  8073 process.cpp:1911] Failed to shutdown
socket with fd 17: Transport endpoint is not connected
I1227 22:53:08.817533  8050 master.cpp:2472] Disconnecting framework
9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at
scheduler(1)@15.242.100.60:59488
I1227 22:53:08.817595  8050 master.cpp:2496] Deactivating framework
9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at
scheduler(1)@15.242.100.60:59488
I1227 22:53:08.817797  8050 master.cpp:1146] Giving framework
9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at
scheduler(1)@15.242.100.60:59488 7625.14222623576weeks to failover
W1227 22:53:08.818389  8062 master.cpp:4840] Master returning
resources offered to framework
9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- because the framework has
terminated or is inactive
I1227 22:53:08.818397  8052 hierarchical.cpp:273] Deactivated
framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-
I1227 22:53:08.819046  8066 hierarchical.cpp:744] Recovered
cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000]
(total: cpus(*):32; mem(*):127878; disk(*):4336;
ports(*):[31000-32000], allocated: ) on slave
9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 from framework
9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-
..

I can't figure out why Mesos master complains "Failed to shutdown
socket with fd 17: Transport endpoint is not connected".
Could someone give some clues on this issue?

Thanks very much in advance!

Best Regards
Nan Xiao


Re: Sync Mesos-Master to Slaves

2015-12-28 Thread Alex Rukletsov
Hi Fred,

hm, if the bug dependents on Ubuntu version, my random guess is that it's
systemd related. Were you able to solve the problem? If not, it would be
helpful if you provide more context and describe a minimal setup that
reproduces the issue.

On Thu, Dec 10, 2015 at 10:15 AM, Frederic LE BRIS 
wrote:

> Thanks Alex.
>
> About the context, we use spark on mesos and marathon to launch some
> elastisearch,
>
> I kill each leader one-by-one.
>
> By the way as I said, we are on a config Mesos-Master on ubuntu 12, and
> mesos-slave on ubuntu 14, to reproduce this comportement.
>
> When I deploy only on Ubuntu 14 master+slave, the issue disappear …
>
> Fred
>
>
>
>
>
>
> On 09 Dec 2015, at 16:30, Alex Rukletsov  wrote:
>
> Frederic,
>
> I have skimmed through the logs and they are do not seem to be complete
> (especially for master1). Could you please say what task has been killed
> (id) and which master failover triggered that? I see at least three
> failovers in the logs : ). Also, could you please share some background
> about your setup? I believe you're on systemd, do you use docker tasks?
>
> To connect our conversation to particular events, let me post here the
> chain of (potentially) interesting events and some info I mined from the
> logs.
> master1: 192.168.37.59 ?
> master2: 192.168.37.58
> master3: 192.168.37.104
>
> timestamp   observed by   event
> 13:48:38 master1  master1 killed by sigterm
> 13:48:48 master2,3   new leader elected (192.168.37.104), id=5
> 13:49:25 master2  master2 killed by sigterm
> 13:50:44 master2,3   new leader elected (192.168.37.59), id=7
> 14:23:34 master1  master1 killed by sigterm
> 14:23:44 master2,3   new leader elected (192.168.37.58), id=8
>
> One interesting thing I cannot understand is why master3 did not commit
> suicide when it lost leadership?
>
>
> On Mon, Dec 7, 2015 at 4:08 PM, Frederic LE BRIS 
> wrote:
>
>> With the context .. sorry
>>
>>
>
>