Re: The issue of "Failed to shutdown socket with fd xx: Transport endpoint is not connected" on Mesos master
Hi Klaus, Firstly, thanks very much for your answer! The km processes are all live: root 129474 128024 2 22:26 pts/000:00:00 km apiserver --address=15.242.100.60 --etcd-servers=http://15.242.100.60:4001 --service-cluster-ip-range=10.10.10.0/24 --port= --cloud-provider=mesos --cloud-config=mesos-cloud.conf --secure-port=0 --v=1 root 129509 128024 2 22:26 pts/000:00:00 km controller-manager --master=15.242.100.60: --cloud-provider=mesos --cloud-config=./mesos-cloud.conf --v=1 root 129538 128024 0 22:26 pts/000:00:00 km scheduler --address=15.242.100.60 --mesos-master=15.242.100.56:5050 --etcd-servers=http://15.242.100.60:4001 --mesos-user=root --api-servers=15.242.100.60: --cluster-dns=10.10.10.10 --cluster-domain=cluster.local --v=2 All the logs are also seem OK, except the logs from scheduler.log: .. I1228 22:26:37.883092 129538 messenger.go:381] Receiving message mesos.internal.InternalMasterChangeDetected from scheduler(1)@15.242.100.60:33077 I1228 22:26:37.883225 129538 scheduler.go:374] New master master@15.242.100.56:5050 detected I1228 22:26:37.883268 129538 scheduler.go:435] No credentials were provided. Attempting to register scheduler without authentication. I1228 22:26:37.883356 129538 scheduler.go:928] Registering with master: master@15.242.100.56:5050 I1228 22:26:37.883460 129538 messenger.go:187] Sending message mesos.internal.RegisterFrameworkMessage to master@15.242.100.56:5050 I1228 22:26:37.883504 129538 scheduler.go:881] will retry registration in 1.209320575s if necessary I1228 22:26:37.883758 129538 http_transporter.go:193] Sending message to master@15.242.100.56:5050 via http I1228 22:26:37.883873 129538 http_transporter.go:587] libproc target URL http://15.242.100.56:5050/master/mesos.internal.RegisterFrameworkMessage I1228 22:26:39.093560 129538 scheduler.go:928] Registering with master: master@15.242.100.56:5050 I1228 22:26:39.093659 129538 messenger.go:187] Sending message mesos.internal.RegisterFrameworkMessage to master@15.242.100.56:5050 I1228 22:26:39.093702 129538 scheduler.go:881] will retry registration in 3.762036352s if necessary I1228 22:26:39.093765 129538 http_transporter.go:193] Sending message to master@15.242.100.56:5050 via http I1228 22:26:39.093847 129538 http_transporter.go:587] libproc target URL http://15.242.100.56:5050/master/mesos.internal.RegisterFrameworkMessage .. >From the log, the Mesos master rejected the k8s's registeration, and k8s retry constantly. Have you met this issue before? Thanks very much in advance! Best Regards Nan Xiao On Mon, Dec 28, 2015 at 7:26 PM, Klaus Ma wrote: > It seems Kubernetes is down; would you help to check kubernetes's status > (km)? > > > Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer > Platform Symphony/DCOS Development & Support, STG, IBM GCG > +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me > > On Mon, Dec 28, 2015 at 6:35 PM, Nan Xiao wrote: >> >> Hi all, >> >> Greetings from me! >> >> I am trying to follow this tutorial >> >> (https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos.md) >> to deploy "k8s on Mesos" on local machines: The k8s is the newest >> master branch, and Mesos is the 0.26 edition. >> >> After running Mesos master(IP:15.242.100.56), Mesos >> slave(IP:15.242.100.16),, and the k8s(IP:15.242.100.60), I can see the >> following logs from Mesos master: >> >> .. >> I1227 22:52:34.494478 8069 master.cpp:4269] Received update of slave >> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 at slave(1)@15.242.100.16:5051 >> (pqsfc016.ftc.rdlabs.hpecorp.net) with total oversubscribed resources >> I1227 22:52:34.494940 8065 hierarchical.cpp:400] Slave >> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 >> (pqsfc016.ftc.rdlabs.hpecorp.net) updated with oversubscribed >> resources (total: cpus(*):32; mem(*):127878; disk(*):4336; >> ports(*):[31000-32000], allocated: ) >> I1227 22:53:06.740757 8053 http.cpp:334] HTTP GET for >> /master/state.json from 15.242.100.60:56219 with >> User-Agent='Go-http-client/1.1' >> I1227 22:53:07.736419 8065 http.cpp:334] HTTP GET for >> /master/state.json from 15.242.100.60:56241 with >> User-Agent='Go-http-client/1.1' >> I1227 22:53:07.767196 8070 http.cpp:334] HTTP GET for >> /master/state.json from 15.242.100.60:56252 with >> User-Agent='Go-http-client/1.1' >> I1227 22:53:08.808171 8053 http.cpp:334] HTTP GET for >> /master/state.json from 15.242.100.60:56272 with >> User-Agent='Go-http-client/1.1' >> I1227 22:53:08.815811 8060 master.cpp:2176] Received SUBSCRIBE call >> for framework 'Kubernetes' at scheduler(1)@15.242.100.60:59488 >> I1227 22:53:08.816182 8060 master.cpp:2247] Subscribing framework >> Kubernetes with checkpointing enabled and capabilities [ ] >> I1227 22:53:08.817294 8052 hierarchical.cpp:195] Added framework >> 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- >> I1227 22:53:08.817464 8050 master.cpp:1122] Framework >> 9c3c6c78-0b62-4eaa-b27a-498f172e
Re: More filters on /master/tasks enpoint and filters on /master/state
+1 It'll also reduce master's workload; but instead of label, I'd like to make master simpler: return tasks page by page and let framework/dashboard to filter it themself. Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer Platform Symphony/DCOS Development & Support, STG, IBM GCG +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me On Tue, Dec 29, 2015 at 6:09 AM, Diogo Gomes wrote: > Hi guys, I would like your opinion about a future feature proposal. > > Currently, we can use HTTP API to list all our tasks running in our > cluster using /master/tasks, but you have to list all tasks or limit/offset > this list, we cannot filter this. I would like to filter this, using > labels, for example. The use case will be to use mesos to fill our load > balancer with tasks data. > > > Marathon currently provides something like this, but only for his tasks, > using /v2/apps/?label=[key]==[value] > > > Diogo Gomes >
Re: mesos-elasticsearch vs Elasticsearch with Marathon
Craig, mind elaborating, how exactly do you run elasticsearch in Marathon? On Mon, Dec 28, 2015 at 8:36 PM, craig w wrote: > In terms of discovery, elasticsearch provides that out of the box > https://www.elastic.co/guide/en/elasticsearch/reference/1.4/modules-discovery.html. > We deploy elasticsearch via Marathon and it works great. > > On Mon, Dec 28, 2015 at 2:17 PM, Eric LEMOINE wrote: >> >> On Mon, Dec 28, 2015 at 7:55 PM, Alex Rukletsov >> wrote: >> > Eric— >> > >> > give me a chance to answer that before you fall into frustration : ). >> > Also, you can directly write to framework developers >> > (mesos...@container-solutions.com) and they either confirm or bust my >> > guess. Or maybe one of the authors — Frank — will chime in in this >> > thread. >> > >> > Marathon has no idea about application logic, hence a "scale" >> > operation just starts more application instances. But sometimes you >> > may want to do extra job (track instances, report ip:port of a new >> > instance to existing instances, and so on). That's when a dedicated >> > framework makes sense. Each framework has a scheduler that is able to >> > track each instance and do all aforementioned actions. >> > >> > How this maps to your question? AFAIK, all Elasticsearch nodes should >> > see each other, hence once a new node is started, it should be somehow >> > advertised to other nodes. You can do it by wrapping Elasticsearch >> > command in a shell script and maintain some sort of an out-of-band >> > registry, take a look at one of the first efforts [1] to run >> > Elasticsearch on Mesos to get an impression how it may look like. But >> > you can use a dedicated framework instead : ). >> > >> > [1] https://github.com/mesosphere/elasticsearch-mesos >> >> >> That makes great sense Alex. Thanks for chiming in. > > > > > -- > > https://github.com/mindscratch > https://www.google.com/+CraigWickesser > https://twitter.com/mind_scratch > https://twitter.com/craig_links
More filters on /master/tasks enpoint and filters on /master/state
Hi guys, I would like your opinion about a future feature proposal. Currently, we can use HTTP API to list all our tasks running in our cluster using /master/tasks, but you have to list all tasks or limit/offset this list, we cannot filter this. I would like to filter this, using labels, for example. The use case will be to use mesos to fill our load balancer with tasks data. Marathon currently provides something like this, but only for his tasks, using /v2/apps/?label=[key]==[value] Diogo Gomes
Re: mesos-elasticsearch vs Elasticsearch with Marathon
In terms of discovery, elasticsearch provides that out of the box https://www.elastic.co/guide/en/elasticsearch/reference/1.4/modules-discovery.html. We deploy elasticsearch via Marathon and it works great. On Mon, Dec 28, 2015 at 2:17 PM, Eric LEMOINE wrote: > On Mon, Dec 28, 2015 at 7:55 PM, Alex Rukletsov > wrote: > > Eric— > > > > give me a chance to answer that before you fall into frustration : ). > > Also, you can directly write to framework developers > > (mesos...@container-solutions.com) and they either confirm or bust my > > guess. Or maybe one of the authors — Frank — will chime in in this > > thread. > > > > Marathon has no idea about application logic, hence a "scale" > > operation just starts more application instances. But sometimes you > > may want to do extra job (track instances, report ip:port of a new > > instance to existing instances, and so on). That's when a dedicated > > framework makes sense. Each framework has a scheduler that is able to > > track each instance and do all aforementioned actions. > > > > How this maps to your question? AFAIK, all Elasticsearch nodes should > > see each other, hence once a new node is started, it should be somehow > > advertised to other nodes. You can do it by wrapping Elasticsearch > > command in a shell script and maintain some sort of an out-of-band > > registry, take a look at one of the first efforts [1] to run > > Elasticsearch on Mesos to get an impression how it may look like. But > > you can use a dedicated framework instead : ). > > > > [1] https://github.com/mesosphere/elasticsearch-mesos > > > That makes great sense Alex. Thanks for chiming in. > -- https://github.com/mindscratch https://www.google.com/+CraigWickesser https://twitter.com/mind_scratch https://twitter.com/craig_links
Re: mesos-elasticsearch vs Elasticsearch with Marathon
On Mon, Dec 28, 2015 at 7:55 PM, Alex Rukletsov wrote: > Eric— > > give me a chance to answer that before you fall into frustration : ). > Also, you can directly write to framework developers > (mesos...@container-solutions.com) and they either confirm or bust my > guess. Or maybe one of the authors — Frank — will chime in in this > thread. > > Marathon has no idea about application logic, hence a "scale" > operation just starts more application instances. But sometimes you > may want to do extra job (track instances, report ip:port of a new > instance to existing instances, and so on). That's when a dedicated > framework makes sense. Each framework has a scheduler that is able to > track each instance and do all aforementioned actions. > > How this maps to your question? AFAIK, all Elasticsearch nodes should > see each other, hence once a new node is started, it should be somehow > advertised to other nodes. You can do it by wrapping Elasticsearch > command in a shell script and maintain some sort of an out-of-band > registry, take a look at one of the first efforts [1] to run > Elasticsearch on Mesos to get an impression how it may look like. But > you can use a dedicated framework instead : ). > > [1] https://github.com/mesosphere/elasticsearch-mesos That makes great sense Alex. Thanks for chiming in.
Re: mesos-elasticsearch vs Elasticsearch with Marathon
Eric— give me a chance to answer that before you fall into frustration : ). Also, you can directly write to framework developers (mesos...@container-solutions.com) and they either confirm or bust my guess. Or maybe one of the authors — Frank — will chime in in this thread. Marathon has no idea about application logic, hence a "scale" operation just starts more application instances. But sometimes you may want to do extra job (track instances, report ip:port of a new instance to existing instances, and so on). That's when a dedicated framework makes sense. Each framework has a scheduler that is able to track each instance and do all aforementioned actions. How this maps to your question? AFAIK, all Elasticsearch nodes should see each other, hence once a new node is started, it should be somehow advertised to other nodes. You can do it by wrapping Elasticsearch command in a shell script and maintain some sort of an out-of-band registry, take a look at one of the first efforts [1] to run Elasticsearch on Mesos to get an impression how it may look like. But you can use a dedicated framework instead : ). [1] https://github.com/mesosphere/elasticsearch-mesos On Wed, Dec 23, 2015 at 10:30 AM, Eric LEMOINE wrote: > On Tue, Dec 22, 2015 at 10:05 AM, craig w wrote: >> We'd like to use the framework once some more features are available (see >> the road map). >> >> Currently we deploy ES in docker using marathon. > > > > Thank you all for your responses. I get that the situation is not as > clear as I expected :)
Re: Role-related configuration in Mesos
Perhaps we could also support HTTP PATCH so you could just update one small thing vs's PUT's get and set method. On Thursday, December 17, 2015, Adam Bordelon wrote: > First off, if we're going to have a /reservations endpoint, we should > follow the same PUT+DELETE pattern for reserve+unreserve, instead of > POST+PUT. And we should consider converting /create and /destroy to > PUT+DELETE verbs on a /volumes endpoint. > > Secondly, we're going to have to support the previous endpoints > through a deprecation cycle (~6mo), so there's no rush to get those > changes in at the same time as or before dynamic weights. > > Finally, it seems like the only real difference between the two > proposals is whether (1) /roles will be the catch-all "show me > everything about each role" endpoint that admins/users will request > when they want to see the current state of their > reservations/quota/weights(/volumes?), or (2) each endpoint with > create/update (PUT/POST) and DELETE actions will also have a GET > action that lists the current state of quotas or weights or whatever, > and /roles can (continue to) show whatever subset of information it > wants. > > In the long-run, I like the idea of consistency among these types of > endpoints, but for the near-term scope of dynamic weights, I think you > can leave the other endpoints alone (including /roles) and just > implement the PUT/POST+DELETE for /weights to create/update+delete > weight configurations. Since weights are already displayed in /roles, > you can leave them there and not worry about creating a GET for > /weights. That's the least amount of work/disruption you have to do to > deliver the feature/functionality, includes no wasted work no matter > whether we go with your proposal 1 or 2 in the long run. > On that note, we should create a JIRA Epic for defining a proper > RESTful API for these actions and migrating all relevant endpoints to > the new model. > > Cheers, > -Adam- > > P.S. Seems like RESTful APIs prefer plural nouns over singular, so it > would be /weights instead of /weight. > > On Wed, Dec 16, 2015 at 4:02 AM, Yongqiao Wang > wrote: > > Hi guys, > > > > Currently, Mesos uses the following ways to configure role-related > objects: > > 1. For dynamic reserve resources for a role, /reserve endpoint is used to > > reserve, another /unreserve endpoint is used to unreserve, maybe the > third > > endpoint should be added to show resource reservation of a role later > due to > > someone has issue a requirement of this. > > > > 2. For configuring quota for a role, only one endpoint /quota is > provided to > > set/remove/show quota information. > > > > 3. For role information, /roles endpoint is only provided to show role > > information(contains role name, weight and the registered frameworks and > > their used resources) that master is configured with (specified by > --roles > > when Mesos master startup), and the configured roles do not be changed by > > this endpoint at runtime(without restart Mesos master). And current there > > are two proposals in progress to support re-configure roles at runtime: > > - Dynamic Roles(MESOS-3177): roles are stored in the registry and > > added/deleted/removed/shown via /roles HTTP endpoints with the authorized > > principles. > > - Implicit Roles(MESOS-3988): any role will be allowed, subject to > the > > ACL/authorization system. > > > > After having a discussion, we all prefer to implement Implicit Roles > rather > > than Dynamic Roles, but dynamic weight is out scope of Implicit Roles, > so a > > new project will need be issued for dynamic weight, and like quota, a new > > endpoint(such as /weight) will be added to update weight of a role at > > runtime. > > > > For above design and implementation, they are all different. In order to > > improve the user experience, some enhancements should be done for the > same > > behaviours between above endpoints. I have two proposals as below: > > > > Proposals 1, using /roles endpoint to centralizely show all roles > > information and using other endpoints(/weight,/quota,/reservation) to > only > > set the role-related configuration. > > - Implement Implicit Roles to support dynamically implicitly add/remove > role > > at runtime. and enhance /roles endpoint to centralizely show all role > > information which contains role name, weight, resource reservation, > > quota,etc. > > - For reservation, merge /reserve and /unreserve together, end user can > use > > one endpoint /reservation(should better be a noun for a Restful > endpoint) to > > reserve(POST method) and unreserve(PUT method) resource, and does not > > support to show reservation with this endpoint; > > - For setting quota, end user can only use /quota endpoint to set and > remove > > quota, and does not support to show quota with this endpoint; > > - For dynamic weight, add a new endpoint /weight, end user can use to > update > > weight of a role, and does not support to show weights with this > endp
Re: Role-related configuration in Mesos
An example that clarifies Benjamin's point: quota is set per role indeed, but it may change in the future (I can envision quotas for individual frameworks as well). I think: * It would be great to merge relevant actions into one endpoint and express the difference via http verbs ("/reservation" and "/volumes"). * All services that are not strictly related to roles, should have their own endpoints. * All services somehow related to roles should be somehow "linked" in "/roles". For example, if quota is set for role "A" as an operator I should be able to see it when I hit "/roles". Otherwise it's very tedious to track all "visible" roles. * All services strictly related to roles (role weight) do not necessarily need their own endpoints and can be managed via "/roles". On Fri, Dec 18, 2015 at 3:05 PM, Benjamin Bannier < benjamin.bann...@mesosphere.io> wrote: > Hi, > > like you write we use roles for a number of pretty loosely coupled > concerns (allocation, quota, reservations). > > While denormalizing the endpoints like you suggest in Proposal (1) > simplifies querying information, it limits how that coupling can be evolved > in the future (at least if we’d like to avoid breaking the interface). That > would be much less a problem for lightweight endpoints dealing with single > services each. > > > Cheers, > > Benjamin > > > > On Dec 16, 2015, at 1:02 PM, Yongqiao Wang > wrote: > > > > Hi guys, > > > > Currently, Mesos uses the following ways to configure role-related > objects: > > 1. For dynamic reserve resources for a role, /reserve endpoint is used > to reserve, another /unreserve endpoint is used to unreserve, maybe the > third endpoint should be added to show resource reservation of a role later > due to someone has issue a requirement of this. > > > > 2. For configuring quota for a role, only one endpoint /quota is > provided to set/remove/show quota information. > > > > 3. For role information, /roles endpoint is only provided to show role > information(contains role name, weight and the registered frameworks and > their used resources) that master is configured with (specified by --roles > when Mesos master startup), and the configured roles do not be changed by > this endpoint at runtime(without restart Mesos master). And current there > are two proposals in progress to support re-configure roles at runtime: > > - Dynamic Roles(MESOS-3177): roles are stored in the registry and > added/deleted/removed/shown via /roles HTTP endpoints with the authorized > principles. > > - Implicit Roles(MESOS-3988): any role will be allowed, subject to > the ACL/authorization system. > > > > After having a discussion, we all prefer to implement Implicit Roles > rather than Dynamic Roles, but dynamic weight is out scope of Implicit > Roles, so a new project will need be issued for dynamic weight, and like > quota, a new endpoint(such as /weight) will be added to update weight of a > role at runtime. > > > > For above design and implementation, they are all different. In order to > improve the user experience, some enhancements should be done for the same > behaviours between above endpoints. I have two proposals as below: > > > > Proposals 1, using /roles endpoint to centralizely show all roles > information and using other endpoints(/weight,/quota,/reservation) to only > set the role-related configuration. > > - Implement Implicit Roles to support dynamically implicitly add/remove > role at runtime. and enhance /roles endpoint to centralizely show all role > information which contains role name, weight, resource reservation, > quota,etc. > > - For reservation, merge /reserve and /unreserve together, end user can > use one endpoint /reservation(should better be a noun for a Restful > endpoint) to reserve(POST method) and unreserve(PUT method) resource, and > does not support to show reservation with this endpoint; > > - For setting quota, end user can only use /quota endpoint to set and > remove quota, and does not support to show quota with this endpoint; > > - For dynamic weight, add a new endpoint /weight, end user can use to > update weight of a role, and does not support to show weights with this > endpoint. > > > > > > Proposals 2, keep the old behaviour of /roles endpoint and using other > endpoints(/weight,/quota,/reservation) to set and show the role-related > configuration. > > - Implement Implicit Roles to support dynamic implicitly configure role > at runtime. and keep the old behaviour of /roles to only show role > information which contains role name, weight and the registered frameworks > and their used resources. > > - For reservation, merge /reserve and /unreserve together, end user can > use one endpoint /reservation to reserve(POST method) resource, > unreserve(PUT method) resource, show reserved resources(GET method); > > - For setting quota, keep the current behaviour, and end user can use > /quota endpoint to set(PUT method), remove(DELETE method) and show(GET > method) quota
Re: The issue of "Failed to shutdown socket with fd xx: Transport endpoint is not connected" on Mesos master
It seems Kubernetes is down; would you help to check kubernetes's status (km)? Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer Platform Symphony/DCOS Development & Support, STG, IBM GCG +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me On Mon, Dec 28, 2015 at 6:35 PM, Nan Xiao wrote: > Hi all, > > Greetings from me! > > I am trying to follow this tutorial > ( > https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos.md > ) > to deploy "k8s on Mesos" on local machines: The k8s is the newest > master branch, and Mesos is the 0.26 edition. > > After running Mesos master(IP:15.242.100.56), Mesos > slave(IP:15.242.100.16),, and the k8s(IP:15.242.100.60), I can see the > following logs from Mesos master: > > .. > I1227 22:52:34.494478 8069 master.cpp:4269] Received update of slave > 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 at slave(1)@15.242.100.16:5051 > (pqsfc016.ftc.rdlabs.hpecorp.net) with total oversubscribed resources > I1227 22:52:34.494940 8065 hierarchical.cpp:400] Slave > 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 > (pqsfc016.ftc.rdlabs.hpecorp.net) updated with oversubscribed > resources (total: cpus(*):32; mem(*):127878; disk(*):4336; > ports(*):[31000-32000], allocated: ) > I1227 22:53:06.740757 8053 http.cpp:334] HTTP GET for > /master/state.json from 15.242.100.60:56219 with > User-Agent='Go-http-client/1.1' > I1227 22:53:07.736419 8065 http.cpp:334] HTTP GET for > /master/state.json from 15.242.100.60:56241 with > User-Agent='Go-http-client/1.1' > I1227 22:53:07.767196 8070 http.cpp:334] HTTP GET for > /master/state.json from 15.242.100.60:56252 with > User-Agent='Go-http-client/1.1' > I1227 22:53:08.808171 8053 http.cpp:334] HTTP GET for > /master/state.json from 15.242.100.60:56272 with > User-Agent='Go-http-client/1.1' > I1227 22:53:08.815811 8060 master.cpp:2176] Received SUBSCRIBE call > for framework 'Kubernetes' at scheduler(1)@15.242.100.60:59488 > I1227 22:53:08.816182 8060 master.cpp:2247] Subscribing framework > Kubernetes with checkpointing enabled and capabilities [ ] > I1227 22:53:08.817294 8052 hierarchical.cpp:195] Added framework > 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- > I1227 22:53:08.817464 8050 master.cpp:1122] Framework > 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at > scheduler(1)@15.242.100.60:59488 disconnected > E1227 22:53:08.817497 8073 process.cpp:1911] Failed to shutdown > socket with fd 17: Transport endpoint is not connected > I1227 22:53:08.817533 8050 master.cpp:2472] Disconnecting framework > 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at > scheduler(1)@15.242.100.60:59488 > I1227 22:53:08.817595 8050 master.cpp:2496] Deactivating framework > 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at > scheduler(1)@15.242.100.60:59488 > I1227 22:53:08.817797 8050 master.cpp:1146] Giving framework > 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at > scheduler(1)@15.242.100.60:59488 7625.14222623576weeks to failover > W1227 22:53:08.818389 8062 master.cpp:4840] Master returning > resources offered to framework > 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- because the framework has > terminated or is inactive > I1227 22:53:08.818397 8052 hierarchical.cpp:273] Deactivated > framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- > I1227 22:53:08.819046 8066 hierarchical.cpp:744] Recovered > cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000] > (total: cpus(*):32; mem(*):127878; disk(*):4336; > ports(*):[31000-32000], allocated: ) on slave > 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 from framework > 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- > .. > > I can't figure out why Mesos master complains "Failed to shutdown > socket with fd 17: Transport endpoint is not connected". > Could someone give some clues on this issue? > > Thanks very much in advance! > > Best Regards > Nan Xiao >
The issue of "Failed to shutdown socket with fd xx: Transport endpoint is not connected" on Mesos master
Hi all, Greetings from me! I am trying to follow this tutorial (https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos.md) to deploy "k8s on Mesos" on local machines: The k8s is the newest master branch, and Mesos is the 0.26 edition. After running Mesos master(IP:15.242.100.56), Mesos slave(IP:15.242.100.16),, and the k8s(IP:15.242.100.60), I can see the following logs from Mesos master: .. I1227 22:52:34.494478 8069 master.cpp:4269] Received update of slave 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 at slave(1)@15.242.100.16:5051 (pqsfc016.ftc.rdlabs.hpecorp.net) with total oversubscribed resources I1227 22:52:34.494940 8065 hierarchical.cpp:400] Slave 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 (pqsfc016.ftc.rdlabs.hpecorp.net) updated with oversubscribed resources (total: cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000], allocated: ) I1227 22:53:06.740757 8053 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:56219 with User-Agent='Go-http-client/1.1' I1227 22:53:07.736419 8065 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:56241 with User-Agent='Go-http-client/1.1' I1227 22:53:07.767196 8070 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:56252 with User-Agent='Go-http-client/1.1' I1227 22:53:08.808171 8053 http.cpp:334] HTTP GET for /master/state.json from 15.242.100.60:56272 with User-Agent='Go-http-client/1.1' I1227 22:53:08.815811 8060 master.cpp:2176] Received SUBSCRIBE call for framework 'Kubernetes' at scheduler(1)@15.242.100.60:59488 I1227 22:53:08.816182 8060 master.cpp:2247] Subscribing framework Kubernetes with checkpointing enabled and capabilities [ ] I1227 22:53:08.817294 8052 hierarchical.cpp:195] Added framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- I1227 22:53:08.817464 8050 master.cpp:1122] Framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at scheduler(1)@15.242.100.60:59488 disconnected E1227 22:53:08.817497 8073 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected I1227 22:53:08.817533 8050 master.cpp:2472] Disconnecting framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at scheduler(1)@15.242.100.60:59488 I1227 22:53:08.817595 8050 master.cpp:2496] Deactivating framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at scheduler(1)@15.242.100.60:59488 I1227 22:53:08.817797 8050 master.cpp:1146] Giving framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- (Kubernetes) at scheduler(1)@15.242.100.60:59488 7625.14222623576weeks to failover W1227 22:53:08.818389 8062 master.cpp:4840] Master returning resources offered to framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- because the framework has terminated or is inactive I1227 22:53:08.818397 8052 hierarchical.cpp:273] Deactivated framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- I1227 22:53:08.819046 8066 hierarchical.cpp:744] Recovered cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000] (total: cpus(*):32; mem(*):127878; disk(*):4336; ports(*):[31000-32000], allocated: ) on slave 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6-S0 from framework 9c3c6c78-0b62-4eaa-b27a-498f172e7fe6- .. I can't figure out why Mesos master complains "Failed to shutdown socket with fd 17: Transport endpoint is not connected". Could someone give some clues on this issue? Thanks very much in advance! Best Regards Nan Xiao
Re: Sync Mesos-Master to Slaves
Hi Fred, hm, if the bug dependents on Ubuntu version, my random guess is that it's systemd related. Were you able to solve the problem? If not, it would be helpful if you provide more context and describe a minimal setup that reproduces the issue. On Thu, Dec 10, 2015 at 10:15 AM, Frederic LE BRIS wrote: > Thanks Alex. > > About the context, we use spark on mesos and marathon to launch some > elastisearch, > > I kill each leader one-by-one. > > By the way as I said, we are on a config Mesos-Master on ubuntu 12, and > mesos-slave on ubuntu 14, to reproduce this comportement. > > When I deploy only on Ubuntu 14 master+slave, the issue disappear … > > Fred > > > > > > > On 09 Dec 2015, at 16:30, Alex Rukletsov wrote: > > Frederic, > > I have skimmed through the logs and they are do not seem to be complete > (especially for master1). Could you please say what task has been killed > (id) and which master failover triggered that? I see at least three > failovers in the logs : ). Also, could you please share some background > about your setup? I believe you're on systemd, do you use docker tasks? > > To connect our conversation to particular events, let me post here the > chain of (potentially) interesting events and some info I mined from the > logs. > master1: 192.168.37.59 ? > master2: 192.168.37.58 > master3: 192.168.37.104 > > timestamp observed by event > 13:48:38 master1 master1 killed by sigterm > 13:48:48 master2,3 new leader elected (192.168.37.104), id=5 > 13:49:25 master2 master2 killed by sigterm > 13:50:44 master2,3 new leader elected (192.168.37.59), id=7 > 14:23:34 master1 master1 killed by sigterm > 14:23:44 master2,3 new leader elected (192.168.37.58), id=8 > > One interesting thing I cannot understand is why master3 did not commit > suicide when it lost leadership? > > > On Mon, Dec 7, 2015 at 4:08 PM, Frederic LE BRIS > wrote: > >> With the context .. sorry >> >> > >