from:"Tobias Weingartner \(JIRA\)"

[jira] [Commented] (MESOS-1529) Handle a network partition between Master and Slave

2014-06-26 Thread Tobias Weingartner (JIRA)

[
https://issues.apache.org/jira/browse/MESOS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044782#comment-14044782
]

Tobias Weingartner commented on MESOS-1529:
---

Reading point #3 above, I believe you mean =. Otherwise you could wait
forever for a ping that will arrive at some point in the future. :)

I think in the end, the most robust solution will be for the master to not be
responsible for initiating/opening any connections to frameworks and/or slaves.
If we do this, then staying connected would be the slave's (framework's)
responsibility.

For example, using the HTTP CONNECT method, a slave could request direct
access to a master's particular pid endpoint, something like:
{noformat}
CONNECT pid1@master HTTP/1.0
Content-Transfer-Encoding: application/x-mesos-protobuf-v1
Authorization: token=..., ...

{noformat}

With the server responding with (only during connection):
{noformat}
HTTP/1.1 200 Connection established
X-Welcome-Message: Welcome to the cloud

{noformat}

At this point, the connection moves to a pure binary TCP connection, which the
master can now use to send protobuf over tcp requests to, including ping/pong,
etc. If multiple pid endpoints are required, then their endpoints could
possibly be multiplexed over this single link. Instead of connecting directly
to a particular pid, you could connect to a mux pid, and the messages would
then be shunted to the correct pids. Not sure if this makes any sense.

Anyways, I gather this would be a rather large re-write, and changing protocols
in a live system is... well, interesting.
Note: rfc-6455 might be another option, albeit much more involved...

Handle a network partition between Master and Slave
---

Key: MESOS-1529
URL: https://issues.apache.org/jira/browse/MESOS-1529
Project: Mesos
Issue Type: Bug
Reporter: Dominic Hamon

If a network partition occurs between a Master and Slave, the Master will
remove the Slave (as it fails health check) and mark the tasks being run
there as LOST. However, the Slave is not aware that it has been removed so
the tasks will continue to run.
(To clarify a little bit: neither the master nor the slave receives 'exited'
event, indicating that the connection between the master and slave is not
closed).
There are at least two possible approaches to solving this issue:
1. Introduce a health check from Slave to Master so they have a consistent
view of a network partition. We may still see this issue should a one-way
connection error occur.
2. Be less aggressive about marking tasks and Slaves as lost. Wait until the
Slave reappears and reconcile then. We'd still need to mark Slaves and tasks
as potentially lost (zombie state) but maybe the Scheduler can make a more
intelligent decision.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MESOS-1529) Handle a network partition between Master and Slave

2014-06-25 Thread Tobias Weingartner (JIRA)

[
https://issues.apache.org/jira/browse/MESOS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043853#comment-14043853
]

Tobias Weingartner commented on MESOS-1529:
---

2) What does an exit event signify? Why would we need to check that it was
for a leading master?

3) How is the 75 seconds determined? Does this lock us into a phased upgrade
path if this timeout value needs to change? If we get a ping from a
non-leading master, we should likely ignore it and not immediately trigger
re-registration. IE: let the timeout take effect.

Handle a network partition between Master and Slave
---

Key: MESOS-1529
URL: https://issues.apache.org/jira/browse/MESOS-1529
Project: Mesos
Issue Type: Bug
Reporter: Dominic Hamon

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MESOS-1529) Handle a network partition between Master and Slave

2014-06-25 Thread Tobias Weingartner (JIRA)

[
https://issues.apache.org/jira/browse/MESOS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043985#comment-14043985
]

Tobias Weingartner commented on MESOS-1529:
---

{quote}
An exited event signifies that a link between slave -- master is broken.
This could be due to network partition or master failover. We need to check if
it was from the leading master because, before exited event is received by
the slave, the slave might have received a new master detected event from zk
and re-registered with a new master. In that case, the slave can safely ignore
the exited event.
{quote}
This sounds like it would be a race. In the face of possibly having multiple
masters connected to a slave, and master fail-over happening.

{quote}
| Does this lock us into a phased upgrade path if this timeout value needs to
change?
I don't see why it would lock us into an upgrade path.
{quote}
What I meant here, was if the operator decided that a 75s delay was too long,
or too short, and needed to be changed in a running cluster. At this point, it
looks like the deploy of this change would be more involved, possibly requiring
the coordination of thousands of machines. If the option is not surfaced to
the operator (no flags/etc), then if/when this single static number changes
(adaptive based on the number of slaves, etc), then the modification of this
will likely require a lot of planning and prep.

I see this as having a constant in two places without one informing the other
what the constant should be. When it changes in one (say a new master
release is going to go with 150s pings due to load issues, if the masters roll
before all the slaves have rolled to the new code, they'll end up flapping,
etc), it can have a detrimental effect on the rest of the system.

Handle a network partition between Master and Slave
---

Key: MESOS-1529
URL: https://issues.apache.org/jira/browse/MESOS-1529
Project: Mesos
Issue Type: Bug
Reporter: Dominic Hamon

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MESOS-1529) Handle a network partition between Master and Slave

2014-06-24 Thread Tobias Weingartner (JIRA)

[
https://issues.apache.org/jira/browse/MESOS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042284#comment-14042284
]

Tobias Weingartner commented on MESOS-1529:
---

I don't think that #2 is an option. We may be able to add extra
information/messages to let the frameworks know that something has been lost
potentially, and allow the frameworks to choose which side of CAP they land
on. With the current assumptions and implementation, I believe that modifying
#2 would be a mistake.

Handle a network partition between Master and Slave
---

Key: MESOS-1529
URL: https://issues.apache.org/jira/browse/MESOS-1529
Project: Mesos
Issue Type: Bug
Reporter: Dominic Hamon

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MESOS-1335) Make state.json information partially accessible as well as via text

2014-05-14 Thread Tobias Weingartner (JIRA)

Tobias Weingartner created MESOS-1335:
-

 Summary: Make state.json information partially accessible as well 
as via text
 Key: MESOS-1335
 URL: https://issues.apache.org/jira/browse/MESOS-1335
 Project: Mesos
  Issue Type: Improvement
  Components: master, slave
Reporter: Tobias Weingartner
Priority: Minor


The information returned by {{http://localhost:5051/slave(1)/state.json}} is 
rather volumous, especially if you're trying to do certain simple things like 
knowing which version of a slave happens to be running.

Possible improvement, allow to address portions of the endpoint:
{noformat}
curl -s 'localhost:5051/slave(1)/state/version.json'
curl -s 'localhost:5051/slave(1)/state/attributes.txt'
{noformat}

The above would return something like:
{noformat}
{version: 0.18.0}
{noformat}
{noformat}
/attributes/host some-hostname
/attributes/rack some-other-rack
/attributes/attr-name attr-value
{noformat}

Possibly an interim solution to the volume of data would be pull out certain 
information into another endpoint (something like stats.json, maybe 
version.json or environment.json?).  In particular, the keys I'd be looking for 
would be:
{noformat}
attributes
flags
build_*
hostname
id
log_dir
master_hostname
pid
resources
start_time
version
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MESOS-1253) Make HTTP endpoint browsable

2014-04-29 Thread Tobias Weingartner (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984426#comment-13984426
 ] 

Tobias Weingartner commented on MESOS-1253:
---

Not necessarily help endpoint, but they are the parent of the leafs.  So when 
I'm presented with:
{noformat}
http://host/parent/leaf
{noformat}
And I wish to explore for more/other things, I usually try a level up:
{noformat}
http://host/parent
{noformat}
And see what other options that I may have.  A directory of places I can go 
to...

 Make HTTP endpoint browsable
 

 Key: MESOS-1253
 URL: https://issues.apache.org/jira/browse/MESOS-1253
 Project: Mesos
  Issue Type: Bug
  Components: statistics
Reporter: Tobias Weingartner
Priority: Minor

 A number of the paths in the master/slave do not have index pages, making the 
 ability to browse and cut the URL path down harder.  For example, if you're 
 looking at:
 {noformat}
 http://host:port/metrics/snapshot
 {noformat}
 And decided to see what other options there are for metrics, it's not easy to 
 get that by just cutting out the last URL path part.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MESOS-1253) Make HTTP endpoint browsable

2014-04-28 Thread Tobias Weingartner (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983487#comment-13983487
 ] 

Tobias Weingartner commented on MESOS-1253:
---

I respectfully disagree.  :)

 Make HTTP endpoint browsable
 

 Key: MESOS-1253
 URL: https://issues.apache.org/jira/browse/MESOS-1253
 Project: Mesos
  Issue Type: Bug
  Components: statistics
Reporter: Tobias Weingartner
Priority: Minor

 A number of the paths in the master/slave do not have index pages, making the 
 ability to browse and cut the URL path down harder.  For example, if you're 
 looking at:
 {noformat}
 http://host:port/metrics/snapshot
 {noformat}
 And decided to see what other options there are for metrics, it's not easy to 
 get that by just cutting out the last URL path part.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MESOS-1258) 0.18.0-rc3: F0427 02:48:30.603756 62192 group.cpp:326] Check failed: state == CONNECTED || state == AUTHENTICATED || state == READY 1

2014-04-28 Thread Tobias Weingartner (JIRA)

Tobias Weingartner created MESOS-1258:
-

 Summary: 0.18.0-rc3: F0427 02:48:30.603756 62192 group.cpp:326] 
Check failed: state == CONNECTED || state == AUTHENTICATED || state == READY 1
 Key: MESOS-1258
 URL: https://issues.apache.org/jira/browse/MESOS-1258
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Tobias Weingartner






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MESOS-1253) Make HTTP endpoint browsable

2014-04-27 Thread Tobias Weingartner (JIRA)

Tobias Weingartner created MESOS-1253:
-

 Summary: Make HTTP endpoint browsable
 Key: MESOS-1253
 URL: https://issues.apache.org/jira/browse/MESOS-1253
 Project: Mesos
  Issue Type: Bug
  Components: statistics
Reporter: Tobias Weingartner
Priority: Minor


A number of the paths in the master/slave do not have index pages, making the 
ability to browse and cut the URL path down harder.  For example, if you're 
looking at:
{noformat}
http://host:port/metrics/snapshot
{noformat}
And decided to see what other options there are for metrics, it's not easy to 
get that by just cutting out the last URL path part.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MESOS-1254) JSON endpoints should have url pointers to other locations

2014-04-27 Thread Tobias Weingartner (JIRA)

Tobias Weingartner created MESOS-1254:
-

 Summary: JSON endpoints should have url pointers to other locations
 Key: MESOS-1254
 URL: https://issues.apache.org/jira/browse/MESOS-1254
 Project: Mesos
  Issue Type: Improvement
Reporter: Tobias Weingartner
Priority: Minor


Hitting an endpoint like:
{noformat}
http://host:port/state.json
{noformat}
Has a lot of information, including a list of slaves/etc.  However, if you'd 
like to hit a slave's JSON endpoint, you're basically left with grabbing the 
{{pid}} of the slave you're looking for, and interpreting that string in 
order to create a URL where you can now get the slave's JSON endpoint.

Having a key - value pair that allows traversing from JSON endpoint to JSON 
endpoint should make this easier and less error prone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MESOS-1255) Master UI should show Mesos version

2014-04-27 Thread Tobias Weingartner (JIRA)

[
https://issues.apache.org/jira/browse/MESOS-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tobias Weingartner updated MESOS-1255:
--

Description:
The master UI should show the Mesos version on the main UI screen.

Would be awesome if there was a tab with the ability to browse software
versions and/or attributes/resources in a coordinated easy fashion.

IE: being able to visually answer (and via curl to JSON endpoint) how many of
the slaves and masters are which version of Mesos. How many have which
attributes and/or resources.

was:
The master UI should show the Mesos version on the main UI screen.

Would be awesome if there was a tab with the ability to browse software
versions and/or attributes/resources in a coordinated easy fashion.

IE: being able to visually answer (and via curl to JSON endpoint) how many of
the slaves are which version of Mesos. How many have which attributes and/or
resources.

Master UI should show Mesos version
---

Key: MESOS-1255
URL: https://issues.apache.org/jira/browse/MESOS-1255
Project: Mesos
Issue Type: Improvement
Reporter: Tobias Weingartner
Priority: Trivial

The master UI should show the Mesos version on the main UI screen.
Would be awesome if there was a tab with the ability to browse software
versions and/or attributes/resources in a coordinated easy fashion.
IE: being able to visually answer (and via curl to JSON endpoint) how many of
the slaves and masters are which version of Mesos. How many have which
attributes and/or resources.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MESOS-1255) Master UI should show Mesos version

2014-04-27 Thread Tobias Weingartner (JIRA)

Tobias Weingartner created MESOS-1255:
-

 Summary: Master UI should show Mesos version
 Key: MESOS-1255
 URL: https://issues.apache.org/jira/browse/MESOS-1255
 Project: Mesos
  Issue Type: Improvement
Reporter: Tobias Weingartner
Priority: Trivial


The master UI should show the Mesos version on the main UI screen.

Would be awesome if there was a tab with the ability to browse software 
versions and/or attributes/resources in a coordinated easy fashion.

IE: being able to visually answer (and via curl to JSON endpoint) how many of 
the slaves are which version of Mesos.  How many have which attributes and/or 
resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MESOS-1225) Allow definition/use of shared resources

2014-04-19 Thread Tobias Weingartner (JIRA)

Tobias Weingartner created MESOS-1225:
-

 Summary: Allow definition/use of shared resources
 Key: MESOS-1225
 URL: https://issues.apache.org/jira/browse/MESOS-1225
 Project: Mesos
  Issue Type: Improvement
  Components: allocation, containerization, framework, isolation, slave
Reporter: Tobias Weingartner
Priority: Minor


It would be nice to be able to define a set of shared resources for a set of 
slaves (such as IP addresses, power, rack bandwidth, etc) that would be managed 
by the master/slaves, and exported to the frameworks for their use.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MESOS-1164) URL encoded urls do not work in slave

2014-03-31 Thread Tobias Weingartner (JIRA)

Tobias Weingartner created MESOS-1164:
-

 Summary: URL encoded urls do not work in slave
 Key: MESOS-1164
 URL: https://issues.apache.org/jira/browse/MESOS-1164
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Tobias Weingartner
Priority: Minor


As show here:
{noformat}
MiniMe:incubator-aurora weingart$ curl -X HEAD -sI 
'http://192.168.33.4:5051/slave%281%29/state.json'
HTTP/1.1 404 Not Found
Date: Mon, 31 Mar 2014 06:17:38 GMT
Content-Length: 0

MiniMe:incubator-aurora weingart$ curl -X HEAD -sI 
'http://192.168.33.4:5051/slave(1)/state.json'
HTTP/1.1 200 OK
Date: Mon, 31 Mar 2014 06:17:50 GMT
Content-Length: 8015
Content-Type: application/json

{noformat}




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MESOS-780) Adding support for 3rd party performance and health monitoring.

2013-11-05 Thread Tobias Weingartner (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814227#comment-13814227
 ] 

Tobias Weingartner commented on MESOS-780:
--

I don't think that plug-able support is a must.  Having an endpoint that you 
can query/scrape should be enough.  There is nothing preventing the running of 
an agent that scrapes these endpoints and then pushed the data (if push is 
wanted) or offers the data up in a manner that is required for whatever health 
monitoring that is present within the infrastructure.

In many ways, I think that the support for 3rd part performance and health 
monitoring is already there.  Certainly there are improvements that can be done 
(exporting more information, etc), but I think that the basic framework is 
present and usable.

 Adding support for 3rd party performance and health monitoring.
 ---

 Key: MESOS-780
 URL: https://issues.apache.org/jira/browse/MESOS-780
 Project: Mesos
  Issue Type: Improvement
  Components: framework
Reporter: Bernardo Gomez Palacio

 User Story:
 As a SysAdmin I should be able to monitor Mesos (Masters and Slaves) with
 3rd party tools such as:
 * [Ganglia|http://ganglia.sourceforge.net/]
 * [Graphite|http://graphite.wikidot.com/]
 * [Nagios|http://www.nagios.org/]
 * [Zabbix|http://www.zabbix.com/]



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MESOS-1529) Handle a network partition between Master and Slave

[jira] [Commented] (MESOS-1529) Handle a network partition between Master and Slave

[jira] [Commented] (MESOS-1529) Handle a network partition between Master and Slave

[jira] [Commented] (MESOS-1529) Handle a network partition between Master and Slave

[jira] [Created] (MESOS-1335) Make state.json information partially accessible as well as via text

[jira] [Commented] (MESOS-1253) Make HTTP endpoint browsable

[jira] [Commented] (MESOS-1253) Make HTTP endpoint browsable

[jira] [Created] (MESOS-1258) 0.18.0-rc3: F0427 02:48:30.603756 62192 group.cpp:326] Check failed: state == CONNECTED || state == AUTHENTICATED || state == READY 1

[jira] [Created] (MESOS-1253) Make HTTP endpoint browsable

[jira] [Created] (MESOS-1254) JSON endpoints should have url pointers to other locations

[jira] [Updated] (MESOS-1255) Master UI should show Mesos version

[jira] [Created] (MESOS-1255) Master UI should show Mesos version

[jira] [Created] (MESOS-1225) Allow definition/use of shared resources

[jira] [Created] (MESOS-1164) URL encoded urls do not work in slave

[jira] [Commented] (MESOS-780) Adding support for 3rd party performance and health monitoring.

15 matches

Site Navigation

Mail list logo

Footer information