[jira] [Created] (MESOS-3831) Document operator HTTP endpoints
Neil Conway created MESOS-3831: -- Summary: Document operator HTTP endpoints Key: MESOS-3831 URL: https://issues.apache.org/jira/browse/MESOS-3831 Project: Mesos Issue Type: Documentation Components: documentation Reporter: Neil Conway Priority: Minor These are not exhaustively documented; they probably should be. Some endpoints have docs: e.g., `/reserve` and `/unreserve` are described in the reservation doc page. But it would be good to have a single page that lists all the endpoints and their semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2455) Add operator endpoints to create/destroy persistent volumes.
[ https://issues.apache.org/jira/browse/MESOS-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-2455: --- Shepherd: Michael Park (was: Adam B) > Add operator endpoints to create/destroy persistent volumes. > > > Key: MESOS-2455 > URL: https://issues.apache.org/jira/browse/MESOS-2455 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Neil Conway >Priority: Critical > Labels: mesosphere, persistent-volumes > > Persistent volumes will not be released automatically. > So we probably need an endpoint for operators to forcefully release > persistent volumes. We probably need to add principal to Persistence struct > and use ACLs to control who can release what. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2455) Add operator endpoints to create/destroy persistent volumes.
[ https://issues.apache.org/jira/browse/MESOS-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-2455: - Summary: Add operator endpoints to create/destroy persistent volumes. (was: Add operator endpoint to destroy persistent volumes.) > Add operator endpoints to create/destroy persistent volumes. > > > Key: MESOS-2455 > URL: https://issues.apache.org/jira/browse/MESOS-2455 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Michael Park >Priority: Critical > Labels: mesosphere, persistent-volumes > > Persistent volumes will not be released automatically. > So we probably need an endpoint for operators to forcefully release > persistent volumes. We probably need to add principal to Persistence struct > and use ACLs to control who can release what. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2455) Add operator endpoints to create/destroy persistent volumes.
[ https://issues.apache.org/jira/browse/MESOS-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway reassigned MESOS-2455: -- Assignee: Neil Conway (was: Michael Park) > Add operator endpoints to create/destroy persistent volumes. > > > Key: MESOS-2455 > URL: https://issues.apache.org/jira/browse/MESOS-2455 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Neil Conway >Priority: Critical > Labels: mesosphere, persistent-volumes > > Persistent volumes will not be released automatically. > So we probably need an endpoint for operators to forcefully release > persistent volumes. We probably need to add principal to Persistence struct > and use ACLs to control who can release what. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2455) Add operator endpoints to create/destroy persistent volumes.
[ https://issues.apache.org/jira/browse/MESOS-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-2455: - Description: Persistent volumes will not be released automatically. So we probably need an endpoint for operators to forcefully release persistent volumes. We probably need to add principal to Persistence struct and use ACLs to control who can release what. Additionally, it would be useful to have an endpoint for operators to create persistent volumes. was: Persistent volumes will not be released automatically. So we probably need an endpoint for operators to forcefully release persistent volumes. We probably need to add principal to Persistence struct and use ACLs to control who can release what. > Add operator endpoints to create/destroy persistent volumes. > > > Key: MESOS-2455 > URL: https://issues.apache.org/jira/browse/MESOS-2455 > Project: Mesos > Issue Type: Task >Reporter: Jie Yu >Assignee: Neil Conway >Priority: Critical > Labels: mesosphere, persistent-volumes > > Persistent volumes will not be released automatically. > So we probably need an endpoint for operators to forcefully release > persistent volumes. We probably need to add principal to Persistence struct > and use ACLs to control who can release what. > Additionally, it would be useful to have an endpoint for operators to create > persistent volumes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3829) Error on gracefully shutdown task
Rafael Capucho created MESOS-3829: - Summary: Error on gracefully shutdown task Key: MESOS-3829 URL: https://issues.apache.org/jira/browse/MESOS-3829 Project: Mesos Issue Type: Bug Affects Versions: 0.25.0 Environment: Marathon: 0.12.0-RC1 Mesos: 0.25.0 Docker 1.9.0 Reporter: Rafael Capucho Hello, I'm suffering from the same error reported here[1]. I have configured my mesos-slave environment as [2] setting DOCKER_STOP_TIMEOUT and EXECUTOR_SHUTDOWN_GRACE_PERIOD. When I see the sandbox stdout, I can see in the first line the declaration: --stop_timeout="30secs" properly configured, but when I click in "Destroy App" in Marathon the stdout keep showing weird things[3] like repeatedly "Killing docker task Shutting down". In my code I deal with SIGTERM and it isn't being reached. Thank you. [1] - https://groups.google.com/forum/?hl=en#!topic/marathon-framework/Oy0dN0Lron0 [2] - https://paste.ee/r/grRyS [3] - https://paste.ee/r/SghOr -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2353) Improve performance of the master's state.json endpoint for large clusters.
[ https://issues.apache.org/jira/browse/MESOS-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990461#comment-14990461 ] Michael Park commented on MESOS-2353: - [~vinodkone] Is Monday acceptable to you? > Improve performance of the master's state.json endpoint for large clusters. > --- > > Key: MESOS-2353 > URL: https://issues.apache.org/jira/browse/MESOS-2353 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Benjamin Mahler > Labels: newbie, scalability, twitter > > The master's state.json endpoint consistently takes a long time to compute > the JSON result, for large clusters: > {noformat} > $ time curl -s -o /dev/null localhost:5050/master/state.json > Mon Jan 26 22:38:50 UTC 2015 > real 0m13.174s > user 0m0.003s > sys 0m0.022s > {noformat} > This can cause the master to get backlogged if there are many state.json > requests in flight. > Looking at {{perf}} data, it seems most of the time is spent doing memory > allocation / de-allocation. This ticket will try to capture any low hanging > fruit to speed this up. Possibly we can leverage moves if they are not > already being used by the compiler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3293) Failing ROOT_ tests on CentOS 7.1 - LimitedCpuIsolatorTest
[ https://issues.apache.org/jira/browse/MESOS-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-3293: -- Shepherd: Kapil Arya (was: Vinod Kone) > Failing ROOT_ tests on CentOS 7.1 - LimitedCpuIsolatorTest > -- > > Key: MESOS-3293 > URL: https://issues.apache.org/jira/browse/MESOS-3293 > Project: Mesos > Issue Type: Bug > Components: containerization, docker, test >Affects Versions: 0.23.0, 0.24.0 > Environment: CentOS Linux release 7.1 > Linux 3.10.0 >Reporter: Marco Massenzio >Assignee: Jian Qiu >Priority: Blocker > Labels: flaky-test, tech-debt > Attachments: 20150818-mesos-tests.log > > > h2. LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids > This is one of several ROOT failing tests: we want to track them > *individually* and for each of them decide whether to: > * fix; > * remove; OR > * redesign. > (full verbose logs attached) > h2. Steps to Reproduce > Completely cleaned the build, removed directory, clean pull from {{master}} > (SHA: {{fb93d93}}) - same results, 9 failed tests: > {noformat} > [==] 751 tests from 114 test cases ran. (231218 ms total) > [ PASSED ] 742 tests. > [ FAILED ] 9 tests, listed below: > [ FAILED ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids > [ FAILED ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where > TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess > [ FAILED ] ContainerizerTest.ROOT_CGROUPS_BalloonFramework > [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem > [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromSandbox > [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHost > [ FAILED ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint > [ FAILED ] > LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithRootFilesystem > [ FAILED ] MesosContainerizerLaunchTest.ROOT_ChangeRootfs > 9 FAILED TESTS > YOU HAVE 10 DISABLED TESTS > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3831) Document operator HTTP endpoints
[ https://issues.apache.org/jira/browse/MESOS-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-3831: --- Description: These are not exhaustively documented; they probably should be. Some endpoints have docs: e.g., {{/reserve}} and {{/unreserve}} are described in the reservation doc page. But it would be good to have a single page that lists all the endpoints and their semantics. was: These are not exhaustively documented; they probably should be. Some endpoints have docs: e.g., `/reserve` and `/unreserve` are described in the reservation doc page. But it would be good to have a single page that lists all the endpoints and their semantics. > Document operator HTTP endpoints > > > Key: MESOS-3831 > URL: https://issues.apache.org/jira/browse/MESOS-3831 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Neil Conway >Priority: Minor > Labels: documentation, mesosphere, newbie > > These are not exhaustively documented; they probably should be. > Some endpoints have docs: e.g., {{/reserve}} and {{/unreserve}} are described > in the reservation doc page. But it would be good to have a single page that > lists all the endpoints and their semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3830) Provide a means to do async data transfer with async back-pressure.
Benjamin Mahler created MESOS-3830: -- Summary: Provide a means to do async data transfer with async back-pressure. Key: MESOS-3830 URL: https://issues.apache.org/jira/browse/MESOS-3830 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Benjamin Mahler I had starting thinking about this while implementing http::Pipe and more recently when seeing the docker registry client streaming to a file and the fetcher cache refactoring work. This should still be seen as an active thought process and this description will be edited along the way. The overall idea here is to provide a composable abstraction to support asynchronously streaming data in libprocess with asynchronous back-pressure. The following characteristics are desired: {noformat} (1) Ability to express both finite and infinite streams of data. (2) Asynchronous zero-copy data transfer. (3) Asynchronous back-pressure. (4) Support for composition (prefer this to polymorphism seen in other implementations): (a) Allow data to flow down through a "pipeline" of transformations. (b) Allow backpressure to flow back up the "pipeline". (c) Allow failures and closures to flow back up the "pipeline". {noformat} The existing [http::Pipe|https://github.com/apache/mesos/blob/0.25.0/3rdparty/libprocess/include/process/http.hpp#L172] is part of the way there (needs enhancements for zero-copy, async back-pressure, and composition) and can be pulled up to become process::Pipe. Composition allows us to create a pipeline of data transfer: {code} // Stream the contents of the url to a file. Future complete = http::download(url) | io::fileWriter(file); // Signatures. Pipe::Reader http::download(const Url&); Pipe::Writer io::writer(const Path&); {code} In this example, composition occurs by connecting the read end of pipe1 (from http::download) with the write end of pipe2 (from io::fileWriter). This composition terminates the pipeline (i.e. A | B | C is not allowed here) and returns a future for the caller to detect completion or trigger a discard to close down the pipeline. Data will now flow through the pipeline without copies and backpressure from the file writing will slow down the consumption of data from the tcp socket downloading the url contents. Another form of composition occurs when we want to apply a transformation: {code} // Stream the contents of the url to a file. Future complete = http::download(url) | zip | io::fileWriter(file); // TODO: Figure out 'zip' type signature to enable composition. {code} Here (A | B) returns another Pipe::Reader because B is a transformation rather than a Pipe::Writer. This enables A | B | C where composing with C terminates the pipeline once again. Related work that is interesting to examine: NodeJS's Stream: https://nodejs.org/api/stream.html Reactive Streams: http://www.reactive-streams.org/ Netty's Channels / Java NIO ByteBuffer: http://seeallhearall.blogspot.com/2012/05/netty-tutorial-part-1-introduction-to.html We may also want typed Pipes to capture the process::Stream abstraction we've wanted in the past. process::Stream and process:Pipe discussed here share much of the same functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3832) Scheduler HTTP API does not redirect to leading master
[ https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-3832: -- Shepherd: Vinod Kone Target Version/s: 0.26.0 Labels: newbie (was: ) Description: The documentation for the Scheduler HTTP API says: {quote}If requests are made to a non-leading master a “HTTP 307 Temporary Redirect” will be received with the “Location” header pointing to the leading master.{quote} While the redirect functionality has been implemented, it was not actually used in the handler for the HTTP api. A probable fix could be: - Check if the current master is the leading master. - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}} was: The documentation for the HTTP api says: {quote}If requests are made to a non-leading master a “HTTP 307 Temporary Redirect” will be received with the “Location” header pointing to the leading master.{quote} While the redirect functionality has been implemented, it was not actually used in the handler for the HTTP api. Summary: Scheduler HTTP API does not redirect to leading master (was: HTTP API does not redirect to leading master) > Scheduler HTTP API does not redirect to leading master > -- > > Key: MESOS-3832 > URL: https://issues.apache.org/jira/browse/MESOS-3832 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.24.0, 0.24.1, 0.25.0 >Reporter: Dario Rexin >Assignee: Dario Rexin > Labels: newbie > > The documentation for the Scheduler HTTP API says: > {quote}If requests are made to a non-leading master a “HTTP 307 Temporary > Redirect” will be received with the “Location” header pointing to the leading > master.{quote} > While the redirect functionality has been implemented, it was not actually > used in the handler for the HTTP api. > A probable fix could be: > - Check if the current master is the leading master. > - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3829) Error on gracefully shutdown task
[ https://issues.apache.org/jira/browse/MESOS-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990959#comment-14990959 ] haosdent commented on MESOS-3829: - DOCKER_STOP_TIMEOUT is used for http://docs.docker.com/engine/reference/commandline/stop/. According docker document, you could receive SIGTERM before SIGKILL. > Error on gracefully shutdown task > - > > Key: MESOS-3829 > URL: https://issues.apache.org/jira/browse/MESOS-3829 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 > Environment: Marathon: 0.12.0-RC1 > Mesos: 0.25.0 > Docker 1.9.0 >Reporter: Rafael Capucho > > Hello, > I'm suffering from the same error reported here[1]. I have configured my > mesos-slave environment as [2] setting DOCKER_STOP_TIMEOUT and > EXECUTOR_SHUTDOWN_GRACE_PERIOD. > When I see the sandbox stdout, I can see in the first line the declaration: > --stop_timeout="30secs" > properly configured, but when I click in "Destroy App" in Marathon the stdout > keep showing weird things[3] like repeatedly "Killing docker task Shutting > down". > In my code I deal with SIGTERM and it isn't being reached. > Thank you. > [1] - > https://groups.google.com/forum/?hl=en#!topic/marathon-framework/Oy0dN0Lron0 > [2] - https://paste.ee/r/grRyS > [3] - https://paste.ee/r/SghOr -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3826) Add an optional unique identifier for resource reservations
[ https://issues.apache.org/jira/browse/MESOS-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990952#comment-14990952 ] Guangya Liu commented on MESOS-3826: Thanks [~neilc] That's also what I noticed, seems it is difficult to add an ID to a dynamic reservation as the dynamic reservation might be merged. [~mcypark] any comments? Thanks. > Add an optional unique identifier for resource reservations > --- > > Key: MESOS-3826 > URL: https://issues.apache.org/jira/browse/MESOS-3826 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Sargun Dhillon >Assignee: Guangya Liu >Priority: Minor > Labels: mesosphere > > Thanks to the resource reservation primitives, frameworks can reserve > resources. These reservations are per role, which means multiple frameworks > can share reservations. This can get very hairy, as multiple reservations can > occur on each agent. > It would be nice to be able to optionally, uniquely identify reservations by > ID, much like persistent volumes are today. This could be done by adding a > new protobuf field, such as Resource.ReservationInfo.id, that if set upon > reservation time, would come back when the reservation is advertised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3833) /help endpoints do not work for nested paths
Anand Mazumdar created MESOS-3833: - Summary: /help endpoints do not work for nested paths Key: MESOS-3833 URL: https://issues.apache.org/jira/browse/MESOS-3833 Project: Mesos Issue Type: Bug Components: HTTP API Reporter: Anand Mazumdar Priority: Minor Mesos displays the list of all supported endpoints starting at a given path prefix using the {{/help}} suffix, e.g. {{master:5050/help}}. It seems that the {{help}} functionality is broken for URL's having nested paths e.g. {{master:5050/help/master/machine/down}}. The response returned is: {quote} Malformed URL, expecting '/help/id/name/' {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3829) Error on gracefully shutdown task
[ https://issues.apache.org/jira/browse/MESOS-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990960#comment-14990960 ] haosdent commented on MESOS-3829: - Maybe because of this https://github.com/docker/docker/pull/3240 ? I think it maybe a docker issue. > Error on gracefully shutdown task > - > > Key: MESOS-3829 > URL: https://issues.apache.org/jira/browse/MESOS-3829 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 > Environment: Marathon: 0.12.0-RC1 > Mesos: 0.25.0 > Docker 1.9.0 >Reporter: Rafael Capucho > > Hello, > I'm suffering from the same error reported here[1]. I have configured my > mesos-slave environment as [2] setting DOCKER_STOP_TIMEOUT and > EXECUTOR_SHUTDOWN_GRACE_PERIOD. > When I see the sandbox stdout, I can see in the first line the declaration: > --stop_timeout="30secs" > properly configured, but when I click in "Destroy App" in Marathon the stdout > keep showing weird things[3] like repeatedly "Killing docker task Shutting > down". > In my code I deal with SIGTERM and it isn't being reached. > Thank you. > [1] - > https://groups.google.com/forum/?hl=en#!topic/marathon-framework/Oy0dN0Lron0 > [2] - https://paste.ee/r/grRyS > [3] - https://paste.ee/r/SghOr -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3832) HTTP API does not redirect to leading master
Dario Rexin created MESOS-3832: -- Summary: HTTP API does not redirect to leading master Key: MESOS-3832 URL: https://issues.apache.org/jira/browse/MESOS-3832 Project: Mesos Issue Type: Bug Components: HTTP API Affects Versions: 0.25.0, 0.24.1, 0.24.0 Reporter: Dario Rexin Assignee: Dario Rexin The documentation for the HTTP api says: {quote}If requests are made to a non-leading master a “HTTP 307 Temporary Redirect” will be received with the “Location” header pointing to the leading master.{quote} While the redirect functionality has been implemented, it was not actually used in the handler for the HTTP api. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3826) Add an optional unique identifier for resource reservations
[ https://issues.apache.org/jira/browse/MESOS-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990783#comment-14990783 ] Neil Conway commented on MESOS-3826: There are some subtle issues here. Right now, reservations do not have identity. For example, suppose a slave has 8 CPUs and 8192 MB of RAM, and a framework makes two dynamic reservations for 2 CPUs and 2048 MB of RAM for role 'foo'. The result is that 4 CPUs and 4096MB of RAM on that slave are reserved for 'foo': there are *not* two distinct reservations that might themselves be assigned an ID. Offhand, my initial impression is that this ticket would not be a reasonable thing to implement (unless we redefine how reservations work). > Add an optional unique identifier for resource reservations > --- > > Key: MESOS-3826 > URL: https://issues.apache.org/jira/browse/MESOS-3826 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Sargun Dhillon >Assignee: Guangya Liu >Priority: Minor > Labels: mesosphere > > Thanks to the resource reservation primitives, frameworks can reserve > resources. These reservations are per role, which means multiple frameworks > can share reservations. This can get very hairy, as multiple reservations can > occur on each agent. > It would be nice to be able to optionally, uniquely identify reservations by > ID, much like persistent volumes are today. This could be done by adding a > new protobuf field, such as Resource.ReservationInfo.id, that if set upon > reservation time, would come back when the reservation is advertised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3833) /help endpoints do not work for nested paths
[ https://issues.apache.org/jira/browse/MESOS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991229#comment-14991229 ] Guangya Liu commented on MESOS-3833: [~bmahler] The /help/master works well and it will list all master related endpoints: {code} /master/api/v1/scheduler /master/flags /master/frameworks /master/health /master/machine/down /master/machine/up /master/maintenance/schedule /master/maintenance/status /master/observe /master/redirect /master/reserve /master/roles /master/roles.json /master/slaves /master/state /master/state-summary /master/state.json /master/tasks /master/tasks.json /master/teardown /master/unreserve {code} But when click the links for master which has more than two strings, the mesos will report error. The reason is that the current https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/help.cpp#L120-L127 logic can only handle three strings (/help/master/) in an endpoint, but for four or bigger string endpoint (/help/master/xxx/xxx), the mesos will report error. Comments? > /help endpoints do not work for nested paths > > > Key: MESOS-3833 > URL: https://issues.apache.org/jira/browse/MESOS-3833 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Reporter: Anand Mazumdar >Assignee: Guangya Liu >Priority: Minor > Labels: mesosphere, newbie > > Mesos displays the list of all supported endpoints starting at a given path > prefix using the {{/help}} suffix, e.g. {{master:5050/help}}. > It seems that the {{help}} functionality is broken for URL's having nested > paths e.g. {{master:5050/help/master/machine/down}}. The response returned is: > {quote} > Malformed URL, expecting '/help/id/name/' > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3833) /help endpoints do not work for nested paths
[ https://issues.apache.org/jira/browse/MESOS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991240#comment-14991240 ] Benjamin Mahler commented on MESOS-3833: Let's fix Help::help to handle these. > /help endpoints do not work for nested paths > > > Key: MESOS-3833 > URL: https://issues.apache.org/jira/browse/MESOS-3833 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Reporter: Anand Mazumdar >Assignee: Guangya Liu >Priority: Minor > Labels: mesosphere, newbie > > Mesos displays the list of all supported endpoints starting at a given path > prefix using the {{/help}} suffix, e.g. {{master:5050/help}}. > It seems that the {{help}} functionality is broken for URL's having nested > paths e.g. {{master:5050/help/master/machine/down}}. The response returned is: > {quote} > Malformed URL, expecting '/help/id/name/' > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3833) /help endpoints do not work for nested paths
[ https://issues.apache.org/jira/browse/MESOS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991037#comment-14991037 ] Guangya Liu commented on MESOS-3833: There are two solutions for this, the first is do not use nested paths and update all nested paths to single string, the second is update https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/help.cpp#L120-L127 to support nested path, but it is difficult to handle this as the current nested paths including two words and not sure if we have more words nested paths in future. [~bmahler] any comments? Thanks. > /help endpoints do not work for nested paths > > > Key: MESOS-3833 > URL: https://issues.apache.org/jira/browse/MESOS-3833 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Reporter: Anand Mazumdar >Priority: Minor > Labels: mesosphere, newbie > > Mesos displays the list of all supported endpoints starting at a given path > prefix using the {{/help}} suffix, e.g. {{master:5050/help}}. > It seems that the {{help}} functionality is broken for URL's having nested > paths e.g. {{master:5050/help/master/machine/down}}. The response returned is: > {quote} > Malformed URL, expecting '/help/id/name/' > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3833) /help endpoints do not work for nested paths
[ https://issues.apache.org/jira/browse/MESOS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guangya Liu reassigned MESOS-3833: -- Assignee: Guangya Liu > /help endpoints do not work for nested paths > > > Key: MESOS-3833 > URL: https://issues.apache.org/jira/browse/MESOS-3833 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Reporter: Anand Mazumdar >Assignee: Guangya Liu >Priority: Minor > Labels: mesosphere, newbie > > Mesos displays the list of all supported endpoints starting at a given path > prefix using the {{/help}} suffix, e.g. {{master:5050/help}}. > It seems that the {{help}} functionality is broken for URL's having nested > paths e.g. {{master:5050/help/master/machine/down}}. The response returned is: > {quote} > Malformed URL, expecting '/help/id/name/' > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3834) slave upgrade framework checkpoint incompatibility
[ https://issues.apache.org/jira/browse/MESOS-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991116#comment-14991116 ] James Peach commented on MESOS-3834: I'm gonna take a crack at a patch for us that restores the compatibility check and also rewrites the framework checkpoint once it is recovered. If the latter is a terrible idea for some reason, I'd love to be educated about it ;) > slave upgrade framework checkpoint incompatibility > --- > > Key: MESOS-3834 > URL: https://issues.apache.org/jira/browse/MESOS-3834 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.24.1 >Reporter: James Peach >Assignee: James Peach > > We are upgrading from 0.22 to 0.25 and experienced the following crash in the > 0.24 slave: > {code} > F1104 05:20:49.162701 1153 slave.cpp:4175] Check failed: > frameworkInfo.has_id() > *** Check failure stack trace: *** > @ 0x7fef9c294650 google::LogMessage::Fail() > @ 0x7fef9c29459f google::LogMessage::SendToLog() > @ 0x7fef9c293fb0 google::LogMessage::Flush() > @ 0x7fef9c296ce4 google::LogMessageFatal::~LogMessageFatal() > @ 0x7fef9b9a5492 mesos::internal::slave::Slave::recoverFramework() > @ 0x7fef9b9a3314 mesos::internal::slave::Slave::recover() > @ 0x7fef9b9d069c > _ZZN7process8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS4_5state5StateEES9_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSG_FSE_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESP_ > @ 0x7fef9ba039f4 > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS8_5state5StateEESD_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSK_FSI_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > {code} > As near as I can tell, what happened was this: > - 0.22 wrote {{framework.info}} without the FrameworkID > - 0.23 had a compatibility check so it was ok with it > - 0.24 removed the compatibility check in MESOS-2259 > - the framework checkpoint doesn't get rewritten during recovery so when the > 0.24 slave starts it reads the 0.22 version > - 0.24 asserts -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3834) slave upgrade framework checkpoint incompatibility
James Peach created MESOS-3834: -- Summary: slave upgrade framework checkpoint incompatibility Key: MESOS-3834 URL: https://issues.apache.org/jira/browse/MESOS-3834 Project: Mesos Issue Type: Bug Affects Versions: 0.24.1 Reporter: James Peach Assignee: James Peach We are upgrading from 0.22 to 0.25 and experienced the following crash in the 0.24 slave: {code} F1104 05:20:49.162701 1153 slave.cpp:4175] Check failed: frameworkInfo.has_id() *** Check failure stack trace: *** @ 0x7fef9c294650 google::LogMessage::Fail() @ 0x7fef9c29459f google::LogMessage::SendToLog() @ 0x7fef9c293fb0 google::LogMessage::Flush() @ 0x7fef9c296ce4 google::LogMessageFatal::~LogMessageFatal() @ 0x7fef9b9a5492 mesos::internal::slave::Slave::recoverFramework() @ 0x7fef9b9a3314 mesos::internal::slave::Slave::recover() @ 0x7fef9b9d069c _ZZN7process8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS4_5state5StateEES9_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSG_FSE_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESP_ @ 0x7fef9ba039f4 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS8_5state5StateEESD_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSK_FSI_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ {code} As near as I can tell, what happened was this: - 0.22 wrote {{framework.info}} without the FrameworkID - 0.23 had a compatibility check so it was ok with it - 0.24 removed the compatibility check in MESOS-2259 - the framework checkpoint doesn't get rewritten during recovery so when the 0.24 slave starts it reads the 0.22 version - 0.24 asserts -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3833) /help endpoints do not work for nested paths
[ https://issues.apache.org/jira/browse/MESOS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991218#comment-14991218 ] Benjamin Mahler commented on MESOS-3833: [~gyliu] it doesn't look that difficult to update the help code in your link to support an arbitrary amount of tokens. However, I'm a bit surprised that the code behaves this way. What is listed when you hit /help/master? Taking a quick glance at the code, it looks like the help code handles multi-token paths during the call to [Help::add|https://github.com/apache/mesos/blob/0.25.0/3rdparty/libprocess/src/help.cpp#L81], so seems that [Help::help|https://github.com/apache/mesos/blob/0.25.0/3rdparty/libprocess/src/help.cpp#L109] should handle these as well. > /help endpoints do not work for nested paths > > > Key: MESOS-3833 > URL: https://issues.apache.org/jira/browse/MESOS-3833 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Reporter: Anand Mazumdar >Assignee: Guangya Liu >Priority: Minor > Labels: mesosphere, newbie > > Mesos displays the list of all supported endpoints starting at a given path > prefix using the {{/help}} suffix, e.g. {{master:5050/help}}. > It seems that the {{help}} functionality is broken for URL's having nested > paths e.g. {{master:5050/help/master/machine/down}}. The response returned is: > {quote} > Malformed URL, expecting '/help/id/name/' > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3751) MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with --executor_environmnent_variables
[ https://issues.apache.org/jira/browse/MESOS-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Maloney updated MESOS-3751: Fix Version/s: 0.26.0 > MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with > --executor_environmnent_variables > --- > > Key: MESOS-3751 > URL: https://issues.apache.org/jira/browse/MESOS-3751 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 0.24.1, 0.25.0 >Reporter: Cody Maloney >Assignee: Gilbert Song > Labels: mesosphere, newbie > Fix For: 0.26.0 > > > When using --executor_environment_variables, and having > MESOS_NATIVE_JAVA_LIBRARY in the environment of mesos-slave, the mesos > containerizer does not set MESOS_NATIVE_JAVA_LIBRARY itself. > Relevant code: > https://github.com/apache/mesos/blob/14f7967ef307f3d98e3a4b93d92d6b3a56399b20/src/slave/containerizer/containerizer.cpp#L281 > It sees that the variable is in the mesos-slave's environment (os::getenv), > rather than checking if it is set in the environment variable set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2077) Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.
[ https://issues.apache.org/jira/browse/MESOS-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991266#comment-14991266 ] Guangya Liu commented on MESOS-2077: [~bmahler] can you please help shepherd this? Thanks! > Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason. > - > > Key: MESOS-2077 > URL: https://issues.apache.org/jira/browse/MESOS-2077 > Project: Mesos > Issue Type: Improvement > Components: master, slave >Reporter: Benjamin Mahler >Assignee: Guangya Liu > Labels: twitter > > For maintenance, sometimes operators will force the drain of a slave (via > SIGUSR1), when deemed safe (e.g. non-critical tasks running) and/or necessary > (e.g. bad hardware). > To eliminate alerting noise, we'd like to add a 'Reason' that expresses the > forced drain of the slave, so that these are not considered to be a generic > slave removal TASK_LOST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2647) Slave should validate tasks using oversubscribed resources
[ https://issues.apache.org/jira/browse/MESOS-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991270#comment-14991270 ] Guangya Liu commented on MESOS-2647: [~vi...@twitter.com] can you please help review? Thanks! > Slave should validate tasks using oversubscribed resources > -- > > Key: MESOS-2647 > URL: https://issues.apache.org/jira/browse/MESOS-2647 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Guangya Liu > Labels: twitter > > The latest oversubscribed resource estimate might render a revocable task > launch invalid. Slave should check this and send TASK_LOST with appropriate > REASON. > We need to add a new REASON for this (REASON_RESOURCE_OVERSUBSCRIBED?). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3828) Strategy for Utilizing Docker 1.9 Multihost Networking
[ https://issues.apache.org/jira/browse/MESOS-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Omernik updated MESOS-3828: Labels: Docker isolation network plugins (was: Docker feaa isolation network plugins) > Strategy for Utilizing Docker 1.9 Multihost Networking > -- > > Key: MESOS-3828 > URL: https://issues.apache.org/jira/browse/MESOS-3828 > Project: Mesos > Issue Type: Story > Components: isolation >Affects Versions: 0.26.0 >Reporter: John Omernik > Labels: Docker, isolation, network, plugins > > This is a user story to discuss the strategy for Mesos to in using the new > Docker 1.9 feature: Multihost Networking. > http://blog.docker.com/2015/11/docker-multi-host-networking-ga/ > Basically we should determine if this is something we want to work with from > a standpoint of container isolation and going forward how can we best > integrate. > The space for networking in Mesos is growing fast with IP per Container and > other networking modules being worked on. Projects like Project Calico offer > services from outside the Mesos community that plug nicely or will plug > nicely into Mesos. > So how about Multihost networking? An option to work with? With Docker being > a first class citizen of Mesos, this is something we should be considering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3828) Strategy for Utilizing Docker 1.9 Multihost Networking
John Omernik created MESOS-3828: --- Summary: Strategy for Utilizing Docker 1.9 Multihost Networking Key: MESOS-3828 URL: https://issues.apache.org/jira/browse/MESOS-3828 Project: Mesos Issue Type: Story Components: isolation Affects Versions: 0.26.0 Reporter: John Omernik This is a user story to discuss the strategy for Mesos to in using the new Docker 1.9 feature: Multihost Networking. http://blog.docker.com/2015/11/docker-multi-host-networking-ga/ Basically we should determine if this is something we want to work with from a standpoint of container isolation and going forward how can we best integrate. The space for networking in Mesos is growing fast with IP per Container and other networking modules being worked on. Projects like Project Calico offer services from outside the Mesos community that plug nicely or will plug nicely into Mesos. So how about Multihost networking? An option to work with? With Docker being a first class citizen of Mesos, this is something we should be considering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2353) Improve performance of the master's state.json endpoint for large clusters.
[ https://issues.apache.org/jira/browse/MESOS-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989814#comment-14989814 ] Vinod Kone commented on MESOS-2353: --- [~mcypark] Do you have an ETA on when you would get the design and reviews out? This is causing issues for us in production, so want to fix this asap. > Improve performance of the master's state.json endpoint for large clusters. > --- > > Key: MESOS-2353 > URL: https://issues.apache.org/jira/browse/MESOS-2353 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Benjamin Mahler > Labels: newbie, scalability, twitter > > The master's state.json endpoint consistently takes a long time to compute > the JSON result, for large clusters: > {noformat} > $ time curl -s -o /dev/null localhost:5050/master/state.json > Mon Jan 26 22:38:50 UTC 2015 > real 0m13.174s > user 0m0.003s > sys 0m0.022s > {noformat} > This can cause the master to get backlogged if there are many state.json > requests in flight. > Looking at {{perf}} data, it seems most of the time is spent doing memory > allocation / de-allocation. This ticket will try to capture any low hanging > fruit to speed this up. Possibly we can leverage moves if they are not > already being used by the compiler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3388) Add an interface to allow Slave Modules to checkpoint/restore state.
[ https://issues.apache.org/jira/browse/MESOS-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-3388: - Labels: external-volumes mesosphere module (was: ) > Add an interface to allow Slave Modules to checkpoint/restore state. > > > Key: MESOS-3388 > URL: https://issues.apache.org/jira/browse/MESOS-3388 > Project: Mesos > Issue Type: Bug >Reporter: Kapil Arya >Assignee: Greg Mann > Labels: external-volumes, mesosphere, module > > * This is to restore module-specific in-memory data structures that might be > required by the modules to do cleanup on task exit, etc. > * We need to define the interaction when an Agent is restarted with a > different set of modules. > One open question is how does an Agent identify a certain module? One > possibility is to assign a UID to the module and pass it in during > `create()`?. The UID is used to assign a ckpt directory during ckpt/restore. > (Something like /tmp/mesos/...//modules/). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3817) Rename offers to outstanding offers
[ https://issues.apache.org/jira/browse/MESOS-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-3817: Labels: newbie (was: ) > Rename offers to outstanding offers > --- > > Key: MESOS-3817 > URL: https://issues.apache.org/jira/browse/MESOS-3817 > Project: Mesos > Issue Type: Bug > Components: webui >Reporter: haosdent > Labels: newbie > > As discussion in http://search-hadoop.com/m/0Vlr6NFAux1DPmxp , we need rename > offers to outstanding offers in webui to avoid user confuse. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3815) docker executor not works when SSL enable
[ https://issues.apache.org/jira/browse/MESOS-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-3815: Summary: docker executor not works when SSL enable (was: os environment variables not passing to docker-executor environment variables correctly) > docker executor not works when SSL enable > - > > Key: MESOS-3815 > URL: https://issues.apache.org/jira/browse/MESOS-3815 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3827) Improve compilation speed of GMock tests
[ https://issues.apache.org/jira/browse/MESOS-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway reassigned MESOS-3827: -- Assignee: Neil Conway > Improve compilation speed of GMock tests > > > Key: MESOS-3827 > URL: https://issues.apache.org/jira/browse/MESOS-3827 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway >Assignee: Neil Conway >Priority: Minor > Labels: mesosphere, tech-debt, testing > > The GMock docs suggest that moving the definition of mock classes' > constructors and destructors to a separate compilation unit can improve > compile performance: > https://code.google.com/p/googlemock/wiki/V1_7_CookBook#Making_the_Compilation_Faster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2079) IO.Write test is flaky on OS X 10.10.
[ https://issues.apache.org/jira/browse/MESOS-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989947#comment-14989947 ] James Peach commented on MESOS-2079: https://reviews.apache.org/r/39938/ https://reviews.apache.org/r/39940/ https://reviews.apache.org/r/39941/ > IO.Write test is flaky on OS X 10.10. > - > > Key: MESOS-2079 > URL: https://issues.apache.org/jira/browse/MESOS-2079 > Project: Mesos > Issue Type: Task > Components: libprocess, technical debt, test > Environment: OS X 10.10 > {noformat} > $ clang++ --version > Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn) > Target: x86_64-apple-darwin14.0.0 > Thread model: posix > {noformat} >Reporter: Benjamin Mahler >Assignee: James Peach > Labels: flaky > > [~benjaminhindman]: If I recall correctly, this is related to MESOS-1658. > Unfortunately, we don't have a stacktrace for SIGPIPE currently: > {noformat} > [ RUN ] IO.Write > make[5]: *** [check-local] Broken pipe: 13 > {noformat} > Running in gdb, seems to always occur here: > {code} > Program received signal SIGPIPE, Broken pipe. > [Switching to process 56827 thread 0x60b] > 0x7fff9a011132 in __psynch_cvwait () > (gdb) where > #0 0x7fff9a011132 in __psynch_cvwait () > #1 0x7fff903e7ea0 in _pthread_cond_wait () > #2 0x00010062f27c in Gate::arrive (this=0x101908a10, old=14780) at > gate.hpp:82 > #3 0x000100600888 in process::schedule (arg=0x0) at src/process.cpp:1373 > #4 0x7fff903e72fc in _pthread_body () > #5 0x7fff903e7279 in _pthread_start () > #6 0x7fff903e54b1 in thread_start () > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3815) docker executor not works when SSL enable
[ https://issues.apache.org/jira/browse/MESOS-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-3815: Description: Because docker executor not pass SSL related environment variables, mesos-docker-executor could not works normal when SSL enable. More details could found in http://search-hadoop.com/m/0Vlr6DsslDSvVs72 > docker executor not works when SSL enable > - > > Key: MESOS-3815 > URL: https://issues.apache.org/jira/browse/MESOS-3815 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent > > Because docker executor not pass SSL related environment variables, > mesos-docker-executor could not works normal when SSL enable. More details > could found in http://search-hadoop.com/m/0Vlr6DsslDSvVs72 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2079) IO.Write test is flaky on OS X 10.10.
[ https://issues.apache.org/jira/browse/MESOS-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989947#comment-14989947 ] James Peach edited comment on MESOS-2079 at 11/4/15 5:19 PM: - These patches global ignore {{SIGPIPE}} during libprocess initialization, document {{SIGPIPE}} behavior a bit more, and remove various signal manipulations that were formerly necessary for disabling {{SIGPIPE}} delivery. https://reviews.apache.org/r/39938/ https://reviews.apache.org/r/39940/ https://reviews.apache.org/r/39941/ was (Author: jamespeach): https://reviews.apache.org/r/39938/ https://reviews.apache.org/r/39940/ https://reviews.apache.org/r/39941/ > IO.Write test is flaky on OS X 10.10. > - > > Key: MESOS-2079 > URL: https://issues.apache.org/jira/browse/MESOS-2079 > Project: Mesos > Issue Type: Task > Components: libprocess, technical debt, test > Environment: OS X 10.10 > {noformat} > $ clang++ --version > Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn) > Target: x86_64-apple-darwin14.0.0 > Thread model: posix > {noformat} >Reporter: Benjamin Mahler >Assignee: James Peach > Labels: flaky > > [~benjaminhindman]: If I recall correctly, this is related to MESOS-1658. > Unfortunately, we don't have a stacktrace for SIGPIPE currently: > {noformat} > [ RUN ] IO.Write > make[5]: *** [check-local] Broken pipe: 13 > {noformat} > Running in gdb, seems to always occur here: > {code} > Program received signal SIGPIPE, Broken pipe. > [Switching to process 56827 thread 0x60b] > 0x7fff9a011132 in __psynch_cvwait () > (gdb) where > #0 0x7fff9a011132 in __psynch_cvwait () > #1 0x7fff903e7ea0 in _pthread_cond_wait () > #2 0x00010062f27c in Gate::arrive (this=0x101908a10, old=14780) at > gate.hpp:82 > #3 0x000100600888 in process::schedule (arg=0x0) at src/process.cpp:1373 > #4 0x7fff903e72fc in _pthread_body () > #5 0x7fff903e7279 in _pthread_start () > #6 0x7fff903e54b1 in thread_start () > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3388) Add an interface to allow Slave Modules to checkpoint/restore state.
[ https://issues.apache.org/jira/browse/MESOS-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989964#comment-14989964 ] Greg Mann commented on MESOS-3388: -- Regarding Agent restart, I'm trying to decide if it makes sense for us to garbage collect the checkpointed state of undetected modules on Agent startup. On the one hand, it's good to leave the Agent in a clean state whenever we can. On the other, it's possible that a user may restart the Agent multiple times with different modules present, and it could be useful for them to have old checkpointed module data hanging around. If our long-term vision is that Agent restart should be a seldom-used operator action, then perhaps garbage collecting old module checkpoint data isn't such a big deal. If we imagine Agents being restarted frequently in order to accomplish different Attribute/Resource/Module configurations, then cleanup would be wise. Regarding module UIDs, how will we maintain association of a given module with its ID through an Agent failover or restart? i.e., if we assign a module a UID, checkpoint some state, and then restart the Agent, how do we know what that module's UID was? Perhaps we could use a hash on the module name? > Add an interface to allow Slave Modules to checkpoint/restore state. > > > Key: MESOS-3388 > URL: https://issues.apache.org/jira/browse/MESOS-3388 > Project: Mesos > Issue Type: Bug >Reporter: Kapil Arya >Assignee: Greg Mann > > * This is to restore module-specific in-memory data structures that might be > required by the modules to do cleanup on task exit, etc. > * We need to define the interaction when an Agent is restarted with a > different set of modules. > One open question is how does an Agent identify a certain module? One > possibility is to assign a UID to the module and pass it in during > `create()`?. The UID is used to assign a ckpt directory during ckpt/restore. > (Something like /tmp/mesos/...//modules/). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3817) Rename offers to outstanding offers
[ https://issues.apache.org/jira/browse/MESOS-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989991#comment-14989991 ] haosdent commented on MESOS-3817: - I think rename "Offers" to "Outstanding Offers" in offers.html and index.html should be enough. > Rename offers to outstanding offers > --- > > Key: MESOS-3817 > URL: https://issues.apache.org/jira/browse/MESOS-3817 > Project: Mesos > Issue Type: Bug > Components: webui >Reporter: haosdent > Labels: newbie > > As discussion in http://search-hadoop.com/m/0Vlr6NFAux1DPmxp , we need rename > offers to outstanding offers in webui to avoid user confuse. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2353) Improve performance of the master's state.json endpoint for large clusters.
[ https://issues.apache.org/jira/browse/MESOS-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989200#comment-14989200 ] Felix Bechstein commented on MESOS-2353: We are experiencing severe issues too. The master is using most of it's cpu cycles for answering the master/state.json and metrics/snapshot requests. It takes up to 30s to fetch the state. We experience, that the master is getting slow in sending offers because of that. We noticed, that restarting the leader to force reelection, the problem goes away for some time. > Improve performance of the master's state.json endpoint for large clusters. > --- > > Key: MESOS-2353 > URL: https://issues.apache.org/jira/browse/MESOS-2353 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Benjamin Mahler > Labels: newbie, scalability, twitter > > The master's state.json endpoint consistently takes a long time to compute > the JSON result, for large clusters: > {noformat} > $ time curl -s -o /dev/null localhost:5050/master/state.json > Mon Jan 26 22:38:50 UTC 2015 > real 0m13.174s > user 0m0.003s > sys 0m0.022s > {noformat} > This can cause the master to get backlogged if there are many state.json > requests in flight. > Looking at {{perf}} data, it seems most of the time is spent doing memory > allocation / de-allocation. This ticket will try to capture any low hanging > fruit to speed this up. Possibly we can leverage moves if they are not > already being used by the compiler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-3815) docker executor not works when SSL enable
[ https://issues.apache.org/jira/browse/MESOS-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984317#comment-14984317 ] haosdent edited comment on MESOS-3815 at 11/4/15 6:04 PM: -- Patch: https://reviews.apache.org/r/39944/ https://reviews.apache.org/r/39945/ was (Author: haosd...@gmail.com): Patch: https://reviews.apache.org/r/39837/ > docker executor not works when SSL enable > - > > Key: MESOS-3815 > URL: https://issues.apache.org/jira/browse/MESOS-3815 > Project: Mesos > Issue Type: Bug >Reporter: haosdent >Assignee: haosdent > > Because docker executor not pass SSL related environment variables, > mesos-docker-executor could not works normal when SSL enable. More details > could found in http://search-hadoop.com/m/0Vlr6DsslDSvVs72 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3827) Improve compilation speed of GMock tests
[ https://issues.apache.org/jira/browse/MESOS-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990064#comment-14990064 ] James Peach commented on MESOS-3827: Did you measure this? I tried it and it didn't make much difference for me :-/ > Improve compilation speed of GMock tests > > > Key: MESOS-3827 > URL: https://issues.apache.org/jira/browse/MESOS-3827 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway >Assignee: Neil Conway >Priority: Minor > Labels: mesosphere, tech-debt, testing > > The GMock docs suggest that moving the definition of mock classes' > constructors and destructors to a separate compilation unit can improve > compile performance: > https://code.google.com/p/googlemock/wiki/V1_7_CookBook#Making_the_Compilation_Faster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3827) Improve compilation speed of GMock tests
[ https://issues.apache.org/jira/browse/MESOS-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-3827: --- Shepherd: Joris Van Remoortere > Improve compilation speed of GMock tests > > > Key: MESOS-3827 > URL: https://issues.apache.org/jira/browse/MESOS-3827 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway >Assignee: Neil Conway >Priority: Minor > Labels: mesosphere, tech-debt, testing > > The GMock docs suggest that moving the definition of mock classes' > constructors and destructors to a separate compilation unit can improve > compile performance: > https://code.google.com/p/googlemock/wiki/V1_7_CookBook#Making_the_Compilation_Faster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3827) Improve compilation speed of GMock tests
[ https://issues.apache.org/jira/browse/MESOS-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990099#comment-14990099 ] James Peach commented on MESOS-3827: Yup. I'd agree it is worth it even if it made no difference :) > Improve compilation speed of GMock tests > > > Key: MESOS-3827 > URL: https://issues.apache.org/jira/browse/MESOS-3827 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway >Assignee: Neil Conway >Priority: Minor > Labels: mesosphere, tech-debt, testing > > The GMock docs suggest that moving the definition of mock classes' > constructors and destructors to a separate compilation unit can improve > compile performance: > https://code.google.com/p/googlemock/wiki/V1_7_CookBook#Making_the_Compilation_Faster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3827) Improve compilation speed of GMock tests
[ https://issues.apache.org/jira/browse/MESOS-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990130#comment-14990130 ] Neil Conway commented on MESOS-3827: One irritant is that we can't use this technique with TestAllocator (because it is templatized, so def'n needs to be in a header), which is a fairly expensive thing to compile: just compiling the things that use it takes ~175 seconds of CPU time (versus ~1300 for the whole test suite). Not sure if there's any easy fix for this, though. > Improve compilation speed of GMock tests > > > Key: MESOS-3827 > URL: https://issues.apache.org/jira/browse/MESOS-3827 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway >Assignee: Neil Conway >Priority: Minor > Labels: mesosphere, tech-debt, testing > > The GMock docs suggest that moving the definition of mock classes' > constructors and destructors to a separate compilation unit can improve > compile performance: > https://code.google.com/p/googlemock/wiki/V1_7_CookBook#Making_the_Compilation_Faster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3827) Improve compilation speed of GMock tests
[ https://issues.apache.org/jira/browse/MESOS-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990136#comment-14990136 ] Neil Conway commented on MESOS-3827: https://reviews.apache.org/r/39946/ https://reviews.apache.org/r/39947/ > Improve compilation speed of GMock tests > > > Key: MESOS-3827 > URL: https://issues.apache.org/jira/browse/MESOS-3827 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway >Assignee: Neil Conway >Priority: Minor > Labels: mesosphere, tech-debt, testing > > The GMock docs suggest that moving the definition of mock classes' > constructors and destructors to a separate compilation unit can improve > compile performance: > https://code.google.com/p/googlemock/wiki/V1_7_CookBook#Making_the_Compilation_Faster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine
[ https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989299#comment-14989299 ] Peter Kolloch commented on MESOS-3793: -- The last log line ( Failed to locate systemd runtime directory: /run/systemd/system) looks as if mesos depended on systemd? Is that correct and expected? > Cannot start mesos local on a Debian GNU/Linux 8 docker machine > --- > > Key: MESOS-3793 > URL: https://issues.apache.org/jira/browse/MESOS-3793 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 > Environment: Debian GNU/Linux 8 docker machine >Reporter: Matthias Veit >Assignee: Jojy Varghese > Labels: mesosphere > > We updated the mesos version to 0.25.0 in our Marathon docker image, that > runs our integration tests. > We use mesos local for those tests. This fails with this message: > {noformat} > root@a06e4b4eb776:/marathon# mesos local > I1022 18:42:26.852485 136 leveldb.cpp:176] Opened db in 6.103258ms > I1022 18:42:26.853302 136 leveldb.cpp:183] Compacted db in 765740ns > I1022 18:42:26.853343 136 leveldb.cpp:198] Created db iterator in 9001ns > I1022 18:42:26.853355 136 leveldb.cpp:204] Seeked to beginning of db in > 1287ns > I1022 18:42:26.853366 136 leveldb.cpp:273] Iterated through 0 keys in the > db in ns > I1022 18:42:26.853406 136 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1022 18:42:26.853775 141 recover.cpp:449] Starting replica recovery > I1022 18:42:26.853862 141 recover.cpp:475] Replica is in EMPTY status > I1022 18:42:26.854751 138 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I1022 18:42:26.854856 140 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1022 18:42:26.855002 140 recover.cpp:566] Updating replica status to > STARTING > I1022 18:42:26.855655 138 master.cpp:376] Master > a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on > 172.17.0.14:5050 > I1022 18:42:26.855680 138 master.cpp:378] Flags at startup: > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="false" --authenticate_slaves="false" > --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_slave_ping_timeouts="5" --quiet="false" > --recovery_slave_removal_limit="100%" --registry="replicated_log" > --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" > --registry_strict="false" --root_submissions="true" > --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" > --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs" > I1022 18:42:26.855790 138 master.cpp:425] Master allowing unauthenticated > frameworks to register > I1022 18:42:26.855803 138 master.cpp:430] Master allowing unauthenticated > slaves to register > I1022 18:42:26.855815 138 master.cpp:467] Using default 'crammd5' > authenticator > W1022 18:42:26.855829 138 authenticator.cpp:505] No credentials provided, > authentication requests will be refused > I1022 18:42:26.855840 138 authenticator.cpp:512] Initializing server SASL > I1022 18:42:26.856442 136 containerizer.cpp:143] Using isolation: > posix/cpu,posix/mem,filesystem/posix > I1022 18:42:26.856943 140 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.888185ms > I1022 18:42:26.856987 140 replica.cpp:323] Persisted replica status to > STARTING > I1022 18:42:26.857115 140 recover.cpp:475] Replica is in STARTING status > I1022 18:42:26.857270 140 replica.cpp:641] Replica in STARTING status > received a broadcasted recover request > I1022 18:42:26.857312 140 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1022 18:42:26.857368 140 recover.cpp:566] Updating replica status to VOTING > I1022 18:42:26.857781 140 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 371121ns > I1022 18:42:26.857841 140 replica.cpp:323] Persisted replica status to > VOTING > I1022 18:42:26.857895 140 recover.cpp:580] Successfully joined the Paxos > group > I1022 18:42:26.857928 140 recover.cpp:464] Recover process terminated > I1022 18:42:26.862455 137 master.cpp:1603] The newly elected leader is > master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8 > I1022 18:42:26.862498 137 master.cpp:1616] Elected as the leading master! > I1022 18:42:26.862511 137 master.cpp:1376] Recovering from registrar > I1022 18:42:26.862560 137 registrar.cpp:309] Recovering
[jira] [Commented] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine
[ https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989308#comment-14989308 ] Peter Kolloch commented on MESOS-3793: -- [~karlkfi]Is it correct that you encountered this problem, too? Did you find a workaround? > Cannot start mesos local on a Debian GNU/Linux 8 docker machine > --- > > Key: MESOS-3793 > URL: https://issues.apache.org/jira/browse/MESOS-3793 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 > Environment: Debian GNU/Linux 8 docker machine >Reporter: Matthias Veit >Assignee: Jojy Varghese > Labels: mesosphere > > We updated the mesos version to 0.25.0 in our Marathon docker image, that > runs our integration tests. > We use mesos local for those tests. This fails with this message: > {noformat} > root@a06e4b4eb776:/marathon# mesos local > I1022 18:42:26.852485 136 leveldb.cpp:176] Opened db in 6.103258ms > I1022 18:42:26.853302 136 leveldb.cpp:183] Compacted db in 765740ns > I1022 18:42:26.853343 136 leveldb.cpp:198] Created db iterator in 9001ns > I1022 18:42:26.853355 136 leveldb.cpp:204] Seeked to beginning of db in > 1287ns > I1022 18:42:26.853366 136 leveldb.cpp:273] Iterated through 0 keys in the > db in ns > I1022 18:42:26.853406 136 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1022 18:42:26.853775 141 recover.cpp:449] Starting replica recovery > I1022 18:42:26.853862 141 recover.cpp:475] Replica is in EMPTY status > I1022 18:42:26.854751 138 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I1022 18:42:26.854856 140 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1022 18:42:26.855002 140 recover.cpp:566] Updating replica status to > STARTING > I1022 18:42:26.855655 138 master.cpp:376] Master > a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on > 172.17.0.14:5050 > I1022 18:42:26.855680 138 master.cpp:378] Flags at startup: > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="false" --authenticate_slaves="false" > --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_slave_ping_timeouts="5" --quiet="false" > --recovery_slave_removal_limit="100%" --registry="replicated_log" > --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" > --registry_strict="false" --root_submissions="true" > --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" > --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs" > I1022 18:42:26.855790 138 master.cpp:425] Master allowing unauthenticated > frameworks to register > I1022 18:42:26.855803 138 master.cpp:430] Master allowing unauthenticated > slaves to register > I1022 18:42:26.855815 138 master.cpp:467] Using default 'crammd5' > authenticator > W1022 18:42:26.855829 138 authenticator.cpp:505] No credentials provided, > authentication requests will be refused > I1022 18:42:26.855840 138 authenticator.cpp:512] Initializing server SASL > I1022 18:42:26.856442 136 containerizer.cpp:143] Using isolation: > posix/cpu,posix/mem,filesystem/posix > I1022 18:42:26.856943 140 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.888185ms > I1022 18:42:26.856987 140 replica.cpp:323] Persisted replica status to > STARTING > I1022 18:42:26.857115 140 recover.cpp:475] Replica is in STARTING status > I1022 18:42:26.857270 140 replica.cpp:641] Replica in STARTING status > received a broadcasted recover request > I1022 18:42:26.857312 140 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1022 18:42:26.857368 140 recover.cpp:566] Updating replica status to VOTING > I1022 18:42:26.857781 140 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 371121ns > I1022 18:42:26.857841 140 replica.cpp:323] Persisted replica status to > VOTING > I1022 18:42:26.857895 140 recover.cpp:580] Successfully joined the Paxos > group > I1022 18:42:26.857928 140 recover.cpp:464] Recover process terminated > I1022 18:42:26.862455 137 master.cpp:1603] The newly elected leader is > master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8 > I1022 18:42:26.862498 137 master.cpp:1616] Elected as the leading master! > I1022 18:42:26.862511 137 master.cpp:1376] Recovering from registrar > I1022 18:42:26.862560 137 registrar.cpp:309] Recovering registrar > Failed to create a containerizer: Could not create
[jira] [Commented] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine
[ https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989332#comment-14989332 ] Peter Kolloch commented on MESOS-3793: -- I found this related CHANGELOG entry (https://github.com/apache/mesos/blob/master/CHANGELOG#L109): {code} * [MESOS-3425] - Modify LinuxLauncher to support Systemd. {code} Maybe MESOS-3425 introduced a hard dependency on systemd utilities? Maybe MESOS-1159 is about fixing that but I am not sure. > Cannot start mesos local on a Debian GNU/Linux 8 docker machine > --- > > Key: MESOS-3793 > URL: https://issues.apache.org/jira/browse/MESOS-3793 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 > Environment: Debian GNU/Linux 8 docker machine >Reporter: Matthias Veit >Assignee: Jojy Varghese > Labels: mesosphere > > We updated the mesos version to 0.25.0 in our Marathon docker image, that > runs our integration tests. > We use mesos local for those tests. This fails with this message: > {noformat} > root@a06e4b4eb776:/marathon# mesos local > I1022 18:42:26.852485 136 leveldb.cpp:176] Opened db in 6.103258ms > I1022 18:42:26.853302 136 leveldb.cpp:183] Compacted db in 765740ns > I1022 18:42:26.853343 136 leveldb.cpp:198] Created db iterator in 9001ns > I1022 18:42:26.853355 136 leveldb.cpp:204] Seeked to beginning of db in > 1287ns > I1022 18:42:26.853366 136 leveldb.cpp:273] Iterated through 0 keys in the > db in ns > I1022 18:42:26.853406 136 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1022 18:42:26.853775 141 recover.cpp:449] Starting replica recovery > I1022 18:42:26.853862 141 recover.cpp:475] Replica is in EMPTY status > I1022 18:42:26.854751 138 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I1022 18:42:26.854856 140 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1022 18:42:26.855002 140 recover.cpp:566] Updating replica status to > STARTING > I1022 18:42:26.855655 138 master.cpp:376] Master > a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on > 172.17.0.14:5050 > I1022 18:42:26.855680 138 master.cpp:378] Flags at startup: > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="false" --authenticate_slaves="false" > --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_slave_ping_timeouts="5" --quiet="false" > --recovery_slave_removal_limit="100%" --registry="replicated_log" > --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" > --registry_strict="false" --root_submissions="true" > --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" > --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs" > I1022 18:42:26.855790 138 master.cpp:425] Master allowing unauthenticated > frameworks to register > I1022 18:42:26.855803 138 master.cpp:430] Master allowing unauthenticated > slaves to register > I1022 18:42:26.855815 138 master.cpp:467] Using default 'crammd5' > authenticator > W1022 18:42:26.855829 138 authenticator.cpp:505] No credentials provided, > authentication requests will be refused > I1022 18:42:26.855840 138 authenticator.cpp:512] Initializing server SASL > I1022 18:42:26.856442 136 containerizer.cpp:143] Using isolation: > posix/cpu,posix/mem,filesystem/posix > I1022 18:42:26.856943 140 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.888185ms > I1022 18:42:26.856987 140 replica.cpp:323] Persisted replica status to > STARTING > I1022 18:42:26.857115 140 recover.cpp:475] Replica is in STARTING status > I1022 18:42:26.857270 140 replica.cpp:641] Replica in STARTING status > received a broadcasted recover request > I1022 18:42:26.857312 140 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1022 18:42:26.857368 140 recover.cpp:566] Updating replica status to VOTING > I1022 18:42:26.857781 140 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 371121ns > I1022 18:42:26.857841 140 replica.cpp:323] Persisted replica status to > VOTING > I1022 18:42:26.857895 140 recover.cpp:580] Successfully joined the Paxos > group > I1022 18:42:26.857928 140 recover.cpp:464] Recover process terminated > I1022 18:42:26.862455 137 master.cpp:1603] The newly elected leader is > master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8 > I1022 18:42:26.862498 137 master.cpp:1616] Elected as the leading