[jira] [Commented] (MESOS-8198) Update the ReconcileOfferOperations protos
[ https://issues.apache.org/jira/browse/MESOS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250472#comment-16250472 ] Greg Mann commented on MESOS-8198: -- Review here: https://reviews.apache.org/r/63768/ > Update the ReconcileOfferOperations protos > -- > > Key: MESOS-8198 > URL: https://issues.apache.org/jira/browse/MESOS-8198 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman >Assignee: Greg Mann > Labels: mesosphere > > Some protos have been committed, but they follow an event-based API. > We decided to follow the request/response model for this API, so we need to > update the protos. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8198) Update the ReconcileOfferOperations protos
[ https://issues.apache.org/jira/browse/MESOS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-8198: - Shepherd: Vinod Kone > Update the ReconcileOfferOperations protos > -- > > Key: MESOS-8198 > URL: https://issues.apache.org/jira/browse/MESOS-8198 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman >Assignee: Greg Mann > Labels: mesosphere > > Some protos have been committed, but they follow an event-based API. > We decided to follow the request/response model for this API, so we need to > update the protos. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-8198) Update the ReconcileOfferOperations protos
[ https://issues.apache.org/jira/browse/MESOS-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-8198: Assignee: Greg Mann > Update the ReconcileOfferOperations protos > -- > > Key: MESOS-8198 > URL: https://issues.apache.org/jira/browse/MESOS-8198 > Project: Mesos > Issue Type: Task >Reporter: Gastón Kleiman >Assignee: Greg Mann > Labels: mesosphere > > Some protos have been committed, but they follow an event-based API. > We decided to follow the request/response model for this API, so we need to > update the protos. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8172) Agent --authenticate_http_executors commandline flag unrecognized in 1.4.0
[ https://issues.apache.org/jira/browse/MESOS-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244125#comment-16244125 ] Greg Mann commented on MESOS-8172: -- Was Mesos built with SSL enabled? Executor authentication requires SSL for now, but we could improve the error messaging in this case. > Agent --authenticate_http_executors commandline flag unrecognized in 1.4.0 > -- > > Key: MESOS-8172 > URL: https://issues.apache.org/jira/browse/MESOS-8172 > Project: Mesos > Issue Type: Bug > Components: executor, security >Affects Versions: 1.4.0 > Environment: Ubuntu 16.04.3 with meso 1.4.0 compiled from source > tarball. >Reporter: Dan Leary >Assignee: Greg Mann > > Apparently the mesos-agent authenticate_http_executors commandline arg was > introduced in 1.3.0 by MESOS-6365. But running "mesos-agent > --authenticate_http_executors ..." in 1.4.0 yields > {noformat} > Failed to load unknown flag 'authenticate_http_executors' > {noformat} > ...followed by a usage report that does not include > "--authenticate_http_executors". > Presumably this means executor authentication is no longer configurable. > It is still documented at > https://mesos.apache.org/documentation/latest/authentication/#agent -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8132) Design a library to send offer operation status updates
[ https://issues.apache.org/jira/browse/MESOS-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235219#comment-16235219 ] Greg Mann commented on MESOS-8132: -- Short design doc [here|https://docs.google.com/a/mesosphere.io/document/d/1hGPQA2pGjUwiR93J1mZuANByXupv42PRYZgi-HU8mb8/edit?usp=sharing]. > Design a library to send offer operation status updates > --- > > Key: MESOS-8132 > URL: https://issues.apache.org/jira/browse/MESOS-8132 > Project: Mesos > Issue Type: Task >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Major > Labels: mesosphere > > As detailed in the [offer operation feedback design > doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#], > we need to add a library to do the following: > * Send offer operation status updates > * Checkpoint pending/unacknowledged operations > * Retry operation status updates until an acknowledgement is received > This should be a common library which can be used by the agent (for its > default resources) and by local resource providers. In the future, it can > also be used by external resource providers. > We should write a short design doc to explore precisely how this will be > implemented. It can probably be modeled after the task status update manager > in the agent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8130) Add placeholder handlers for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225743#comment-16225743 ] Greg Mann commented on MESOS-8130: -- Review here: https://reviews.apache.org/r/63322/ > Add placeholder handlers for offer operation feedback > - > > Key: MESOS-8130 > URL: https://issues.apache.org/jira/browse/MESOS-8130 > Project: Mesos > Issue Type: Task > Components: agent, master >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > In order to sketch out the flow of messages necessary to facilitate offer > operation feedback, we should add some empty placeholder handlers to the > master and agent as detailed in the [offer operation feedback design > doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8130) Add placeholder handlers for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225744#comment-16225744 ] Greg Mann commented on MESOS-8130: -- {code} commit 6ecbf02c21d3cfdb74c56cbdde5d2c5879149ae9 Author: Greg Mann g...@mesosphere.io Date: Mon Oct 30 13:02:18 2017 -0700 Added placeholder handlers and other changes for operation updates. This patch adds empty placeholder handler functions which will be used for offer operation status updates as well as their acknowledgement and reconciliation. A number of switch statements are also updated to handle new enum values and validation code is added. Review: https://reviews.apache.org/r/63322/ {code} > Add placeholder handlers for offer operation feedback > - > > Key: MESOS-8130 > URL: https://issues.apache.org/jira/browse/MESOS-8130 > Project: Mesos > Issue Type: Task > Components: agent, master >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > In order to sketch out the flow of messages necessary to facilitate offer > operation feedback, we should add some empty placeholder handlers to the > master and agent as detailed in the [offer operation feedback design > doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8131) Add new protobuf messages for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225731#comment-16225731 ] Greg Mann commented on MESOS-8131: -- {code} commit e6bec836af3a672a0838cd6a1b7687f087d5594f Author: Greg Mann Date: Mon Oct 30 13:00:58 2017 -0700 Added protobuf messages for V1 scheduler operation feedback. This patch adds new and updated protobuf messages to facilitate offer operation status updates, as well as acknowledgement of those updates and operation status reconciliation. Review: https://reviews.apache.org/r/63321/ {code} > Add new protobuf messages for offer operation feedback > -- > > Key: MESOS-8131 > URL: https://issues.apache.org/jira/browse/MESOS-8131 > Project: Mesos > Issue Type: Task >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > Fix For: 1.5.0 > > > We should add the necessary protobuf messages for offer operation feedback as > detailed in the [offer operation feedback design > doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8131) Add new protobuf messages for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225729#comment-16225729 ] Greg Mann commented on MESOS-8131: -- Review here: https://reviews.apache.org/r/63321/ > Add new protobuf messages for offer operation feedback > -- > > Key: MESOS-8131 > URL: https://issues.apache.org/jira/browse/MESOS-8131 > Project: Mesos > Issue Type: Task >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > We should add the necessary protobuf messages for offer operation feedback as > detailed in the [offer operation feedback design > doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8054) Feedback for offer operations
[ https://issues.apache.org/jira/browse/MESOS-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-8054: - Shepherd: Greg Mann > Feedback for offer operations > - > > Key: MESOS-8054 > URL: https://issues.apache.org/jira/browse/MESOS-8054 > Project: Mesos > Issue Type: Epic >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > > Only LAUNCH operations provide feedback on success or failure. All Operations > should do so. RESERVE, UNRESERVE, CREATE, DESTROY, CREATE_VOLUME, AND > DESTROY_VOLUME should all provide feedback on success or failure. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8140) Executors should clear their auth tokens
[ https://issues.apache.org/jira/browse/MESOS-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-8140: - Shepherd: Greg Mann Labels: security (was: ) > Executors should clear their auth tokens > > > Key: MESOS-8140 > URL: https://issues.apache.org/jira/browse/MESOS-8140 > Project: Mesos > Issue Type: Bug > Components: executor, security >Reporter: James Peach >Assignee: James Peach > Labels: security > Fix For: 1.5.0 > > > The built-in executors should clear {{MESOS_EXECUTOR_AUTHENTICATION_TOKEN}} > from their environment since otherwise tasks running as the same user in the > same container can trivially inspect it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8130) Add placeholder handlers for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220059#comment-16220059 ] Greg Mann commented on MESOS-8130: -- Review here: https://reviews.apache.org/r/63322/ > Add placeholder handlers for offer operation feedback > - > > Key: MESOS-8130 > URL: https://issues.apache.org/jira/browse/MESOS-8130 > Project: Mesos > Issue Type: Task > Components: agent, master >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > In order to sketch out the flow of messages necessary to facilitate offer > operation feedback, we should add some empty placeholder handlers to the > master and agent as detailed in the [offer operation feedback design > doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (MESOS-8130) Add placeholder handlers for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-8130: - Comment: was deleted (was: Review here: https://reviews.apache.org/r/63322/) > Add placeholder handlers for offer operation feedback > - > > Key: MESOS-8130 > URL: https://issues.apache.org/jira/browse/MESOS-8130 > Project: Mesos > Issue Type: Task > Components: agent, master >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > In order to sketch out the flow of messages necessary to facilitate offer > operation feedback, we should add some empty placeholder handlers to the > master and agent as detailed in the [offer operation feedback design > doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-8132) Design a library to send offer operation status updates
[ https://issues.apache.org/jira/browse/MESOS-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-8132: Assignee: Greg Mann > Design a library to send offer operation status updates > --- > > Key: MESOS-8132 > URL: https://issues.apache.org/jira/browse/MESOS-8132 > Project: Mesos > Issue Type: Task >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > As detailed in the [offer operation feedback design > doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#], > we need to add a library to do the following: > * Send offer operation status updates > * Checkpoint pending/unacknowledged operations > * Retry operation status updates until an acknowledgement is received > This should be a common library which can be used by the agent (for its > default resources) and by local resource providers. In the future, it can > also be used by external resource providers. > We should write a short design doc to explore precisely how this will be > implemented. It can probably be modeled after the task status update manager > in the agent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8132) Design a library to send offer operation status updates
Greg Mann created MESOS-8132: Summary: Design a library to send offer operation status updates Key: MESOS-8132 URL: https://issues.apache.org/jira/browse/MESOS-8132 Project: Mesos Issue Type: Task Reporter: Greg Mann As detailed in the [offer operation feedback design doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#], we need to add a library to do the following: * Send offer operation status updates * Checkpoint pending/unacknowledged operations * Retry operation status updates until an acknowledgement is received This should be a common library which can be used by the agent (for its default resources) and by local resource providers. In the future, it can also be used by external resource providers. We should write a short design doc to explore precisely how this will be implemented. It can probably be modeled after the task status update manager in the agent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8131) Add new protobuf messages for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219160#comment-16219160 ] Greg Mann commented on MESOS-8131: -- This ticket refers to the framework API parts of the offer operation feedback feature. Storage-related work has already resulted in a couple reviews for other protobufs in the design: * https://reviews.apache.org/r/63001/ * https://reviews.apache.org/r/63094/ > Add new protobuf messages for offer operation feedback > -- > > Key: MESOS-8131 > URL: https://issues.apache.org/jira/browse/MESOS-8131 > Project: Mesos > Issue Type: Task >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > We should add the necessary protobuf messages for offer operation feedback as > detailed in the [offer operation feedback design > doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8131) Add new protobuf messages for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-8131: - Shepherd: Jie Yu > Add new protobuf messages for offer operation feedback > -- > > Key: MESOS-8131 > URL: https://issues.apache.org/jira/browse/MESOS-8131 > Project: Mesos > Issue Type: Task >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > We should add the necessary protobuf messages for offer operation feedback as > detailed in the [offer operation feedback design > doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-8131) Add new protobuf messages for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-8131: Assignee: Greg Mann > Add new protobuf messages for offer operation feedback > -- > > Key: MESOS-8131 > URL: https://issues.apache.org/jira/browse/MESOS-8131 > Project: Mesos > Issue Type: Task >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > We should add the necessary protobuf messages for offer operation feedback as > detailed in the [offer operation feedback design > doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8131) Add new protobuf messages for offer operation feedback
Greg Mann created MESOS-8131: Summary: Add new protobuf messages for offer operation feedback Key: MESOS-8131 URL: https://issues.apache.org/jira/browse/MESOS-8131 Project: Mesos Issue Type: Task Reporter: Greg Mann We should add the necessary protobuf messages for offer operation feedback as detailed in the [offer operation feedback design doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8130) Add placeholder handlers for offer operation feedback
Greg Mann created MESOS-8130: Summary: Add placeholder handlers for offer operation feedback Key: MESOS-8130 URL: https://issues.apache.org/jira/browse/MESOS-8130 Project: Mesos Issue Type: Task Components: agent, master Reporter: Greg Mann In order to sketch out the flow of messages necessary to facilitate offer operation feedback, we should add some empty placeholder handlers to the master and agent as detailed in the [offer operation feedback design doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-8130) Add placeholder handlers for offer operation feedback
[ https://issues.apache.org/jira/browse/MESOS-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-8130: Assignee: Greg Mann > Add placeholder handlers for offer operation feedback > - > > Key: MESOS-8130 > URL: https://issues.apache.org/jira/browse/MESOS-8130 > Project: Mesos > Issue Type: Task > Components: agent, master >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > In order to sketch out the flow of messages necessary to facilitate offer > operation feedback, we should add some empty placeholder handlers to the > master and agent as detailed in the [offer operation feedback design > doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8126) Consider decoupling the authorization logic from response creation.
[ https://issues.apache.org/jira/browse/MESOS-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217127#comment-16217127 ] Greg Mann commented on MESOS-8126: -- Agreed - I think breaking the authorization code out of {{createAgentResponse}} would clean things up. If we use a helper which modifies the {{GetAgents::Agent}} in-place, like we do in {{convertResourceFormat}}, then we could avoid extra copies as a result of the refactor. > Consider decoupling the authorization logic from response creation. > --- > > Key: MESOS-8126 > URL: https://issues.apache.org/jira/browse/MESOS-8126 > Project: Mesos > Issue Type: Task >Reporter: Michael Park > > Currently the {{createAgentResponse}} function performs some authorization, > given an optional {{rolesAcceptor}}. {{_getAgents}} function uses this helper > *with* a {{rolesAcceptor}}. {{createAgentAdded}} on the other hand uses the > helper *without* a {{rolesAcceptor}} and is passed to > {{Master::Subscriber::send}} > for authorization post-hoc. > From first glance, it seemed like there were 2 authorizations being done for > no > reason, and it seems like it could be beneficial to actually pull the > authorization > logic out of the response creation logic, rather than coupling them and > by-passing > authorization when we want a *custom* authorization logic. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-6985) os::getenv() can segfault
[ https://issues.apache.org/jira/browse/MESOS-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215880#comment-16215880 ] Greg Mann commented on MESOS-6985: -- Hey [~ipronin]! The approach you proposed here back in January sounds good to me. Do you have any cycles to work on this at present? If so, I can shepherd the ticket. > os::getenv() can segfault > - > > Key: MESOS-6985 > URL: https://issues.apache.org/jira/browse/MESOS-6985 > Project: Mesos > Issue Type: Bug > Components: stout > Environment: ASF CI, Ubuntu 14.04 and CentOS 7 both with and without > libevent/SSL >Reporter: Greg Mann >Assignee: Ilya Pronin > Labels: reliability, stout > Attachments: > MasterMaintenanceTest.InverseOffersFilters-truncated.txt, > MasterTest.MultipleExecutors.txt > > > This was observed on ASF CI. The segfault first showed up on CI on 9/20/16 > and has been produced by the tests {{MasterTest.MultipleExecutors}} and > {{MasterMaintenanceTest.InverseOffersFilters}}. In both cases, > {{os::getenv()}} segfaults with the same stack trace: > {code} > *** Aborted at 1485241617 (unix time) try "date -d @1485241617" if you are > using GNU date *** > PC: @ 0x2ad59e3ae82d (unknown) > I0124 07:06:57.422080 28619 exec.cpp:162] Version: 1.2.0 > *** SIGSEGV (@0xf0) received by PID 28591 (TID 0x2ad5a7b87700) from PID 240; > stack trace: *** > I0124 07:06:57.422336 28615 exec.cpp:212] Executor started at: > executor(75)@172.17.0.2:45752 with pid 28591 > @ 0x2ad5ab953197 (unknown) > @ 0x2ad5ab957479 (unknown) > @ 0x2ad59e165330 (unknown) > @ 0x2ad59e3ae82d (unknown) > @ 0x2ad594631358 os::getenv() > @ 0x2ad59aba6acf mesos::internal::slave::executorEnvironment() > @ 0x2ad59ab845c0 mesos::internal::slave::Framework::launchExecutor() > @ 0x2ad59ab818a2 mesos::internal::slave::Slave::_run() > @ 0x2ad59ac1ec10 > _ZZN7process8dispatchIN5mesos8internal5slave5SlaveERKNS_6FutureIbEERKNS1_13FrameworkInfoERKNS1_12ExecutorInfoERK6OptionINS1_8TaskInfoEERKSF_INS1_13TaskGroupInfoEES6_S9_SC_SH_SL_EEvRKNS_3PIDIT_EEMSP_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES16_ > @ 0x2ad59ac1e6bf > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureIbEERKNS5_13FrameworkInfoERKNS5_12ExecutorInfoERK6OptionINS5_8TaskInfoEERKSJ_INS5_13TaskGroupInfoEESA_SD_SG_SL_SP_EEvRKNS0_3PIDIT_EEMST_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x2ad59bce2304 std::function<>::operator()() > @ 0x2ad59bcc9824 process::ProcessBase::visit() > @ 0x2ad59bd4028e process::DispatchEvent::visit() > @ 0x2ad594616df1 process::ProcessBase::serve() > @ 0x2ad59bcc72b7 process::ProcessManager::resume() > @ 0x2ad59bcd567c > process::ProcessManager::init_threads()::$_2::operator()() > @ 0x2ad59bcd5585 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_2vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE > @ 0x2ad59bcd std::_Bind_simple<>::operator()() > @ 0x2ad59bcd552c std::thread::_Impl<>::_M_run() > @ 0x2ad59d9e6a60 (unknown) > @ 0x2ad59e15d184 start_thread > @ 0x2ad59e46d37d (unknown) > make[4]: *** [check-local] Segmentation fault > {code} > Find attached the full log from a failed run of > {{MasterTest.MultipleExecutors}} and a truncated log from a failed run of > {{MasterMaintenanceTest.InverseOffersFilters}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-6985) os::getenv() can segfault
[ https://issues.apache.org/jira/browse/MESOS-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-6985: - Shepherd: Greg Mann > os::getenv() can segfault > - > > Key: MESOS-6985 > URL: https://issues.apache.org/jira/browse/MESOS-6985 > Project: Mesos > Issue Type: Bug > Components: stout > Environment: ASF CI, Ubuntu 14.04 and CentOS 7 both with and without > libevent/SSL >Reporter: Greg Mann > Labels: reliability, stout > Attachments: > MasterMaintenanceTest.InverseOffersFilters-truncated.txt, > MasterTest.MultipleExecutors.txt > > > This was observed on ASF CI. The segfault first showed up on CI on 9/20/16 > and has been produced by the tests {{MasterTest.MultipleExecutors}} and > {{MasterMaintenanceTest.InverseOffersFilters}}. In both cases, > {{os::getenv()}} segfaults with the same stack trace: > {code} > *** Aborted at 1485241617 (unix time) try "date -d @1485241617" if you are > using GNU date *** > PC: @ 0x2ad59e3ae82d (unknown) > I0124 07:06:57.422080 28619 exec.cpp:162] Version: 1.2.0 > *** SIGSEGV (@0xf0) received by PID 28591 (TID 0x2ad5a7b87700) from PID 240; > stack trace: *** > I0124 07:06:57.422336 28615 exec.cpp:212] Executor started at: > executor(75)@172.17.0.2:45752 with pid 28591 > @ 0x2ad5ab953197 (unknown) > @ 0x2ad5ab957479 (unknown) > @ 0x2ad59e165330 (unknown) > @ 0x2ad59e3ae82d (unknown) > @ 0x2ad594631358 os::getenv() > @ 0x2ad59aba6acf mesos::internal::slave::executorEnvironment() > @ 0x2ad59ab845c0 mesos::internal::slave::Framework::launchExecutor() > @ 0x2ad59ab818a2 mesos::internal::slave::Slave::_run() > @ 0x2ad59ac1ec10 > _ZZN7process8dispatchIN5mesos8internal5slave5SlaveERKNS_6FutureIbEERKNS1_13FrameworkInfoERKNS1_12ExecutorInfoERK6OptionINS1_8TaskInfoEERKSF_INS1_13TaskGroupInfoEES6_S9_SC_SH_SL_EEvRKNS_3PIDIT_EEMSP_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES16_ > @ 0x2ad59ac1e6bf > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureIbEERKNS5_13FrameworkInfoERKNS5_12ExecutorInfoERK6OptionINS5_8TaskInfoEERKSJ_INS5_13TaskGroupInfoEESA_SD_SG_SL_SP_EEvRKNS0_3PIDIT_EEMST_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x2ad59bce2304 std::function<>::operator()() > @ 0x2ad59bcc9824 process::ProcessBase::visit() > @ 0x2ad59bd4028e process::DispatchEvent::visit() > @ 0x2ad594616df1 process::ProcessBase::serve() > @ 0x2ad59bcc72b7 process::ProcessManager::resume() > @ 0x2ad59bcd567c > process::ProcessManager::init_threads()::$_2::operator()() > @ 0x2ad59bcd5585 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_2vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE > @ 0x2ad59bcd std::_Bind_simple<>::operator()() > @ 0x2ad59bcd552c std::thread::_Impl<>::_M_run() > @ 0x2ad59d9e6a60 (unknown) > @ 0x2ad59e15d184 start_thread > @ 0x2ad59e46d37d (unknown) > make[4]: *** [check-local] Segmentation fault > {code} > Find attached the full log from a failed run of > {{MasterTest.MultipleExecutors}} and a truncated log from a failed run of > {{MasterMaintenanceTest.InverseOffersFilters}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-6985) os::getenv() can segfault
[ https://issues.apache.org/jira/browse/MESOS-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-6985: Assignee: Ilya Pronin > os::getenv() can segfault > - > > Key: MESOS-6985 > URL: https://issues.apache.org/jira/browse/MESOS-6985 > Project: Mesos > Issue Type: Bug > Components: stout > Environment: ASF CI, Ubuntu 14.04 and CentOS 7 both with and without > libevent/SSL >Reporter: Greg Mann >Assignee: Ilya Pronin > Labels: reliability, stout > Attachments: > MasterMaintenanceTest.InverseOffersFilters-truncated.txt, > MasterTest.MultipleExecutors.txt > > > This was observed on ASF CI. The segfault first showed up on CI on 9/20/16 > and has been produced by the tests {{MasterTest.MultipleExecutors}} and > {{MasterMaintenanceTest.InverseOffersFilters}}. In both cases, > {{os::getenv()}} segfaults with the same stack trace: > {code} > *** Aborted at 1485241617 (unix time) try "date -d @1485241617" if you are > using GNU date *** > PC: @ 0x2ad59e3ae82d (unknown) > I0124 07:06:57.422080 28619 exec.cpp:162] Version: 1.2.0 > *** SIGSEGV (@0xf0) received by PID 28591 (TID 0x2ad5a7b87700) from PID 240; > stack trace: *** > I0124 07:06:57.422336 28615 exec.cpp:212] Executor started at: > executor(75)@172.17.0.2:45752 with pid 28591 > @ 0x2ad5ab953197 (unknown) > @ 0x2ad5ab957479 (unknown) > @ 0x2ad59e165330 (unknown) > @ 0x2ad59e3ae82d (unknown) > @ 0x2ad594631358 os::getenv() > @ 0x2ad59aba6acf mesos::internal::slave::executorEnvironment() > @ 0x2ad59ab845c0 mesos::internal::slave::Framework::launchExecutor() > @ 0x2ad59ab818a2 mesos::internal::slave::Slave::_run() > @ 0x2ad59ac1ec10 > _ZZN7process8dispatchIN5mesos8internal5slave5SlaveERKNS_6FutureIbEERKNS1_13FrameworkInfoERKNS1_12ExecutorInfoERK6OptionINS1_8TaskInfoEERKSF_INS1_13TaskGroupInfoEES6_S9_SC_SH_SL_EEvRKNS_3PIDIT_EEMSP_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES16_ > @ 0x2ad59ac1e6bf > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureIbEERKNS5_13FrameworkInfoERKNS5_12ExecutorInfoERK6OptionINS5_8TaskInfoEERKSJ_INS5_13TaskGroupInfoEESA_SD_SG_SL_SP_EEvRKNS0_3PIDIT_EEMST_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x2ad59bce2304 std::function<>::operator()() > @ 0x2ad59bcc9824 process::ProcessBase::visit() > @ 0x2ad59bd4028e process::DispatchEvent::visit() > @ 0x2ad594616df1 process::ProcessBase::serve() > @ 0x2ad59bcc72b7 process::ProcessManager::resume() > @ 0x2ad59bcd567c > process::ProcessManager::init_threads()::$_2::operator()() > @ 0x2ad59bcd5585 > _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_2vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE > @ 0x2ad59bcd std::_Bind_simple<>::operator()() > @ 0x2ad59bcd552c std::thread::_Impl<>::_M_run() > @ 0x2ad59d9e6a60 (unknown) > @ 0x2ad59e15d184 start_thread > @ 0x2ad59e46d37d (unknown) > make[4]: *** [check-local] Segmentation fault > {code} > Find attached the full log from a failed run of > {{MasterTest.MultipleExecutors}} and a truncated log from a failed run of > {{MasterMaintenanceTest.InverseOffersFilters}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8117) Update Getting Started documentation
[ https://issues.apache.org/jira/browse/MESOS-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212940#comment-16212940 ] Greg Mann commented on MESOS-8117: -- {code} commit 8386e22f20d9d20836df6111221cb3afdaf2a3ba Author: Andrew Schwartzmeyer Date: Fri Oct 20 10:37:24 2017 -0700 Moved building docs to `building.md`. The existing "Getting Started" documentation does not cover how to "get started" with Mesos, but instead how to build it from source on multiple platforms. Also added a link to `configuration.md` in the build documentation section, as it was not obvious. Review: https://reviews.apache.org/r/63093/ {code} {code} commit b71478750dce4a26d84c1840a4e6d73349a6f0db Author: Andrew Schwartzmeyer Date: Fri Oct 20 10:37:25 2017 -0700 Added the Getting Started landing page. After moving the build documentation to its own page, we can now have a real "Getting Started" page suitable for anyone to get started with Mesos. It is purposefully short, and therefore not overwhelming. Review: https://reviews.apache.org/r/63095/ {code} > Update Getting Started documentation > > > Key: MESOS-8117 > URL: https://issues.apache.org/jira/browse/MESOS-8117 > Project: Mesos > Issue Type: Improvement > Components: documentation >Reporter: Andrew Schwartzmeyer >Assignee: Andrew Schwartzmeyer > Labels: docuentation, microsoft > Fix For: 1.5.0 > > > Our "getting started" landing page is not how to get started on Mesos, it's > how to build. Build instructions should exist in their own file, and the > getting started page should be more like a landing page such as the community > page is. As someone who has onboarded other developers, I speak from > experience saying we need real "getting started" info. > This work was started at the docathon at Mesosphere a couple weeks ago. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7434) SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.
[ https://issues.apache.org/jira/browse/MESOS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7434: - Story Points: 5 (was: 2) > SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky. > - > > Key: MESOS-7434 > URL: https://issues.apache.org/jira/browse/MESOS-7434 > Project: Mesos > Issue Type: Bug > Environment: Debian 8 > CentOS 6 > other Linux distros >Reporter: Greg Mann >Assignee: Greg Mann > Labels: flaky, flaky-test, mesosphere > Attachments: RestartSlaveRequireExecutorAuthentication is > flaky_failure_log_centos6.txt, > RestartSlaveRequireExecutorAuthentication_failure_log_debian8.txt, > SlaveTest.RestartSlaveRequireExecAuth-Ubuntu-16.txt > > > This test failure has been observed on an internal CI system. It occurs on a > variety of Linux distributions. It seems that using {{cat}} as the task > command may be problematic; see attached log file > {{SlaveTest.RestartSlaveRequireExecutorAuthentication.txt}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7434) SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.
[ https://issues.apache.org/jira/browse/MESOS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212903#comment-16212903 ] Greg Mann commented on MESOS-7434: -- This was observed recently on our internal CI, on Ubuntu 16; logs attached to this ticket. > SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky. > - > > Key: MESOS-7434 > URL: https://issues.apache.org/jira/browse/MESOS-7434 > Project: Mesos > Issue Type: Bug > Environment: Debian 8 > CentOS 6 > other Linux distros >Reporter: Greg Mann >Assignee: Greg Mann > Labels: flaky, flaky-test, mesosphere > Attachments: RestartSlaveRequireExecutorAuthentication is > flaky_failure_log_centos6.txt, > RestartSlaveRequireExecutorAuthentication_failure_log_debian8.txt, > SlaveTest.RestartSlaveRequireExecAuth-Ubuntu-16.txt > > > This test failure has been observed on an internal CI system. It occurs on a > variety of Linux distributions. It seems that using {{cat}} as the task > command may be problematic; see attached log file > {{SlaveTest.RestartSlaveRequireExecutorAuthentication.txt}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7434) SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.
[ https://issues.apache.org/jira/browse/MESOS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7434: - Attachment: SlaveTest.RestartSlaveRequireExecAuth-Ubuntu-16.txt > SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky. > - > > Key: MESOS-7434 > URL: https://issues.apache.org/jira/browse/MESOS-7434 > Project: Mesos > Issue Type: Bug > Environment: Debian 8 > CentOS 6 > other Linux distros >Reporter: Greg Mann >Assignee: Greg Mann > Labels: flaky, flaky-test, mesosphere > Attachments: RestartSlaveRequireExecutorAuthentication is > flaky_failure_log_centos6.txt, > RestartSlaveRequireExecutorAuthentication_failure_log_debian8.txt, > SlaveTest.RestartSlaveRequireExecAuth-Ubuntu-16.txt > > > This test failure has been observed on an internal CI system. It occurs on a > variety of Linux distributions. It seems that using {{cat}} as the task > command may be problematic; see attached log file > {{SlaveTest.RestartSlaveRequireExecutorAuthentication.txt}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8091) Allow the KillPolicy to specify a signal
[ https://issues.apache.org/jira/browse/MESOS-8091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-8091: - Story Points: 3 (was: 1) Description: As specified in the design doc of MESOS-7951, the default executor should be updated to allow the framework to specify a particular signal to be used when initiating task termination. (was: The {{KillPolicy}} protobuf message should be updated to match the design doc of MESOS-7951.) Summary: Allow the KillPolicy to specify a signal (was: Update the KillPolicy protobuf message) > Allow the KillPolicy to specify a signal > > > Key: MESOS-8091 > URL: https://issues.apache.org/jira/browse/MESOS-8091 > Project: Mesos > Issue Type: Improvement >Reporter: Greg Mann > Labels: mesosphere > > As specified in the design doc of MESOS-7951, the default executor should be > updated to allow the framework to specify a particular signal to be used when > initiating task termination. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8092) Allow the KillPolicy to specify a command
Greg Mann created MESOS-8092: Summary: Allow the KillPolicy to specify a command Key: MESOS-8092 URL: https://issues.apache.org/jira/browse/MESOS-8092 Project: Mesos Issue Type: Improvement Reporter: Greg Mann As specified in the design doc of MESOS-7951, the default executor should be extended to allow the specification of a command in the {{KillPolicy}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8091) Update the KillPolicy protobuf message
Greg Mann created MESOS-8091: Summary: Update the KillPolicy protobuf message Key: MESOS-8091 URL: https://issues.apache.org/jira/browse/MESOS-8091 Project: Mesos Issue Type: Improvement Reporter: Greg Mann The {{KillPolicy}} protobuf message should be updated to match the design doc of MESOS-7951. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-564) Update Contribution Documentation
[ https://issues.apache.org/jira/browse/MESOS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202150#comment-16202150 ] Greg Mann commented on MESOS-564: - Review here: https://reviews.apache.org/r/62548/ > Update Contribution Documentation > - > > Key: MESOS-564 > URL: https://issues.apache.org/jira/browse/MESOS-564 > Project: Mesos > Issue Type: Improvement > Components: documentation >Reporter: Dave Lester >Assignee: Greg Mann > Labels: documentation, mesosphere > > Our contribution guide is currently fairly verbose, and it focuses on the > ReviewBoard workflow for making code contributions. It would be helpful for > new contributors to have a first-time contribution guide which focuses on > using GitHub PRs to make small contributions, since that workflow has a > smaller barrier to entry for new users. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7914) Replace usage of `ObjectApprover` with `AuthorizationAcceptor`
[ https://issues.apache.org/jira/browse/MESOS-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7914: - Sprint: Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64 (was: Mesosphere Sprint 62, Mesosphere Sprint 63, Mesosphere Sprint 64, Mesosphere Sprint 65) > Replace usage of `ObjectApprover` with `AuthorizationAcceptor` > -- > > Key: MESOS-7914 > URL: https://issues.apache.org/jira/browse/MESOS-7914 > Project: Mesos > Issue Type: Improvement > Components: security >Affects Versions: 1.4.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: authorization, mesosphere > > Now that the {{AuthorizationAcceptor}} class has been added, we can replace > all occurrences of {{getObjectApprover}} with > {{AuthorizationAcceptor::create}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8067) Extended KillPolicy
[ https://issues.apache.org/jira/browse/MESOS-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-8067: - Epic Name: Extended KillPolicy (was: Extend the KillPolicy) > Extended KillPolicy > --- > > Key: MESOS-8067 > URL: https://issues.apache.org/jira/browse/MESOS-8067 > Project: Mesos > Issue Type: Epic >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-8067) Extended KillPolicy
[ https://issues.apache.org/jira/browse/MESOS-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-8067: Assignee: Greg Mann > Extended KillPolicy > --- > > Key: MESOS-8067 > URL: https://issues.apache.org/jira/browse/MESOS-8067 > Project: Mesos > Issue Type: Epic >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8067) Extended KillPolicy
Greg Mann created MESOS-8067: Summary: Extended KillPolicy Key: MESOS-8067 URL: https://issues.apache.org/jira/browse/MESOS-8067 Project: Mesos Issue Type: Epic Reporter: Greg Mann -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7951) Design Doc for Extended KillPolicy
[ https://issues.apache.org/jira/browse/MESOS-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7951: - Summary: Design Doc for Extended KillPolicy (was: Extend the KillPolicy) > Design Doc for Extended KillPolicy > -- > > Key: MESOS-7951 > URL: https://issues.apache.org/jira/browse/MESOS-7951 > Project: Mesos > Issue Type: Improvement > Components: agent, executor, HTTP API >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > Fix For: 1.5.0 > > > After introducing the {{KillPolicy}} in MESOS-4909, some interactions with > framework developers have led to the suggestion of a couple possible > improvements to this interface. Namely, > * Allowing the framework to specify a command to be run to initiate > termination, rather than a signal to be sent, would allow some developers to > avoid wrapping their application in a signal handler. This is useful because > a signal handler wrapper modifies the application's process tree, which may > make introspection and debugging more difficult in the case of well-known > services with standard debugging procedures. > * In the case of terminations which do begin with a signal, it would be > useful to allow the framework to specify the signal to be sent, rather than > assuming SIGTERM. PostgreSQL, for example, permits several shutdown types, > each initiated with a [different > signal|https://www.postgresql.org/docs/9.3/static/server-shutdown.html]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7951) Extend the KillPolicy
[ https://issues.apache.org/jira/browse/MESOS-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179729#comment-16179729 ] Greg Mann commented on MESOS-7951: -- Design doc here: https://docs.google.com/document/d/1xRaOEe2K7OIVrDTOY9UDwwJbCIwXF3wZUrXYl8Pqy24/edit?usp=sharing > Extend the KillPolicy > - > > Key: MESOS-7951 > URL: https://issues.apache.org/jira/browse/MESOS-7951 > Project: Mesos > Issue Type: Improvement > Components: agent, executor, HTTP API >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > After introducing the {{KillPolicy}} in MESOS-4909, some interactions with > framework developers have led to the suggestion of a couple possible > improvements to this interface. Namely, > * Allowing the framework to specify a command to be run to initiate > termination, rather than a signal to be sent, would allow some developers to > avoid wrapping their application in a signal handler. This is useful because > a signal handler wrapper modifies the application's process tree, which may > make introspection and debugging more difficult in the case of well-known > services with standard debugging procedures. > * In the case of terminations which do begin with a signal, it would be > useful to allow the framework to specify the signal to be sent, rather than > assuming SIGTERM. PostgreSQL, for example, permits several shutdown types, > each initiated with a [different > signal|https://www.postgresql.org/docs/9.3/static/server-shutdown.html]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-564) Update Contribution Documentation
[ https://issues.apache.org/jira/browse/MESOS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-564: Summary: Update Contribution Documentation (was: Update 'Mesos Developers Guide' Contribution Documentation) > Update Contribution Documentation > - > > Key: MESOS-564 > URL: https://issues.apache.org/jira/browse/MESOS-564 > Project: Mesos > Issue Type: Improvement > Components: documentation >Reporter: Dave Lester >Assignee: Greg Mann > Labels: documentation, mesosphere > > Our contribution guide is currently fairly verbose, and it focuses on the > ReviewBoard workflow for making code contributions. It would be helpful for > new contributors to have a first-time contribution guide which focuses on > using GitHub PRs to make small contributions, since that workflow has a > smaller barrier to entry for new users. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-564) Update 'Mesos Developers Guide' Contribution Documentation
[ https://issues.apache.org/jira/browse/MESOS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-564: Sprint: Mesosphere Sprint 64 Labels: documentation mesosphere (was: twitter) Description: Our contribution guide is currently fairly verbose, and it focuses on the ReviewBoard workflow for making code contributions. It would be helpful for new contributors to have a first-time contribution guide which focuses on using GitHub PRs to make small contributions, since that workflow has a smaller barrier to entry for new users. Component/s: documentation Issue Type: Improvement (was: Bug) > Update 'Mesos Developers Guide' Contribution Documentation > -- > > Key: MESOS-564 > URL: https://issues.apache.org/jira/browse/MESOS-564 > Project: Mesos > Issue Type: Improvement > Components: documentation >Reporter: Dave Lester >Assignee: Greg Mann > Labels: documentation, mesosphere > > Our contribution guide is currently fairly verbose, and it focuses on the > ReviewBoard workflow for making code contributions. It would be helpful for > new contributors to have a first-time contribution guide which focuses on > using GitHub PRs to make small contributions, since that workflow has a > smaller barrier to entry for new users. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-564) Update 'Mesos Developers Guide' Contribution Documentation
[ https://issues.apache.org/jira/browse/MESOS-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-564: --- Assignee: Greg Mann > Update 'Mesos Developers Guide' Contribution Documentation > -- > > Key: MESOS-564 > URL: https://issues.apache.org/jira/browse/MESOS-564 > Project: Mesos > Issue Type: Bug >Reporter: Dave Lester >Assignee: Greg Mann > Labels: twitter > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7914) Replace usage of `ObjectApprover` with `AuthorizationAcceptor`
[ https://issues.apache.org/jira/browse/MESOS-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172487#comment-16172487 ] Greg Mann commented on MESOS-7914: -- Reviews here: https://reviews.apache.org/r/61924/ https://reviews.apache.org/r/61925/ > Replace usage of `ObjectApprover` with `AuthorizationAcceptor` > -- > > Key: MESOS-7914 > URL: https://issues.apache.org/jira/browse/MESOS-7914 > Project: Mesos > Issue Type: Improvement > Components: security >Affects Versions: 1.4.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: authorization, mesosphere > > Now that the {{AuthorizationAcceptor}} class has been added, we can replace > all occurrences of {{getObjectApprover}} with > {{AuthorizationAcceptor::create}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7941) Send TASK_STARTING status from built-in executors
[ https://issues.apache.org/jira/browse/MESOS-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7941: - Sprint: (was: Mesosphere Sprint 63) > Send TASK_STARTING status from built-in executors > - > > Key: MESOS-7941 > URL: https://issues.apache.org/jira/browse/MESOS-7941 > Project: Mesos > Issue Type: Bug >Reporter: Benno Evers >Assignee: Benno Evers > > All executors have the option to send out a TASK_STARTING status update to > signal to the scheduler that they received the command to launch the task. > It would be good if our built-in executors would do this, for reasons laid > out in > https://mail-archives.apache.org/mod_mbox/mesos-dev/201708.mbox/%3CCA%2B9TLTzkEVM0CKvY%2B%3D0%3DwjrN6hYFAt0401Y7b8tysDWx1WZzdw%40mail.gmail.com%3E > This will also fix MESOS-6790. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7601) Some container launch failures are mistakenly treated as errors.
[ https://issues.apache.org/jira/browse/MESOS-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7601: - Shepherd: Greg Mann (was: Jie Yu) > Some container launch failures are mistakenly treated as errors. > > > Key: MESOS-7601 > URL: https://issues.apache.org/jira/browse/MESOS-7601 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.3.0 >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: containerizer, mesosphere, tech-debt > > I've observed a case when a scheduler stops (i.e. calls TEARDOWN) while some > of its tasks are being launched. While this is a valid behaviour, the agent > prints an error and increased container launch errors metrics. > Below are log excerpts for such framework, > {{6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092}}. > *Master log* > {noformat} > [centos@ip-172-31-6-200 ~]$ journalctl _PID=29716 --since "2 hours ago" > --no-pager | grep > "6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092" > Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal > mesos-master[29716]: I0601 11:32:58.226218 29724 master.cpp:6072] Updating > info for framework > 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 > Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal > mesos-master[29716]: I0601 11:32:58.226356 29728 hierarchical.cpp:274] Added > framework 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 > Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal > mesos-master[29716]: I0601 11:32:58.226405 29728 hierarchical.cpp:379] > Deactivated framework > 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 > Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal > mesos-master[29716]: I0601 11:32:58.228570 29728 hierarchical.cpp:343] > Activated framework > 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 > Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal > mesos-master[29716]: I0601 11:32:58.246068 29721 master.cpp:7105] Sending 1 > offers to framework > 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 > (TeraValidate) at > scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531 > Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal > mesos-master[29716]: I0601 11:32:58.247851 29721 master.cpp:7194] Sending 1 > inverse offers to framework > 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 > (TeraValidate) at > scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531 > Jun 01 11:32:58 ip-172-31-6-200.us-west-2.compute.internal > mesos-master[29716]: I0601 11:32:58.912937 29728 master.cpp:4806] Processing > DECLINE call for offers: [ 92434aef-27da-4fd1-a5c4-b286d640d5b3-O509464 ] for > framework > 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 > (TeraValidate) at > scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531 > Jun 01 11:32:59 ip-172-31-6-200.us-west-2.compute.internal > mesos-master[29716]: I0601 11:32:59.804184 29727 master.cpp:7105] Sending 2 > offers to framework > 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 > (TeraValidate) at > scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531 > Jun 01 11:32:59 ip-172-31-6-200.us-west-2.compute.internal > mesos-master[29716]: I0601 11:32:59.804411 29727 master.cpp:7194] Sending 2 > inverse offers to framework > 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 > (TeraValidate) at > scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531 > Jun 01 11:33:01 ip-172-31-6-200.us-west-2.compute.internal > mesos-master[29716]: I0601 11:33:01.248924 29721 master.cpp:7105] Sending 2 > offers to framework > 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 > (TeraValidate) at > scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531 > Jun 01 11:33:01 ip-172-31-6-200.us-west-2.compute.internal > mesos-master[29716]: I0601 11:33:01.249289 29721 master.cpp:7194] Sending 2 > inverse offers to framework > 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 > (TeraValidate) at > scheduler-3b84262b-e1a6-47a8-ac0f-00af50b24f5c@172.31.7.83:45531 > Jun 01 11:33:01 ip-172-31-6-200.us-west-2.compute.internal > mesos-master[29716]: I0601 11:33:01.249724 29721 master.cpp:3851] Processing > ACCEPT call for offers: [ 92434aef-27da-4fd1-a5c4-b286d640d5b3-O509469 ] on > agent 36a25adb-4ea2-49d3-a195-448cff1dc146-S35 at slave(1)@172.31.13.122:5051 > (172.31.13.122) for framework > 6dd898d6-7f3a-406c-8ead-24b4d55ed262-0018-driver-20170601113252-0092 > (TeraValidate) at > s
[jira] [Commented] (MESOS-7916) Improve the test coverage of the DefaultExecutor.
[ https://issues.apache.org/jira/browse/MESOS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165041#comment-16165041 ] Greg Mann commented on MESOS-7916: -- {code} commit e7df335a484131450ff15bcd2ee325ea40dc8155 Author: Gastón Kleiman gas...@mesosphere.io Date: Wed Sep 13 09:21:23 2017 -0700 Cleaned up DefaultExecutor tests. Updated the DefaultExecutor tests to use test helpers where possible. Also made the boilerplate initialization code consistent across tests. Review: https://reviews.apache.org/r/61982/ {code} > Improve the test coverage of the DefaultExecutor. > - > > Key: MESOS-7916 > URL: https://issues.apache.org/jira/browse/MESOS-7916 > Project: Mesos > Issue Type: Improvement > Components: executor >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > Labels: mesosphere > > We should write tests for the {{DefaultExecutor}} to cover the following > common scenarios: > # -Start a task that uses a GPU, and make sure that it is made available to > the task.- > # -Launch a Docker task with a health check.- > # -Launch two tasks and verify that they can access a volume owned by the > Executor via {{sandbox_path}} volumes.- > # -Launch two tasks, each one in its own task group, and verify that they can > access a volume owned by the Executor via {{sandbox_path}} volumes.- > # -Launch a task that uses an env secret, make sure that it is accessible.- > # Launch a task using a URI and make sure that the artifact is accessible. > # Launch a task using a Docker image + URIs, make sure that the fetched > artifact is accessible. > # Launch one task and ensure that (health) checks can read from a persistent > volume. > # Ensure that the executor's env is NOT inherited by the nested tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7877) Audit test code for undefined behavior in accessing container elements
[ https://issues.apache.org/jira/browse/MESOS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165042#comment-16165042 ] Greg Mann commented on MESOS-7877: -- {code} commit 1f4d7ef27e0e4936c1ea15d4e56d778e35a92507 Author: Gastón Kleiman gas...@mesosphere.io Date: Wed Sep 13 09:21:20 2017 -0700 Added new overloads for the `createExecutorInfo` test helper method. These new overloads make it possible to specify framework ID, executor resources, and executor ID as a protobuf message rather than a string. Review: https://reviews.apache.org/r/62197/ {code} {code} commit 2a6f6b7aedf05b23ae0fe04364159c87f6c5cea8 Author: Gastón Kleiman gas...@mesosphere.io Date: Wed Sep 13 09:21:25 2017 -0700 Changed `EXPECT` to `ASSERT` when relying on the assertion afterwards. A common pattern in our tests is to check that at least one offer is received using: 'EXPECT_FALSE(offers->offers().empty())' The test then accesses the first element of the array returned by `offers->offers()` to extract information such as the agent ID. This patch makes the tests that follow this pattern use `ASSERT_FALSE` instead of `EXPECT_FALSE` to avoid invalid memory accesses when the array is empty. Review: https://reviews.apache.org/r/62042/ {code} > Audit test code for undefined behavior in accessing container elements > -- > > Key: MESOS-7877 > URL: https://issues.apache.org/jira/browse/MESOS-7877 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Benjamin Bannier >Assignee: Gastón Kleiman >Priority: Minor > Labels: mesosphere, newbie, tech-debt, test > > We do not always make sure we never access elements from empty containers, > e.g., we use patterns like the following > {code} > Future> offers; > // Satisfy offers. > EXPECT_FALSE(offers.empty()); > const auto& offer = (*offers)[0]; > {code} > While the intention here is to diagnose an empty {{offers}}, the code still > exhibits undefined behavior in the element access if {{offers}} was indeed > empty (compilers might aggressively exploit undefined behavior to e.g., > remove "impossible" code). Instead one should prevent accessing any elements > of an empty container, e.g., > {code} > ASSERT_FALSE(offers.empty()); // Prevent execution of rest of test body. > {code} > We should audit and fix existing test code for such incorrect checks and > variations involving e.g., {{EXPECT_NE}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7914) Replace usage of `ObjectApprover` with `AuthorizationAcceptor`
[ https://issues.apache.org/jira/browse/MESOS-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163828#comment-16163828 ] Greg Mann commented on MESOS-7914: -- Note that it would also be useful to add some error logging [here|https://github.com/apache/mesos/blob/5125b80ea50b5babd7636234605b66c627780834/src/common/http.cpp#L1212-L1216] in an {{onFailed}} handler to log the case where the authorizer fails to return a valid object approver. > Replace usage of `ObjectApprover` with `AuthorizationAcceptor` > -- > > Key: MESOS-7914 > URL: https://issues.apache.org/jira/browse/MESOS-7914 > Project: Mesos > Issue Type: Improvement > Components: security >Affects Versions: 1.4.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: authorization, mesosphere > > Now that the {{AuthorizationAcceptor}} class has been added, we can replace > all occurrences of {{getObjectApprover}} with > {{AuthorizationAcceptor::create}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7951) Extend the KillPolicy
[ https://issues.apache.org/jira/browse/MESOS-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-7951: Assignee: Greg Mann > Extend the KillPolicy > - > > Key: MESOS-7951 > URL: https://issues.apache.org/jira/browse/MESOS-7951 > Project: Mesos > Issue Type: Improvement > Components: agent, executor, HTTP API >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > After introducing the {{KillPolicy}} in MESOS-4909, some interactions with > framework developers have led to the suggestion of a couple possible > improvements to this interface. Namely, > * Allowing the framework to specify a command to be run to initiate > termination, rather than a signal to be sent, would allow some developers to > avoid wrapping their application in a signal handler. This is useful because > a signal handler wrapper modifies the application's process tree, which may > make introspection and debugging more difficult in the case of well-known > services with standard debugging procedures. > * In the case of terminations which do begin with a signal, it would be > useful to allow the framework to specify the signal to be sent, rather than > assuming SIGTERM. PostgreSQL, for example, permits several shutdown types, > each initiated with a [different > signal|https://www.postgresql.org/docs/9.3/static/server-shutdown.html]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7951) Extend the KillPolicy
Greg Mann created MESOS-7951: Summary: Extend the KillPolicy Key: MESOS-7951 URL: https://issues.apache.org/jira/browse/MESOS-7951 Project: Mesos Issue Type: Improvement Components: agent, executor, HTTP API Reporter: Greg Mann After introducing the {{KillPolicy}} in MESOS-4909, some interactions with framework developers have led to the suggestion of a couple possible improvements to this interface. Namely, * Allowing the framework to specify a command to be run to initiate termination, rather than a signal to be sent, would allow some developers to avoid wrapping their application in a signal handler. This is useful because a signal handler wrapper modifies the application's process tree, which may make introspection and debugging more difficult in the case of well-known services with standard debugging procedures. * In the case of terminations which do begin with a signal, it would be useful to allow the framework to specify the signal to be sent, rather than assuming SIGTERM. PostgreSQL, for example, permits several shutdown types, each initiated with a [different signal|https://www.postgresql.org/docs/9.3/static/server-shutdown.html]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7916) Improve the test coverage of the DefaultExecutor.
[ https://issues.apache.org/jira/browse/MESOS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148260#comment-16148260 ] Greg Mann commented on MESOS-7916: -- {code} commit be7d2ca48a765c247b644fb54d142602ac487d61 Author: Gastón Kleiman Date: Wed Aug 30 17:20:38 2017 -0700 Added tests to ensure that tasks can access their parent's volumes. These tests verify that sibling tasks can share a persistent volume owned by their parent executor using 'sandbox_path' volumes. Review: https://reviews.apache.org/r/61921/ {code} {code} commit 065d2a801396e90adb619e839f062ae153249ca0 Author: Gastón Kleiman Date: Wed Aug 30 17:20:36 2017 -0700 Added a test that uses environment secrets and the DefaultExecutor. This test checks that environment secrets are properly resolved and exposed to tasks started by the DefaultExecutor. Review: https://reviews.apache.org/r/61920/ {code} > Improve the test coverage of the DefaultExecutor. > - > > Key: MESOS-7916 > URL: https://issues.apache.org/jira/browse/MESOS-7916 > Project: Mesos > Issue Type: Improvement > Components: executor >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > Labels: mesosphere > > We should write tests for the {{DefaultExecutor}} to cover the following > common scenarios: > # Start a task that uses a GPU, and make sure that it is made available to > the task. > # Launch a Docker task with a health check. > # Launch two tasks and verify that they can access a volume owned by the > Executor via {{sandbox_path}} volumes. > # Launch two tasks, each one in its own task group, and verify that they can > access a volume owned by the Executor via {{sandbox_path}} volumes. > # Launch one task and ensure that (health) checks can read from a persistent > volume. > # Launch a task using a URI and make sure that the artifact is accessible. > # Launch a task using a Docker image + URIs, make sure that the fetched > artifact is accessible. > # Write a test that ensures that the executor's env is NOT inherited by the > nested tasks. > # Launch a task that uses an env secret, make sure that it is accessible. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7785) Pass Operator API subscription events through authorizer
[ https://issues.apache.org/jira/browse/MESOS-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147985#comment-16147985 ] Greg Mann commented on MESOS-7785: -- {code} commit e4d56bcb65f7bf9805eff18e6a9249eb7512f745 Author: Quinn Leng Date: Tue Aug 29 13:13:19 2017 -0700 Added authorization for V1 events. Added authorization filtering for the master V1 operator event stream. Subscribers will only receive events that their principal is authorized to see. The new test 'MasterAPITest.EventAuthorizationFiltering' verifies this behavior. Review: https://reviews.apache.org/r/61189/ {code} > Pass Operator API subscription events through authorizer > - > > Key: MESOS-7785 > URL: https://issues.apache.org/jira/browse/MESOS-7785 > Project: Mesos > Issue Type: Improvement >Reporter: Mathew Appelman >Assignee: Quinn > Fix For: 1.5.0 > > > In order to consume the subscription endpoint from the Operator API in the > DC/OS UI, we must ensure a user can only receive events they are authorized > to consume. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7914) Replace usage of `ObjectApprover` with `AuthorizationAcceptor`
[ https://issues.apache.org/jira/browse/MESOS-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7914: - Sprint: Mesosphere Sprint 62 > Replace usage of `ObjectApprover` with `AuthorizationAcceptor` > -- > > Key: MESOS-7914 > URL: https://issues.apache.org/jira/browse/MESOS-7914 > Project: Mesos > Issue Type: Improvement > Components: security >Affects Versions: 1.4.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: authorization, mesosphere > > Now that the {{AuthorizationAcceptor}} class has been added, we can replace > all occurrences of {{getObjectApprover}} with > {{AuthorizationAcceptor::create}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7914) Replace usage of `ObjectApprover` with `AuthorizationAcceptor`
[ https://issues.apache.org/jira/browse/MESOS-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-7914: Assignee: Greg Mann > Replace usage of `ObjectApprover` with `AuthorizationAcceptor` > -- > > Key: MESOS-7914 > URL: https://issues.apache.org/jira/browse/MESOS-7914 > Project: Mesos > Issue Type: Improvement > Components: security >Affects Versions: 1.4.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: authorization, mesosphere > > Now that the {{AuthorizationAcceptor}} class has been added, we can replace > all occurrences of {{getObjectApprover}} with > {{AuthorizationAcceptor::create}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7913) Authorization Improvements
[ https://issues.apache.org/jira/browse/MESOS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7913: - Labels: authorization mesosphere (was: authorization) > Authorization Improvements > -- > > Key: MESOS-7913 > URL: https://issues.apache.org/jira/browse/MESOS-7913 > Project: Mesos > Issue Type: Epic > Components: security >Reporter: Greg Mann > Labels: authorization, mesosphere > > This epic is meant to collect tickets for improvements to authorization in > Mesos. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7914) Replace usage of `ObjectApprover` with `AuthorizationAcceptor`
Greg Mann created MESOS-7914: Summary: Replace usage of `ObjectApprover` with `AuthorizationAcceptor` Key: MESOS-7914 URL: https://issues.apache.org/jira/browse/MESOS-7914 Project: Mesos Issue Type: Improvement Components: security Affects Versions: 1.4.0 Reporter: Greg Mann Now that the {{AuthorizationAcceptor}} class has been added, we can replace all occurrences of {{getObjectApprover}} with {{AuthorizationAcceptor::create}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7913) Authorization Improvements
Greg Mann created MESOS-7913: Summary: Authorization Improvements Key: MESOS-7913 URL: https://issues.apache.org/jira/browse/MESOS-7913 Project: Mesos Issue Type: Epic Components: security Reporter: Greg Mann This epic is meant to collect tickets for improvements to authorization in Mesos. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7888) Track fetcher task success and failures
[ https://issues.apache.org/jira/browse/MESOS-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137451#comment-16137451 ] Greg Mann commented on MESOS-7888: -- [~xujyan] [~jpe...@apache.org] FYI: this didn't make it into the {{1.4.0-rc1}} tag: https://github.com/apache/mesos/commits/1.4.0-rc1 I saw the fix version on this ticket and just wanted to make sure you knew. > Track fetcher task success and failures > --- > > Key: MESOS-7888 > URL: https://issues.apache.org/jira/browse/MESOS-7888 > Project: Mesos > Issue Type: Bug > Components: fetcher, statistics >Reporter: James Peach >Assignee: James Peach >Priority: Minor > Fix For: 1.4.0 > > > In MESOS-7524, we added fetcher metrics for total task fetches and failed > task fetches. For consistency with the similar metrics in MESOS-7842, we > should switch these to track the successful task fetches and the failed task > fetches. Operators can derive the total by adding the succeeded and failed > counts. > ie. replace {{containerizer/fetcher/task_fetches_total}} with > {{containerizer/fetcher/task_fetches_succeeded}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7861) Include check output in the DefaultExecutor log
[ https://issues.apache.org/jira/browse/MESOS-7861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16135973#comment-16135973 ] Greg Mann commented on MESOS-7861: -- {code} commit 8347ec09f15989b822f48c43c6547c25b5f4 Author: Gastón Kleiman Date: Mon Aug 21 15:28:39 2017 -0700 Raised the logging level of some check and health check messages. Some users pointed out that always logging the result of checks and health checks makes it easier to debug problems. Review: https://reviews.apache.org/r/61791/ {code} {code} commit 6d778fa45a73723b857db9b2ce92c3d15fb3373f Author: Gastón Kleiman Date: Mon Aug 21 15:28:37 2017 -0700 Made the log output handling of TCP and HTTP checks consistent. Review: https://reviews.apache.org/r/61766/ {code} {code} commit 0a01bc38eba08da8ef8b4ae152c95a57c39d73f3 Author: Gastón Kleiman Date: Mon Aug 21 15:28:36 2017 -0700 Included nested command check output in the executor logs. This patch updates the checker and health checker to include the output of COMMAND checks and health checks in its logs by default. This has the effect of including these logs in the executor output for easier debugging. Review: https://reviews.apache.org/r/61697/ {code} > Include check output in the DefaultExecutor log > --- > > Key: MESOS-7861 > URL: https://issues.apache.org/jira/browse/MESOS-7861 > Project: Mesos > Issue Type: Improvement > Components: executor >Affects Versions: 1.3.0 >Reporter: Michael Browning >Assignee: Gastón Kleiman > Labels: check, default-executor, health-check, mesosphere > > With the default executor, health and readiness checks are run in their own > nested containers, whose sandboxes are cleaned up right before performing the > next check. This makes access to stdout/stderr of previous runs of the check > command effectively impossible. > Although the exit code of the command being run is reported in a task status, > it is often necessary to see the command's actual output when debugging a > framework issue, so the ability to access this output via the executor logs > would be helpful. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7861) Include check output in the DefaultExecutor log
[ https://issues.apache.org/jira/browse/MESOS-7861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7861: - Shepherd: Greg Mann > Include check output in the DefaultExecutor log > --- > > Key: MESOS-7861 > URL: https://issues.apache.org/jira/browse/MESOS-7861 > Project: Mesos > Issue Type: Bug > Components: executor >Affects Versions: 1.3.0 >Reporter: Michael Browning >Assignee: Gastón Kleiman >Priority: Minor > Labels: check, default-executor, health-check, mesosphere > > With the default executor, health and readiness checks are run in their own > nested containers, whose sandboxes are cleaned up right before performing the > next check. This makes access to stdout/stderr of previous runs of the check > command effectively impossible. > Although the exit code of the command being run is reported in a task status, > it is often necessary to see the command's actual output when debugging a > framework issue, so the ability to access this output via the executor logs > would be helpful. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-7661) Libprocess timers with long durations trigger immediately
[ https://issues.apache.org/jira/browse/MESOS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126403#comment-16126403 ] Greg Mann edited comment on MESOS-7661 at 8/14/17 9:19 PM: --- The commits here, along with those from MESOS-7660, will help us avoid many cases in which we would overflow. Note, however, that fundamentally this issue still exists, since we have not made changes to the arithmetic operators. After some discussion, we are opting to address this issue at the Mesos level, by restricting the lengths of durations that users can supply, so I'm closing this ticket as "Won't Fix". was (Author: greggomann): The commits here, along with those from MESOS-7660, will prevent us from overflowing. Note, however, that fundamentally this issue still exists, since we have not made changes to the arithmetic operators. > Libprocess timers with long durations trigger immediately > - > > Key: MESOS-7661 > URL: https://issues.apache.org/jira/browse/MESOS-7661 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > Labels: mesosphere > > {{process::delay()}} will schedule a method to be run right ahead when called > with a vry long {{Duration}}. > This happens because [{{Timeout}} tries to add two long > durations|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/libprocess/include/process/timeout.hpp#L33-L38], > leading to an [integer overflow in > {{Duration}}|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/stout/include/stout/duration.hpp#L116]. > I'd expect libprocess to either: > 1. Never run the method. > 2. Schedule it in the longest possible {{Duration}}. > {{Duration::operator+=()}} should probably also handle integer overflows > differently. If an addition leads to an integer overflow, it might make more > sense to return {{Duration::max()}} than a negative duration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7661) Libprocess timers with long durations trigger immediately
[ https://issues.apache.org/jira/browse/MESOS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126403#comment-16126403 ] Greg Mann commented on MESOS-7661: -- The commits here, along with those from MESOS-7660, will prevent us from overflowing. Note, however, that fundamentally this issue still exists, since we have not made changes to the arithmetic operators. > Libprocess timers with long durations trigger immediately > - > > Key: MESOS-7661 > URL: https://issues.apache.org/jira/browse/MESOS-7661 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > Labels: mesosphere > > {{process::delay()}} will schedule a method to be run right ahead when called > with a vry long {{Duration}}. > This happens because [{{Timeout}} tries to add two long > durations|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/libprocess/include/process/timeout.hpp#L33-L38], > leading to an [integer overflow in > {{Duration}}|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/stout/include/stout/duration.hpp#L116]. > I'd expect libprocess to either: > 1. Never run the method. > 2. Schedule it in the longest possible {{Duration}}. > {{Duration::operator+=()}} should probably also handle integer overflows > differently. If an addition leads to an integer overflow, it might make more > sense to return {{Duration::max()}} than a negative duration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7661) Libprocess timers with long durations trigger immediately
[ https://issues.apache.org/jira/browse/MESOS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126397#comment-16126397 ] Greg Mann commented on MESOS-7661: -- {code} commit 1efe264ebcf998c248cb7eecba57bd65e2060645 Author: Gastón Kleiman Date: Mon Aug 14 13:52:50 2017 -0700 Stout: Made boundary checking in Duration consistent. Review: https://reviews.apache.org/r/61601/ {code} {code} commit f4348182c1c5b832743166cfdad9b1a84bc2824e Author: Gastón Kleiman Date: Mon Aug 14 13:52:49 2017 -0700 Stout: Made `Duration::parse()` handle durations out of range. Made `Duration:parse()` return an error if the argument is out of the range that a `Duration` can represent. Review: https://reviews.apache.org/r/60721/ {code} > Libprocess timers with long durations trigger immediately > - > > Key: MESOS-7661 > URL: https://issues.apache.org/jira/browse/MESOS-7661 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > Labels: mesosphere > > {{process::delay()}} will schedule a method to be run right ahead when called > with a vry long {{Duration}}. > This happens because [{{Timeout}} tries to add two long > durations|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/libprocess/include/process/timeout.hpp#L33-L38], > leading to an [integer overflow in > {{Duration}}|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/stout/include/stout/duration.hpp#L116]. > I'd expect libprocess to either: > 1. Never run the method. > 2. Schedule it in the longest possible {{Duration}}. > {{Duration::operator+=()}} should probably also handle integer overflows > differently. If an addition leads to an integer overflow, it might make more > sense to return {{Duration::max()}} than a negative duration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7660) HierarchicalAllocator uses the default filter instead of a very long one
[ https://issues.apache.org/jira/browse/MESOS-7660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126392#comment-16126392 ] Greg Mann commented on MESOS-7660: -- {code} commit 2fe2bb26a425da9aaf1d7cf34019dd347d0cf9a4 Author: Gastón Kleiman Date: Mon Aug 14 13:52:54 2017 -0700 Added MESOS-7660 to the changelog. This patch adds MESOS-7660 to the changelog and adds a missing period to the existing text. Review: https://reviews.apache.org/r/61621/ {code} {code} commit 183cceef366586f4a55b6ba7144c4a8277eb9962 Author: Gastón Kleiman Date: Mon Aug 14 13:52:52 2017 -0700 Fixed the default filter used by the allocator. If a framework accepts/refuses an offer using a very long filter, the `HierarchicalAllocator` will use the default filter instead, meaning that it will filter the resources for only 5 seconds. This can happen when a framework sets `Filter::refuse_seconds` to a number of seconds larger than what fits in `Duration`. This patch makes the hierarchical allocator cap the filter duration to at most 365 days. Review: https://reviews.apache.org/r/60525/ {code} > HierarchicalAllocator uses the default filter instead of a very long one > > > Key: MESOS-7660 > URL: https://issues.apache.org/jira/browse/MESOS-7660 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > Labels: mesosphere > Fix For: 1.5.0 > > > If a framework accepts/refuses an offer using a very long filter, [the > {{HierarchicalAllocator}} will use the default {{Filter}} > instead|https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.cpp#L1046-L1052]. > Meaning that it will filter the resources for only 5 seconds. > This can happen when a framework sets {{Filter::refuse_seconds}} to a number > of seconds [larger than what fits in > {{Duration}}|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/stout/include/stout/duration.hpp#L401-L405]. > The following [tests are > flaky|https://issues.apache.org/jira/browse/MESOS-7514] because of this: > {{ReservationTest.ReserveShareWithinRole}} and > {{ReservationTest.PreventUnreservingAlienResources}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7871) Agent fails assertion during request to '/state'
[ https://issues.apache.org/jira/browse/MESOS-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120820#comment-16120820 ] Greg Mann commented on MESOS-7871: -- {code} commit db8d097c9565e9b6f60531f9eb3f993a6c60fd72 Author: Greg Mann Date: Wed Aug 9 10:00:46 2017 -0700 Added a test to verify the fix for a failed agent assertion. This patch adds 'SlaveTest.GetStateTaskGroupPending', which confirms the fix for MESOS-7871. The test verifies that requests to the agent's '/state' endpoint are successful when there are pending tasks on the agent which were launched as part of a task group. Review: https://reviews.apache.org/r/61534 {code} {code} commit 4f4807394944d23d3a6f79249ce49e2494a88350 Author: Andrei Budnik Date: Wed Aug 9 11:06:40 2017 -0700 Moved task validation from `getExecutorInfo` to `runTask` on agent. Previously, `getExecutorInfo` was called only in `runTask`, so it asserted the invariant that a task should have either CommandInfo or ExecutorInfo set but not both. This is true for individual tasks, but it is not necessarily true for tasks which are part of a task group, since the master injects the task group's ExecutorInfo. Now `getExecutorInfo` is also called to calculate allocated resources of tasks which might be part of a task group, which could violate this invariant, so the assertion has been moved. Review: https://reviews.apache.org/r/61524/ {code} > Agent fails assertion during request to '/state' > > > Key: MESOS-7871 > URL: https://issues.apache.org/jira/browse/MESOS-7871 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Greg Mann >Assignee: Andrei Budnik > Labels: mesosphere > Fix For: 1.4.0 > > > While processing requests to {{/state}}, the Mesos agent calls > {{Framework::allocatedResources()}}, which in turn calls > {{Slave::getExecutorInfo()}} on executors associated with the framework's > pending tasks. > In the case of tasks launched as part of task groups, this leads to the > failure of the assertion > [here|https://github.com/apache/mesos/blob/a31dd52ab71d2a529b55cd9111ec54acf7550ded/src/slave/slave.cpp#L4983-L4985]. > This means that the check will fail if the agent processes a request to > {{/state}} at a time when it has pending tasks launched as part of a task > group. > This assertion should be removed since this helper function is now used with > task groups. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7871) Agent fails assertion during request to '/state'
[ https://issues.apache.org/jira/browse/MESOS-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120422#comment-16120422 ] Greg Mann commented on MESOS-7871: -- Test and comment updates: https://reviews.apache.org/r/61534/ https://reviews.apache.org/r/61535/ > Agent fails assertion during request to '/state' > > > Key: MESOS-7871 > URL: https://issues.apache.org/jira/browse/MESOS-7871 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Greg Mann >Assignee: Andrei Budnik > Labels: mesosphere > > While processing requests to {{/state}}, the Mesos agent calls > {{Framework::allocatedResources()}}, which in turn calls > {{Slave::getExecutorInfo()}} on executors associated with the framework's > pending tasks. > In the case of tasks launched as part of task groups, this leads to the > failure of the assertion > [here|https://github.com/apache/mesos/blob/a31dd52ab71d2a529b55cd9111ec54acf7550ded/src/slave/slave.cpp#L4983-L4985]. > This means that the check will fail if the agent processes a request to > {{/state}} at a time when it has pending tasks launched as part of a task > group. > This assertion should be removed since this helper function is now used with > task groups. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7871) Agent fails assertion during request to '/state'
[ https://issues.apache.org/jira/browse/MESOS-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-7871: Assignee: Greg Mann > Agent fails assertion during request to '/state' > > > Key: MESOS-7871 > URL: https://issues.apache.org/jira/browse/MESOS-7871 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > While processing requests to {{/state}}, the Mesos agent calls > {{Framework::allocatedResources()}}, which in turn calls > {{Slave::getExecutorInfo()}} on executors associated with the framework's > pending tasks. > In the case of tasks launched as part of task groups, this leads to the > failure of the assertion > [here|https://github.com/apache/mesos/blob/a31dd52ab71d2a529b55cd9111ec54acf7550ded/src/slave/slave.cpp#L4983-L4985]. > This means that the check will fail if the agent processes a request to > {{/state}} at a time when it has pending tasks launched as part of a task > group. > This assertion should be removed since this helper function is now used with > task groups. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7871) Agent fails assertion during request to '/state'
Greg Mann created MESOS-7871: Summary: Agent fails assertion during request to '/state' Key: MESOS-7871 URL: https://issues.apache.org/jira/browse/MESOS-7871 Project: Mesos Issue Type: Bug Components: agent Reporter: Greg Mann While processing requests to {{/state}}, the Mesos agent calls {{Framework::allocatedResources()}}, which in turn calls {{Slave::getExecutorInfo()}} on executors associated with the framework's pending tasks. In the case of tasks launched as part of task groups, this leads to the failure of the assertion [here|https://github.com/apache/mesos/blob/a31dd52ab71d2a529b55cd9111ec54acf7550ded/src/slave/slave.cpp#L4983-L4985]. This means that the check will fail if the agent processes a request to {{/state}} at a time when it has pending tasks launched as part of a task group. This assertion should be removed since this helper function is now used with task groups. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7416) Filter results of `/master/slaves` and the v1 call GET_AGENTS
[ https://issues.apache.org/jira/browse/MESOS-7416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111872#comment-16111872 ] Greg Mann commented on MESOS-7416: -- {code} commit e87569b2ae3c7f8303ce146f882c340b4fdd5ca4 Author: Alexander Rojas Date: Wed Aug 2 13:14:07 2017 -0700 Added full authz for non summarized fields of `/slaves` endpoint. Fields were authorized based on partial elements of each resource. Moreover, some fields which required authorization were not being authorized at all. This patch enables full authorization of all fields. Review: https://reviews.apache.org/r/61257/ {code} {code} commit 2fe2562455d899545f2f6cbace989489867b8ee7 Author: Alexander Rojas Date: Wed Aug 2 13:14:01 2017 -0700 Enabled filtering of the 'GET_AGENTS' v1 API call. Enables filtering of the results of calls to the 'GET_AGENTS' v1 API. It filters the contents of different resources entries based on the 'VIEW_ROLE' permissions of the principal doing the request based on resource roles, allocation roles and reservations. Review: https://reviews.apache.org/r/61171/ {code} > Filter results of `/master/slaves` and the v1 call GET_AGENTS > - > > Key: MESOS-7416 > URL: https://issues.apache.org/jira/browse/MESOS-7416 > Project: Mesos > Issue Type: Task > Components: HTTP API, master >Reporter: Alexander Rojas >Assignee: Alexander Rojas > Labels: mesosphere, security > Fix For: 1.4.0 > > > The results returned by both the endpoint {{/master/slaves}} and the API v1 > {{GET_AGENTS}} return full information about the agent state which probably > need to be filtered for certain uses, particularly in a multi-tenancy > scenario. > The kind of leaked data includes specific role names and their specific > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7851) Master stores old resource format in the registry
[ https://issues.apache.org/jira/browse/MESOS-7851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111663#comment-16111663 ] Greg Mann commented on MESOS-7851: -- Note that when this issue is resolved, the {{authorizeResource()}} helper introduced in [this patch|https://reviews.apache.org/r/61171/] should be updated. > Master stores old resource format in the registry > - > > Key: MESOS-7851 > URL: https://issues.apache.org/jira/browse/MESOS-7851 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Greg Mann > Labels: master, mesosphere, reservation > > We intend for the master to store all internal resource representations in > the new, post-reservation-refinement format. However, [when persisting > registered agents to the > registrar|https://github.com/apache/mesos/blob/498a000ac1bb8f51dc871f22aea265424a407a17/src/master/master.cpp#L5861-L5876], > the master does not convert the resources; agents provide resources in the > pre-reservation-refinement format, and these resources are stored as-is. This > means that after recovery, any agents in the master's {{slaves.recovered}} > map will have {{SlaveInfo.resources}} in the pre-reservation-refinement > format. > We should update the master to convert these resources before persisting them > to the registry. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7851) Master stores old resource format in the registry
Greg Mann created MESOS-7851: Summary: Master stores old resource format in the registry Key: MESOS-7851 URL: https://issues.apache.org/jira/browse/MESOS-7851 Project: Mesos Issue Type: Bug Components: master Reporter: Greg Mann We intend for the master to store all internal resource representations in the new, post-reservation-refinement format. However, [when persisting registered agents to the registrar|https://github.com/apache/mesos/blob/498a000ac1bb8f51dc871f22aea265424a407a17/src/master/master.cpp#L5861-L5876], the master does not convert the resources; agents provide resources in the pre-reservation-refinement format, and these resources are stored as-is. This means that after recovery, any agents in the master's {{slaves.recovered}} map will have {{SlaveInfo.resources}} in the pre-reservation-refinement format. We should update the master to convert these resources before persisting them to the registry. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7416) Filter results of `/master/slaves` and the v1 call GET_AGENTS
[ https://issues.apache.org/jira/browse/MESOS-7416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100838#comment-16100838 ] Greg Mann commented on MESOS-7416: -- [~arojas] I didn't catch this during review, but looking back at the above commit, it doesn't look like it touches the v1 handler at all? > Filter results of `/master/slaves` and the v1 call GET_AGENTS > - > > Key: MESOS-7416 > URL: https://issues.apache.org/jira/browse/MESOS-7416 > Project: Mesos > Issue Type: Task > Components: HTTP API, master >Reporter: Alexander Rojas >Assignee: Alexander Rojas > Labels: mesosphere, security > Fix For: 1.4.0 > > > The results returned by both the endpoint {{/master/slaves}} and the API v1 > {{GET_AGENTS}} return full information about the agent state which probably > need to be filtered for certain uses, particularly in a multi-tenancy > scenario. > The kind of leaked data includes specific role names and their specific > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7829) Improve completed task/framework garbage collection
Greg Mann created MESOS-7829: Summary: Improve completed task/framework garbage collection Key: MESOS-7829 URL: https://issues.apache.org/jira/browse/MESOS-7829 Project: Mesos Issue Type: Improvement Components: master Reporter: Greg Mann The Mesos master currently uses two flags to determine how it garbage collects completed tasks and frameworks from memory: * {{--max_completed_frameworks}} * {{--max_completed_tasks_per_framework}} Setting these parameters correctly can be difficult, since there may be a large variance in the size of Task and Framework objects kept in memory. Launching a framework which makes use of task labels to pass data of significant size can quickly lead to performance issues if the master is retaining a large number of completed tasks. We should explore other ways of garbage collecting completed frameworks and tasks, which could better handle the variation in the size of task metadata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-6101) Add event for Framwork added to master operator API
[ https://issues.apache.org/jira/browse/MESOS-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-6101: - Sprint: Mesosphere Sprint 60 > Add event for Framwork added to master operator API > --- > > Key: MESOS-6101 > URL: https://issues.apache.org/jira/browse/MESOS-6101 > Project: Mesos > Issue Type: Task >Reporter: Zhitao Li >Assignee: Quinn > > Consider the following case: > 1) a subscriber connects to master; > 2) a new scheduler registered as a new framework; > 3) a task is launched from this framework. > In this sequence, subscriber does not have a way to know the FrameworkInfo > belonging to the FrameworkId. > We should support an event (e.g. when framework info in master is > added/changed). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-6101) Add Framwork events to master's operator API
[ https://issues.apache.org/jira/browse/MESOS-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-6101: - Summary: Add Framwork events to master's operator API (was: Add event for Framwork added to master operator API) > Add Framwork events to master's operator API > > > Key: MESOS-6101 > URL: https://issues.apache.org/jira/browse/MESOS-6101 > Project: Mesos > Issue Type: Task >Reporter: Zhitao Li >Assignee: Quinn > > Consider the following case: > 1) a subscriber connects to master; > 2) a new scheduler registered as a new framework; > 3) a task is launched from this framework. > In this sequence, subscriber does not have a way to know the FrameworkInfo > belonging to the FrameworkId. > We should support an event (e.g. when framework info in master is > added/changed). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-6101) Add event for Framwork added to master operator API
[ https://issues.apache.org/jira/browse/MESOS-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-6101: - Shepherd: Anand Mazumdar (was: Greg Mann) > Add event for Framwork added to master operator API > --- > > Key: MESOS-6101 > URL: https://issues.apache.org/jira/browse/MESOS-6101 > Project: Mesos > Issue Type: Task >Reporter: Zhitao Li >Assignee: Quinn > > Consider the following case: > 1) a subscriber connects to master; > 2) a new scheduler registered as a new framework; > 3) a task is launched from this framework. > In this sequence, subscriber does not have a way to know the FrameworkInfo > belonging to the FrameworkId. > We should support an event (e.g. when framework info in master is > added/changed). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7818) Add more filtering options for unversioned operator API
[ https://issues.apache.org/jira/browse/MESOS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095017#comment-16095017 ] Greg Mann commented on MESOS-7818: -- Let's collect specific requirements here and break out into additional tickets if necessary. cc [~klueska] [~cinchurge] > Add more filtering options for unversioned operator API > --- > > Key: MESOS-7818 > URL: https://issues.apache.org/jira/browse/MESOS-7818 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Greg Mann > Labels: api, mesosphere, operator > > The Mesos CLI hits {{/state}} to get the state of the Mesos cluster, which > can cause performance issues in large clusters. To optimize the CLI for large > clusters, we can add more filtering options to unversioned operator endpoints > like {{/tasks}}, so that the CLI can request results for only those tasks > which match certain criteria. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-2258) Enable filtering of task information in master/state.json
[ https://issues.apache.org/jira/browse/MESOS-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095014#comment-16095014 ] Greg Mann commented on MESOS-2258: -- Closing this in favor of MESOS-7818. > Enable filtering of task information in master/state.json > - > > Key: MESOS-2258 > URL: https://issues.apache.org/jira/browse/MESOS-2258 > Project: Mesos > Issue Type: Improvement >Reporter: Niklas Quarfot Nielsen > > The masters state endpoint can grow huge (several MB's) in large > installations due to data of all running and completed tasks, while other > pieces of information (counters, attached slaves and frameworks) are still > useful to be polled frequently. > We can add query parameters to state.json to filter out task information > and/or introduce a /metadata.json endpoint with all but task information. > Any thoughts? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7818) Add more filtering options for unversioned operator API
Greg Mann created MESOS-7818: Summary: Add more filtering options for unversioned operator API Key: MESOS-7818 URL: https://issues.apache.org/jira/browse/MESOS-7818 Project: Mesos Issue Type: Improvement Components: master Reporter: Greg Mann The Mesos CLI hits {{/state}} to get the state of the Mesos cluster, which can cause performance issues in large clusters. To optimize the CLI for large clusters, we can add more filtering options to unversioned operator endpoints like {{/tasks}}, so that the CLI can request results for only those tasks which match certain criteria. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7630) Add simple filtering to unversioned operator API
[ https://issues.apache.org/jira/browse/MESOS-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7630: - Sprint: Mesosphere Sprint 59 > Add simple filtering to unversioned operator API > > > Key: MESOS-7630 > URL: https://issues.apache.org/jira/browse/MESOS-7630 > Project: Mesos > Issue Type: Improvement > Components: agent, master >Reporter: Quinn >Assignee: Quinn > Labels: agent, api, http, master, mesosphere > Fix For: 1.4.0 > > > Add filtering for the following endpoints: > - {{/frameworks}} > - {{/slaves}} > - {{/tasks}} > - {{/containers}} > We should investigate whether we should use RESTful style or query string to > filter the specific resource. We should also figure out whether it's > necessary to filter a list of resources. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7434) SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.
[ https://issues.apache.org/jira/browse/MESOS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7434: - Sprint: Mesosphere Sprint 58 (was: Mesosphere Sprint 58, Mesosphere Sprint 59) > SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky. > - > > Key: MESOS-7434 > URL: https://issues.apache.org/jira/browse/MESOS-7434 > Project: Mesos > Issue Type: Bug > Environment: Debian 8 > CentOS 6 > other Linux distros >Reporter: Greg Mann >Assignee: Greg Mann > Labels: flaky, flaky-test, mesosphere > Attachments: > RestartSlaveRequireExecutorAuthentication_failure_log_debian8.txt, > RestartSlaveRequireExecutorAuthentication is flaky_failure_log_centos6.txt > > > This test failure has been observed on an internal CI system. It occurs on a > variety of Linux distributions. It seems that using {{cat}} as the task > command may be problematic; see attached log file > {{SlaveTest.RestartSlaveRequireExecutorAuthentication.txt}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7630) Add simple filtering to unversioned operator API
[ https://issues.apache.org/jira/browse/MESOS-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092386#comment-16092386 ] Greg Mann commented on MESOS-7630: -- {code} commit 916a5c9fdbc7619b7c9356c21afb83e043feef88 Author: Quinn Leng Date: Tue Jul 18 17:07:02 2017 -0700 Added test cases for /slaves, /containers, /frameworks endpoints. Added query parameter test cases for '/slaves' and '/frameworks' on the master, and '/containers' on the agent. Review: https://reviews.apache.org/r/60847/ {code} {code} commit 8363449c130298b9c77560c5df583dc1226dd17c Author: Quinn Leng Date: Tue Jul 18 17:06:59 2017 -0700 Added filtering to /slaves, /containers and /frameworks endpoints. Added query parameter support for the '/slaves', '/frameworks' and '/containers' endpoints. This allows slaves, frameworks and containers to be queried by ID. If no ID is specified, all records are returned, consistent with current behavior. Review: https://reviews.apache.org/r/60822/ {code} {code} commit aa244baa45d8db84e98e6dca9944a3f679da70d1 Author: Quinn Leng Date: Tue Jul 18 17:06:56 2017 -0700 Added class definition for the 'IDAcceptor'. This commit contains the class definition for 'IDAcceptor', which is used to filter IDs in the '/master/frameworks', '/master/slaves', '/master/tasks', and '/slave/containers' endpoints. Review: https://reviews.apache.org/r/60820/ {code} > Add simple filtering to unversioned operator API > > > Key: MESOS-7630 > URL: https://issues.apache.org/jira/browse/MESOS-7630 > Project: Mesos > Issue Type: Improvement > Components: agent, master >Reporter: Quinn >Assignee: Quinn > Labels: agent, api, http, master, mesosphere > > Add filtering for the following endpoints: > - {{/frameworks}} > - {{/slaves}} > - {{/tasks}} > - {{/containers}} > We should investigate whether we should use RESTful style or query string to > filter the specific resource. We should also figure out whether it's > necessary to filter a list of resources. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7630) Add simple filtering to unversioned operator API
[ https://issues.apache.org/jira/browse/MESOS-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086807#comment-16086807 ] Greg Mann commented on MESOS-7630: -- {code} commit 9e208293ba482d843e5c56a40d997ba18e764b58 Author: Quinn Leng Date: Thu Jul 13 17:44:03 2017 -0700 Refactored authorization acceptors into a single class. Replaced different authorization-related Acceptor classes with one AuthorizationAcceptor class. Removed the ObjectAcceptor parent class, since no inheritance features are provided by it. Review: https://reviews.apache.org/r/60716/ {code} {code} commit 15656be2f65cc4eeaf053b47133ca0bd43d5c166 Author: Quinn Leng Date: Thu Jul 13 17:43:59 2017 -0700 Added constructors for ObjectApprover::Object. Added new constructors and updated all places where ObjectApprover::Objects are constructed to use new constructors. Review: https://reviews.apache.org/r/60279/ {code} > Add simple filtering to unversioned operator API > > > Key: MESOS-7630 > URL: https://issues.apache.org/jira/browse/MESOS-7630 > Project: Mesos > Issue Type: Improvement > Components: agent, master >Reporter: Quinn >Assignee: Quinn > Labels: agent, api, http, master, mesosphere > > Add filtering for the following endpoints: > - {{/frameworks}} > - {{/slaves}} > - {{/tasks}} > - {{/containers}} > We should investigate whether we should use RESTful style or query string to > filter the specific resource. We should also figure out whether it's > necessary to filter a list of resources. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7602) Add filtering capabilities to the master/agent operator APIs
[ https://issues.apache.org/jira/browse/MESOS-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-7602: Assignee: Quinn > Add filtering capabilities to the master/agent operator APIs > > > Key: MESOS-7602 > URL: https://issues.apache.org/jira/browse/MESOS-7602 > Project: Mesos > Issue Type: Epic > Components: agent, HTTP API, master >Reporter: Greg Mann >Assignee: Quinn > Labels: api, http, mesosphere > > We would like to add filtering capabilities to both the unversioned operator > HTTP endpoints and the V1 operator APIs on the master and agent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7602) Add filtering capabilities to the master/agent operator APIs
[ https://issues.apache.org/jira/browse/MESOS-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7602: - Shepherd: Greg Mann > Add filtering capabilities to the master/agent operator APIs > > > Key: MESOS-7602 > URL: https://issues.apache.org/jira/browse/MESOS-7602 > Project: Mesos > Issue Type: Epic > Components: agent, HTTP API, master >Reporter: Greg Mann >Assignee: Quinn > Labels: api, http, mesosphere > > We would like to add filtering capabilities to both the unversioned operator > HTTP endpoints and the V1 operator APIs on the master and agent. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7434) SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.
[ https://issues.apache.org/jira/browse/MESOS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7434: - Shepherd: Till Toenshoff > SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky. > - > > Key: MESOS-7434 > URL: https://issues.apache.org/jira/browse/MESOS-7434 > Project: Mesos > Issue Type: Bug > Environment: Debian 8 > CentOS 6 > other Linux distros >Reporter: Greg Mann >Assignee: Greg Mann > Labels: flaky, flaky-test, mesosphere > Attachments: > RestartSlaveRequireExecutorAuthentication_failure_log_debian8.txt, > RestartSlaveRequireExecutorAuthentication is flaky_failure_log_centos6.txt > > > This test failure has been observed on an internal CI system. It occurs on a > variety of Linux distributions. It seems that using {{cat}} as the task > command may be problematic; see attached log file > {{SlaveTest.RestartSlaveRequireExecutorAuthentication.txt}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7630) Add simple filtering to unversioned operator API
[ https://issues.apache.org/jira/browse/MESOS-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070910#comment-16070910 ] Greg Mann commented on MESOS-7630: -- {code} commit 0d277bb64fa5a4d0b4f741daedf64095beab4773 Author: Quinn Leng Date: Fri Jun 30 16:58:34 2017 -0700 Added filtering to the '/tasks' endpoint. Added filtering to the '/tasks' endpoint. Review: https://reviews.apache.org/r/60107/ {code} > Add simple filtering to unversioned operator API > > > Key: MESOS-7630 > URL: https://issues.apache.org/jira/browse/MESOS-7630 > Project: Mesos > Issue Type: Improvement > Components: agent, master >Reporter: Quinn >Assignee: Quinn > Labels: agent, api, http, master, mesosphere > > Add filtering for the following endpoints: > - {{/frameworks}} > - {{/slaves}} > - {{/tasks}} > - {{/containers}} > We should investigate whether we should use RESTful style or query string to > filter the specific resource. We should also figure out whether it's > necessary to filter a list of resources. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7726) MasterTest.IgnoreOldAgentReregistration test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070853#comment-16070853 ] Greg Mann commented on MESOS-7726: -- The linked ticket, MESOS-7562, is similar but fails on the timeout of a different future. Leaving them both open for the time being in case these are discrete issues. > MasterTest.IgnoreOldAgentReregistration test is flaky > - > > Key: MESOS-7726 > URL: https://issues.apache.org/jira/browse/MESOS-7726 > Project: Mesos > Issue Type: Bug >Reporter: Vinod Kone >Assignee: Neil Conway > Labels: flaky-test, mesosphere-oncall > > Observed this on ASF CI. > {code} > [ RUN ] MasterTest.IgnoreOldAgentReregistration > I0627 05:23:06.031154 4917 cluster.cpp:162] Creating default 'local' > authorizer > I0627 05:23:06.033433 4945 master.cpp:438] Master > a8778782-0da1-49a5-9cb8-9f6d11701733 (c43debbe7e32) started on > 172.17.0.4:41747 > I0627 05:23:06.033457 4945 master.cpp:440] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/2BARnF/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-1.4.0/_inst/share/mesos/webui" > --work_dir="/tmp/2BARnF/master" --zk_session_timeout="10secs" > I0627 05:23:06.033771 4945 master.cpp:490] Master only allowing > authenticated frameworks to register > I0627 05:23:06.033787 4945 master.cpp:504] Master only allowing > authenticated agents to register > I0627 05:23:06.033798 4945 master.cpp:517] Master only allowing > authenticated HTTP frameworks to register > I0627 05:23:06.033812 4945 credentials.hpp:37] Loading credentials for > authentication from '/tmp/2BARnF/credentials' > I0627 05:23:06.034080 4945 master.cpp:562] Using default 'crammd5' > authenticator > I0627 05:23:06.034221 4945 http.cpp:974] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0627 05:23:06.034409 4945 http.cpp:974] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0627 05:23:06.034569 4945 http.cpp:974] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0627 05:23:06.034688 4945 master.cpp:642] Authorization enabled > I0627 05:23:06.034862 4938 whitelist_watcher.cpp:77] No whitelist given > I0627 05:23:06.034868 4950 hierarchical.cpp:169] Initialized hierarchical > allocator process > I0627 05:23:06.037211 4957 master.cpp:2161] Elected as the leading master! > I0627 05:23:06.037236 4957 master.cpp:1700] Recovering from registrar > I0627 05:23:06.037333 4938 registrar.cpp:345] Recovering registrar > I0627 05:23:06.038146 4938 registrar.cpp:389] Successfully fetched the > registry (0B) in 768256ns > I0627 05:23:06.038290 4938 registrar.cpp:493] Applied 1 operations in > 30798ns; attempting to update the registry > I0627 05:23:06.038861 4938 registrar.cpp:550] Successfully updated the > registry in 510976ns > I0627 05:23:06.038960 4938 registrar.cpp:422] Successfully recovered > registrar > I0627 05:23:06.039364 4941 hierarchical.cpp:207] Skipping recovery of > hierarchical allocator: nothing to recover > I0627 05:23:06.039594 4958 master.cpp:1799] Recovered 0 agents from the > registry (129B); allowing 10mins for agents to re-register > I0627 05:23:06.043999 4917 containerizer.cpp:230] Using isolation: > posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret > W0627 05:23:06.044456 4917 backend.cpp:76] Failed to create 'aufs' backend: > AufsBackend requires root privileges > W0627 05:23:06.044548 4917 backend.cpp:76] Failed to create 'bind' backend: > BindBackend requires root privileges > I0627 05:23:06.044580 4917 provisioner.cpp
[jira] [Updated] (MESOS-7562) MasterTest.IgnoreOldAgentReregistration is flaky
[ https://issues.apache.org/jira/browse/MESOS-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7562: - Labels: flaky flaky-test mesosphere mesosphere-oncall (was: ) Component/s: test > MasterTest.IgnoreOldAgentReregistration is flaky > > > Key: MESOS-7562 > URL: https://issues.apache.org/jira/browse/MESOS-7562 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Neil Conway >Assignee: Neil Conway > Labels: flaky, flaky-test, mesosphere, mesosphere-oncall > > {noformat} > [ RUN ] MasterTest.IgnoreOldAgentReregistration > I0524 16:29:07.143152 29236 cluster.cpp:162] Creating default 'local' > authorizer > I0524 16:29:07.149690 29287 master.cpp:436] Master > 3912ae61-36a4-468c-bef5-82f082370f3d (core-dev) started on 10.0.49.2:42980 > I0524 16:29:07.149724 29287 master.cpp:438] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/gg4ie7/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/gg4ie7/master" > --zk_session_timeout="10secs" > I0524 16:29:07.149896 29287 master.cpp:488] Master only allowing > authenticated frameworks to register > I0524 16:29:07.149905 29287 master.cpp:502] Master only allowing > authenticated agents to register > I0524 16:29:07.149912 29287 master.cpp:515] Master only allowing > authenticated HTTP frameworks to register > I0524 16:29:07.149920 29287 credentials.hpp:37] Loading credentials for > authentication from '/tmp/gg4ie7/credentials' > I0524 16:29:07.150065 29287 master.cpp:560] Using default 'crammd5' > authenticator > I0524 16:29:07.150133 29287 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0524 16:29:07.150168 29287 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0524 16:29:07.150223 29287 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0524 16:29:07.150259 29287 master.cpp:640] Authorization enabled > I0524 16:29:07.151617 29274 master.cpp:2161] Elected as the leading master! > I0524 16:29:07.151644 29274 master.cpp:1700] Recovering from registrar > I0524 16:29:07.152218 29261 registrar.cpp:389] Successfully fetched the > registry (0B) in 505088ns > I0524 16:29:07.152268 29261 registrar.cpp:493] Applied 1 operations in > 4200ns; attempting to update the registry > I0524 16:29:07.152664 29261 registrar.cpp:550] Successfully updated the > registry in 371200ns > I0524 16:29:07.152703 29261 registrar.cpp:422] Successfully recovered > registrar > I0524 16:29:07.153328 29291 master.cpp:1799] Recovered 0 agents from the > registry (119B); allowing 10mins for agents to re-register > I0524 16:29:07.160094 29236 containerizer.cpp:230] Using isolation: > posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret > W0524 16:29:07.160295 29236 backend.cpp:76] Failed to create 'overlay' > backend: OverlayBackend requires root privileges > W0524 16:29:07.160326 29236 backend.cpp:76] Failed to create 'bind' backend: > BindBackend requires root privileges > I0524 16:29:07.160334 29236 provisioner.cpp:255] Using default backend 'copy' > I0524 16:29:07.161916 29236 cluster.cpp:448] Creating default 'local' > authorizer > I0524 16:29:07.162616 29276 slave.cpp:225] Mesos agent started on > (7738)@10.0.49.2:42980 > I0524 16:29:07.162644 29276 slave.cpp:226] Flags at startup: --acls="" > --appc_simple_discovery_uri_prefix="http://"; > --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticatee="crammd5" > --authentication_backoff_factor="1secs" --au
[jira] [Commented] (MESOS-7743) Authorization for framework effective capabilities.
[ https://issues.apache.org/jira/browse/MESOS-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069236#comment-16069236 ] Greg Mann commented on MESOS-7743: -- cc [~arojas] > Authorization for framework effective capabilities. > --- > > Key: MESOS-7743 > URL: https://issues.apache.org/jira/browse/MESOS-7743 > Project: Mesos > Issue Type: Bug > Components: modules, security >Reporter: James Peach > > As noted by [~greggomann], we should add an authorization hook to the > application of framework effective capabilities so that authorization modules > can make fine-grained decisions about which effective capabilities they are > willing to allow specific tasks to hold. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7680) Stop using EXIT() in master/agent initialization code
[ https://issues.apache.org/jira/browse/MESOS-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7680: - Description: The initialization of master/agent dependencies is currently inconsistent. For some dependencies, we initialize them outside of the actor and then inject them via the constructor; for example, in {{main.cpp}} and {{cluster.cpp}}. Some other dependencies are created/initialized within the master/slave's {{initialize()}} method. In this case, if the dependency creation fails, we use {{EXIT(EXIT_FAILURE)}} to terminate the process. In the case of tests, this is problematic. If I create multiple agents, for example, and one of their dependencies fails to initialize successfully, the entire test harness would exit :-( During some discussion, [~jieyu] proposed an alternative: instead of using {{EXIT}} when dependency creation fails, we could terminate the master/agent libprocess process. In the case of the production binaries, this would cause the executable to exit. In the case of our tests, this would allow a single test to fail, while the test harness continues running. was: The initialization of master/agent dependencies is currently inconsistent. For some dependencies, we initialize them outside of the actor and then inject them via the constructor; for example, in {{main.cpp}} and {{cluster.cpp}}. Some other dependencies are created/initialized within the master/slave's {{initialize()}} method. In this case, if the dependency creation fails, we use {{EXIT(EXIT_FAILURE)}} to terminate the process. In the case of tests, this is problematic. If I create multiple agents, for example, and one of their dependencies fails to initialize successfully, the entire test harness would exit :-( Instead of using {{EXIT}} when dependency creation fails, we should terminate the master/agent libprocess process. In the case of the production binaries, this will cause the executable to exit. In the case of our tests, this will allow a single test to fail, while the test harness continues running. > Stop using EXIT() in master/agent initialization code > - > > Key: MESOS-7680 > URL: https://issues.apache.org/jira/browse/MESOS-7680 > Project: Mesos > Issue Type: Improvement > Components: agent, master >Reporter: Greg Mann > Labels: mesosphere > > The initialization of master/agent dependencies is currently inconsistent. > For some dependencies, we initialize them outside of the actor and then > inject them via the constructor; for example, in {{main.cpp}} and > {{cluster.cpp}}. > Some other dependencies are created/initialized within the master/slave's > {{initialize()}} method. In this case, if the dependency creation fails, we > use {{EXIT(EXIT_FAILURE)}} to terminate the process. In the case of tests, > this is problematic. If I create multiple agents, for example, and one of > their dependencies fails to initialize successfully, the entire test harness > would exit :-( > During some discussion, [~jieyu] proposed an alternative: instead of using > {{EXIT}} when dependency creation fails, we could terminate the master/agent > libprocess process. In the case of the production binaries, this would cause > the executable to exit. In the case of our tests, this would allow a single > test to fail, while the test harness continues running. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7680) Stop using EXIT() in master/agent initialization code
Greg Mann created MESOS-7680: Summary: Stop using EXIT() in master/agent initialization code Key: MESOS-7680 URL: https://issues.apache.org/jira/browse/MESOS-7680 Project: Mesos Issue Type: Improvement Components: agent, master Reporter: Greg Mann The initialization of master/agent dependencies is currently inconsistent. For some dependencies, we initialize them outside of the actor and then inject them via the constructor; for example, in {{main.cpp}} and {{cluster.cpp}}. Some other dependencies are created/initialized within the master/slave's {{initialize()}} method. In this case, if the dependency creation fails, we use {{EXIT(EXIT_FAILURE)}} to terminate the process. In the case of tests, this is problematic. If I create multiple agents, for example, and one of their dependencies fails to initialize successfully, the entire test harness would exit :-( Instead of using {{EXIT}} when dependency creation fails, we should terminate the master/agent libprocess process. In the case of the production binaries, this will cause the executable to exit. In the case of our tests, this will allow a single test to fail, while the test harness continues running. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7434) SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky.
[ https://issues.apache.org/jira/browse/MESOS-7434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7434: - Sprint: Mesosphere Sprint 58 > SlaveTest.RestartSlaveRequireExecutorAuthentication is flaky. > - > > Key: MESOS-7434 > URL: https://issues.apache.org/jira/browse/MESOS-7434 > Project: Mesos > Issue Type: Bug > Environment: Debian 8 > CentOS 6 > other Linux distros >Reporter: Greg Mann >Assignee: Greg Mann > Labels: flaky, flaky-test, mesosphere > Attachments: > RestartSlaveRequireExecutorAuthentication_failure_log_debian8.txt, > RestartSlaveRequireExecutorAuthentication is flaky_failure_log_centos6.txt > > > This test failure has been observed on an internal CI system. It occurs on a > variety of Linux distributions. It seems that using {{cat}} as the task > command may be problematic; see attached log file > {{SlaveTest.RestartSlaveRequireExecutorAuthentication.txt}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7661) Libprocess timers with long durations trigger immediately
[ https://issues.apache.org/jira/browse/MESOS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-7661: - Summary: Libprocess timers with long durations trigger immediately (was: Libprocess runs long timers right ahead) > Libprocess timers with long durations trigger immediately > - > > Key: MESOS-7661 > URL: https://issues.apache.org/jira/browse/MESOS-7661 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Gastón Kleiman > Labels: mesosphere > > {{process::delay()}} will schedule a method to be run right ahead when called > with a vry long {{Duration}}. > This happens because [{{Timeout}} tries to add two long > durations|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/libprocess/include/process/timeout.hpp#L33-L38], > leading to an [integer overflow in > {{Duration}}|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/stout/include/stout/duration.hpp#L116]. > I'd expect libprocess to either: > 1. Never run the method. > 2. Schedule it in the longest possible {{Duration}}. > {{Duration::operator+=()}} should probably also handle integer overflows > differently. If an addition leads to an integer overflow, it might make more > sense to return {{Duration::max()}} than a negative duration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7661) Libprocess runs long timers right ahead
[ https://issues.apache.org/jira/browse/MESOS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16048536#comment-16048536 ] Greg Mann commented on MESOS-7661: -- I could imagine something like: {code} Duration& operator+=(const Duration& that) { if (max() - that < *this) { nanos = max().nanos; } else { nanos += that.nanos; } return *this; } {code} cc [~bmahler] [~kaysoky] > Libprocess runs long timers right ahead > --- > > Key: MESOS-7661 > URL: https://issues.apache.org/jira/browse/MESOS-7661 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Gastón Kleiman > Labels: mesosphere > > {{process::delay()}} will schedule a method to be run right ahead when called > with a vry long {{Duration}}. > This happens because [{{Timeout}} tries to add two long > durations|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/libprocess/include/process/timeout.hpp#L33-L38], > leading to an [integer overflow in > {{Duration}}|https://github.com/apache/mesos/blob/13cae29e7832d8bb879c68847ad0df449d227f17/3rdparty/stout/include/stout/duration.hpp#L116]. > I'd expect libprocess to either: > 1. Never run the method. > 2. Schedule it in the longest possible {{Duration}}. > {{Duration::operator+=()}} should probably also handle integer overflows > differently. If an addition leads to an integer overflow, it might make more > sense to return {{Duration::max()}} than a negative duration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7629) Parsing to protobuf leads to API call validation errors
Greg Mann created MESOS-7629: Summary: Parsing to protobuf leads to API call validation errors Key: MESOS-7629 URL: https://issues.apache.org/jira/browse/MESOS-7629 Project: Mesos Issue Type: Bug Components: stout Reporter: Greg Mann The {{::protobuf::parse()}} function will [silently drop unrecognized fields|https://github.com/apache/mesos/blob/7ec3269d51d7d180aa857140097c170c469d7959/3rdparty/stout/include/stout/protobuf.hpp#L589], which makes sense in the context of maintaining backward-compatibility across different Mesos versions which may add or remove fields from protobuf messages. However, since we [rely on this protobuf parsing|https://github.com/apache/mesos/blob/7ec3269d51d7d180aa857140097c170c469d7959/src/master/http.cpp#L514-L520] in some places for validation of user-supplied JSON, this can lead to API endpoints returning successful 2XX responses, when in fact the JSON was malformed and the call has not been completed as submitted. We should consider adding a parameter to API calls which allows users to enable/disable ignoring unrecognized fields in the call. If the default behavior for JSON requests was to return an error rather than ignore unrecognized fields, then our parsing code would catch malformed JSON submissions. The user could opt-in to the "ignore unrecognized fields" behavior when backwards compatibility is a concern. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7629) Parsing to protobuf leads to API call validation errors
[ https://issues.apache.org/jira/browse/MESOS-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039076#comment-16039076 ] Greg Mann commented on MESOS-7629: -- cc [~bmahler] > Parsing to protobuf leads to API call validation errors > --- > > Key: MESOS-7629 > URL: https://issues.apache.org/jira/browse/MESOS-7629 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Greg Mann > Labels: api, json, mesosphere, parsing, protobuf > > The {{::protobuf::parse()}} function will [silently drop unrecognized > fields|https://github.com/apache/mesos/blob/7ec3269d51d7d180aa857140097c170c469d7959/3rdparty/stout/include/stout/protobuf.hpp#L589], > which makes sense in the context of maintaining backward-compatibility > across different Mesos versions which may add or remove fields from protobuf > messages. However, since we [rely on this protobuf > parsing|https://github.com/apache/mesos/blob/7ec3269d51d7d180aa857140097c170c469d7959/src/master/http.cpp#L514-L520] > in some places for validation of user-supplied JSON, this can lead to API > endpoints returning successful 2XX responses, when in fact the JSON was > malformed and the call has not been completed as submitted. > We should consider adding a parameter to API calls which allows users to > enable/disable ignoring unrecognized fields in the call. If the default > behavior for JSON requests was to return an error rather than ignore > unrecognized fields, then our parsing code would catch malformed JSON > submissions. The user could opt-in to the "ignore unrecognized fields" > behavior when backwards compatibility is a concern. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7602) Add filtering capabilities to the master/agent operator APIs
Greg Mann created MESOS-7602: Summary: Add filtering capabilities to the master/agent operator APIs Key: MESOS-7602 URL: https://issues.apache.org/jira/browse/MESOS-7602 Project: Mesos Issue Type: Epic Components: agent, HTTP API, master Reporter: Greg Mann We would like to add filtering capabilities to both the unversioned operator HTTP endpoints and the V1 operator APIs on the master and agent. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (MESOS-7542) Add executor reconnection retry logic to the agent
[ https://issues.apache.org/jira/browse/MESOS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025602#comment-16025602 ] Greg Mann edited comment on MESOS-7542 at 5/26/17 12:48 AM: Implementation/tests of the agent-side behavior: https://reviews.apache.org/r/59584/ https://reviews.apache.org/r/59585/ https://reviews.apache.org/r/59586/ https://reviews.apache.org/r/59587/ was (Author: greggomann): Implementation/tests of the agent-side behavior: https://reviews.apache.org/r/59584/ https://reviews.apache.org/r/59584/ https://reviews.apache.org/r/59584/ https://reviews.apache.org/r/59584/ > Add executor reconnection retry logic to the agent > -- > > Key: MESOS-7542 > URL: https://issues.apache.org/jira/browse/MESOS-7542 > Project: Mesos > Issue Type: Improvement > Components: agent, executor >Reporter: Greg Mann >Assignee: Benjamin Mahler > Labels: mesosphere > > Currently, the agent sends a single {{ReconnectExecutorMessage}} to PID-based > executors during recovery. It would be more robust to have the agent retry > these messages until {{executor_reregister_timeout}} has elapsed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7542) Add executor reconnection retry logic to the agent
[ https://issues.apache.org/jira/browse/MESOS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025602#comment-16025602 ] Greg Mann commented on MESOS-7542: -- Implementation/tests of the agent-side behavior: https://reviews.apache.org/r/59584/ https://reviews.apache.org/r/59584/ https://reviews.apache.org/r/59584/ https://reviews.apache.org/r/59584/ > Add executor reconnection retry logic to the agent > -- > > Key: MESOS-7542 > URL: https://issues.apache.org/jira/browse/MESOS-7542 > Project: Mesos > Issue Type: Improvement > Components: agent, executor >Reporter: Greg Mann >Assignee: Benjamin Mahler > Labels: mesosphere > > Currently, the agent sends a single {{ReconnectExecutorMessage}} to PID-based > executors during recovery. It would be more robust to have the agent retry > these messages until {{executor_reregister_timeout}} has elapsed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)