[jira] [Created] (MESOS-9448) Semantics of RECONCILE_OPERATIONS framework API call are incorrect
Benjamin Bannier created MESOS-9448: --- Summary: Semantics of RECONCILE_OPERATIONS framework API call are incorrect Key: MESOS-9448 URL: https://issues.apache.org/jira/browse/MESOS-9448 Project: Mesos Issue Type: Bug Components: framework, HTTP API, master Reporter: Benjamin Bannier The typical pattern in the framework HTTP API is that frameworks send calls to which the master responds with {{Accepted}} responses and which trigger events. The only designed exception to this are {{SUBSCRIBE}} calls to which the master responds with an {{Ok}} response containing the assigned framework ID. This is even codified in {{src/scheduler.cpp:646ff}}, {code} if (response->code == process::http::Status::OK) { // Only SUBSCRIBE call should get a "200 OK" response. CHECK_EQ(Call::SUBSCRIBE, call.type()) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9449) Support HTTP when pull UCR image using docker registry v2 API
haoyuan ge created MESOS-9449: - Summary: Support HTTP when pull UCR image using docker registry v2 API Key: MESOS-9449 URL: https://issues.apache.org/jira/browse/MESOS-9449 Project: Mesos Issue Type: Improvement Components: agent, containerization Reporter: haoyuan ge Many customers use Harbor as docker registries in their private cloud. And most of them use HTTP instead of HTTPS to expose registry API. However, currently SSL is automatically handled when fetching images/layers for Mesos container. And mesos agent will report error when the registry is using HTTP: Failed to launch container: Failed to perform 'curl': curl: (60) SSL certificate problem: unable to get local issuer certificate -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9223) Storage local provider does not sufficiently handle container launch failures or errors
[ https://issues.apache.org/jira/browse/MESOS-9223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707318#comment-16707318 ] James DeFelice commented on MESOS-9223: --- MESOS-8380 addresses UI changes. The UI should not be the only place to easily observe/troubleshoot errors. Ideally there'd be an API that exposes such. > Storage local provider does not sufficiently handle container launch failures > or errors > --- > > Key: MESOS-9223 > URL: https://issues.apache.org/jira/browse/MESOS-9223 > Project: Mesos > Issue Type: Improvement > Components: agent, storage >Reporter: Benjamin Bannier >Priority: Critical > > The storage local resource provider as currently implemented does not handle > launch failures or task errors of its standalone containers well enough, If > e.g., a RP container fails to come up during node start a warning would be > logged, but an operator still needs to detect degraded functionality, > manually check the state of containers with {{GET_CONTAINERS}}, and decide > whether the agent needs restarting; I suspect they do not have always have > enough context for this decision. It would be better if the provider would > either enforce a restart by failing over the whole agent, or by retrying the > operation (optionally: up to some maximum amount of retries). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-9157) cannot pull docker image from dockerhub
[ https://issues.apache.org/jira/browse/MESOS-9157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707338#comment-16707338 ] Andrei Budnik edited comment on MESOS-9157 at 12/3/18 3:11 PM: --- [~MichaelBowie] Can you please provide stderr and stdout logs of a failed Docker container from its sandbox? It would also be great if you provide Mesos agent logs containing the failing Docker task. was (Author: abudnik): [~MichaelBowie] Can you please provide stderr and stdout logs of a failed Docker container from its sandbox? > cannot pull docker image from dockerhub > --- > > Key: MESOS-9157 > URL: https://issues.apache.org/jira/browse/MESOS-9157 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 1.6.1 >Reporter: Michael Bowie >Priority: Blocker > Labels: containerization > > I am not able to pull docker images from docker hub through marathon/mesos. > I get one of two errors: > * `Aug 15 10:11:02 michael-b-dcos-agent-1 dockerd[5974]: > time="2018-08-15T10:11:02.770309104-04:00" level=error msg="Not continuing > with pull after error: context canceled"` > * `Failed to run docker -H ... Error: No such object: > mesos-d2f333a8-fef2-48fb-8b99-28c52c327790` > However, I can manually ssh into one of the agents and successfully pull the > image from the command line. > Any pointers in the right direction? > Thank you! > Similar Issues: > https://github.com/mesosphere/marathon/issues/3869 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9157) cannot pull docker image from dockerhub
[ https://issues.apache.org/jira/browse/MESOS-9157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707338#comment-16707338 ] Andrei Budnik commented on MESOS-9157: -- [~MichaelBowie] Can you please provide stderr and stdout logs of a failed Docker container from its sandbox? > cannot pull docker image from dockerhub > --- > > Key: MESOS-9157 > URL: https://issues.apache.org/jira/browse/MESOS-9157 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 1.6.1 >Reporter: Michael Bowie >Priority: Blocker > Labels: containerization > > I am not able to pull docker images from docker hub through marathon/mesos. > I get one of two errors: > * `Aug 15 10:11:02 michael-b-dcos-agent-1 dockerd[5974]: > time="2018-08-15T10:11:02.770309104-04:00" level=error msg="Not continuing > with pull after error: context canceled"` > * `Failed to run docker -H ... Error: No such object: > mesos-d2f333a8-fef2-48fb-8b99-28c52c327790` > However, I can manually ssh into one of the agents and successfully pull the > image from the command line. > Any pointers in the right direction? > Thank you! > Similar Issues: > https://github.com/mesosphere/marathon/issues/3869 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9318) Consider providing better operation status updates while an RP is recovering
[ https://issues.apache.org/jira/browse/MESOS-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707362#comment-16707362 ] Benjamin Bannier commented on MESOS-9318: - The flow for a possible fix could be: * master sees a reconcilation request an operation on some resource provider on a registered agent * master forwards reconcilation request to agent * agent forwards it to its resource provider manager * resource provider manager either sends a {{ReconcileOperations}} event to the registered resource provider, or responds with an {{OPERATION_UNREACHABLE}} for a resource provider which is not subscribed. It could also respond with some status for resource providers marked gone, see MESOS-8403. > Consider providing better operation status updates while an RP is recovering > > > Key: MESOS-9318 > URL: https://issues.apache.org/jira/browse/MESOS-9318 > Project: Mesos > Issue Type: Task >Affects Versions: 1.6.0, 1.7.0 >Reporter: Gastón Kleiman >Priority: Major > Labels: mesosphere, operation-feedback > > Consider the following scenario: > 1. A framework accepts an offer with an operation affecting SLRP resources. > 2. The master forwards it to the corresponding agent. > 3. The agent forwards it to the corresponding RP. > 4. The agent and the master fail over. > 5. The master recovers. > 6. The agent recovers while the RP is still recovering, so it doesn't include > the pending operation on the {{RegisterMessage}}. > 7. A framework performs an explicit operation status reconciliation. > In this case the master will currently respond with {{OPERATION_UNKNOWN}}, > but it should be possible to respond with a more fine-grained and useful > state, such as {{OPERATION_RECOVERING}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9022) Race condition in task updates could cause missing event in streaming
[ https://issues.apache.org/jira/browse/MESOS-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707413#comment-16707413 ] Benno Evers commented on MESOS-9022: Confirmed, this is caused by the same underlying problem as MESOS-9000 and should be solved by https://reviews.apache.org/r/67575/ . > Race condition in task updates could cause missing event in streaming > - > > Key: MESOS-9022 > URL: https://issues.apache.org/jira/browse/MESOS-9022 > Project: Mesos > Issue Type: Bug > Components: HTTP API, master >Affects Versions: 1.6.0 >Reporter: Evelyn Liu >Assignee: Benno Evers >Priority: Blocker > Labels: events, foundations, mesos, mesosphere, race-condition, > streaming > > Master sends update event of {{TASK_STARTING}} when task's latest state is > already {{TASK_FAILED}}. Then when it handles the update of {{TASK_FAILED}}, > {{sendSubscribersUpdate}} is set to {{false}} because of > [this|https://github.com/apache/mesos/blob/1.6.x/src/master/master.cpp#L10805]. > The subscriber would not receive update event of {{TASK_FAILED}}. > This happened when a task failed very fast. Is there a race condition while > handling task updates? > {{*master log:*}} > {code:java} > I0622 13:08:29.189771 84079 master.cpp:8345] Status update TASK_STARTING > (Status UUID: eb091093-d303-4e82-b69f-e2ba1011ba76) for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- from agent > d2f1c7c2-668d-46e5-829b-ce614cca79ae-S1587 > I0622 13:08:29.189801 84079 master.cpp:8402] Forwarding status update > TASK_STARTING (Status UUID: eb091093-d303-4e82-b69f-e2ba1011ba76) for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- > I0622 13:08:29.190004 84079 master.cpp:10843] Updating the state of task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- (latest state: TASK_STARTING, > status update state: TASK_STARTING) > I0622 13:08:29.603857 84079 master.cpp:6195] Processing ACKNOWLEDGE call for > status eb091093-d303-4e82-b69f-e2ba1011ba76 for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- (Aurora) on agent > d2f1c7c2-668d-46e5-829b-ce614cca79ae-S1587 > I0622 13:08:29.615643 84079 master.cpp:8345] Status update TASK_STARTING > (Status UUID: eb091093-d303-4e82-b69f-e2ba1011ba76) for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- from agent > d2f1c7c2-668d-46e5-829b-ce614cca79ae-S1587 > I0622 13:08:29.615669 84079 master.cpp:8402] Forwarding status update > TASK_STARTING (Status UUID: eb091093-d303-4e82-b69f-e2ba1011ba76) for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- > I0622 13:08:29.615783 84079 master.cpp:10843] Updating the state of task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- (latest state: TASK_FAILED, status > update state: TASK_STARTING) > I0622 13:08:29.620837 84079 master.cpp:8345] Status update TASK_FAILED > (Status UUID: ac34f1e9-eaa4-4765-82ac-7398c2e6c835) for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- from agent > d2f1c7c2-668d-46e5-829b-ce614cca79ae-S1587 > I0622 13:08:29.620853 84079 master.cpp:8402] Forwarding status update > TASK_FAILED (Status UUID: ac34f1e9-eaa4-4765-82ac-7398c2e6c835) for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- > I0622 13:08:29.620923 84079 master.cpp:10843] Updating the state of task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- (latest state: TASK_FAILED, status > update state: TASK_FAILED) > I0622 13:08:29.630455 84079 master.cpp:6195] Processing ACKNOWLEDGE call for > status eb091093-d303-4e82-b69f-e2ba1011ba76 for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- (Aurora) on agent > d2f1c7c2-668d-46e5-829b-ce614cca79ae-S1587 > I0622 13:08:29.673051 84095 master.cpp:6195] Processing ACKNOWLEDGE call for > status ac34f1e9-eaa4-4765-82ac-7398c2e6c835 for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- (Aurora) on agent > d2f1c7c2-668d-46e5-829b-ce614cca79ae-S1587{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9338) Add asynchronous DNS facilities to libprocess.
[ https://issues.apache.org/jira/browse/MESOS-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707502#comment-16707502 ] Chun-Hung Hsiao commented on MESOS-9338: C-ares is implicitly bundled in the gRPC bundle. If we are going to bundle c-ares we should compile gRPC against the our c-ares bundle. > Add asynchronous DNS facilities to libprocess. > -- > > Key: MESOS-9338 > URL: https://issues.apache.org/jira/browse/MESOS-9338 > Project: Mesos > Issue Type: Improvement > Components: libprocess >Reporter: Benjamin Mahler >Priority: Major > Labels: foundations > > This would enable non-blocking DNS queries. One use case is during TLS peer > certificate verification, we need to perform a reverse DNS lookup to get the > peer's hostname. This blocks the event loop thread! > Some options: > (1) Linux provides {{getaddrinfo_a}}, however I don't see an equivalent one > for {{getnameinfo}}: > http://man7.org/linux/man-pages/man3/getaddrinfo_a.3.html > (2) A popular library is c-ares (MIT license): > https://c-ares.haxx.se/ > (3) ADNS (GPLv3): > https://www.gnu.org/software/adns/ > (4) c-ares has a list of other libraries: > https://c-ares.haxx.se/otherlibs.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-9338) Add asynchronous DNS facilities to libprocess.
[ https://issues.apache.org/jira/browse/MESOS-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707502#comment-16707502 ] Chun-Hung Hsiao edited comment on MESOS-9338 at 12/3/18 4:44 PM: - C-ares is implicitly bundled in the gRPC bundle. If we are going to bundle c-ares we should compile gRPC against our c-ares bundle. was (Author: chhsia0): C-ares is implicitly bundled in the gRPC bundle. If we are going to bundle c-ares we should compile gRPC against the our c-ares bundle. > Add asynchronous DNS facilities to libprocess. > -- > > Key: MESOS-9338 > URL: https://issues.apache.org/jira/browse/MESOS-9338 > Project: Mesos > Issue Type: Improvement > Components: libprocess >Reporter: Benjamin Mahler >Priority: Major > Labels: foundations > > This would enable non-blocking DNS queries. One use case is during TLS peer > certificate verification, we need to perform a reverse DNS lookup to get the > peer's hostname. This blocks the event loop thread! > Some options: > (1) Linux provides {{getaddrinfo_a}}, however I don't see an equivalent one > for {{getnameinfo}}: > http://man7.org/linux/man-pages/man3/getaddrinfo_a.3.html > (2) A popular library is c-ares (MIT license): > https://c-ares.haxx.se/ > (3) ADNS (GPLv3): > https://www.gnu.org/software/adns/ > (4) c-ares has a list of other libraries: > https://c-ares.haxx.se/otherlibs.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9022) Race condition in task updates could cause missing event in streaming
[ https://issues.apache.org/jira/browse/MESOS-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707638#comment-16707638 ] Evelyn Liu commented on MESOS-9022: --- Thanks [~bennoe] [~vinodkone]! > Race condition in task updates could cause missing event in streaming > - > > Key: MESOS-9022 > URL: https://issues.apache.org/jira/browse/MESOS-9022 > Project: Mesos > Issue Type: Bug > Components: HTTP API, master >Affects Versions: 1.6.0 >Reporter: Evelyn Liu >Assignee: Benno Evers >Priority: Blocker > Labels: events, foundations, mesos, mesosphere, race-condition, > streaming > Fix For: 1.7.0 > > > Master sends update event of {{TASK_STARTING}} when task's latest state is > already {{TASK_FAILED}}. Then when it handles the update of {{TASK_FAILED}}, > {{sendSubscribersUpdate}} is set to {{false}} because of > [this|https://github.com/apache/mesos/blob/1.6.x/src/master/master.cpp#L10805]. > The subscriber would not receive update event of {{TASK_FAILED}}. > This happened when a task failed very fast. Is there a race condition while > handling task updates? > {{*master log:*}} > {code:java} > I0622 13:08:29.189771 84079 master.cpp:8345] Status update TASK_STARTING > (Status UUID: eb091093-d303-4e82-b69f-e2ba1011ba76) for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- from agent > d2f1c7c2-668d-46e5-829b-ce614cca79ae-S1587 > I0622 13:08:29.189801 84079 master.cpp:8402] Forwarding status update > TASK_STARTING (Status UUID: eb091093-d303-4e82-b69f-e2ba1011ba76) for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- > I0622 13:08:29.190004 84079 master.cpp:10843] Updating the state of task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- (latest state: TASK_STARTING, > status update state: TASK_STARTING) > I0622 13:08:29.603857 84079 master.cpp:6195] Processing ACKNOWLEDGE call for > status eb091093-d303-4e82-b69f-e2ba1011ba76 for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- (Aurora) on agent > d2f1c7c2-668d-46e5-829b-ce614cca79ae-S1587 > I0622 13:08:29.615643 84079 master.cpp:8345] Status update TASK_STARTING > (Status UUID: eb091093-d303-4e82-b69f-e2ba1011ba76) for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- from agent > d2f1c7c2-668d-46e5-829b-ce614cca79ae-S1587 > I0622 13:08:29.615669 84079 master.cpp:8402] Forwarding status update > TASK_STARTING (Status UUID: eb091093-d303-4e82-b69f-e2ba1011ba76) for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- > I0622 13:08:29.615783 84079 master.cpp:10843] Updating the state of task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- (latest state: TASK_FAILED, status > update state: TASK_STARTING) > I0622 13:08:29.620837 84079 master.cpp:8345] Status update TASK_FAILED > (Status UUID: ac34f1e9-eaa4-4765-82ac-7398c2e6c835) for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- from agent > d2f1c7c2-668d-46e5-829b-ce614cca79ae-S1587 > I0622 13:08:29.620853 84079 master.cpp:8402] Forwarding status update > TASK_FAILED (Status UUID: ac34f1e9-eaa4-4765-82ac-7398c2e6c835) for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- > I0622 13:08:29.620923 84079 master.cpp:10843] Updating the state of task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- (latest state: TASK_FAILED, status > update state: TASK_FAILED) > I0622 13:08:29.630455 84079 master.cpp:6195] Processing ACKNOWLEDGE call for > status eb091093-d303-4e82-b69f-e2ba1011ba76 for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- (Aurora) on agent > d2f1c7c2-668d-46e5-829b-ce614cca79ae-S1587 > I0622 13:08:29.673051 84095 master.cpp:6195] Processing ACKNOWLEDGE call for > status ac34f1e9-eaa4-4765-82ac-7398c2e6c835 for task > f839055c-7a40-4e6c-9f53-22030f388c8c of framework > 4591ea8b-4adb-4acf-bb29-b70817663c4e- (Aurora) on agent > d2f1c7c2-668d-46e5-829b-ce614cca79ae-S1587{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9448) Semantics of RECONCILE_OPERATIONS framework API call are incorrect
[ https://issues.apache.org/jira/browse/MESOS-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707713#comment-16707713 ] Gastón Kleiman commented on MESOS-9448: --- These are the intended semantics for {{RECONCILE_OPERATIONS}}, we decided that we wanted to follow a Request/Response pattern instead of an event based pattern like {{RECONCILE}}. {{send()}} is a {{void}} method, so we had to add the {{call()}} method in order to use this API call. We should update the description of {{send()}} in {{scheduler.hpp}} and {{scheduler.cpp}} to make it clear that it can't be used to send {{RECONCILE_OPERATIONS}} requests. > Semantics of RECONCILE_OPERATIONS framework API call are incorrect > -- > > Key: MESOS-9448 > URL: https://issues.apache.org/jira/browse/MESOS-9448 > Project: Mesos > Issue Type: Bug > Components: framework, HTTP API, master >Reporter: Benjamin Bannier >Priority: Major > > The typical pattern in the framework HTTP API is that frameworks send calls > to which the master responds with {{Accepted}} responses and which trigger > events. The only designed exception to this are {{SUBSCRIBE}} calls to which > the master responds with an {{Ok}} response containing the assigned framework > ID. This is even codified in {{src/scheduler.cpp:646ff}}, > {code} > if (response->code == process::http::Status::OK) { > // Only SUBSCRIBE call should get a "200 OK" response. > CHECK_EQ(Call::SUBSCRIBE, call.type()); > {code} > Currently, the handling of {{RECONCILE_OPERATIONS}} calls does not follow > this pattern. Instead of sending events, the master immediately responds with > a {{Ok}} and a list of operations. This e.g., leads to assertion failures in > above hard check whenever one uses the {{Scheduler::send}} instead of > {{Scheduler::call}}. One can reproduce this by modifying the existing tests > in {{src/operation_reconciliation_tests.cpp}}, > {code} > mesos.send({createCallReconcileOperations(frameworkId, {operation})}); // ADD > THIS. > const Future result = > mesos.call({createCallReconcileOperations(frameworkId, {operation})}); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8983) SlaveRecoveryTest/0.PingTimeoutDuringRecovery flaky
[ https://issues.apache.org/jira/browse/MESOS-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707749#comment-16707749 ] Vinod Kone commented on MESOS-8983: --- This is happening on ASF CI. {code} *15:49:24* 3: [ RUN ] SlaveRecoveryTest/0.PingTimeoutDuringRecovery*15:49:24* 3: I1203 15:49:24.425719 24686 cluster.cpp:173] Creating default 'local' authorizer*15:49:24* 3: I1203 15:49:24.430784 24687 master.cpp:413] Master 620b2018-c90f-4b11-bbe3-8fa1c90f204d (5a45e7f918b2) started on 172.17.0.3:42912*15:49:24* 3: I1203 15:49:24.430824 24687 master.cpp:416] Flags at startup: --acls="" --agent_ping_timeout="1secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/PNxXC7/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="2" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --publish_per_framework_metrics="true" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/PNxXC7/master" --zk_session_timeout="10secs"*15:49:24* 3: I1203 15:49:24.431120 24687 master.cpp:465] Master only allowing authenticated frameworks to register*15:49:24* 3: I1203 15:49:24.431131 24687 master.cpp:471] Master only allowing authenticated agents to register*15:49:24* 3: I1203 15:49:24.431139 24687 master.cpp:477] Master only allowing authenticated HTTP frameworks to register*15:49:24* 3: I1203 15:49:24.431149 24687 credentials.hpp:37] Loading credentials for authentication from '/tmp/PNxXC7/credentials'*15:49:24* 3: I1203 15:49:24.431355 24687 master.cpp:521] Using default 'crammd5' authenticator*15:49:24* 3: I1203 15:49:24.431514 24687 http.cpp:1042] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly'*15:49:24* 3: I1203 15:49:24.431659 24687 http.cpp:1042] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'*15:49:24* 3: I1203 15:49:24.431778 24687 http.cpp:1042] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'*15:49:24* 3: I1203 15:49:24.431896 24687 master.cpp:602] Authorization enabled*15:49:24* 3: I1203 15:49:24.432276 24688 hierarchical.cpp:175] Initialized hierarchical allocator process*15:49:24* 3: I1203 15:49:24.432498 24688 whitelist_watcher.cpp:77] No whitelist given*15:49:24* 3: I1203 15:49:24.444337 24690 master.cpp:2105] Elected as the leading master!*15:49:24* 3: I1203 15:49:24.444366 24690 master.cpp:1660] Recovering from registrar*15:49:24* 3: I1203 15:49:24.445142 24687 registrar.cpp:339] Recovering registrar*15:49:24* 3: I1203 15:49:24.445669 24687 registrar.cpp:383] Successfully fetched the registry (0B) in 472064ns*15:49:24* 3: I1203 15:49:24.445785 24687 registrar.cpp:487] Applied 1 operations in 40517ns; attempting to update the registry*15:49:24* 3: I1203 15:49:24.446497 24687 registrar.cpp:544] Successfully updated the registry in 660992ns*15:49:24* 3: I1203 15:49:24.453212 24687 registrar.cpp:416] Successfully recovered registrar*15:49:24* 3: I1203 15:49:24.453722 24692 master.cpp:1774] Recovered 0 agents from the registry (135B); allowing 10mins for agents to reregister*15:49:24* 3: I1203 15:49:24.453984 24692 hierarchical.cpp:215] Skipping recovery of hierarchical allocator: nothing to recover*15:49:24* 3: I1203 15:49:24.468710 24686 containerizer.cpp:305] Using isolation \{ environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }*15:49:24* 3: W1203 15:49:24.481513 24686 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges*15:49:24* 3: W1203 15:49:24.481549 24686 backend.cpp:76] Failed to create 'bind' backend: BindBackend requires root privileges*15:49:24* 3: I1203 15:49:24.481591 24686 provisioner.cpp:298] Using default backend 'copy'*15:49:24* 3: W1203 15:49:24.498661 24686 process.cpp:2829] Attempted to spawn already running proce
[jira] [Commented] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707765#comment-16707765 ] Vinod Kone commented on MESOS-7971: --- Saw this again. {code} *06:14:51* [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove*06:14:51* I1203 06:14:50.630549 19784 cluster.cpp:173] Creating default 'local' authorizer*06:14:51* I1203 06:14:50.633529 19796 master.cpp:413] Master f1ffe054-ad44-45d4-9f39-84b048e1a359 (c16130e94783) started on 172.17.0.3:44340*06:14:51* I1203 06:14:50.633581 19796 master.cpp:416] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1000secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/4vMyjy/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --publish_per_framework_metrics="true" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --roles="role1" --root_submissions="true" --version="false" --webui_dir="/tmp/SRC/build/mesos-1.8.0/_inst/share/mesos/webui" --work_dir="/tmp/4vMyjy/master" --zk_session_timeout="10secs"*06:14:51* I1203 06:14:50.634217 19796 master.cpp:465] Master only allowing authenticated frameworks to register*06:14:51* I1203 06:14:50.634236 19796 master.cpp:471] Master only allowing authenticated agents to register*06:14:51* I1203 06:14:50.634253 19796 master.cpp:477] Master only allowing authenticated HTTP frameworks to register*06:14:51* I1203 06:14:50.634270 19796 credentials.hpp:37] Loading credentials for authentication from '/tmp/4vMyjy/credentials'*06:14:51* I1203 06:14:50.634608 19796 master.cpp:521] Using default 'crammd5' authenticator*06:14:51* I1203 06:14:50.634840 19796 http.cpp:1042] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly'*06:14:51* I1203 06:14:50.635052 19796 http.cpp:1042] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'*06:14:51* I1203 06:14:50.635200 19796 http.cpp:1042] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler'*06:14:51* I1203 06:14:50.635373 19796 master.cpp:602] Authorization enabled*06:14:51* W1203 06:14:50.635457 19796 master.cpp:665] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information*06:14:51* I1203 06:14:50.635991 19800 whitelist_watcher.cpp:77] No whitelist given*06:14:51* I1203 06:14:50.636032 19793 hierarchical.cpp:175] Initialized hierarchical allocator process*06:14:51* I1203 06:14:50.638939 19796 master.cpp:2105] Elected as the leading master!*06:14:51* I1203 06:14:50.638975 19796 master.cpp:1660] Recovering from registrar*06:14:51* I1203 06:14:50.639200 19792 registrar.cpp:339] Recovering registrar*06:14:51* I1203 06:14:50.639927 19792 registrar.cpp:383] Successfully fetched the registry (0B) in 672768ns*06:14:51* I1203 06:14:50.640069 19792 registrar.cpp:487] Applied 1 operations in 48006ns; attempting to update the registry*06:14:51* I1203 06:14:50.640718 19792 registrar.cpp:544] Successfully updated the registry in 582912ns*06:14:51* I1203 06:14:50.640852 19792 registrar.cpp:416] Successfully recovered registrar*06:14:51* I1203 06:14:50.641299 19800 master.cpp:1774] Recovered 0 agents from the registry (135B); allowing 10mins for agents to reregister*06:14:51* I1203 06:14:50.641340 19799 hierarchical.cpp:215] Skipping recovery of hierarchical allocator: nothing to recover*06:14:51* W1203 06:14:50.647153 19784 process.cpp:2829] Attempted to spawn already running process files@172.17.0.3:44340*06:14:51* I1203 06:14:50.648453 19784 containerizer.cpp:305] Using isolation \{ environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }*06:14:51* W1203 06:14:50.649060 19784 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges*06:14:51* W1203 06:14:50.649088 19784 backend.c
[jira] [Comment Edited] (MESOS-7971) PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove test is flaky
[ https://issues.apache.org/jira/browse/MESOS-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707765#comment-16707765 ] Vinod Kone edited comment on MESOS-7971 at 12/3/18 8:50 PM: Saw this again. {noformat} 06:14:51 [ RUN ] PersistentVolumeEndpointsTest.EndpointCreateThenOfferRemove 06:14:51 I1203 06:14:50.630549 19784 cluster.cpp:173] Creating default 'local' authorizer 06:14:51 I1203 06:14:50.633529 19796 master.cpp:413] Master f1ffe054-ad44-45d4-9f39-84b048e1a359 (c16130e94783) started on 172.17.0.3:44340 06:14:51 I1203 06:14:50.633581 19796 master.cpp:416] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1000secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/4vMyjy/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --publish_per_framework_metrics="true" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --roles="role1" --root_submissions="true" --version="false" --webui_dir="/tmp/SRC/build/mesos-1.8.0/_inst/share/mesos/webui" --work_dir="/tmp/4vMyjy/master" --zk_session_timeout="10secs" 06:14:51 I1203 06:14:50.634217 19796 master.cpp:465] Master only allowing authenticated frameworks to register 06:14:51 I1203 06:14:50.634236 19796 master.cpp:471] Master only allowing authenticated agents to register 06:14:51 I1203 06:14:50.634253 19796 master.cpp:477] Master only allowing authenticated HTTP frameworks to register 06:14:51 I1203 06:14:50.634270 19796 credentials.hpp:37] Loading credentials for authentication from '/tmp/4vMyjy/credentials' 06:14:51 I1203 06:14:50.634608 19796 master.cpp:521] Using default 'crammd5' authenticator 06:14:51 I1203 06:14:50.634840 19796 http.cpp:1042] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' 06:14:51 I1203 06:14:50.635052 19796 http.cpp:1042] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' 06:14:51 I1203 06:14:50.635200 19796 http.cpp:1042] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' 06:14:51 I1203 06:14:50.635373 19796 master.cpp:602] Authorization enabled 06:14:51 W1203 06:14:50.635457 19796 master.cpp:665] The '--roles' flag is deprecated. This flag will be removed in the future. See the Mesos 0.27 upgrade notes for more information 06:14:51 I1203 06:14:50.635991 19800 whitelist_watcher.cpp:77] No whitelist given 06:14:51 I1203 06:14:50.636032 19793 hierarchical.cpp:175] Initialized hierarchical allocator process 06:14:51 I1203 06:14:50.638939 19796 master.cpp:2105] Elected as the leading master! 06:14:51 I1203 06:14:50.638975 19796 master.cpp:1660] Recovering from registrar 06:14:51 I1203 06:14:50.639200 19792 registrar.cpp:339] Recovering registrar 06:14:51 I1203 06:14:50.639927 19792 registrar.cpp:383] Successfully fetched the registry (0B) in 672768ns 06:14:51 I1203 06:14:50.640069 19792 registrar.cpp:487] Applied 1 operations in 48006ns; attempting to update the registry 06:14:51 I1203 06:14:50.640718 19792 registrar.cpp:544] Successfully updated the registry in 582912ns 06:14:51 I1203 06:14:50.640852 19792 registrar.cpp:416] Successfully recovered registrar 06:14:51 I1203 06:14:50.641299 19800 master.cpp:1774] Recovered 0 agents from the registry (135B); allowing 10mins for agents to reregister 06:14:51 I1203 06:14:50.641340 19799 hierarchical.cpp:215] Skipping recovery of hierarchical allocator: nothing to recover 06:14:51 W1203 06:14:50.647153 19784 process.cpp:2829] Attempted to spawn already running process files@172.17.0.3:44340 06:14:51 I1203 06:14:50.648453 19784 containerizer.cpp:305] Using isolation { environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni } 06:14:51 W1203 06:14:50.649060 19784 backend.cpp:76] Failed to create 'aufs' backend: AufsBackend requires root privileges 06:14:51 W1203 06:14:50.649088 19784 backen
[jira] [Commented] (MESOS-3938) Consider allowing setting quotas for the default '*' role.
[ https://issues.apache.org/jira/browse/MESOS-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707949#comment-16707949 ] Meng Zhu commented on MESOS-3938: - Closing this unless there are new use cases. > Consider allowing setting quotas for the default '*' role. > -- > > Key: MESOS-3938 > URL: https://issues.apache.org/jira/browse/MESOS-3938 > Project: Mesos > Issue Type: Task >Reporter: Alexander Rukletsov >Priority: Major > > Investigate use cases and implications of the possibility to set quota for > the '*' role. For example, having quota for '*' set can effectively reduce > the scope of the quota capacity heuristic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-8045) Update Mesos executables output if there is a typo
[ https://issues.apache.org/jira/browse/MESOS-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benno Evers reassigned MESOS-8045: -- Resolution: Fixed Assignee: Benno Evers This is resolved by MESOS-8728, now we only print the full help string when the "--help" option is specified. > Update Mesos executables output if there is a typo > -- > > Key: MESOS-8045 > URL: https://issues.apache.org/jira/browse/MESOS-8045 > Project: Mesos > Issue Type: Improvement >Reporter: Armand Grillet >Assignee: Benno Evers >Priority: Minor > > Current output if a user makes a typo while using one of the Mesos > executables: > {code} > build (master) $ ./bin/mesos-master.sh --ip=127.0.0.1 --workdir=/tmp > Failed to load unknown flag 'workdir' > Usage: mesos-master [options] > --acls=VALUE >The value could be a JSON-formatted string of ACLs > >or a file path containing the JSON-formatted ACLs used > >for authorization. Path could be of the form `file:///path/to/file` > >or `/path/to/file`. > >Note that if the flag `--authorizers` is provided with a value > >different than `local`, the ACLs contents > >will be ignored. > >See the ACLs protobuf in acls.proto for the expected format. > >Example: > >{ > > "register_frameworks": [ > >{ > > "principals": { "type": "ANY" }, > > "roles": { "values": ["a"] } > >} > > ], > > "run_tasks": [ > >{ > > "principals": { "values": ["a", "b"] }, > > "users": { "values": ["c"] } > >} > > ], > > "teardown_frameworks": [ > >{ > > "principals": { "values": ["a", "b"] }, > > "framework_principals": { "values": ["c"] } > >} > > ], > > "set_quotas": [ > >{ > > "principals": { "values": ["a"] }, > > "roles": { "values": ["a", "b"] } > >} > > ], > > "remove_quotas": [ > >{ >
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707993#comment-16707993 ] Till Toenshoff commented on MESOS-4646: --- [~ipronin] do you have any cycles for looking into this? > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.8.0 > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Priority: Major > Labels: flaky, flaky-test > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = > *PortMappingIsolatorTest*-HierarchicalAllocator_BENCHMARK_Test.DeclineOffers:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward:PerfEventIsolatorTest.ROOT_CGROUPS_Sample:UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy:CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf:PerfTest.ROOT_Events:PerfTest.ROOT_Sample:PerfTest.Parse:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/0:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/1:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/2:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/3:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/4:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/5:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/6:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/7:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/8:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/9:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/10:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/11:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/12:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/13:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/14:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/15:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/16:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/17:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/18:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/19:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/20:SlaveAndFrameworkCount/
[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.
[ https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708001#comment-16708001 ] Till Toenshoff commented on MESOS-4646: --- We should try this on a more recent Kernel -- [~ipronin] suggested using a 4.9 or 4.14 -- will give that a spin as soon as possible. > PortMappingIsolatorTests get kernel stuck. > -- > > Key: MESOS-4646 > URL: https://issues.apache.org/jira/browse/MESOS-4646 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.8.0 > Environment: Linux Kernel 3.19.9-49-generic, > libnl-3.2.27 >Reporter: Till Toenshoff >Priority: Major > Labels: flaky, flaky-test > > {noformat} > $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*" > Source directory: /home/till/scratchpad/mesos > Build directory: /home/till/scratchpad/mesos/build > - > We cannot run any cgroups tests that require mounting > hierarchies because you have the following hierarchies mounted: > /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, > /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, > /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, > /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd > We'll disable the CgroupsNoHierarchyTest test fixture for now. > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > No 'perf' command found so no 'perf' tests will be run > - > WARNING: perf not found for kernel 3.19.0-49 > You may need to install the following packages for this specific kernel: > linux-tools-3.19.0-49-generic > linux-cloud-tools-3.19.0-49-generic > You may also want to install one of the following packages to keep up to > date: > linux-tools-generic-lts- > linux-cloud-tools-generic-lts- > - > The 'perf' command wasn't found so tests using it > to sample the 'cycles' hardware event will not be run. > - > /bin/nc > /usr/local/bin/curl > Note: Google Test filter = > *PortMappingIsolatorTest*-HierarchicalAllocator_BENCHMARK_Test.DeclineOffers:MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward:PerfEventIsolatorTest.ROOT_CGROUPS_Sample:UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup:UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy:CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf:PerfTest.ROOT_Events:PerfTest.ROOT_Sample:PerfTest.Parse:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/0:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/1:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/2:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/3:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/4:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/5:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/6:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/7:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/8:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/9:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/10:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/11:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/12:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/13:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/14:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/15:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/16:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/17:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/18:SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.AddAndUpdateSlave/19:SlaveAndFrameworkCount/Hier
[jira] [Created] (MESOS-9450) MasterAuthorizationTest.SlaveRemovedDropped is flaky.
Till Toenshoff created MESOS-9450: - Summary: MasterAuthorizationTest.SlaveRemovedDropped is flaky. Key: MESOS-9450 URL: https://issues.apache.org/jira/browse/MESOS-9450 Project: Mesos Issue Type: Bug Components: test Affects Versions: 1.8.0 Environment: Debian 9, autotools, libevent + SSL Reporter: Till Toenshoff {noformat} 23:50:59 [ RUN ] MasterAuthorizationTest.SlaveRemovedDropped 23:50:59 I1203 23:50:59.123471 1137 master.cpp:414] Master 1f14ff95-e61f-4410-a724-dfec18eb52b0 (localhost) started on 127.0.0.1:33161 23:50:59 I1203 23:50:59.123558 1137 master.cpp:417] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/0p45nb/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --publish_per_framework_metrics="true" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/0p45nb/master" --zk_session_timeout="10secs" 23:50:59 W1203 23:50:59.123672 1137 master.cpp:420] 23:50:59 ** 23:50:59 Master bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address. 23:50:59 ** 23:50:59 I1203 23:50:59.123688 1137 master.cpp:466] Master only allowing authenticated frameworks to register 23:50:59 I1203 23:50:59.123695 1137 master.cpp:472] Master only allowing authenticated agents to register 23:50:59 I1203 23:50:59.123702 1137 master.cpp:478] Master only allowing authenticated HTTP frameworks to register 23:50:59 I1203 23:50:59.123708 1137 credentials.hpp:37] Loading credentials for authentication from '/tmp/0p45nb/credentials' 23:50:59 I1203 23:50:59.123761 1137 master.cpp:522] Using default 'crammd5' authenticator 23:50:59 I1203 23:50:59.123819 1137 http.cpp:1017] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' 23:50:59 I1203 23:50:59.123875 1137 http.cpp:1017] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' 23:50:59 I1203 23:50:59.123903 1137 http.cpp:1017] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' 23:50:59 I1203 23:50:59.123939 1137 master.cpp:603] Authorization enabled 23:50:59 I1203 23:50:59.124068 1133 hierarchical.cpp:175] Initialized hierarchical allocator process 23:50:59 I1203 23:50:59.124094 1138 whitelist_watcher.cpp:77] No whitelist given 23:50:59 I1203 23:50:59.124608 1137 master.cpp:2089] Elected as the leading master! 23:50:59 I1203 23:50:59.124625 1137 master.cpp:1644] Recovering from registrar 23:50:59 I1203 23:50:59.124652 1136 registrar.cpp:339] Recovering registrar 23:50:59 I1203 23:50:59.124763 1136 registrar.cpp:383] Successfully fetched the registry (0B) in 97024ns 23:50:59 I1203 23:50:59.124807 1136 registrar.cpp:487] Applied 1 operations in 6279ns; attempting to update the registry 23:50:59 I1203 23:50:59.124967 1136 registrar.cpp:544] Successfully updated the registry in 143104ns 23:50:59 I1203 23:50:59.125001 1136 registrar.cpp:416] Successfully recovered registrar 23:50:59 I1203 23:50:59.125172 1137 master.cpp:1758] Recovered 0 agents from the registry (125B); allowing 10mins for agents to reregister 23:50:59 I1203 23:50:59.125355 1138 hierarchical.cpp:215] Skipping recovery of hierarchical allocator: nothing to recover 23:50:59 W1203 23:50:59.126682 1117 process.cpp:2829] Attempted to spawn already running process files@127.0.0.1:33161 23:50:59 I1203 23:50:59.126904 1117 cluster.cpp:485] Creating default 'local' authorizer 23:50:59 I1203 23:50:59.127399 1131 slave.cpp:268] Mesos age