[jira] [Updated] (MESOS-3923) Implement AuthN handling for HTTP Scheduler API

2016-01-21 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3923:
--
Assignee: (was: Anand Mazumdar)

> Implement AuthN handling for HTTP Scheduler API
> ---
>
> Key: MESOS-3923
> URL: https://issues.apache.org/jira/browse/MESOS-3923
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API, master
>Affects Versions: 0.25.0
>Reporter: Ben Whitehead
>  Labels: mesosphere
>
> If authentication(AuthN) is enabled on a master, frameworks attempting to use 
> the HTTP Scheduler API can't register.
> {code}
> $ cat /tmp/subscribe-943257503176798091.bin | http --print=HhBb --stream 
> --pretty=colors --auth verification:password1 POST :5050/api/v1/scheduler 
> Accept:application/x-protobuf Content-Type:application/x-protobuf
> POST /api/v1/scheduler HTTP/1.1
> Connection: keep-alive
> Content-Type: application/x-protobuf
> Accept-Encoding: gzip, deflate
> Accept: application/x-protobuf
> Content-Length: 126
> User-Agent: HTTPie/0.9.0
> Host: localhost:5050
> Authorization: Basic dmVyaWZpY2F0aW9uOnBhc3N3b3JkMQ==
> +-+
> | NOTE: binary data not shown in terminal |
> +-+
> HTTP/1.1 401 Unauthorized
> Date: Fri, 13 Nov 2015 20:00:45 GMT
> WWW-authenticate: Basic realm="Mesos master"
> Content-Length: 65
> HTTP schedulers are not supported when authentication is required
> {code}
> Authorization(AuthZ) is already supported for HTTP based frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4449) SegFault on agent during executor startup

2016-01-21 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111027#comment-15111027
 ] 

Anand Mazumdar commented on MESOS-4449:
---

Thanks for reporting this. The culprit here seems to be an erroneous {{NULL}} 
dereference introduced by me.

Patch for the fix: https://reviews.apache.org/r/42605/

> SegFault on agent during executor startup
> -
>
> Key: MESOS-4449
> URL: https://issues.apache.org/jira/browse/MESOS-4449
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: Some setup details:
> - Master and agents running in separate docker containers on the same host.
> - Containers based upon Ubuntu 14.04 using Mesosphere produced Mesos deb 
> files. For more details see 
> (https://github.com/ContainerSolutions/minimesos-docker)
> - This only occurs with 0.26, not with 0.25.
>Reporter: Philip Winder
> Attachments: agent.txt, master.txt
>
>
> When repeatedly performing our system tests we have found that we get a 
> segfault on one of the agents. It probably occurs about one time in ten. I 
> have attached the full log from that agent. I've attached the log from the 
> agent that failed and the master (although I think this is less helpful).
> To reproduce
> - I have no idea. It seems to occur at certain times. E.g. like if a packet 
> is created right on a minute boundary or something. But I don't think it's 
> something caused by our code because the timestamps are stamped by mesos. I 
> was surprised not to find a bug already open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4449) SegFault on agent during executor startup

2016-01-21 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-4449:
-

Assignee: Anand Mazumdar

> SegFault on agent during executor startup
> -
>
> Key: MESOS-4449
> URL: https://issues.apache.org/jira/browse/MESOS-4449
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: Some setup details:
> - Master and agents running in separate docker containers on the same host.
> - Containers based upon Ubuntu 14.04 using Mesosphere produced Mesos deb 
> files. For more details see 
> (https://github.com/ContainerSolutions/minimesos-docker)
> - This only occurs with 0.26, not with 0.25.
>Reporter: Philip Winder
>Assignee: Anand Mazumdar
> Attachments: agent.txt, master.txt
>
>
> When repeatedly performing our system tests we have found that we get a 
> segfault on one of the agents. It probably occurs about one time in ten. I 
> have attached the full log from that agent. I've attached the log from the 
> agent that failed and the master (although I think this is less helpful).
> To reproduce
> - I have no idea. It seems to occur at certain times. E.g. like if a packet 
> is created right on a minute boundary or something. But I don't think it's 
> something caused by our code because the timestamps are stamped by mesos. I 
> was surprised not to find a bug already open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4451) Enable `-Wnull-dereference` when building Mesos

2016-01-21 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4451:
-

 Summary: Enable `-Wnull-dereference` when building Mesos
 Key: MESOS-4451
 URL: https://issues.apache.org/jira/browse/MESOS-4451
 Project: Mesos
  Issue Type: Task
  Components: build
Reporter: Anand Mazumdar
Priority: Minor


Currently we don't have {{-Wnull-dereference}} enabled for Mesos. This can 
sometimes lead to {{NULL}} dereference errors that go unnoticed like the one 
reported in {{MESOS-4449}}.

We do have {{-Werror}} set to {{-Wall}} but this particular warning is not 
included in {{-Wall}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4449) SegFault on agent during executor startup

2016-01-21 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4449:
--
Shepherd: Joris Van Remoortere
  Sprint: Mesosphere Sprint 27
Story Points: 1
  Labels: mesosphere  (was: )

> SegFault on agent during executor startup
> -
>
> Key: MESOS-4449
> URL: https://issues.apache.org/jira/browse/MESOS-4449
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: Some setup details:
> - Master and agents running in separate docker containers on the same host.
> - Containers based upon Ubuntu 14.04 using Mesosphere produced Mesos deb 
> files. For more details see 
> (https://github.com/ContainerSolutions/minimesos-docker)
> - This only occurs with 0.26, not with 0.25.
>Reporter: Philip Winder
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere
> Attachments: agent.txt, master.txt
>
>
> When repeatedly performing our system tests we have found that we get a 
> segfault on one of the agents. It probably occurs about one time in ten. I 
> have attached the full log from that agent. I've attached the log from the 
> agent that failed and the master (although I think this is less helpful).
> To reproduce
> - I have no idea. It seems to occur at certain times. E.g. like if a packet 
> is created right on a minute boundary or something. But I don't think it's 
> something caused by our code because the timestamps are stamped by mesos. I 
> was surprised not to find a bug already open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2296) Implement the Events stream on slave for Call endpoint

2016-01-19 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2296:
--
Issue Type: Epic  (was: Task)

> Implement the Events stream on slave for Call endpoint
> --
>
> Key: MESOS-2296
> URL: https://issues.apache.org/jira/browse/MESOS-2296
> Project: Mesos
>  Issue Type: Epic
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4255) Add mechanism for testing recovery of HTTP based executors

2016-01-19 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4255:
--
Sprint:   (was: Mesosphere Sprint 26)

> Add mechanism for testing recovery of HTTP based executors
> --
>
> Key: MESOS-4255
> URL: https://issues.apache.org/jira/browse/MESOS-4255
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, the slave process generates a process ID every time it is 
> initialized via {{process::ID::generate}} function call. This is a problem 
> for testing HTTP executors as it can't retry if there is a disconnection 
> after an agent restart since the prefix is incremented. 
> {code}
> Agent PID before:
> slave(1)@127.0.0.1:43915
> Agent PID after restart:
> slave(2)@127.0.0.1:43915
> {code}
> There are a couple of ways to fix this:
> - Add a constructor to {{Slave}} exclusively for testing that passes on a 
> fixed {{ID}} instead of relying on {{ID::generate}}.
> - Currently we delegate to slave(1)@ i.e. (1) when nothing is specified as 
> the URL in libprocess i.e. {{127.0.0.1:43915/api/v1/executor}} would delegate 
> to {{slave(1)@127.0.0.1:43915/api/v1/executor}}. Instead of defaulting to 
> (1), we can default to the last known active ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4425) Introduce filtering test abstractions for HTTP events to libprocess

2016-01-19 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4425:
--
  Sprint: Mesosphere Sprint 27
Story Points: 3

> Introduce filtering test abstractions for HTTP events to libprocess
> ---
>
> Key: MESOS-4425
> URL: https://issues.apache.org/jira/browse/MESOS-4425
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> We need a test abstraction for {{HttpEvent}} similar to the already existing 
> one's for {{DispatchEvent}}, {{MessageEvent}} in libprocess.
> The abstraction can look similar in semantics to the already existing 
> {{FUTURE_DISPATCH}}/{{FUTURE_MESSAGE}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4433) Implement a callback testing interface for the Executor Library

2016-01-19 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4433:
-

 Summary: Implement a callback testing interface for the Executor 
Library
 Key: MESOS-4433
 URL: https://issues.apache.org/jira/browse/MESOS-4433
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar


Currently, we do not have a mocking based callback interface for the executor 
library. This should look similar to the ongoing work for MESOS-3339 i.e. the 
corresponding issue for the scheduler library.

The interface should allow us to set expectations like we do for the driver. An 
example:

{code}
EXPECT_CALL(executor, connected())
  .Times(1)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4425) Introduce filtering test abstractions for HTTP events to libprocess

2016-01-18 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4425:
-

 Summary: Introduce filtering test abstractions for HTTP events to 
libprocess
 Key: MESOS-4425
 URL: https://issues.apache.org/jira/browse/MESOS-4425
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar


We need a test abstraction for {{HttpEvent}} similar to the already existing 
one's for {{DispatchEvent}}, {{MessageEvent}} in libprocess.

The abstraction can look similar in semantics to the already existing 
{{FUTURE_DISPATCH}}/{{FUTURE_MESSAGE}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3832) Scheduler HTTP API does not redirect to leading master

2016-01-15 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102187#comment-15102187
 ] 

Anand Mazumdar commented on MESOS-3832:
---

I posted another patch that addressed the comments from Vinod in the earlier 
review. Should be able make it in to 0.27.

https://reviews.apache.org/r/42341/

> Scheduler HTTP API does not redirect to leading master
> --
>
> Key: MESOS-3832
> URL: https://issues.apache.org/jira/browse/MESOS-3832
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Dario Rexin
>Assignee: Vinod Kone
>  Labels: newbie
>
> The documentation for the Scheduler HTTP API says:
> {quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
> Redirect” will be received with the “Location” header pointing to the leading 
> master.{quote}
> While the redirect functionality has been implemented, it was not actually 
> used in the handler for the HTTP api.
> A probable fix could be:
> - Check if the current master is the leading master.
> - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3923) Implement AuthN handling for HTTP Scheduler API

2016-01-15 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3923:
--
Sprint:   (was: Mesosphere Sprint 27)

> Implement AuthN handling for HTTP Scheduler API
> ---
>
> Key: MESOS-3923
> URL: https://issues.apache.org/jira/browse/MESOS-3923
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API, master
>Affects Versions: 0.25.0
>Reporter: Ben Whitehead
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> If authentication(AuthN) is enabled on a master, frameworks attempting to use 
> the HTTP Scheduler API can't register.
> {code}
> $ cat /tmp/subscribe-943257503176798091.bin | http --print=HhBb --stream 
> --pretty=colors --auth verification:password1 POST :5050/api/v1/scheduler 
> Accept:application/x-protobuf Content-Type:application/x-protobuf
> POST /api/v1/scheduler HTTP/1.1
> Connection: keep-alive
> Content-Type: application/x-protobuf
> Accept-Encoding: gzip, deflate
> Accept: application/x-protobuf
> Content-Length: 126
> User-Agent: HTTPie/0.9.0
> Host: localhost:5050
> Authorization: Basic dmVyaWZpY2F0aW9uOnBhc3N3b3JkMQ==
> +-+
> | NOTE: binary data not shown in terminal |
> +-+
> HTTP/1.1 401 Unauthorized
> Date: Fri, 13 Nov 2015 20:00:45 GMT
> WWW-authenticate: Basic realm="Mesos master"
> Content-Length: 65
> HTTP schedulers are not supported when authentication is required
> {code}
> Authorization(AuthZ) is already supported for HTTP based frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4334) Add documentation for the registry

2016-01-15 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4334:
--
Shepherd: Benjamin Mahler

> Add documentation for the registry
> --
>
> Key: MESOS-4334
> URL: https://issues.apache.org/jira/browse/MESOS-4334
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, master
>Reporter: Neil Conway
>Assignee: Anand Mazumdar
>  Labels: documentation, mesosphere, registry
>
> What information does the master store in the registry? What do operators 
> need to know about managing the registry?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4398) Synchronously handle AuthZ errors for the Scheduler endpoint.

2016-01-15 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4398:
-

 Summary: Synchronously handle AuthZ errors for the Scheduler 
endpoint.
 Key: MESOS-4398
 URL: https://issues.apache.org/jira/browse/MESOS-4398
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.25.0
Reporter: Anand Mazumdar


Currently, any AuthZ errors for the {{/scheduler}} endpoint are handled 
asynchronously as {{FrameworkErrorMessage}}. Here is an example:

{code}
  if (authorizationError.isSome()) {
LOG(INFO) << "Refusing subscription of framework"
  << " '" << frameworkInfo.name() << "'"
  << ": " << authorizationError.get().message;

FrameworkErrorMessage message;
message.set_message(authorizationError.get().message);
http.send(message);
http.close();
return;
  }
{code}

We would like to handle such errors synchronously when the request is received 
similar to what other endpoints like {{/reserve}}/{{/quota}} do. We already 
have the relevant functions {{authorizeXXX}} etc in {{master.cpp}}. We should 
just make the requests pass through once the relevant {{Future}} from the 
{{authorizeXXX}} function is fulfilled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4409) MasterTest.MaxCompletedFrameworksFlag is flaky

2016-01-15 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102504#comment-15102504
 ] 

Anand Mazumdar commented on MESOS-4409:
---

[~klueska] Do you want to have a look at this?

> MasterTest.MaxCompletedFrameworksFlag is flaky
> --
>
> Key: MESOS-4409
> URL: https://issues.apache.org/jira/browse/MESOS-4409
> Project: Mesos
>  Issue Type: Bug
>  Components: master, tests
>Affects Versions: 0.26.0
> Environment: On Jenkins CI: gcc,--verbose,ubuntu:14.04,docker
>Reporter: Greg Mann
>  Labels: flaky-test, mesosphere, tests
>
> Saw this failure on Jenkins CI:
> {code}
> [ RUN  ] MasterTest.MaxCompletedFrameworksFlag
> I0115 21:24:50.344116 31507 leveldb.cpp:174] Opened db in 2.062201ms
> I0115 21:24:50.344874 31507 leveldb.cpp:181] Compacted db in 716863ns
> I0115 21:24:50.344923 31507 leveldb.cpp:196] Created db iterator in 19087ns
> I0115 21:24:50.344949 31507 leveldb.cpp:202] Seeked to beginning of db in 
> 1897ns
> I0115 21:24:50.344965 31507 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 298ns
> I0115 21:24:50.345012 31507 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0115 21:24:50.345432 31536 recover.cpp:447] Starting replica recovery
> I0115 21:24:50.345657 31536 recover.cpp:473] Replica is in EMPTY status
> I0115 21:24:50.346535 31539 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (6089)@172.17.0.4:52665
> I0115 21:24:50.347028 31540 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0115 21:24:50.347554 31526 recover.cpp:564] Updating replica status to 
> STARTING
> I0115 21:24:50.348175 31540 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 433937ns
> I0115 21:24:50.348215 31526 master.cpp:374] Master 
> bf6ba047-245f-4e65-986c-1880cef81248 (4e6fbf10d387) started on 
> 172.17.0.4:52665
> I0115 21:24:50.349417 31540 replica.cpp:320] Persisted replica status to 
> STARTING
> I0115 21:24:50.349630 31536 recover.cpp:473] Replica is in STARTING status
> I0115 21:24:50.349421 31526 master.cpp:376] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/2wURTY/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="0" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/2wURTY/master" --zk_session_timeout="10secs"
> I0115 21:24:50.349720 31526 master.cpp:421] Master only allowing 
> authenticated frameworks to register
> I0115 21:24:50.349737 31526 master.cpp:426] Master only allowing 
> authenticated slaves to register
> I0115 21:24:50.349750 31526 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/2wURTY/credentials'
> I0115 21:24:50.350005 31526 master.cpp:466] Using default 'crammd5' 
> authenticator
> I0115 21:24:50.350132 31526 master.cpp:535] Using default 'basic' HTTP 
> authenticator
> I0115 21:24:50.350256 31526 master.cpp:569] Authorization enabled
> I0115 21:24:50.350546 31529 hierarchical.cpp:147] Initialized hierarchical 
> allocator process
> I0115 21:24:50.350626 31536 whitelist_watcher.cpp:77] No whitelist given
> I0115 21:24:50.350559 31538 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (6090)@172.17.0.4:52665
> I0115 21:24:50.351049 31534 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0115 21:24:50.351704 31537 recover.cpp:564] Updating replica status to VOTING
> I0115 21:24:50.352221 31532 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 38ns
> I0115 21:24:50.352246 31532 replica.cpp:320] Persisted replica status to 
> VOTING
> I0115 21:24:50.352371 31541 recover.cpp:578] Successfully joined the Paxos 
> group
> I0115 21:24:50.352620 31541 recover.cpp:462] Recover process terminated
> I0115 21:24:50.353121 31528 master.cpp:1710] The newly elected leader is 
> master@172.17.0.4:52665 with id bf6ba047-245f-4e65-986c-1880cef81248
> I0115 21:24:50.353152 31528 master.cpp:1723] 

[jira] [Commented] (MESOS-4404) SlaveTest.HTTPSchedulerSlaveRestart is flaky

2016-01-15 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102530#comment-15102530
 ] 

Anand Mazumdar commented on MESOS-4404:
---

[~qiujian] Do you want to have a look at this? Not sure, but might be related 
to your recent patch to speed up this test.

> SlaveTest.HTTPSchedulerSlaveRestart is flaky
> 
>
> Key: MESOS-4404
> URL: https://issues.apache.org/jira/browse/MESOS-4404
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, slave
>Affects Versions: 0.26.0
> Environment: From the Jenkins CI: gcc,--verbose --enable-libevent 
> --enable-ssl,centos:7,docker
>Reporter: Greg Mann
>  Labels: flaky-test, mesosphere
>
> Saw this failure on the Jenkins CI:
> {code}
> [ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
> I0115 18:42:25.393354  1762 leveldb.cpp:174] Opened db in 3.456169ms
> I0115 18:42:25.394310  1762 leveldb.cpp:181] Compacted db in 922588ns
> I0115 18:42:25.394361  1762 leveldb.cpp:196] Created db iterator in 18529ns
> I0115 18:42:25.394378  1762 leveldb.cpp:202] Seeked to beginning of db in 
> 1933ns
> I0115 18:42:25.394390  1762 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 280ns
> I0115 18:42:25.394430  1762 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0115 18:42:25.394963  1791 recover.cpp:447] Starting replica recovery
> I0115 18:42:25.395396  1791 recover.cpp:473] Replica is in EMPTY status
> I0115 18:42:25.396589  1795 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (11302)@172.17.0.2:49129
> I0115 18:42:25.397101  1785 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0115 18:42:25.397721  1791 recover.cpp:564] Updating replica status to 
> STARTING
> I0115 18:42:25.398764  1789 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 684584ns
> I0115 18:42:25.398807  1789 replica.cpp:320] Persisted replica status to 
> STARTING
> I0115 18:42:25.398947  1795 master.cpp:374] Master 
> 544823be-76b5-47be-b326-2cd6d6a700b8 (e648fe109cb1) started on 
> 172.17.0.2:49129
> I0115 18:42:25.399209  1788 recover.cpp:473] Replica is in STARTING status
> I0115 18:42:25.398980  1795 master.cpp:376] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/BOGaaq/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/BOGaaq/master" --zk_session_timeout="10secs"
> I0115 18:42:25.399435  1795 master.cpp:421] Master only allowing 
> authenticated frameworks to register
> I0115 18:42:25.399451  1795 master.cpp:426] Master only allowing 
> authenticated slaves to register
> I0115 18:42:25.399461  1795 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/BOGaaq/credentials'
> I0115 18:42:25.399884  1795 master.cpp:466] Using default 'crammd5' 
> authenticator
> I0115 18:42:25.400060  1795 master.cpp:535] Using default 'basic' HTTP 
> authenticator
> I0115 18:42:25.400254  1795 master.cpp:569] Authorization enabled
> I0115 18:42:25.400439  1785 hierarchical.cpp:147] Initialized hierarchical 
> allocator process
> I0115 18:42:25.400470  1789 whitelist_watcher.cpp:77] No whitelist given
> I0115 18:42:25.400656  1792 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (11303)@172.17.0.2:49129
> I0115 18:42:25.400943  1781 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0115 18:42:25.401612  1791 recover.cpp:564] Updating replica status to VOTING
> I0115 18:42:25.402313  1785 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 458849ns
> I0115 18:42:25.402345  1785 replica.cpp:320] Persisted replica status to 
> VOTING
> I0115 18:42:25.402510  1788 recover.cpp:578] Successfully joined the Paxos 
> group
> I0115 18:42:25.402848  1788 recover.cpp:462] Recover process terminated
> I0115 18:42:25.402997  1784 master.cpp:1710] The newly elected leader is 
> 

[jira] [Updated] (MESOS-3550) Create a Executor Library based on the new Executor HTTP API

2016-01-14 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3550:
--
Target Version/s:   (was: 0.27.0)

> Create a Executor Library based on the new Executor HTTP API
> 
>
> Key: MESOS-3550
> URL: https://issues.apache.org/jira/browse/MESOS-3550
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Similar to the Scheduler Library {{src/scheduler/scheduler.cpp}} , we would 
> need a Executor Library that speaks the new Executor HTTP API. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3578) ProvisionerDockerLocalStoreTest.MetadataManagerInitialization is flaky

2016-01-14 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3578:
--
Target Version/s:   (was: 0.27.0)

> ProvisionerDockerLocalStoreTest.MetadataManagerInitialization is flaky
> --
>
> Key: MESOS-3578
> URL: https://issues.apache.org/jira/browse/MESOS-3578
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>  Labels: flaky-test, mesosphere
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/881/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull
> {code}
> [ RUN  ] ProvisionerDockerLocalStoreTest.MetadataManagerInitialization
> Using temporary directory 
> '/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE'
> I0929 02:36:44.066397 30457 local_puller.cpp:127] Untarring image from 
> '/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE/store/staging/aZND7C'
>  to 
> '/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE/images/abc:latest.tar'
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:843: Failure
> (layers).failure(): Collect failed: Untar failed with exit code: exited with 
> status 2
> [  FAILED  ] ProvisionerDockerLocalStoreTest.MetadataManagerInitialization 
> (181 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3583) Introduce sessions in HTTP Scheduler API Subscribed Responses

2016-01-12 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3583:
--
Target Version/s:   (was: 0.27.0)

> Introduce sessions in HTTP Scheduler API Subscribed Responses
> -
>
> Key: MESOS-3583
> URL: https://issues.apache.org/jira/browse/MESOS-3583
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere, tech-debt
>
> Currently, the HTTP Scheduler API has no concept of Sessions aka 
> {{SessionID}} or a {{TokenID}}. This is useful in some failure scenarios. As 
> of now, if a framework fails over and then subscribes again with the same 
> {{FrameworkID}} with the {{force}} option set. The Mesos master would 
> subscribe it.
> If the previous instance of the framework/scheduler tries to send a Call , 
> e.g. {{Call::KILL}} with the same previous {{FrameworkID}} set, it would be 
> still accepted by the master leading to erroneously killing a task.
> This is possible because we do not have a way currently of distinguishing 
> connections. It used to work in the previous driver implementation due to the 
> master also performing a {{UPID}} check to verify if they matched and only 
> then allowing the call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3339) Implement filtering mechanism for (Scheduler API Events) Testing

2016-01-12 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3339:
--
Target Version/s:   (was: 0.27.0)

> Implement filtering mechanism for (Scheduler API Events) Testing
> 
>
> Key: MESOS-3339
> URL: https://issues.apache.org/jira/browse/MESOS-3339
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, our testing infrastructure does not have a mechanism of 
> filtering/dropping HTTP events of a particular type from the Scheduler API 
> response stream.  We need a {{DROP_HTTP_CALLS}} abstraction that can help us 
> to filter a particular event type.
> {code}
> // Enqueues all received events into a libprocess queue.
> ACTION_P(Enqueue, queue)
> {
>   std::queue events = arg0;
>   while (!events.empty()) {
> // Note that we currently drop HEARTBEATs because most of these tests
> // are not designed to deal with heartbeats.
> // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
> if (events.front().type() == Event::HEARTBEAT) {
>   VLOG(1) << "Ignoring HEARTBEAT event";
> } else {
>   queue->put(events.front());
> }
> events.pop();
>   }
> }
> {code}
> This helper code is duplicated in at least two places currently, Scheduler 
> Library/Maintenance Primitives tests. 
> - The solution can be as trivial as moving this helper function to a common 
> test-header.
> - Implement a {{DROP_HTTP_CALLS}} similar to what we do for other protobufs 
> via {{DROP_CALLS}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4031) slave crashed in cgroupstatistics()

2016-01-11 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4031:
--
Description: 
Hi all, 
I have built a mesos cluster with three slaves. Any slave may sporadically 
crash when I get the summary through mesos master ui. Here is the stack trace. 

{code}
 slave.sh[13336]: I1201 11:54:12.827975 13338 slave.cpp:3926] Current disk 
usage 79.71%. Max allowed age: 17.279577136390834hrs
 slave.sh[13336]: I1201 11:55:12.829792 13342 slave.cpp:3926] Current disk 
usage 79.71%. Max allowed age: 17.279577136390834hrs
 slave.sh[13336]: I1201 11:55:38.389614 13342 http.cpp:189] HTTP GET for 
/slave(1)/state from 192.168.100.1:64870 with User-Agent='Mozilla/5.0 (X11; 
Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0'
 docker[8409]: time="2015-12-01T11:55:38.934148017+08:00" level=info msg="GET 
/v1.20/containers/mesos-b25be32d-41e1-4e14-9b84-d33d733cef51-S3.79c206a6-d6b5-487b-9390-e09292c5b53a/json"
 docker[8409]: time="2015-12-01T11:55:38.941489332+08:00" level=info msg="GET 
/v1.20/containers/mesos-b25be32d-41e1-4e14-9b84-d33d733cef51-S3.1e01a4b3-a76e-4bf6-8ce0-a4a937faf236/json"
 slave.sh[13336]: ABORT: 
(../../3rdparty/libprocess/3rdparty/stout/include/stout/result.hpp:110): 
Result::get() but state == NONE*** Aborted at 1448942139 (unix time) try "date 
-d @1448942139" if you are using GNU date ***
 slave.sh[13336]: PC: @ 0x7f295218a107 (unknown)
 slave.sh[13336]: *** SIGABRT (@0x3419) received by PID 13337 (TID 
0x7f2948992700) from PID 13337; stack trace: ***
 slave.sh[13336]: @ 0x7f2952a2e8d0 (unknown)
 slave.sh[13336]: @ 0x7f295218a107 (unknown)
 slave.sh[13336]: @ 0x7f295218b4e8 (unknown)
 slave.sh[13336]: @   0x43dc59 _Abort()
 slave.sh[13336]: @   0x43dc87 _Abort()
 slave.sh[13336]: @ 0x7f2955e31c86 Result<>::get()
 slave.sh[13336]: @ 0x7f295637f017 
mesos::internal::slave::DockerContainerizerProcess::cgroupsStatistics()
 slave.sh[13336]: @ 0x7f295637dfea 
_ZZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS_11ContainerIDEENKUliE_clEi
 slave.sh[13336]: @ 0x7f295637e549 
_ZZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS_11ContainerIDEENKUlRKN6Docker9ContainerEE0_clES9_
 slave.sh[13336]: @ 0x7f295638453b
ZN5mesos8internal5slave26DockerContainerizerProcess5usageERKNS1_11ContainerIDEEUlRKN6Docker9ContainerEE0_EcvSt8functionIFT_T0_EEINS_6FutureINS1_18ResourceStatisticsEEESB_EEvENKUlSB_E_clESB_ENKUlvE_clEv
 slave.sh[13336]: @ 0x7f295638751d
FN7process6FutureIN5mesos18ResourceStatisticsEEEvEZZNKS0_9_DeferredIZNS2_8internal5slave26DockerContainerizerProcess5usageERKNS2_11ContainerIDEEUlRKN6Docker9ContainerEE0_EcvSt8functionIFT_T0_EEIS4_SG_EEvENKUlSG_E_clESG_EUlvE_E9_M_invoke

 slave.sh[13336]: @ 0x7f29563b53e7 std::function<>::operator()()
 slave.sh[13336]: @ 0x7f29563aa5dc 
_ZZN7process8dispatchIN5mesos18ResourceStatisticsEEENS_6FutureIT_EERKNS_4UPIDERKSt8functionIFS5_vEEENKUlPNS_11ProcessBaseEE_clESF_
 slave.sh[13336]: @ 0x7f29563bd667 
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos18ResourceStatisticsEEENS0_6FutureIT_EERKNS0_4UPIDERKSt8functionIFS9_vEEEUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
 slave.sh[13336]: @ 0x7f2956b893c3 std::function<>::operator()()
 slave.sh[13336]: @ 0x7f2956b72ab0 process::ProcessBase::visit()
 slave.sh[13336]: @ 0x7f2956b7588e process::DispatchEvent::visit()
 slave.sh[13336]: @ 0x7f2955d7f972 process::ProcessBase::serve()
 slave.sh[13336]: @ 0x7f2956b6ef8e process::ProcessManager::resume()
 slave.sh[13336]: @ 0x7f2956b63555 process::internal::schedule()
 slave.sh[13336]: @ 0x7f2956bc0839 
_ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
 slave.sh[13336]: @ 0x7f2956bc0781 std::_Bind_simple<>::operator()()
 slave.sh[13336]: @ 0x7f2956bc06fe std::thread::_Impl<>::_M_run()
 slave.sh[13336]: @ 0x7f29527ca970 (unknown)
 slave.sh[13336]: @ 0x7f2952a270a4 start_thread
 slave.sh[13336]: @ 0x7f295223b04d (unknown)
{code}

  was:
Hi all, 
I have built a mesos cluster with three slaves. Any slave may sporadically 
crash when I get the summary through mesos master ui. Here is the stack trace. 

```
 slave.sh[13336]: I1201 11:54:12.827975 13338 slave.cpp:3926] Current disk 
usage 79.71%. Max allowed age: 17.279577136390834hrs
 slave.sh[13336]: I1201 11:55:12.829792 13342 slave.cpp:3926] Current disk 
usage 79.71%. Max allowed age: 17.279577136390834hrs
 slave.sh[13336]: I1201 11:55:38.389614 13342 http.cpp:189] HTTP GET for 
/slave(1)/state from 192.168.100.1:64870 with User-Agent='Mozilla/5.0 (X11; 
Ubuntu; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0'
 docker[8409]: time="2015-12-01T11:55:38.934148017+08:00" level=info msg="GET 
/v1.20/containers/mesos-b25be32d-41e1-4e14-9b84-d33d733cef51-S3.79c206a6-d6b5-487b-9390-e09292c5b53a/json"
 docker[8409]: 

[jira] [Updated] (MESOS-4335) Investigate ubsan error in AnonymousTest.Running

2016-01-11 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4335:
--
Labels: mesosphere module newbie ubsan  (was: mesosphere module ubsan)

> Investigate ubsan error in AnonymousTest.Running
> 
>
> Key: MESOS-4335
> URL: https://issues.apache.org/jira/browse/MESOS-4335
> Project: Mesos
>  Issue Type: Task
>  Components: modules
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere, module, newbie, ubsan
>
> {noformat}
> [ RUN  ] AnonymousTest.Running
> /mesos-2/3rdparty/libprocess/include/process/owned.hpp:202:3: runtime error: 
> member call on address 0x0be1dcc0 which does not point to an object of 
> type 'Anonymous'
> 0x0be1dcc0: note: object is of type 'TestAnonymous'
>  00 00 00 00  30 50 f9 db 48 7f 00 00  53 54 5f 41 4e 4f 4e 59  4d 4f 55 53 
> 00 00 00 00  21 00 00 00
>   ^~~
>   vptr for 'TestAnonymous'
> #0 0xb85f4d in process::Owned::Data::~Data() 
> (/home/vagrant/build-mesos-2-ubsan/src/.libs/lt-mesos-tests+0xb85f4d)
> #1 0xb93d30 in 
> std::_Sp_counted_ptr (__gnu_cxx::_Lock_policy)2>::_M_dispose() 
> (/home/vagrant/build-mesos-2-ubsan/src/.libs/lt-mesos-tests+0xb93d30)
> #2 0xb05a4c in 
> std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() 
> /usr/include/c++/5.3.0/bits/shared_ptr_base.h:150
> #3 0xb01a5e in 
> std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() 
> /usr/include/c++/5.3.0/bits/shared_ptr_base.h:659
> #4 0xb4ee7a in 
> std::__shared_ptr (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() 
> (/home/vagrant/build-mesos-2-ubsan/src/.libs/lt-mesos-tests+0xb4ee7a)
> #5 0xb4eede in 
> std::shared_ptr::~shared_ptr()
>  (/home/vagrant/build-mesos-2-ubsan/src/.libs/lt-mesos-tests+0xb4eede)
> #6 0xb4ef42 in process::Owned::~Owned() 
> (/home/vagrant/build-mesos-2-ubsan/src/.libs/lt-mesos-tests+0xb4ef42)
> #7 0xb83e81 in void 
> std::_Destroy >(process::Owned*) 
> (/home/vagrant/build-mesos-2-ubsan/src/.libs/lt-mesos-tests+0xb83e81)
> #8 0xb7aa43 in void 
> std::_Destroy_aux::__destroy(process::Owned*,
>  process::Owned*) 
> (/home/vagrant/build-mesos-2-ubsan/src/.libs/lt-mesos-tests+0xb7aa43)
> #9 0xb70e77 in void 
> std::_Destroy(process::Owned*,
>  process::Owned*) 
> (/home/vagrant/build-mesos-2-ubsan/src/.libs/lt-mesos-tests+0xb70e77)
> #10 0xb6505b in void 
> std::_Destroy process::Owned 
> >(process::Owned*, 
> process::Owned*, 
> std::allocator&) 
> (/home/vagrant/build-mesos-2-ubsan/src/.libs/lt-mesos-tests+0xb6505b)
> #11 0xb57cbe in std::vector std::allocator >::~vector() 
> (/home/vagrant/build-mesos-2-ubsan/src/.libs/lt-mesos-tests+0xb57cbe)
> #12 0xb42097 in 
> mesos::internal::tests::AnonymousTest_Running_Test::TestBody() 
> /mesos-2/src/tests/anonymous_tests.cpp:71
> #13 0x2e397b1 in void 
> testing::internal::HandleSehExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) 
> gmock-1.7.0/gtest/src/gtest.cc:2078
> #14 0x2e29993 in void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) 
> gmock-1.7.0/gtest/src/gtest.cc:2114
> #15 0x2dc939d in testing::Test::Run() gmock-1.7.0/gtest/src/gtest.cc:2151
> #16 0x2dcb056 in testing::TestInfo::Run() 
> gmock-1.7.0/gtest/src/gtest.cc:2326
> #17 0x2dccb6a in testing::TestCase::Run() 
> gmock-1.7.0/gtest/src/gtest.cc:2444
> #18 0x2de6290 in testing::internal::UnitTestImpl::RunAllTests() 
> gmock-1.7.0/gtest/src/gtest.cc:4315
> #19 0x2e3bd7f in bool 
> testing::internal::HandleSehExceptionsInMethodIfSupported  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) 
> gmock-1.7.0/gtest/src/gtest.cc:2078
> #20 0x2e2bd67 in bool 
> testing::internal::HandleExceptionsInMethodIfSupported  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) 
> gmock-1.7.0/gtest/src/gtest.cc:2114
> #21 0x2ddf009 in testing::UnitTest::Run() 
> gmock-1.7.0/gtest/src/gtest.cc:3926
> #22 0x170b27b in RUN_ALL_TESTS() 
> ../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include/gtest/gtest.h:2288
> #23 0x170ab6d in main /mesos-2/src/tests/main.cpp:97
> #24 0x7f48df58760f in __libc_start_main (/usr/lib/libc.so.6+0x2060f)
> #25 0xaf54b8 in _start 
> 

[jira] [Updated] (MESOS-3558) Make the CommandExecutor use the Executor Library speaking HTTP

2016-01-11 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3558:
--
Target Version/s:   (was: 0.27.0)

> Make the CommandExecutor use the Executor Library speaking HTTP
> ---
>
> Key: MESOS-3558
> URL: https://issues.apache.org/jira/browse/MESOS-3558
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Instead of using the {{MesosExecutorDriver}} , we should make the 
> {{CommandExecutor}} in {{src/launcher/executor.cpp}} use the new Executor 
> HTTP Library that we create in {{MESOS-3550}}. 
> This would act as a good validation of the {{HTTP API}} implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4255) Add mechanism for testing recovery of HTTP based executors

2016-01-04 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-4255:
-

Assignee: Anand Mazumdar

> Add mechanism for testing recovery of HTTP based executors
> --
>
> Key: MESOS-4255
> URL: https://issues.apache.org/jira/browse/MESOS-4255
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, the slave process generates a process ID every time it is 
> initialized via {{process::ID::generate}} function call. This is a problem 
> for testing HTTP executors as it can't retry if there is a disconnection 
> after an agent restart since the prefix is incremented. 
> {code}
> Agent PID before:
> slave(1)@127.0.0.1:43915
> Agent PID after restart:
> slave(2)@127.0.0.1:43915
> {code}
> There are a couple of ways to fix this:
> - Add a constructor to {{Slave}} exclusively for testing that passes on a 
> fixed {{ID}} instead of relying on {{ID::generate}}.
> - Currently we delegate to slave(1)@ i.e. (1) when nothing is specified as 
> the URL in libprocess i.e. {{127.0.0.1:43915/api/v1/executor}} would delegate 
> to {{slave(1)@127.0.0.1:43915/api/v1/executor}}. Instead of defaulting to 
> (1), we can default to the last known active ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4255) Add mechanism for testing recovery of HTTP based executors

2016-01-04 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4255:
--
Shepherd: Vinod Kone
  Sprint: Mesosphere Sprint 26

> Add mechanism for testing recovery of HTTP based executors
> --
>
> Key: MESOS-4255
> URL: https://issues.apache.org/jira/browse/MESOS-4255
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, the slave process generates a process ID every time it is 
> initialized via {{process::ID::generate}} function call. This is a problem 
> for testing HTTP executors as it can't retry if there is a disconnection 
> after an agent restart since the prefix is incremented. 
> {code}
> Agent PID before:
> slave(1)@127.0.0.1:43915
> Agent PID after restart:
> slave(2)@127.0.0.1:43915
> {code}
> There are a couple of ways to fix this:
> - Add a constructor to {{Slave}} exclusively for testing that passes on a 
> fixed {{ID}} instead of relying on {{ID::generate}}.
> - Currently we delegate to slave(1)@ i.e. (1) when nothing is specified as 
> the URL in libprocess i.e. {{127.0.0.1:43915/api/v1/executor}} would delegate 
> to {{slave(1)@127.0.0.1:43915/api/v1/executor}}. Instead of defaulting to 
> (1), we can default to the last known active ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4261) Remove docker auth server flag

2015-12-30 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4261:
--
Target Version/s: 0.27.0
  Labels: mesosphere  (was: )

> Remove docker auth server flag
> --
>
> Key: MESOS-4261
> URL: https://issues.apache.org/jira/browse/MESOS-4261
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere
>
> We currently use a configured docker auth server from a slave flag to get 
> token auth for docker registry. However this doesn't work for private 
> registries as docker registry supports sending down the correct auth server 
> to contact.
> We should remove docker auth server flag completely and ask the docker 
> registry for auth server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4163) SlaveTest.HTTPSchedulerSlaveRestart is slow

2015-12-29 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4163:
--
Shepherd: Timothy Chen

> SlaveTest.HTTPSchedulerSlaveRestart is slow
> ---
>
> Key: MESOS-4163
> URL: https://issues.apache.org/jira/browse/MESOS-4163
> Project: Mesos
>  Issue Type: Improvement
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Assignee: Jian Qiu
>Priority: Minor
>  Labels: mesosphere, newbie++, tech-debt
>
> The {{SlaveTest.HTTPSchedulerSlaveRestart}} test takes more than {{2s}} to 
> finish on my Mac OS 10.10.4:
> {code}
> SlaveTest.HTTPSchedulerSlaveRestart (2307 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3817) Rename offers to outstanding offers

2015-12-29 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3817:
--
Assignee: Diego Gomes

> Rename offers to outstanding offers
> ---
>
> Key: MESOS-3817
> URL: https://issues.apache.org/jira/browse/MESOS-3817
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Reporter: haosdent
>Assignee: Diego Gomes
>  Labels: newbie
>
> As discussion in http://search-hadoop.com/m/0Vlr6NFAux1DPmxp , we need rename 
> offers to outstanding offers in webui to avoid user confuse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3817) Rename offers to outstanding offers

2015-12-29 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3817:
--
Assignee: (was: Diego Gomes)

> Rename offers to outstanding offers
> ---
>
> Key: MESOS-3817
> URL: https://issues.apache.org/jira/browse/MESOS-3817
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Reporter: haosdent
>  Labels: newbie
>
> As discussion in http://search-hadoop.com/m/0Vlr6NFAux1DPmxp , we need rename 
> offers to outstanding offers in webui to avoid user confuse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4163) SlaveTest.HTTPSchedulerSlaveRestart is slow

2015-12-28 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073357#comment-15073357
 ] 

Anand Mazumdar commented on MESOS-4163:
---

Thanks for the patch. Can you find a shepherd for this by email/IRC ?

http://mesos.apache.org/documentation/latest/submitting-a-patch/

> SlaveTest.HTTPSchedulerSlaveRestart is slow
> ---
>
> Key: MESOS-4163
> URL: https://issues.apache.org/jira/browse/MESOS-4163
> Project: Mesos
>  Issue Type: Improvement
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Assignee: Jian Qiu
>Priority: Minor
>  Labels: mesosphere, newbie++, tech-debt
>
> The {{SlaveTest.HTTPSchedulerSlaveRestart}} test takes more than {{2s}} to 
> finish on my Mac OS 10.10.4:
> {code}
> SlaveTest.HTTPSchedulerSlaveRestart (2307 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4255) Add mechanism for testing recovery of HTTP based executors

2015-12-28 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4255:
--
Description: 
Currently, the slave process generates a process ID every time it is 
initialized via {{process::ID::generate}} function call. This is a problem for 
testing HTTP executors as it can't retry if there is a disconnection after an 
agent restart since the prefix is incremented. 

{code}
Agent PID before:
slave(1)@127.0.0.1:43915

Agent PID after restart:
slave(2)@127.0.0.1:43915
{code}

There are a couple of ways to fix this:
- Add a constructor to {{Slave}} exclusively for testing that passes on a fixed 
{{ID}} instead of relying on {{ID::generate}}.
- Currently we delegate to slave(1)@ i.e. (1) when nothing is specified as the 
URL in libprocess i.e. {{127.0.0.1:43915/api/v1/executor}} would delegate to 
{{slave(1)@127.0.0.1:43915/api/v1/executor}}. Instead of defaulting to (1), we 
can default to the last known active ID.

  was:
Currently, the slave process generates a process ID every time it is 
initialized via {{process::ID::generate}} function call. This is a problem for 
testing HTTP executors as it can't retry if there is a disconnection after an 
agent restart since the prefix is incremented. 

{code}
Agent PID before:
slave(1)@127.0.0.1:43915

Agent PID after restart:
slave(2)@127.0.0.1:43915

There are a couple of ways to fix this:
- Add a constructor to {{Slave}} exclusively for testing that passes on a fixed 
{{ID}} instead of relying on {{ID::generate}}.
- Currently we delegate to slave(1)@ i.e. (1) when nothing is specified as the 
URL in libprocess i.e. {{127.0.0.1:43915/api/v1/executor}} would delegate to 
{{slave(1)@127.0.0.1:43915/api/v1/executor}}. Instead of defaulting to (1), we 
can default to the last known active ID.


> Add mechanism for testing recovery of HTTP based executors
> --
>
> Key: MESOS-4255
> URL: https://issues.apache.org/jira/browse/MESOS-4255
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, the slave process generates a process ID every time it is 
> initialized via {{process::ID::generate}} function call. This is a problem 
> for testing HTTP executors as it can't retry if there is a disconnection 
> after an agent restart since the prefix is incremented. 
> {code}
> Agent PID before:
> slave(1)@127.0.0.1:43915
> Agent PID after restart:
> slave(2)@127.0.0.1:43915
> {code}
> There are a couple of ways to fix this:
> - Add a constructor to {{Slave}} exclusively for testing that passes on a 
> fixed {{ID}} instead of relying on {{ID::generate}}.
> - Currently we delegate to slave(1)@ i.e. (1) when nothing is specified as 
> the URL in libprocess i.e. {{127.0.0.1:43915/api/v1/executor}} would delegate 
> to {{slave(1)@127.0.0.1:43915/api/v1/executor}}. Instead of defaulting to 
> (1), we can default to the last known active ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4255) Add mechanism for testing recovery of HTTP based executors

2015-12-28 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4255:
-

 Summary: Add mechanism for testing recovery of HTTP based executors
 Key: MESOS-4255
 URL: https://issues.apache.org/jira/browse/MESOS-4255
 Project: Mesos
  Issue Type: Bug
Reporter: Anand Mazumdar


Currently, the slave process generates a process ID every time it is 
initialized via {{process::ID::generate}} function call. This is a problem for 
testing HTTP executors as it can't retry if there is a disconnection after an 
agent restart since the prefix is incremented. 

{code}
Agent PID before:
slave(1)@127.0.0.1:43915

Agent PID after restart:
slave(2)@127.0.0.1:43915

There are a couple of ways to fix this:
- Add a constructor to {{Slave}} exclusively for testing that passes on a fixed 
{{ID}} instead of relying on {{ID::generate}}.
- Currently we delegate to slave(1)@ i.e. (1) when nothing is specified as the 
URL in libprocess i.e. {{127.0.0.1:43915/api/v1/executor}} would delegate to 
{{slave(1)@127.0.0.1:43915/api/v1/executor}}. Instead of defaulting to (1), we 
can default to the last known active ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4192) Add documentation for API Versioning

2015-12-21 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4192:
--
Story Points: 3

> Add documentation for API Versioning
> 
>
> Key: MESOS-4192
> URL: https://issues.apache.org/jira/browse/MESOS-4192
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: documentation, mesosphere
>
> Currently, we don't have any documentation for:
> - How Mesos implements API versioning ?
> - How are protobufs versioned and how does mesos handle them internally ?
> - What do contributors need to do when they make a change to a external user 
> facing protobuf ?
> The relevant design doc:
> https://docs.google.com/document/d/1-iQjo6778H_fU_1Zi_Yk6szg8qj-wqYgVgnx7u3h6OU/edit#heading=h.2gkbjz6amn7b



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4192) Add documentation for API Versioning

2015-12-21 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-4192:
-

Assignee: Anand Mazumdar

> Add documentation for API Versioning
> 
>
> Key: MESOS-4192
> URL: https://issues.apache.org/jira/browse/MESOS-4192
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: documentation, mesosphere
>
> Currently, we don't have any documentation for:
> - How Mesos implements API versioning ?
> - How are protobufs versioned and how does mesos handle them internally ?
> - What do contributors need to do when they make a change to a external user 
> facing protobuf ?
> The relevant design doc:
> https://docs.google.com/document/d/1-iQjo6778H_fU_1Zi_Yk6szg8qj-wqYgVgnx7u3h6OU/edit#heading=h.2gkbjz6amn7b



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4192) Add documentation for API Versioning

2015-12-21 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4192:
--
Shepherd: Vinod Kone

> Add documentation for API Versioning
> 
>
> Key: MESOS-4192
> URL: https://issues.apache.org/jira/browse/MESOS-4192
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: documentation, mesosphere
>
> Currently, we don't have any documentation for:
> - How Mesos implements API versioning ?
> - How are protobufs versioned and how does mesos handle them internally ?
> - What do contributors need to do when they make a change to a external user 
> facing protobuf ?
> The relevant design doc:
> https://docs.google.com/document/d/1-iQjo6778H_fU_1Zi_Yk6szg8qj-wqYgVgnx7u3h6OU/edit#heading=h.2gkbjz6amn7b



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4203) Document that disk resource limits are not enforced by default

2015-12-21 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066866#comment-15066866
 ] 

Anand Mazumdar commented on MESOS-4203:
---

Resolving this ticket since we already have documentation in two places as 
remarked by [~kaysoky] and [~jieyu].



> Document that disk resource limits are not enforced by default
> --
>
> Key: MESOS-4203
> URL: https://issues.apache.org/jira/browse/MESOS-4203
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, isolation
>Reporter: Neil Conway
>Assignee: Anand Mazumdar
>  Labels: isolation, mesosphere, persistent-volumes
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4198) Disk Resource Reservation is NOT Enforced for Persistent Volumes

2015-12-21 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4198:
--
Assignee: Jie Yu

> Disk Resource Reservation is NOT Enforced for Persistent Volumes
> 
>
> Key: MESOS-4198
> URL: https://issues.apache.org/jira/browse/MESOS-4198
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gabriel Hartmann
>Assignee: Jie Yu
>  Labels: isolation, mesosphere, persistent-volumes, reservations
>
> If I create a persistent volume on a reserved disk resource, I am able to 
> write data in excess of my reserved size.
> Disk resource reservation should be enforced just as "cpus" and "mem" 
> reservations are enforced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3851) Investigate recent crashes in Command Executor

2015-12-21 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3851:
--
Shepherd: Jie Yu  (was: Vinod Kone)

> Investigate recent crashes in Command Executor
> --
>
> Key: MESOS-3851
> URL: https://issues.apache.org/jira/browse/MESOS-3851
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to 
> support rootfs. There seem to be some tests showing frequent crashes due to 
> assert violations.
> {{FetcherCacheTest.SimpleEviction}} failed due to the following log:
> {code}
> I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to 
> executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at 
> executor(1)@172.17.5.200:33871'
> I1107 19:36:46.363682  1236 exec.cpp:297] 
> I1107 19:36:46.373569  1245 exec.cpp:210] Executor registered on slave 
> 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0
> @ 0x7f9f5a7db3fa  google::LogMessage::Fail()
> I1107 19:36:46.394081  1245 exec.cpp:222] Executor::registered took 395411ns
> @ 0x7f9f5a7db359  google::LogMessage::SendToLog()
> @ 0x7f9f5a7dad6a  google::LogMessage::Flush()
> @ 0x7f9f5a7dda9e  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a  _CheckFatal::~_CheckFatal()
> @   0x49c99d  
> mesos::internal::CommandExecutorProcess::launchTask()
> @   0x4b3dd7  
> _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_
> @   0x4c470c  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x7f9f5a761b1b  std::function<>::operator()()
> @ 0x7f9f5a749935  process::ProcessBase::visit()
> @ 0x7f9f5a74d700  process::DispatchEvent::visit()
> @   0x48e004  process::ProcessBase::serve()
> @ 0x7f9f5a745d21  process::ProcessManager::resume()
> @ 0x7f9f5a742f52  
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x7f9f5a74cf2c  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x7f9f5a74cedc  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x7f9f5a74ce6e  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x7f9f5a74cdc5  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x7f9f5a74cd5e  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x7f9f5624f1e0  (unknown)
> @ 0x7f9f564a8df5  start_thread
> @ 0x7f9f559b71ad  __clone
> I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container 
> '6553a617-6b4a-418d-9759-5681f45ff854' has exited
> I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container 
> '6553a617-6b4a-418d-9759-5681f45ff854'
> I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container 
> 'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited
> {code}
> The reason seems to be a race between the executor receiving a 
> {{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the 
> {{CHECK_SOME(executorInfo)}} failure.
> Link to complete log: 
> https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535
> Another related failure from {{ExamplesTest.PersistentVolumeFramework}}
> {code}
> @ 0x7f4f71529cbd  google::LogMessage::SendToLog()
> I1107 13:15:09.949987 31573 slave.cpp:2337] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> 721c7316-5580-4636-a83a-098e3bd4ed1f) for task 
> ad90531f-d3d8-43f6-96f2-c81c4548a12d of framework 
> ac4ea54a-7d19-4e41-9ee3-1a761f8e5b0f-
> @ 0x7f4f715296ce  google::LogMessage::Flush()
> @ 0x7f4f7152c402  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a  _CheckFatal::~_CheckFatal()
> @   0x49c99d  
> 

[jira] [Comment Edited] (MESOS-3851) Investigate recent crashes in Command Executor

2015-12-21 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045457#comment-15045457
 ] 

Anand Mazumdar edited comment on MESOS-3851 at 12/21/15 6:05 PM:
-

Patch to check if messages are delivered in order on the command executor: 
https://reviews.apache.org/r/40998/


was (Author: anandmazumdar):
Patch for bringing back the reverted commit : 
https://reviews.apache.org/r/40998/

> Investigate recent crashes in Command Executor
> --
>
> Key: MESOS-3851
> URL: https://issues.apache.org/jira/browse/MESOS-3851
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to 
> support rootfs. There seem to be some tests showing frequent crashes due to 
> assert violations.
> {{FetcherCacheTest.SimpleEviction}} failed due to the following log:
> {code}
> I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to 
> executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at 
> executor(1)@172.17.5.200:33871'
> I1107 19:36:46.363682  1236 exec.cpp:297] 
> I1107 19:36:46.373569  1245 exec.cpp:210] Executor registered on slave 
> 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0
> @ 0x7f9f5a7db3fa  google::LogMessage::Fail()
> I1107 19:36:46.394081  1245 exec.cpp:222] Executor::registered took 395411ns
> @ 0x7f9f5a7db359  google::LogMessage::SendToLog()
> @ 0x7f9f5a7dad6a  google::LogMessage::Flush()
> @ 0x7f9f5a7dda9e  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a  _CheckFatal::~_CheckFatal()
> @   0x49c99d  
> mesos::internal::CommandExecutorProcess::launchTask()
> @   0x4b3dd7  
> _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_
> @   0x4c470c  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x7f9f5a761b1b  std::function<>::operator()()
> @ 0x7f9f5a749935  process::ProcessBase::visit()
> @ 0x7f9f5a74d700  process::DispatchEvent::visit()
> @   0x48e004  process::ProcessBase::serve()
> @ 0x7f9f5a745d21  process::ProcessManager::resume()
> @ 0x7f9f5a742f52  
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x7f9f5a74cf2c  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x7f9f5a74cedc  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x7f9f5a74ce6e  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x7f9f5a74cdc5  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x7f9f5a74cd5e  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x7f9f5624f1e0  (unknown)
> @ 0x7f9f564a8df5  start_thread
> @ 0x7f9f559b71ad  __clone
> I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container 
> '6553a617-6b4a-418d-9759-5681f45ff854' has exited
> I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container 
> '6553a617-6b4a-418d-9759-5681f45ff854'
> I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container 
> 'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited
> {code}
> The reason seems to be a race between the executor receiving a 
> {{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the 
> {{CHECK_SOME(executorInfo)}} failure.
> Link to complete log: 
> https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535
> Another related failure from {{ExamplesTest.PersistentVolumeFramework}}
> {code}
> @ 0x7f4f71529cbd  google::LogMessage::SendToLog()
> I1107 13:15:09.949987 31573 slave.cpp:2337] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> 721c7316-5580-4636-a83a-098e3bd4ed1f) for task 
> ad90531f-d3d8-43f6-96f2-c81c4548a12d of 

[jira] [Commented] (MESOS-4198) Disk Resource Reservation is NOT Enforced

2015-12-18 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064579#comment-15064579
 ] 

Anand Mazumdar commented on MESOS-4198:
---

[~gabriel.hartm...@gmail.com] Mesos does not enforce container disk quota 
limits by default. You would need to use the {{posix/disk}} isolator via the 
{{--isolation}} flag and set {{enforce_container_disk_quota}}. 

You can also configure the {{--container_disk_watch_interval}} to a suitable 
value. 

Are you seeing this behavior even after setting the above options ?

> Disk Resource Reservation is NOT Enforced
> -
>
> Key: MESOS-4198
> URL: https://issues.apache.org/jira/browse/MESOS-4198
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gabriel Hartmann
>
> If I create a persistent volume on a reserved disk resource, I am able to 
> write data in excess of my reserved size.
> Disk resource reservation should be enforced just as "cpus" and "mem" 
> reservations are enforced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4192) Add documentation for API Versioning

2015-12-17 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4192:
-

 Summary: Add documentation for API Versioning
 Key: MESOS-4192
 URL: https://issues.apache.org/jira/browse/MESOS-4192
 Project: Mesos
  Issue Type: Bug
  Components: documentation
Reporter: Anand Mazumdar


Currently, we don't have any documentation for:

- How Mesos implements API versioning ?
- How are protobufs versioned and how does mesos handle them internally ?
- What do contributors need to do when they make a change to a external user 
facing protobuf ?

The relevant design doc:
https://docs.google.com/document/d/1-iQjo6778H_fU_1Zi_Yk6szg8qj-wqYgVgnx7u3h6OU/edit#heading=h.2gkbjz6amn7b




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4177) Create a user doc for Executor HTTP API

2015-12-16 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4177:
--
Sprint: Mesosphere Sprint 24

> Create a user doc for Executor HTTP API
> ---
>
> Key: MESOS-4177
> URL: https://issues.apache.org/jira/browse/MESOS-4177
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> We need a user doc similar to the corresponding one for the Scheduler HTTP 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4143) Reserve/UnReserve Dynamic Reservation Endpoints allow reservations on non-existing roles

2015-12-16 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4143:
--
  Sprint: Mesosphere Sprint 24
Story Points: 2
Target Version/s: 0.27.0
  Labels: mesosphere reservations  (was: )

> Reserve/UnReserve Dynamic Reservation Endpoints allow reservations on 
> non-existing roles
> 
>
> Key: MESOS-4143
> URL: https://issues.apache.org/jira/browse/MESOS-4143
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Affects Versions: 0.25.0, 0.26.0
>Reporter: John Omernik
>Assignee: Neil Conway
>  Labels: mesosphere, reservations
>
> When working with Dynamic reservations via the /reserve and /unreserve 
> endpoints, it is possible to reserve resources for roles that have not been 
> specified via the --roles flag on the master.  However, these roles are not 
> usable because the roles have not been defined, nor are they added to the 
> list of roles available. 
> Per the mailing list, changing roles after the fact is not possible at this 
> time. (That may be another JIRA), more importantly, the /reserve and 
> /unreserve end points should not allow reservation of roles not specified by 
> --roles.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4177) Create a user doc for Executor HTTP API

2015-12-15 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4177:
-

 Summary: Create a user doc for Executor HTTP API
 Key: MESOS-4177
 URL: https://issues.apache.org/jira/browse/MESOS-4177
 Project: Mesos
  Issue Type: Bug
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar


We need a user doc similar to the corresponding one for the Scheduler HTTP API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4153) process::Connection does not invoke the disconnected callback when remote process exits

2015-12-14 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4153:
--
Description: 
The {{disconnected}} callback is never invoked when a local/remote libprocess 
{{process}} terminates. Here is a sample test that shows that the returned 
future from {{disconnect()}} is never fulfilled.

{code}
TEST(HTTPConnectionTest, Disconnected)
{
  Option connection;
  Future disconnected;

  {
Http http;

http::URL url = http::URL(
  "http",
  http.process->self().address.ip,
  http.process->self().address.port,
  http.process->self().id + "/get");

Future connect = http::connect(url);

AWAIT_READY(connect);

connection = connect.get();

disconnected = connection->disconnected();
  }

  AWAIT_READY(disconnected);
}
{code}

The {{Http}} class refers to the one used in the tests:
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/tests/http_tests.cpp#L114

The following test passes i.e. when the client explicitly invokes 
{{disconnect()}}:
{code}
TEST(HTTPConnectionTest, Disconnected)
{
  Option connection;
  Future disconnected;

  {
Http http;

http::URL url = http::URL(
  "http",
  http.process->self().address.ip,
  http.process->self().address.port,
  http.process->self().id + "/get");

Future connect = http::connect(url);

AWAIT_READY(connect);

connection = connect.get();

disconnected = connection->disconnected();
  }

  AWAIT_READY(connection->disconnect());
  AWAIT_READY(disconnected);
}
{code}


  was:
The {{disconnected}} callback is never invoked when a local/remote libprocess 
{{process}} terminates. Here is a sample test that shows that the returned 
future from {{disconnect()}} is never fulfilled.

{code}
TEST(HTTPConnectionTest, Disconnected)
{
  Option connection;
  Future disconnected;

  {
Http http;

http::URL url = http::URL(
  "http",
  http.process->self().address.ip,
  http.process->self().address.port,
  http.process->self().id + "/get");

Future connect = http::connect(url);

AWAIT_READY(connect);

connection = connect.get();

disconnected = connection->disconnected();
  }

  AWAIT_READY(disconnected);
}
{code}

The {{Http}} class refers to the one used in the tests:
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/tests/http_tests.cpp#L114

The following test passes:
{code}
TEST(HTTPConnectionTest, Disconnected)
{
  Option connection;
  Future disconnected;

  {
Http http;

http::URL url = http::URL(
  "http",
  http.process->self().address.ip,
  http.process->self().address.port,
  http.process->self().id + "/get");

Future connect = http::connect(url);

AWAIT_READY(connect);

connection = connect.get();

disconnected = connection->disconnected();
  }

  AWAIT_READY(connection->disconnect());
  AWAIT_READY(disconnected);
}
{code}



> process::Connection does not invoke the disconnected callback when remote 
> process exits
> ---
>
> Key: MESOS-4153
> URL: https://issues.apache.org/jira/browse/MESOS-4153
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Anand Mazumdar
>  Labels: http
>
> The {{disconnected}} callback is never invoked when a local/remote libprocess 
> {{process}} terminates. Here is a sample test that shows that the returned 
> future from {{disconnect()}} is never fulfilled.
> {code}
> TEST(HTTPConnectionTest, Disconnected)
> {
>   Option connection;
>   Future disconnected;
>   {
> Http http;
> http::URL url = http::URL(
>   "http",
>   http.process->self().address.ip,
>   http.process->self().address.port,
>   http.process->self().id + "/get");
> Future connect = http::connect(url);
> AWAIT_READY(connect);
> connection = connect.get();
> disconnected = connection->disconnected();
>   }
>   AWAIT_READY(disconnected);
> }
> {code}
> The {{Http}} class refers to the one used in the tests:
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/tests/http_tests.cpp#L114
> The following test passes i.e. when the client explicitly invokes 
> {{disconnect()}}:
> {code}
> TEST(HTTPConnectionTest, Disconnected)
> {
>   Option connection;
>   Future disconnected;
>   {
> Http http;
> http::URL url = http::URL(
>   "http",
>   http.process->self().address.ip,
>   http.process->self().address.port,
>   http.process->self().id + "/get");
> Future connect = http::connect(url);
> AWAIT_READY(connect);
> connection = connect.get();
> disconnected = connection->disconnected();
>   }
>   AWAIT_READY(connection->disconnect());
>   AWAIT_READY(disconnected);
> }
> {code}



--
This message 

[jira] [Updated] (MESOS-4153) process::Connection does not invoke the disconnected callback when remote process exits

2015-12-14 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4153:
--
Description: 
The {{disconnected}} callback is never invoked when a local/remote libprocess 
{{process}} terminates. Here is a sample test that shows that the returned 
future from {{disconnect()}} is never fulfilled.

{code}
TEST(HTTPConnectionTest, Disconnected)
{
  Option connection;
  Future disconnected;

  {
Http http;

http::URL url = http::URL(
  "http",
  http.process->self().address.ip,
  http.process->self().address.port,
  http.process->self().id + "/get");

Future connect = http::connect(url);

AWAIT_READY(connect);

connection = connect.get();

disconnected = connection->disconnected();
  }

  AWAIT_READY(disconnected);
}
{code}

The {{Http}} class refers to the one used in the tests:
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/tests/http_tests.cpp#L114

The following test passes:
{code}
TEST(HTTPConnectionTest, Disconnected)
{
  Option connection;
  Future disconnected;

  {
Http http;

http::URL url = http::URL(
  "http",
  http.process->self().address.ip,
  http.process->self().address.port,
  http.process->self().id + "/get");

Future connect = http::connect(url);

AWAIT_READY(connect);

connection = connect.get();

disconnected = connection->disconnected();
  }

  AWAIT_READY(connection->disconnect());
  AWAIT_READY(disconnected);
}
{code}


  was:
The {{disconnected}} callback is never invoked when a local/remote libprocess 
{{process}} terminates. Here is a sample test that shows that the returned 
future from {{disconnect()}} is never fulfilled.

{code}
TEST(HTTPConnectionTest, Disconnected)
{
  Option connection;
  Future disconnected;

  {
Http http;

http::URL url = http::URL(
  "http",
  http.process->self().address.ip,
  http.process->self().address.port,
  http.process->self().id + "/get");

Future connect = http::connect(url);

AWAIT_READY(connect);

connection = connect.get();

disconnected = connection->disconnected();
  }

  AWAIT_READY(disconnected);
}
{code}

The {{Http}} class refers to the one used in the tests:
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/tests/http_tests.cpp#L114


> process::Connection does not invoke the disconnected callback when remote 
> process exits
> ---
>
> Key: MESOS-4153
> URL: https://issues.apache.org/jira/browse/MESOS-4153
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Anand Mazumdar
>  Labels: http
>
> The {{disconnected}} callback is never invoked when a local/remote libprocess 
> {{process}} terminates. Here is a sample test that shows that the returned 
> future from {{disconnect()}} is never fulfilled.
> {code}
> TEST(HTTPConnectionTest, Disconnected)
> {
>   Option connection;
>   Future disconnected;
>   {
> Http http;
> http::URL url = http::URL(
>   "http",
>   http.process->self().address.ip,
>   http.process->self().address.port,
>   http.process->self().id + "/get");
> Future connect = http::connect(url);
> AWAIT_READY(connect);
> connection = connect.get();
> disconnected = connection->disconnected();
>   }
>   AWAIT_READY(disconnected);
> }
> {code}
> The {{Http}} class refers to the one used in the tests:
> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/tests/http_tests.cpp#L114
> The following test passes:
> {code}
> TEST(HTTPConnectionTest, Disconnected)
> {
>   Option connection;
>   Future disconnected;
>   {
> Http http;
> http::URL url = http::URL(
>   "http",
>   http.process->self().address.ip,
>   http.process->self().address.port,
>   http.process->self().id + "/get");
> Future connect = http::connect(url);
> AWAIT_READY(connect);
> connection = connect.get();
> disconnected = connection->disconnected();
>   }
>   AWAIT_READY(connection->disconnect());
>   AWAIT_READY(disconnected);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4153) process::Connection does not invoke the disconnected callback when remote process exits

2015-12-14 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4153:
-

 Summary: process::Connection does not invoke the disconnected 
callback when remote process exits
 Key: MESOS-4153
 URL: https://issues.apache.org/jira/browse/MESOS-4153
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Reporter: Anand Mazumdar


The {{disconnected}} callback is never invoked when a local/remote libprocess 
{{process}} terminates. Here is a sample test that shows that the returned 
future from {{disconnect()}} is never fulfilled.

{code}
TEST(HTTPConnectionTest, Disconnected)
{
  Option connection;
  Future disconnected;

  {
Http http;

http::URL url = http::URL(
  "http",
  http.process->self().address.ip,
  http.process->self().address.port,
  http.process->self().id + "/get");

Future connect = http::connect(url);

AWAIT_READY(connect);

connection = connect.get();

disconnected = connection->disconnected();
  }

  AWAIT_READY(disconnected);
}
{code}

The {{Http}} class refers to the one used in the tests:
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/tests/http_tests.cpp#L114



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4109) HTTPConnectionTest.ClosingResponse is flaky

2015-12-11 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4109:
--
  Sprint: Mesosphere Sprint 24
Story Points: 1
  Labels: flaky flaky-test mesosphere newbie test  (was: flaky 
flaky-test newbie test)

> HTTPConnectionTest.ClosingResponse is flaky
> ---
>
> Key: MESOS-4109
> URL: https://issues.apache.org/jira/browse/MESOS-4109
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, test
>Affects Versions: 0.26.0
> Environment: ASF Ubuntu 14 
> {{--enable-ssl --enable-libevent}}
>Reporter: Joseph Wu
>Assignee: Benjamin Mahler
>Priority: Minor
>  Labels: flaky, flaky-test, mesosphere, newbie, test
> Fix For: 0.27.0
>
>
> Output of the test:
> {code}
> [ RUN  ] HTTPConnectionTest.ClosingResponse
> I1210 01:20:27.048532 26671 process.cpp:3077] Handling HTTP event for process 
> '(22)' with path: '/(22)/get'
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:919: Failure
> Actual function call count doesn't match EXPECT_CALL(*http.process, get(_))...
>  Expected: to be called twice
>Actual: called once - unsatisfied and active
> [  FAILED  ] HTTPConnectionTest.ClosingResponse (43 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3032) Document containerizer launch

2015-12-07 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3032:
--
Shepherd: Jie Yu  (was: Timothy Chen)

> Document containerizer launch 
> --
>
> Key: MESOS-3032
> URL: https://issues.apache.org/jira/browse/MESOS-3032
> Project: Mesos
>  Issue Type: Documentation
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>Priority: Minor
>  Labels: docathon, documentation, mesosphere
>
> We currently dont have enough documentation for the containerizer component. 
> This task adds documentation for containerizer launch sequence.
> The mail goals are:
> - Have diagrams (state, sequence, class etc) depicting the containerizer 
> launch process.
> - Make the documentation newbie friendly.
> - Usable for future design discussions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4029) ContentType/SchedulerTest is flaky.

2015-12-07 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4029:
--
Sprint: Mesosphere Sprint 23  (was: Mesosphere Sprint 23, Mesosphere Sprint 
24)

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Anand Mazumdar
>  Labels: flaky, flaky-test, mesosphere
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x2b5495b81a40 (unknown)
> @ 0x2b54962f8182 start_thread
> @ 0x2b549660847d (unknown)
> make[3]: *** [check-local] Segmentation fault
> make[3]: Leaving directory `/home/vagrant/mesos/build/src'
> make[2]: *** [check-am] Error 2
> make[2]: Leaving directory 

[jira] [Updated] (MESOS-3515) Support Subscribe Call for HTTP based Executors

2015-12-07 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3515:
--
Story Points: 5  (was: 3)

> Support Subscribe Call for HTTP based Executors
> ---
>
> Key: MESOS-3515
> URL: https://issues.apache.org/jira/browse/MESOS-3515
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> We need to add a {{subscribe(...)}} method in {{src/slave/slave.cpp}} to 
> introduce the ability for HTTP based executors to subscribe and then receive 
> events on the persistent HTTP connection. Most of the functionality needed 
> would be similar to {{Master::subscribe}} in {{src/master/master.cpp}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3851) Investigate recent crashes in Command Executor

2015-12-07 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045457#comment-15045457
 ] 

Anand Mazumdar commented on MESOS-3851:
---

Patch for bringing back the reverted commit : 
https://reviews.apache.org/r/40998/

> Investigate recent crashes in Command Executor
> --
>
> Key: MESOS-3851
> URL: https://issues.apache.org/jira/browse/MESOS-3851
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to 
> support rootfs. There seem to be some tests showing frequent crashes due to 
> assert violations.
> {{FetcherCacheTest.SimpleEviction}} failed due to the following log:
> {code}
> I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to 
> executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at 
> executor(1)@172.17.5.200:33871'
> I1107 19:36:46.363682  1236 exec.cpp:297] 
> I1107 19:36:46.373569  1245 exec.cpp:210] Executor registered on slave 
> 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0
> @ 0x7f9f5a7db3fa  google::LogMessage::Fail()
> I1107 19:36:46.394081  1245 exec.cpp:222] Executor::registered took 395411ns
> @ 0x7f9f5a7db359  google::LogMessage::SendToLog()
> @ 0x7f9f5a7dad6a  google::LogMessage::Flush()
> @ 0x7f9f5a7dda9e  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a  _CheckFatal::~_CheckFatal()
> @   0x49c99d  
> mesos::internal::CommandExecutorProcess::launchTask()
> @   0x4b3dd7  
> _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_
> @   0x4c470c  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x7f9f5a761b1b  std::function<>::operator()()
> @ 0x7f9f5a749935  process::ProcessBase::visit()
> @ 0x7f9f5a74d700  process::DispatchEvent::visit()
> @   0x48e004  process::ProcessBase::serve()
> @ 0x7f9f5a745d21  process::ProcessManager::resume()
> @ 0x7f9f5a742f52  
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x7f9f5a74cf2c  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x7f9f5a74cedc  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x7f9f5a74ce6e  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x7f9f5a74cdc5  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x7f9f5a74cd5e  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x7f9f5624f1e0  (unknown)
> @ 0x7f9f564a8df5  start_thread
> @ 0x7f9f559b71ad  __clone
> I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container 
> '6553a617-6b4a-418d-9759-5681f45ff854' has exited
> I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container 
> '6553a617-6b4a-418d-9759-5681f45ff854'
> I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container 
> 'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited
> {code}
> The reason seems to be a race between the executor receiving a 
> {{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the 
> {{CHECK_SOME(executorInfo)}} failure.
> Link to complete log: 
> https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535
> Another related failure from {{ExamplesTest.PersistentVolumeFramework}}
> {code}
> @ 0x7f4f71529cbd  google::LogMessage::SendToLog()
> I1107 13:15:09.949987 31573 slave.cpp:2337] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> 721c7316-5580-4636-a83a-098e3bd4ed1f) for task 
> ad90531f-d3d8-43f6-96f2-c81c4548a12d of framework 
> ac4ea54a-7d19-4e41-9ee3-1a761f8e5b0f-
> @ 0x7f4f715296ce  google::LogMessage::Flush()
> @ 0x7f4f7152c402  google::LogMessageFatal::~LogMessageFatal()
> @   

[jira] [Updated] (MESOS-4002) ReservationEndpointsTest.UnreserveAvailableAndOfferedResources is flaky

2015-12-07 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4002:
--
Story Points: 1

> ReservationEndpointsTest.UnreserveAvailableAndOfferedResources is flaky
> ---
>
> Key: MESOS-4002
> URL: https://issues.apache.org/jira/browse/MESOS-4002
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: flaky-test, mesosphere, reservations
> Fix For: 0.27.0
>
>
> Showed up on ASF CI: ( test kept looping on and on and ultimately failing the 
> build after 300 minutes )
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/1269/changes
> {code}
> [ RUN  ] ReservationEndpointsTest.UnreserveAvailableAndOfferedResources
> I1124 01:07:20.050729 30260 leveldb.cpp:174] Opened db in 107.434842ms
> I1124 01:07:20.099630 30260 leveldb.cpp:181] Compacted db in 48.82312ms
> I1124 01:07:20.099722 30260 leveldb.cpp:196] Created db iterator in 29905ns
> I1124 01:07:20.099738 30260 leveldb.cpp:202] Seeked to beginning of db in 
> 3145ns
> I1124 01:07:20.099750 30260 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 279ns
> I1124 01:07:20.099804 30260 replica.cpp:778] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1124 01:07:20.100637 30292 recover.cpp:447] Starting replica recovery
> I1124 01:07:20.100934 30292 recover.cpp:473] Replica is in EMPTY status
> I1124 01:07:20.103240 30288 replica.cpp:674] Replica in EMPTY status received 
> a broadcasted recover request from (6305)@172.17.18.107:37993
> I1124 01:07:20.103672 30292 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I1124 01:07:20.104142 30292 recover.cpp:564] Updating replica status to 
> STARTING
> I1124 01:07:20.114534 30284 master.cpp:365] Master 
> ad27bc60-16d1-4239-9a65-235a991f9600 (9f2f81738d5e) started on 
> 172.17.18.107:37993
> I1124 01:07:20.114558 30284 master.cpp:367] Flags at startup: --acls="" 
> --allocation_interval="1000secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/I60I5f/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" --roles="role" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/I60I5f/master" --zk_session_timeout="10secs"
> I1124 01:07:20.114809 30284 master.cpp:412] Master only allowing 
> authenticated frameworks to register
> I1124 01:07:20.114820 30284 master.cpp:417] Master only allowing 
> authenticated slaves to register
> I1124 01:07:20.114825 30284 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/I60I5f/credentials'
> I1124 01:07:20.115067 30284 master.cpp:456] Using default 'crammd5' 
> authenticator
> I1124 01:07:20.115320 30284 master.cpp:493] Authorization enabled
> I1124 01:07:20.115792 30285 hierarchical.cpp:162] Initialized hierarchical 
> allocator process
> I1124 01:07:20.115855 30285 whitelist_watcher.cpp:77] No whitelist given
> I1124 01:07:20.118755 30285 master.cpp:1625] The newly elected leader is 
> master@172.17.18.107:37993 with id ad27bc60-16d1-4239-9a65-235a991f9600
> I1124 01:07:20.118788 30285 master.cpp:1638] Elected as the leading master!
> I1124 01:07:20.118809 30285 master.cpp:1383] Recovering from registrar
> I1124 01:07:20.119078 30285 registrar.cpp:307] Recovering registrar
> I1124 01:07:20.143256 30292 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 38.787419ms
> I1124 01:07:20.143347 30292 replica.cpp:321] Persisted replica status to 
> STARTING
> I1124 01:07:20.143717 30292 recover.cpp:473] Replica is in STARTING status
> I1124 01:07:20.145454 30286 replica.cpp:674] Replica in STARTING status 
> received a broadcasted recover request from (6307)@172.17.18.107:37993
> I1124 01:07:20.145979 30292 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I1124 01:07:20.146654 30292 recover.cpp:564] Updating replica status to VOTING
> I1124 01:07:20.182672 30286 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 35.422256ms
> I1124 01:07:20.182747 30286 replica.cpp:321] Persisted replica status to 
> VOTING
> I1124 

[jira] [Commented] (MESOS-4029) ContentType/SchedulerTest is flaky.

2015-12-01 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034974#comment-15034974
 ] 

Anand Mazumdar commented on MESOS-4029:
---

The culprit is this: 
https://github.com/apache/mesos/blob/master/src/scheduler/scheduler.cpp#L260 
We pass the {{Callbacks}} mock object by reference and not by value. Since we 
do an {{async}} , the call is queued on another thread but it does not ensure 
that it is invoked before the object is destroyed. Hence, we might invoke the 
{{received}} callback even after the original {{Callbacks}} object is destroyed.

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Anand Mazumdar
>  Labels: flaky, flaky-test, mesosphere
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> 

[jira] [Created] (MESOS-4026) RegistryClientTest.SimpleRegistryPuller is flaky

2015-11-30 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4026:
-

 Summary: RegistryClientTest.SimpleRegistryPuller is flaky
 Key: MESOS-4026
 URL: https://issues.apache.org/jira/browse/MESOS-4026
 Project: Mesos
  Issue Type: Bug
Reporter: Anand Mazumdar


>From ASF CI:
https://builds.apache.org/job/Mesos/1289/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=centos:7,label_exp=docker%7C%7CHadoop/console

{code}
[ RUN  ] RegistryClientTest.SimpleRegistryPuller
I1127 02:51:40.235900   362 registry_client.cpp:511] Response status for url 
'https://localhost:57828/v2/library/busybox/manifests/latest': 401 Unauthorized
I1127 02:51:40.249766   360 registry_client.cpp:511] Response status for url 
'https://localhost:57828/v2/library/busybox/manifests/latest': 200 OK
I1127 02:51:40.251137   361 registry_puller.cpp:195] Downloading layer 
'1ce2e90b0bc7224de3db1f0d646fe8e2c4dd37f1793928287f6074bc451a57ea' for image 
'busybox:latest'
I1127 02:51:40.258514   354 registry_client.cpp:511] Response status for url 
'https://localhost:57828/v2/library/busybox/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4':
 307 Temporary Redirect
I1127 02:51:40.264171   367 libevent_ssl_socket.cpp:1023] Socket error: 
Connection reset by peer
../../src/tests/containerizer/provisioner_docker_tests.cpp:1210: Failure
(socket).failure(): Failed accept: connection error: Connection reset by peer
[  FAILED  ] RegistryClientTest.SimpleRegistryPuller (349 ms)
{code}

Logs from a previous run that passed:
{code}
[ RUN  ] RegistryClientTest.SimpleRegistryPuller
I1126 18:49:05.306396   349 registry_client.cpp:511] Response status for url 
'https://localhost:53492/v2/library/busybox/manifests/latest': 401 Unauthorized
I1126 18:49:05.321362   347 registry_client.cpp:511] Response status for url 
'https://localhost:53492/v2/library/busybox/manifests/latest': 200 OK
I1126 18:49:05.322720   352 registry_puller.cpp:195] Downloading layer 
'1ce2e90b0bc7224de3db1f0d646fe8e2c4dd37f1793928287f6074bc451a57ea' for image 
'busybox:latest'
I1126 18:49:05.331317   350 registry_client.cpp:511] Response status for url 
'https://localhost:53492/v2/library/busybox/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4':
 307 Temporary Redirect
I1126 18:49:05.370625   352 registry_client.cpp:511] Response status for url 
'https://127.0.0.1:53492/': 200 OK
I1126 18:49:05.372102   355 registry_puller.cpp:294] Untarring layer 
'1ce2e90b0bc7224de3db1f0d646fe8e2c4dd37f1793928287f6074bc451a57ea' downloaded 
from registry to directory 'output_dir'
[   OK ] RegistryClientTest.SimpleRegistryPuller (353 ms)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4029) ContentType/SchedulerTest is flaky.

2015-11-30 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4029:
--
  Sprint: Mesosphere Sprint 23
Story Points: 2
  Labels: flaky flaky-test mesosphere  (was: flaky flaky-test)

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Anand Mazumdar
>  Labels: flaky, flaky-test, mesosphere
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x2b5495b81a40 (unknown)
> @ 0x2b54962f8182 start_thread
> @ 0x2b549660847d (unknown)
> make[3]: *** [check-local] Segmentation fault
> make[3]: Leaving directory `/home/vagrant/mesos/build/src'
> make[2]: *** [check-am] Error 2
> 

[jira] [Commented] (MESOS-3773) RegistryClientTest.SimpleGetBlob is flaky

2015-11-30 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033049#comment-15033049
 ] 

Anand Mazumdar commented on MESOS-3773:
---

Re-opening:
https://builds.apache.org/job/Mesos/1296/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console

{code}
[ RUN  ] RegistryClientTest.SimpleGetBlob
2015-12-01 
02:33:02,566:31760(0x2b94b0200700):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:41913] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
I1201 02:33:02.603672 31781 registry_client.cpp:511] Response status for url 
'https://localhost:41372/v2/library/blob/blobs/digest': 401 Unauthorized
I1201 02:33:02.621564 31786 registry_client.cpp:511] Response status for url 
'https://localhost:41372/v2/library/blob/blobs/digest': 307 Temporary Redirect
I1201 02:33:02.628077 31791 registry_client.cpp:511] Response status for url 
'https://127.0.0.1:41372/': 200 OK
../../src/tests/containerizer/provisioner_docker_tests.cpp:939: Failure
Value of: blobResponse
  Actual: "2015-12-29 03:49:20.648481088+00:00"
Expected: blob.get()
Which is: 
"\x15\x3\x3\00A\x9E\xEE\vrz\xDA\xC6$z\xE6\xEC\b\f8\xCB\x93\xD9\xA3\xEFv\x9E\xEA\x99\xEB\x1F\x9C:Ic#8C\x1\xC4\xF3\xC3\xCB\xB1\x17\xBE\x87\x1B/\xE7y->2015-12-29
 03:49:20.648481088+00:00"
[  FAILED  ] RegistryClientTest.SimpleGetBlob (385 ms)
{code}

> RegistryClientTest.SimpleGetBlob is flaky
> -
>
> Key: MESOS-3773
> URL: https://issues.apache.org/jira/browse/MESOS-3773
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Joseph Wu
>Assignee: Jojy Varghese
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> {{RegistryClientTest.SimpleGetBlob}} fails about 1/5 times.  This was 
> encountered on OSX.
> {code:title=Repro}
> bin/mesos-tests.sh --gtest_filter="*RegistryClientTest.SimpleGetBlob*" 
> --gtest_repeat=10 --gtest_break_on_failure
> {code}
> {code:title=Example Failure}
> [ RUN  ] RegistryClientTest.SimpleGetBlob
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:946: Failure
> Value of: blobResponse
>   Actual: "2015-10-20 20:58:59.579393024+00:00"
> Expected: blob.get()
> Which is: 
> "\x15\x3\x3\00(P~\xCA&\xC6<\x4\x16\xE\xB2\xFF\b1a\xB9Z{\xE0\x80\xDA`\xBCt\x5R\x81x6\xF8
>  \x8B{\xA8\xA9\x4\xAB\xB6" "E\xE6\xDE\xCF\xD9*\xCC!\xC2\x15" "2015-10-20 
> 20:58:59.579393024+00:00"
> *** Aborted at 1445374739 (unix time) try "date -d @1445374739" if you are 
> using GNU date ***
> PC: @0x103144ddc testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 49008 (TID 0x7fff73ca3300) stack trace: ***
> @ 0x7fff8c58af1a _sigtramp
> @ 0x7fff8386e187 malloc
> @0x1031445b7 testing::internal::AssertHelper::operator=()
> @0x1030d32e0 
> mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody()
> @0x1030d3562 
> mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody()
> @0x1031ac8f3 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x103192f87 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x1031533f5 testing::Test::Run()
> @0x10315493b testing::TestInfo::Run()
> @0x1031555f7 testing::TestCase::Run()
> @0x103163df3 testing::internal::UnitTestImpl::RunAllTests()
> @0x1031af8c3 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x103195397 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x1031639f2 testing::UnitTest::Run()
> @0x1025abd41 RUN_ALL_TESTS()
> @0x1025a8089 main
> @ 0x7fff86b155c9 start
> {code}
> {code:title=Less common failure}
> [ RUN  ] RegistryClientTest.SimpleGetBlob
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:926: Failure
> (socket).failure(): Failed accept: connection error: 
> error::lib(0):func(0):reason(0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4014) Introduce delete/remove endpoint for quota

2015-11-25 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4014:
--
Summary: Introduce delete/remove endpoint for quota  (was: Introduce 
DELETE/remove endpoint for quota)

> Introduce delete/remove endpoint for quota
> --
>
> Key: MESOS-4014
> URL: https://issues.apache.org/jira/browse/MESOS-4014
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Alexander Rukletsov
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> This endpoint is for removing quotas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4014) Introduce delete/remove endpoint for quota

2015-11-25 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4014:
--
Description: This endpoint is for removing quotas via the DELETE method.  
(was: This endpoint is for removing quotas.)

> Introduce delete/remove endpoint for quota
> --
>
> Key: MESOS-4014
> URL: https://issues.apache.org/jira/browse/MESOS-4014
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Alexander Rukletsov
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> This endpoint is for removing quotas via the DELETE method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4002) ReservationEndpointsTest.UnreserveAvailableAndOfferedResources is flaky

2015-11-24 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4002:
-

 Summary: 
ReservationEndpointsTest.UnreserveAvailableAndOfferedResources is flaky
 Key: MESOS-4002
 URL: https://issues.apache.org/jira/browse/MESOS-4002
 Project: Mesos
  Issue Type: Bug
Reporter: Anand Mazumdar


Showed up on ASF CI: ( test kept looping on and on and ultimately failing the 
build after 300 minutes )
https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/1269/changes

{code}
[ RUN  ] ReservationEndpointsTest.UnreserveAvailableAndOfferedResources
I1124 01:07:20.050729 30260 leveldb.cpp:174] Opened db in 107.434842ms
I1124 01:07:20.099630 30260 leveldb.cpp:181] Compacted db in 48.82312ms
I1124 01:07:20.099722 30260 leveldb.cpp:196] Created db iterator in 29905ns
I1124 01:07:20.099738 30260 leveldb.cpp:202] Seeked to beginning of db in 3145ns
I1124 01:07:20.099750 30260 leveldb.cpp:271] Iterated through 0 keys in the db 
in 279ns
I1124 01:07:20.099804 30260 replica.cpp:778] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 01:07:20.100637 30292 recover.cpp:447] Starting replica recovery
I1124 01:07:20.100934 30292 recover.cpp:473] Replica is in EMPTY status
I1124 01:07:20.103240 30288 replica.cpp:674] Replica in EMPTY status received a 
broadcasted recover request from (6305)@172.17.18.107:37993
I1124 01:07:20.103672 30292 recover.cpp:193] Received a recover response from a 
replica in EMPTY status
I1124 01:07:20.104142 30292 recover.cpp:564] Updating replica status to STARTING
I1124 01:07:20.114534 30284 master.cpp:365] Master 
ad27bc60-16d1-4239-9a65-235a991f9600 (9f2f81738d5e) started on 
172.17.18.107:37993
I1124 01:07:20.114558 30284 master.cpp:367] Flags at startup: --acls="" 
--allocation_interval="1000secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/I60I5f/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" --roles="role" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
--work_dir="/tmp/I60I5f/master" --zk_session_timeout="10secs"
I1124 01:07:20.114809 30284 master.cpp:412] Master only allowing authenticated 
frameworks to register
I1124 01:07:20.114820 30284 master.cpp:417] Master only allowing authenticated 
slaves to register
I1124 01:07:20.114825 30284 credentials.hpp:35] Loading credentials for 
authentication from '/tmp/I60I5f/credentials'
I1124 01:07:20.115067 30284 master.cpp:456] Using default 'crammd5' 
authenticator
I1124 01:07:20.115320 30284 master.cpp:493] Authorization enabled
I1124 01:07:20.115792 30285 hierarchical.cpp:162] Initialized hierarchical 
allocator process
I1124 01:07:20.115855 30285 whitelist_watcher.cpp:77] No whitelist given
I1124 01:07:20.118755 30285 master.cpp:1625] The newly elected leader is 
master@172.17.18.107:37993 with id ad27bc60-16d1-4239-9a65-235a991f9600
I1124 01:07:20.118788 30285 master.cpp:1638] Elected as the leading master!
I1124 01:07:20.118809 30285 master.cpp:1383] Recovering from registrar
I1124 01:07:20.119078 30285 registrar.cpp:307] Recovering registrar
I1124 01:07:20.143256 30292 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 38.787419ms
I1124 01:07:20.143347 30292 replica.cpp:321] Persisted replica status to 
STARTING
I1124 01:07:20.143717 30292 recover.cpp:473] Replica is in STARTING status
I1124 01:07:20.145454 30286 replica.cpp:674] Replica in STARTING status 
received a broadcasted recover request from (6307)@172.17.18.107:37993
I1124 01:07:20.145979 30292 recover.cpp:193] Received a recover response from a 
replica in STARTING status
I1124 01:07:20.146654 30292 recover.cpp:564] Updating replica status to VOTING
I1124 01:07:20.182672 30286 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 35.422256ms
I1124 01:07:20.182747 30286 replica.cpp:321] Persisted replica status to VOTING
I1124 01:07:20.182929 30286 recover.cpp:578] Successfully joined the Paxos group
I1124 01:07:20.183115 30286 recover.cpp:462] Recover process terminated
I1124 01:07:20.183831 30286 log.cpp:659] Attempting to start the writer
I1124 01:07:20.185907 30285 replica.cpp:494] Replica received implicit promise 
request from (6308)@172.17.18.107:37993 with proposal 1
I1124 01:07:20.225256 30285 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 39.291288ms
I1124 01:07:20.225344 30285 

[jira] [Commented] (MESOS-4002) ReservationEndpointsTest.UnreserveAvailableAndOfferedResources is flaky

2015-11-24 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025269#comment-15025269
 ] 

Anand Mazumdar commented on MESOS-4002:
---

Partial Fix: https://reviews.apache.org/r/40667/

Still does not answer why the test kept looping for 300 minutes. There might be 
some future that was never ready and we invoked a .get() blocking forever or a 
deadlock somewhere else. Would keep digging for the root cause and for future 
failures on the ASF CI leading to more clues.

> ReservationEndpointsTest.UnreserveAvailableAndOfferedResources is flaky
> ---
>
> Key: MESOS-4002
> URL: https://issues.apache.org/jira/browse/MESOS-4002
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: flaky-test, mesosphere, reservations
>
> Showed up on ASF CI: ( test kept looping on and on and ultimately failing the 
> build after 300 minutes )
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/1269/changes
> {code}
> [ RUN  ] ReservationEndpointsTest.UnreserveAvailableAndOfferedResources
> I1124 01:07:20.050729 30260 leveldb.cpp:174] Opened db in 107.434842ms
> I1124 01:07:20.099630 30260 leveldb.cpp:181] Compacted db in 48.82312ms
> I1124 01:07:20.099722 30260 leveldb.cpp:196] Created db iterator in 29905ns
> I1124 01:07:20.099738 30260 leveldb.cpp:202] Seeked to beginning of db in 
> 3145ns
> I1124 01:07:20.099750 30260 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 279ns
> I1124 01:07:20.099804 30260 replica.cpp:778] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1124 01:07:20.100637 30292 recover.cpp:447] Starting replica recovery
> I1124 01:07:20.100934 30292 recover.cpp:473] Replica is in EMPTY status
> I1124 01:07:20.103240 30288 replica.cpp:674] Replica in EMPTY status received 
> a broadcasted recover request from (6305)@172.17.18.107:37993
> I1124 01:07:20.103672 30292 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I1124 01:07:20.104142 30292 recover.cpp:564] Updating replica status to 
> STARTING
> I1124 01:07:20.114534 30284 master.cpp:365] Master 
> ad27bc60-16d1-4239-9a65-235a991f9600 (9f2f81738d5e) started on 
> 172.17.18.107:37993
> I1124 01:07:20.114558 30284 master.cpp:367] Flags at startup: --acls="" 
> --allocation_interval="1000secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/I60I5f/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" --roles="role" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/I60I5f/master" --zk_session_timeout="10secs"
> I1124 01:07:20.114809 30284 master.cpp:412] Master only allowing 
> authenticated frameworks to register
> I1124 01:07:20.114820 30284 master.cpp:417] Master only allowing 
> authenticated slaves to register
> I1124 01:07:20.114825 30284 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/I60I5f/credentials'
> I1124 01:07:20.115067 30284 master.cpp:456] Using default 'crammd5' 
> authenticator
> I1124 01:07:20.115320 30284 master.cpp:493] Authorization enabled
> I1124 01:07:20.115792 30285 hierarchical.cpp:162] Initialized hierarchical 
> allocator process
> I1124 01:07:20.115855 30285 whitelist_watcher.cpp:77] No whitelist given
> I1124 01:07:20.118755 30285 master.cpp:1625] The newly elected leader is 
> master@172.17.18.107:37993 with id ad27bc60-16d1-4239-9a65-235a991f9600
> I1124 01:07:20.118788 30285 master.cpp:1638] Elected as the leading master!
> I1124 01:07:20.118809 30285 master.cpp:1383] Recovering from registrar
> I1124 01:07:20.119078 30285 registrar.cpp:307] Recovering registrar
> I1124 01:07:20.143256 30292 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 38.787419ms
> I1124 01:07:20.143347 30292 replica.cpp:321] Persisted replica status to 
> STARTING
> I1124 01:07:20.143717 30292 recover.cpp:473] Replica is in STARTING status
> I1124 01:07:20.145454 30286 replica.cpp:674] Replica in STARTING status 
> received a broadcasted recover request from (6307)@172.17.18.107:37993
> I1124 01:07:20.145979 30292 recover.cpp:193] Received a recover response from 

[jira] [Assigned] (MESOS-4002) ReservationEndpointsTest.UnreserveAvailableAndOfferedResources is flaky

2015-11-24 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-4002:
-

Assignee: Anand Mazumdar

> ReservationEndpointsTest.UnreserveAvailableAndOfferedResources is flaky
> ---
>
> Key: MESOS-4002
> URL: https://issues.apache.org/jira/browse/MESOS-4002
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: flaky-test, mesosphere, reservations
>
> Showed up on ASF CI: ( test kept looping on and on and ultimately failing the 
> build after 300 minutes )
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/1269/changes
> {code}
> [ RUN  ] ReservationEndpointsTest.UnreserveAvailableAndOfferedResources
> I1124 01:07:20.050729 30260 leveldb.cpp:174] Opened db in 107.434842ms
> I1124 01:07:20.099630 30260 leveldb.cpp:181] Compacted db in 48.82312ms
> I1124 01:07:20.099722 30260 leveldb.cpp:196] Created db iterator in 29905ns
> I1124 01:07:20.099738 30260 leveldb.cpp:202] Seeked to beginning of db in 
> 3145ns
> I1124 01:07:20.099750 30260 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 279ns
> I1124 01:07:20.099804 30260 replica.cpp:778] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1124 01:07:20.100637 30292 recover.cpp:447] Starting replica recovery
> I1124 01:07:20.100934 30292 recover.cpp:473] Replica is in EMPTY status
> I1124 01:07:20.103240 30288 replica.cpp:674] Replica in EMPTY status received 
> a broadcasted recover request from (6305)@172.17.18.107:37993
> I1124 01:07:20.103672 30292 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I1124 01:07:20.104142 30292 recover.cpp:564] Updating replica status to 
> STARTING
> I1124 01:07:20.114534 30284 master.cpp:365] Master 
> ad27bc60-16d1-4239-9a65-235a991f9600 (9f2f81738d5e) started on 
> 172.17.18.107:37993
> I1124 01:07:20.114558 30284 master.cpp:367] Flags at startup: --acls="" 
> --allocation_interval="1000secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/I60I5f/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" --roles="role" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/I60I5f/master" --zk_session_timeout="10secs"
> I1124 01:07:20.114809 30284 master.cpp:412] Master only allowing 
> authenticated frameworks to register
> I1124 01:07:20.114820 30284 master.cpp:417] Master only allowing 
> authenticated slaves to register
> I1124 01:07:20.114825 30284 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/I60I5f/credentials'
> I1124 01:07:20.115067 30284 master.cpp:456] Using default 'crammd5' 
> authenticator
> I1124 01:07:20.115320 30284 master.cpp:493] Authorization enabled
> I1124 01:07:20.115792 30285 hierarchical.cpp:162] Initialized hierarchical 
> allocator process
> I1124 01:07:20.115855 30285 whitelist_watcher.cpp:77] No whitelist given
> I1124 01:07:20.118755 30285 master.cpp:1625] The newly elected leader is 
> master@172.17.18.107:37993 with id ad27bc60-16d1-4239-9a65-235a991f9600
> I1124 01:07:20.118788 30285 master.cpp:1638] Elected as the leading master!
> I1124 01:07:20.118809 30285 master.cpp:1383] Recovering from registrar
> I1124 01:07:20.119078 30285 registrar.cpp:307] Recovering registrar
> I1124 01:07:20.143256 30292 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 38.787419ms
> I1124 01:07:20.143347 30292 replica.cpp:321] Persisted replica status to 
> STARTING
> I1124 01:07:20.143717 30292 recover.cpp:473] Replica is in STARTING status
> I1124 01:07:20.145454 30286 replica.cpp:674] Replica in STARTING status 
> received a broadcasted recover request from (6307)@172.17.18.107:37993
> I1124 01:07:20.145979 30292 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I1124 01:07:20.146654 30292 recover.cpp:564] Updating replica status to VOTING
> I1124 01:07:20.182672 30286 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 35.422256ms
> I1124 01:07:20.182747 30286 replica.cpp:321] Persisted replica status to 
> VOTING
> I1124 01:07:20.182929 30286 

[jira] [Updated] (MESOS-4002) ReservationEndpointsTest.UnreserveAvailableAndOfferedResources is flaky

2015-11-24 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4002:
--
Shepherd: Michael Park

> ReservationEndpointsTest.UnreserveAvailableAndOfferedResources is flaky
> ---
>
> Key: MESOS-4002
> URL: https://issues.apache.org/jira/browse/MESOS-4002
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>  Labels: flaky-test, mesosphere, reservations
>
> Showed up on ASF CI: ( test kept looping on and on and ultimately failing the 
> build after 300 minutes )
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/1269/changes
> {code}
> [ RUN  ] ReservationEndpointsTest.UnreserveAvailableAndOfferedResources
> I1124 01:07:20.050729 30260 leveldb.cpp:174] Opened db in 107.434842ms
> I1124 01:07:20.099630 30260 leveldb.cpp:181] Compacted db in 48.82312ms
> I1124 01:07:20.099722 30260 leveldb.cpp:196] Created db iterator in 29905ns
> I1124 01:07:20.099738 30260 leveldb.cpp:202] Seeked to beginning of db in 
> 3145ns
> I1124 01:07:20.099750 30260 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 279ns
> I1124 01:07:20.099804 30260 replica.cpp:778] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1124 01:07:20.100637 30292 recover.cpp:447] Starting replica recovery
> I1124 01:07:20.100934 30292 recover.cpp:473] Replica is in EMPTY status
> I1124 01:07:20.103240 30288 replica.cpp:674] Replica in EMPTY status received 
> a broadcasted recover request from (6305)@172.17.18.107:37993
> I1124 01:07:20.103672 30292 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I1124 01:07:20.104142 30292 recover.cpp:564] Updating replica status to 
> STARTING
> I1124 01:07:20.114534 30284 master.cpp:365] Master 
> ad27bc60-16d1-4239-9a65-235a991f9600 (9f2f81738d5e) started on 
> 172.17.18.107:37993
> I1124 01:07:20.114558 30284 master.cpp:367] Flags at startup: --acls="" 
> --allocation_interval="1000secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/I60I5f/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" --roles="role" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/I60I5f/master" --zk_session_timeout="10secs"
> I1124 01:07:20.114809 30284 master.cpp:412] Master only allowing 
> authenticated frameworks to register
> I1124 01:07:20.114820 30284 master.cpp:417] Master only allowing 
> authenticated slaves to register
> I1124 01:07:20.114825 30284 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/I60I5f/credentials'
> I1124 01:07:20.115067 30284 master.cpp:456] Using default 'crammd5' 
> authenticator
> I1124 01:07:20.115320 30284 master.cpp:493] Authorization enabled
> I1124 01:07:20.115792 30285 hierarchical.cpp:162] Initialized hierarchical 
> allocator process
> I1124 01:07:20.115855 30285 whitelist_watcher.cpp:77] No whitelist given
> I1124 01:07:20.118755 30285 master.cpp:1625] The newly elected leader is 
> master@172.17.18.107:37993 with id ad27bc60-16d1-4239-9a65-235a991f9600
> I1124 01:07:20.118788 30285 master.cpp:1638] Elected as the leading master!
> I1124 01:07:20.118809 30285 master.cpp:1383] Recovering from registrar
> I1124 01:07:20.119078 30285 registrar.cpp:307] Recovering registrar
> I1124 01:07:20.143256 30292 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 38.787419ms
> I1124 01:07:20.143347 30292 replica.cpp:321] Persisted replica status to 
> STARTING
> I1124 01:07:20.143717 30292 recover.cpp:473] Replica is in STARTING status
> I1124 01:07:20.145454 30286 replica.cpp:674] Replica in STARTING status 
> received a broadcasted recover request from (6307)@172.17.18.107:37993
> I1124 01:07:20.145979 30292 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I1124 01:07:20.146654 30292 recover.cpp:564] Updating replica status to VOTING
> I1124 01:07:20.182672 30286 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 35.422256ms
> I1124 01:07:20.182747 30286 replica.cpp:321] Persisted replica status to 
> VOTING
> I1124 01:07:20.182929 30286 recover.cpp:578] Successfully joined the Paxos 
> 

[jira] [Updated] (MESOS-3996) libprocess: document when, why defer() is necessary

2015-11-23 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3996:
--
Description: 
Current rules around this are pretty confusing and undocumented, as evidenced 
by some recent bugs in this area.

Some example snippets in the mesos source code that were a result of this 
confusion and are indeed bugs:

1. 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/provisioner/docker/registry_client.cpp#L754
{code}
return doHttpGet(blobURL, None(), true, true, None())
.then([this, blobURLPath, digest, filePath](
const http::Response& response) -> Future {
  Try fd = os::open(
  filePath.value,
  O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC,
  S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
{code}


  was:Current rules around this are pretty confusing and undocumented, as 
evidenced by some recent bugs in this area.


> libprocess: document when, why defer() is necessary
> ---
>
> Key: MESOS-3996
> URL: https://issues.apache.org/jira/browse/MESOS-3996
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Neil Conway
>Priority: Minor
>  Labels: documentation, libprocess, mesosphere
>
> Current rules around this are pretty confusing and undocumented, as evidenced 
> by some recent bugs in this area.
> Some example snippets in the mesos source code that were a result of this 
> confusion and are indeed bugs:
> 1. 
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/provisioner/docker/registry_client.cpp#L754
> {code}
> return doHttpGet(blobURL, None(), true, true, None())
> .then([this, blobURLPath, digest, filePath](
> const http::Response& response) -> Future {
>   Try fd = os::open(
>   filePath.value,
>   O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC,
>   S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3940) /reserve and /unreserve should be permissive under a master without authentication.

2015-11-23 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022040#comment-15022040
 ] 

Anand Mazumdar commented on MESOS-3940:
---

[~neilc] It won't be a bad idea to wait for MESOS-3233 ? Once that is 
implemented, all you need to do is remove the boiler plate code inside the 
handler function that tries to extract the {{Authorization}} header itself.

> /reserve and /unreserve should be permissive under a master without 
> authentication.
> ---
>
> Key: MESOS-3940
> URL: https://issues.apache.org/jira/browse/MESOS-3940
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Assignee: Neil Conway
>  Labels: authentication, mesosphere, reservations
>
> Currently, the {{/reserve}} and {{/unreserve}} endpoints do not work without 
> authentication enabled on the master. When authentication is disabled on the 
> master, these endpoints should just be permissive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3940) /reserve and /unreserve should be permissive under a master without authentication.

2015-11-23 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3940:
--
Shepherd: Michael Park

> /reserve and /unreserve should be permissive under a master without 
> authentication.
> ---
>
> Key: MESOS-3940
> URL: https://issues.apache.org/jira/browse/MESOS-3940
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>Assignee: Neil Conway
>  Labels: authentication, mesosphere, reservations
>
> Currently, the {{/reserve}} and {{/unreserve}} endpoints do not work without 
> authentication enabled on the master. When authentication is disabled on the 
> master, these endpoints should just be permissive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3851) Investigate recent crashes in Command Executor

2015-11-23 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023186#comment-15023186
 ] 

Anand Mazumdar commented on MESOS-3851:
---

This should not be a blocker for 0.26. As [~bmahler] pointed out earlier in the 
thread, this race happens due to us doing a {{send}} without a {{link}}. Hence, 
this behavior has existed for quite some time. It's just that [~tnachen]'s 
changes to command executor highlighted this.

> Investigate recent crashes in Command Executor
> --
>
> Key: MESOS-3851
> URL: https://issues.apache.org/jira/browse/MESOS-3851
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere
>
> Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to 
> support rootfs. There seem to be some tests showing frequent crashes due to 
> assert violations.
> {{FetcherCacheTest.SimpleEviction}} failed due to the following log:
> {code}
> I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to 
> executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at 
> executor(1)@172.17.5.200:33871'
> I1107 19:36:46.363682  1236 exec.cpp:297] 
> I1107 19:36:46.373569  1245 exec.cpp:210] Executor registered on slave 
> 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0
> @ 0x7f9f5a7db3fa  google::LogMessage::Fail()
> I1107 19:36:46.394081  1245 exec.cpp:222] Executor::registered took 395411ns
> @ 0x7f9f5a7db359  google::LogMessage::SendToLog()
> @ 0x7f9f5a7dad6a  google::LogMessage::Flush()
> @ 0x7f9f5a7dda9e  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a  _CheckFatal::~_CheckFatal()
> @   0x49c99d  
> mesos::internal::CommandExecutorProcess::launchTask()
> @   0x4b3dd7  
> _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_
> @   0x4c470c  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x7f9f5a761b1b  std::function<>::operator()()
> @ 0x7f9f5a749935  process::ProcessBase::visit()
> @ 0x7f9f5a74d700  process::DispatchEvent::visit()
> @   0x48e004  process::ProcessBase::serve()
> @ 0x7f9f5a745d21  process::ProcessManager::resume()
> @ 0x7f9f5a742f52  
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x7f9f5a74cf2c  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x7f9f5a74cedc  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x7f9f5a74ce6e  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x7f9f5a74cdc5  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x7f9f5a74cd5e  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x7f9f5624f1e0  (unknown)
> @ 0x7f9f564a8df5  start_thread
> @ 0x7f9f559b71ad  __clone
> I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container 
> '6553a617-6b4a-418d-9759-5681f45ff854' has exited
> I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container 
> '6553a617-6b4a-418d-9759-5681f45ff854'
> I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container 
> 'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited
> {code}
> The reason seems to be a race between the executor receiving a 
> {{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the 
> {{CHECK_SOME(executorInfo)}} failure.
> Link to complete log: 
> https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535
> Another related failure from {{ExamplesTest.PersistentVolumeFramework}}
> {code}
> @ 0x7f4f71529cbd  google::LogMessage::SendToLog()
> I1107 13:15:09.949987 31573 slave.cpp:2337] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> 721c7316-5580-4636-a83a-098e3bd4ed1f) for task 
> 

[jira] [Updated] (MESOS-3851) Investigate recent crashes in Command Executor

2015-11-23 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3851:
--
Shepherd: Vinod Kone

> Investigate recent crashes in Command Executor
> --
>
> Key: MESOS-3851
> URL: https://issues.apache.org/jira/browse/MESOS-3851
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere
>
> Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to 
> support rootfs. There seem to be some tests showing frequent crashes due to 
> assert violations.
> {{FetcherCacheTest.SimpleEviction}} failed due to the following log:
> {code}
> I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to 
> executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at 
> executor(1)@172.17.5.200:33871'
> I1107 19:36:46.363682  1236 exec.cpp:297] 
> I1107 19:36:46.373569  1245 exec.cpp:210] Executor registered on slave 
> 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0
> @ 0x7f9f5a7db3fa  google::LogMessage::Fail()
> I1107 19:36:46.394081  1245 exec.cpp:222] Executor::registered took 395411ns
> @ 0x7f9f5a7db359  google::LogMessage::SendToLog()
> @ 0x7f9f5a7dad6a  google::LogMessage::Flush()
> @ 0x7f9f5a7dda9e  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a  _CheckFatal::~_CheckFatal()
> @   0x49c99d  
> mesos::internal::CommandExecutorProcess::launchTask()
> @   0x4b3dd7  
> _ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_
> @   0x4c470c  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x7f9f5a761b1b  std::function<>::operator()()
> @ 0x7f9f5a749935  process::ProcessBase::visit()
> @ 0x7f9f5a74d700  process::DispatchEvent::visit()
> @   0x48e004  process::ProcessBase::serve()
> @ 0x7f9f5a745d21  process::ProcessManager::resume()
> @ 0x7f9f5a742f52  
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x7f9f5a74cf2c  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x7f9f5a74cedc  
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x7f9f5a74ce6e  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x7f9f5a74cdc5  
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x7f9f5a74cd5e  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x7f9f5624f1e0  (unknown)
> @ 0x7f9f564a8df5  start_thread
> @ 0x7f9f559b71ad  __clone
> I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container 
> '6553a617-6b4a-418d-9759-5681f45ff854' has exited
> I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container 
> '6553a617-6b4a-418d-9759-5681f45ff854'
> I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container 
> 'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited
> {code}
> The reason seems to be a race between the executor receiving a 
> {{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the 
> {{CHECK_SOME(executorInfo)}} failure.
> Link to complete log: 
> https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535
> Another related failure from {{ExamplesTest.PersistentVolumeFramework}}
> {code}
> @ 0x7f4f71529cbd  google::LogMessage::SendToLog()
> I1107 13:15:09.949987 31573 slave.cpp:2337] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> 721c7316-5580-4636-a83a-098e3bd4ed1f) for task 
> ad90531f-d3d8-43f6-96f2-c81c4548a12d of framework 
> ac4ea54a-7d19-4e41-9ee3-1a761f8e5b0f-
> @ 0x7f4f715296ce  google::LogMessage::Flush()
> @ 0x7f4f7152c402  google::LogMessageFatal::~LogMessageFatal()
> @   0x48d00a  _CheckFatal::~_CheckFatal()
> @   0x49c99d  

[jira] [Updated] (MESOS-3976) C++ HTTP Scheduler Library does not work with SSL enabled

2015-11-20 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3976:
--
Description: 
The C++ HTTP scheduler library does not work against Mesos when SSL is enabled 
(without downgrade).

The fix should be simple:
* The library should detect if SSL is enabled.
* If SSL is enabled, connections should be made with HTTPS instead of HTTP.

  was:
The HTTP scheduler library does not work against Mesos when SSL is enabled 
(without downgrade).

The fix should be simple:
* The library should detect if SSL is enabled.
* If SSL is enabled, connections should be made with HTTPS instead of HTTP.


> C++ HTTP Scheduler Library does not work with SSL enabled
> -
>
> Key: MESOS-3976
> URL: https://issues.apache.org/jira/browse/MESOS-3976
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API
>Reporter: Joseph Wu
>Assignee: Anand Mazumdar
>  Labels: mesosphere, security
>
> The C++ HTTP scheduler library does not work against Mesos when SSL is 
> enabled (without downgrade).
> The fix should be simple:
> * The library should detect if SSL is enabled.
> * If SSL is enabled, connections should be made with HTTPS instead of HTTP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3976) C++ HTTP Scheduler Library does not work with SSL enabled

2015-11-20 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3976:
--
Summary: C++ HTTP Scheduler Library does not work with SSL enabled  (was: 
HTTP Scheduler Library does not work with SSL enabled)

> C++ HTTP Scheduler Library does not work with SSL enabled
> -
>
> Key: MESOS-3976
> URL: https://issues.apache.org/jira/browse/MESOS-3976
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API
>Reporter: Joseph Wu
>Assignee: Anand Mazumdar
>  Labels: mesosphere, security
>
> The HTTP scheduler library does not work against Mesos when SSL is enabled 
> (without downgrade).
> The fix should be simple:
> * The library should detect if SSL is enabled.
> * If SSL is enabled, connections should be made with HTTPS instead of HTTP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3476) Refactor Status Update method on Agent to handle HTTP based Executors

2015-11-19 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-3476:
-

Assignee: Anand Mazumdar  (was: Isabel Jimenez)

> Refactor Status Update method on Agent to handle HTTP based Executors
> -
>
> Key: MESOS-3476
> URL: https://issues.apache.org/jira/browse/MESOS-3476
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, receiving a status update sent from slave to itself , {{runTask}} 
> , {{killTask}} and status updates from executors are handled by the 
> {{Slave::statusUpdate}} method on Slave. The signature of the method is 
> {{void Slave::statusUpdate(StatusUpdate update, const UPID& pid)}}. 
> We need to create another overload of it that can also handle HTTP based 
> executors which the previous PID based function can also call into. The 
> signature of the new function could be:
> {{void Slave::statusUpdate(StatusUpdate update, Executor* executor)}}
> The HTTP Executor would also call into this new function via 
> {{src/slave/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine

2015-11-18 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3793:
--
Target Version/s: 0.26.0

> Cannot start mesos local on a Debian GNU/Linux 8 docker machine
> ---
>
> Key: MESOS-3793
> URL: https://issues.apache.org/jira/browse/MESOS-3793
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: Debian GNU/Linux 8 docker machine
>Reporter: Matthias Veit
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> We updated the mesos version to 0.25.0 in our Marathon docker image, that 
> runs our integration tests.
> We use mesos local for those tests. This fails with this message:
> {noformat}
> root@a06e4b4eb776:/marathon# mesos local
> I1022 18:42:26.852485   136 leveldb.cpp:176] Opened db in 6.103258ms
> I1022 18:42:26.853302   136 leveldb.cpp:183] Compacted db in 765740ns
> I1022 18:42:26.853343   136 leveldb.cpp:198] Created db iterator in 9001ns
> I1022 18:42:26.853355   136 leveldb.cpp:204] Seeked to beginning of db in 
> 1287ns
> I1022 18:42:26.853366   136 leveldb.cpp:273] Iterated through 0 keys in the 
> db in ns
> I1022 18:42:26.853406   136 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1022 18:42:26.853775   141 recover.cpp:449] Starting replica recovery
> I1022 18:42:26.853862   141 recover.cpp:475] Replica is in EMPTY status
> I1022 18:42:26.854751   138 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I1022 18:42:26.854856   140 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1022 18:42:26.855002   140 recover.cpp:566] Updating replica status to 
> STARTING
> I1022 18:42:26.855655   138 master.cpp:376] Master 
> a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on 
> 172.17.0.14:5050
> I1022 18:42:26.855680   138 master.cpp:378] Flags at startup: 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" 
> --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs"
> I1022 18:42:26.855790   138 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1022 18:42:26.855803   138 master.cpp:430] Master allowing unauthenticated 
> slaves to register
> I1022 18:42:26.855815   138 master.cpp:467] Using default 'crammd5' 
> authenticator
> W1022 18:42:26.855829   138 authenticator.cpp:505] No credentials provided, 
> authentication requests will be refused
> I1022 18:42:26.855840   138 authenticator.cpp:512] Initializing server SASL
> I1022 18:42:26.856442   136 containerizer.cpp:143] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I1022 18:42:26.856943   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.888185ms
> I1022 18:42:26.856987   140 replica.cpp:323] Persisted replica status to 
> STARTING
> I1022 18:42:26.857115   140 recover.cpp:475] Replica is in STARTING status
> I1022 18:42:26.857270   140 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I1022 18:42:26.857312   140 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1022 18:42:26.857368   140 recover.cpp:566] Updating replica status to VOTING
> I1022 18:42:26.857781   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 371121ns
> I1022 18:42:26.857841   140 replica.cpp:323] Persisted replica status to 
> VOTING
> I1022 18:42:26.857895   140 recover.cpp:580] Successfully joined the Paxos 
> group
> I1022 18:42:26.857928   140 recover.cpp:464] Recover process terminated
> I1022 18:42:26.862455   137 master.cpp:1603] The newly elected leader is 
> master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8
> I1022 18:42:26.862498   137 master.cpp:1616] Elected as the leading master!
> I1022 18:42:26.862511   137 master.cpp:1376] Recovering from registrar
> I1022 18:42:26.862560   137 registrar.cpp:309] Recovering registrar
> Failed to create a containerizer: Could not create MesosContainerizer: Failed 
> to create launcher: Failed to create Linux launcher: Failed to mount cgroups 
> 

[jira] [Commented] (MESOS-3809) Expose advertise_ip and advertise_port as command line options in mesos slave

2015-11-18 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012426#comment-15012426
 ] 

Anand Mazumdar commented on MESOS-3809:
---

[~tnachen] agreed to shepherd this.

> Expose advertise_ip and advertise_port as command line options in mesos slave
> -
>
> Key: MESOS-3809
> URL: https://issues.apache.org/jira/browse/MESOS-3809
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.25.0
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>  Labels: mesosphere
>
> advertise_ip and advertise_port are exposed as mesos master command line args 
> (MESOS-809). But the following use case makes it a candidate for adding as 
> command line args in mesos slave as well.
> On Tue, Oct 27, 2015 at 7:43 PM, Xiaodong Zhang  wrote:
> It works! Thanks a lot.
> 发件人: haosdent 
> 答复: "u...@mesos.apache.org" 
> 日期: 2015年10月28日 星期三 上午10:23
> 至: user 
> 主题: Re: How to tell master which ip to connect.
> Do you try `export LIBPROCESS_ADVERTISE_IP=xxx` and 
> `LIBPROCESS_ADVERTISE_PORT` when start slave?
> On Wed, Oct 28, 2015 at 10:16 AM, Xiaodong Zhang  wrote:
> Hi teams:
> My scenarios is like this:
> My master nodes were deployed in AWS. My slaves were in AZURE.So they 
> communicate via public ip.
> I got trouble when slaves try to register to master. 
> Now slaves can get master’s public ip address,and can send register 
> request.But they can only send there private ip to master.(Because they don’t 
> know there public ip,thus they can’t not bind a public ip via —ip  flag), 
> thus  masters can’t connect slaves.How can the slave to tell master which ip 
> master should connect(I can’t find any flags like —advertise_ip in master).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3809) Expose advertise_ip and advertise_port as command line options in mesos slave

2015-11-18 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3809:
--
Shepherd: Timothy Chen

> Expose advertise_ip and advertise_port as command line options in mesos slave
> -
>
> Key: MESOS-3809
> URL: https://issues.apache.org/jira/browse/MESOS-3809
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.25.0
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>  Labels: mesosphere
>
> advertise_ip and advertise_port are exposed as mesos master command line args 
> (MESOS-809). But the following use case makes it a candidate for adding as 
> command line args in mesos slave as well.
> On Tue, Oct 27, 2015 at 7:43 PM, Xiaodong Zhang  wrote:
> It works! Thanks a lot.
> 发件人: haosdent 
> 答复: "u...@mesos.apache.org" 
> 日期: 2015年10月28日 星期三 上午10:23
> 至: user 
> 主题: Re: How to tell master which ip to connect.
> Do you try `export LIBPROCESS_ADVERTISE_IP=xxx` and 
> `LIBPROCESS_ADVERTISE_PORT` when start slave?
> On Wed, Oct 28, 2015 at 10:16 AM, Xiaodong Zhang  wrote:
> Hi teams:
> My scenarios is like this:
> My master nodes were deployed in AWS. My slaves were in AZURE.So they 
> communicate via public ip.
> I got trouble when slaves try to register to master. 
> Now slaves can get master’s public ip address,and can send register 
> request.But they can only send there private ip to master.(Because they don’t 
> know there public ip,thus they can’t not bind a public ip via —ip  flag), 
> thus  masters can’t connect slaves.How can the slave to tell master which ip 
> master should connect(I can’t find any flags like —advertise_ip in master).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3923) Implement AuthN handling for HTTP Scheduler API

2015-11-13 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3923:
--
Summary: Implement AuthN handling for HTTP Scheduler API  (was: Implement 
Authorization handling for HTTP Scheduler API)

> Implement AuthN handling for HTTP Scheduler API
> ---
>
> Key: MESOS-3923
> URL: https://issues.apache.org/jira/browse/MESOS-3923
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API, master
>Affects Versions: 0.25.0
>Reporter: Ben Whitehead
>  Labels: mesosphere
>
> If authorization features are enabled on a master frameworks attempting to 
> use the HTTP API can't register.
> {code}
> $ cat /tmp/subscribe-943257503176798091.bin | http --print=HhBb --stream 
> --pretty=colors --auth verification:password1 POST :5050/api/v1/scheduler 
> Accept:application/x-protobuf Content-Type:application/x-protobuf
> POST /api/v1/scheduler HTTP/1.1
> Connection: keep-alive
> Content-Type: application/x-protobuf
> Accept-Encoding: gzip, deflate
> Accept: application/x-protobuf
> Content-Length: 126
> User-Agent: HTTPie/0.9.0
> Host: localhost:5050
> Authorization: Basic dmVyaWZpY2F0aW9uOnBhc3N3b3JkMQ==
> +-+
> | NOTE: binary data not shown in terminal |
> +-+
> HTTP/1.1 401 Unauthorized
> Date: Fri, 13 Nov 2015 20:00:45 GMT
> WWW-authenticate: Basic realm="Mesos master"
> Content-Length: 65
> HTTP schedulers are not supported when authentication is required
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3923) Implement AuthN handling for HTTP Scheduler API

2015-11-13 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3923:
--
Description: 
If authentication(AuthN) is enabled on a master, frameworks attempting to use 
the HTTP Scheduler API can't register.

{code}
$ cat /tmp/subscribe-943257503176798091.bin | http --print=HhBb --stream 
--pretty=colors --auth verification:password1 POST :5050/api/v1/scheduler 
Accept:application/x-protobuf Content-Type:application/x-protobuf
POST /api/v1/scheduler HTTP/1.1
Connection: keep-alive
Content-Type: application/x-protobuf
Accept-Encoding: gzip, deflate
Accept: application/x-protobuf
Content-Length: 126
User-Agent: HTTPie/0.9.0
Host: localhost:5050
Authorization: Basic dmVyaWZpY2F0aW9uOnBhc3N3b3JkMQ==



+-+
| NOTE: binary data not shown in terminal |
+-+

HTTP/1.1 401 Unauthorized
Date: Fri, 13 Nov 2015 20:00:45 GMT
WWW-authenticate: Basic realm="Mesos master"
Content-Length: 65

HTTP schedulers are not supported when authentication is required
{code}

Authorization(AuthZ) is already supported for HTTP based frameworks.

  was:
If authorization features are enabled on a master frameworks attempting to use 
the HTTP API can't register.

{code}
$ cat /tmp/subscribe-943257503176798091.bin | http --print=HhBb --stream 
--pretty=colors --auth verification:password1 POST :5050/api/v1/scheduler 
Accept:application/x-protobuf Content-Type:application/x-protobuf
POST /api/v1/scheduler HTTP/1.1
Connection: keep-alive
Content-Type: application/x-protobuf
Accept-Encoding: gzip, deflate
Accept: application/x-protobuf
Content-Length: 126
User-Agent: HTTPie/0.9.0
Host: localhost:5050
Authorization: Basic dmVyaWZpY2F0aW9uOnBhc3N3b3JkMQ==



+-+
| NOTE: binary data not shown in terminal |
+-+

HTTP/1.1 401 Unauthorized
Date: Fri, 13 Nov 2015 20:00:45 GMT
WWW-authenticate: Basic realm="Mesos master"
Content-Length: 65

HTTP schedulers are not supported when authentication is required
{code}


> Implement AuthN handling for HTTP Scheduler API
> ---
>
> Key: MESOS-3923
> URL: https://issues.apache.org/jira/browse/MESOS-3923
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API, master
>Affects Versions: 0.25.0
>Reporter: Ben Whitehead
>  Labels: mesosphere
>
> If authentication(AuthN) is enabled on a master, frameworks attempting to use 
> the HTTP Scheduler API can't register.
> {code}
> $ cat /tmp/subscribe-943257503176798091.bin | http --print=HhBb --stream 
> --pretty=colors --auth verification:password1 POST :5050/api/v1/scheduler 
> Accept:application/x-protobuf Content-Type:application/x-protobuf
> POST /api/v1/scheduler HTTP/1.1
> Connection: keep-alive
> Content-Type: application/x-protobuf
> Accept-Encoding: gzip, deflate
> Accept: application/x-protobuf
> Content-Length: 126
> User-Agent: HTTPie/0.9.0
> Host: localhost:5050
> Authorization: Basic dmVyaWZpY2F0aW9uOnBhc3N3b3JkMQ==
> +-+
> | NOTE: binary data not shown in terminal |
> +-+
> HTTP/1.1 401 Unauthorized
> Date: Fri, 13 Nov 2015 20:00:45 GMT
> WWW-authenticate: Basic realm="Mesos master"
> Content-Length: 65
> HTTP schedulers are not supported when authentication is required
> {code}
> Authorization(AuthZ) is already supported for HTTP based frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3916) MasterMaintenanceTest.InverseOffersFilters is flaky

2015-11-13 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3916:
--
Description: 
Verbose Logs:
{code}
[ RUN  ] MasterMaintenanceTest.InverseOffersFilters
I1113 16:43:58.486469  8728 leveldb.cpp:176] Opened db in 2.360405ms
I1113 16:43:58.486935  8728 leveldb.cpp:183] Compacted db in 407105ns
I1113 16:43:58.486995  8728 leveldb.cpp:198] Created db iterator in 16221ns
I1113 16:43:58.487030  8728 leveldb.cpp:204] Seeked to beginning of db in 
10935ns
I1113 16:43:58.487046  8728 leveldb.cpp:273] Iterated through 0 keys in the db 
in 999ns
I1113 16:43:58.487090  8728 replica.cpp:780] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1113 16:43:58.487735  8747 recover.cpp:449] Starting replica recovery
I1113 16:43:58.488047  8747 recover.cpp:475] Replica is in EMPTY status
I1113 16:43:58.488977  8745 replica.cpp:676] Replica in EMPTY status received a 
broadcasted recover request from (58)@10.0.2.15:45384
I1113 16:43:58.489452  8746 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I1113 16:43:58.489712  8747 recover.cpp:566] Updating replica status to STARTING
I1113 16:43:58.490706  8742 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 745443ns
I1113 16:43:58.490739  8742 replica.cpp:323] Persisted replica status to 
STARTING
I1113 16:43:58.490859  8742 recover.cpp:475] Replica is in STARTING status
I1113 16:43:58.491786  8747 replica.cpp:676] Replica in STARTING status 
received a broadcasted recover request from (59)@10.0.2.15:45384
I1113 16:43:58.492542  8749 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I1113 16:43:58.493221  8743 recover.cpp:566] Updating replica status to VOTING
I1113 16:43:58.493710  8743 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 331874ns
I1113 16:43:58.493767  8743 replica.cpp:323] Persisted replica status to VOTING
I1113 16:43:58.493868  8743 recover.cpp:580] Successfully joined the Paxos group
I1113 16:43:58.494119  8743 recover.cpp:464] Recover process terminated
I1113 16:43:58.504369  8749 master.cpp:367] Master 
d59449fc-5462-43c5-b935-e05563fdd4b6 (vagrant-ubuntu-wily-64) started on 
10.0.2.15:45384
I1113 16:43:58.504438  8749 master.cpp:369] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="false" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/ZB7csS/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/ZB7csS/master" 
--zk_session_timeout="10secs"
I1113 16:43:58.504717  8749 master.cpp:416] Master allowing unauthenticated 
frameworks to register
I1113 16:43:58.504889  8749 master.cpp:419] Master only allowing authenticated 
slaves to register
I1113 16:43:58.504922  8749 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/ZB7csS/credentials'
I1113 16:43:58.505497  8749 master.cpp:458] Using default 'crammd5' 
authenticator
I1113 16:43:58.505759  8749 master.cpp:495] Authorization enabled
I1113 16:43:58.507638  8746 master.cpp:1606] The newly elected leader is 
master@10.0.2.15:45384 with id d59449fc-5462-43c5-b935-e05563fdd4b6
I1113 16:43:58.507693  8746 master.cpp:1619] Elected as the leading master!
I1113 16:43:58.507720  8746 master.cpp:1379] Recovering from registrar
I1113 16:43:58.507946  8749 registrar.cpp:309] Recovering registrar
I1113 16:43:58.508561  8749 log.cpp:661] Attempting to start the writer
I1113 16:43:58.510282  8747 replica.cpp:496] Replica received implicit promise 
request from (60)@10.0.2.15:45384 with proposal 1
I1113 16:43:58.510867  8747 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 475696ns
I1113 16:43:58.510946  8747 replica.cpp:345] Persisted promised to 1
I1113 16:43:58.511912  8745 coordinator.cpp:240] Coordinator attempting to fill 
missing positions
I1113 16:43:58.513030  8749 replica.cpp:391] Replica received explicit promise 
request from (61)@10.0.2.15:45384 for position 0 with proposal 2
I1113 16:43:58.513819  8749 leveldb.cpp:343] Persisting action (8 bytes) to 
leveldb took 739171ns
I1113 16:43:58.513867  8749 replica.cpp:715] Persisted action at 0
I1113 16:43:58.522002  8745 replica.cpp:540] Replica received write request for 
position 0 from (62)@10.0.2.15:45384
I1113 16:43:58.522114  8745 leveldb.cpp:438] Reading 

[jira] [Updated] (MESOS-3916) MasterMaintenanceTest.InverseOffersFilters is flaky

2015-11-13 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3916:
--
Environment: Ubuntu Wily 64 bit

> MasterMaintenanceTest.InverseOffersFilters is flaky
> ---
>
> Key: MESOS-3916
> URL: https://issues.apache.org/jira/browse/MESOS-3916
> Project: Mesos
>  Issue Type: Bug
> Environment: Ubuntu Wily 64 bit
>Reporter: Neil Conway
>  Labels: maintenance, mesosphere, tech-debt
> Attachments: wily_maintenance_test_verbose.txt
>
>
> Seems to fail about 10% of the time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3916) MasterMaintenanceTest.InverseOffersFilters is flaky

2015-11-13 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3916:
--
Summary: MasterMaintenanceTest.InverseOffersFilters is flaky  (was: Flakey 
test on Ubuntu Wily: MasterMaintenanceTest.InverseOffersFilters)

> MasterMaintenanceTest.InverseOffersFilters is flaky
> ---
>
> Key: MESOS-3916
> URL: https://issues.apache.org/jira/browse/MESOS-3916
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>  Labels: maintenance, mesosphere, tech-debt
> Attachments: wily_maintenance_test_verbose.txt
>
>
> Seems to fail about 10% of the time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3916) MasterMaintenanceTest.InverseOffersFilters is flaky

2015-11-13 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3916:
--
Labels: flaky-test maintenance mesosphere  (was: maintenance mesosphere 
tech-debt)

> MasterMaintenanceTest.InverseOffersFilters is flaky
> ---
>
> Key: MESOS-3916
> URL: https://issues.apache.org/jira/browse/MESOS-3916
> Project: Mesos
>  Issue Type: Bug
> Environment: Ubuntu Wily 64 bit
>Reporter: Neil Conway
>  Labels: flaky-test, maintenance, mesosphere
> Attachments: wily_maintenance_test_verbose.txt
>
>
> Verbose Logs:
> {code}
> [ RUN  ] MasterMaintenanceTest.InverseOffersFilters
> I1113 16:43:58.486469  8728 leveldb.cpp:176] Opened db in 2.360405ms
> I1113 16:43:58.486935  8728 leveldb.cpp:183] Compacted db in 407105ns
> I1113 16:43:58.486995  8728 leveldb.cpp:198] Created db iterator in 16221ns
> I1113 16:43:58.487030  8728 leveldb.cpp:204] Seeked to beginning of db in 
> 10935ns
> I1113 16:43:58.487046  8728 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 999ns
> I1113 16:43:58.487090  8728 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1113 16:43:58.487735  8747 recover.cpp:449] Starting replica recovery
> I1113 16:43:58.488047  8747 recover.cpp:475] Replica is in EMPTY status
> I1113 16:43:58.488977  8745 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (58)@10.0.2.15:45384
> I1113 16:43:58.489452  8746 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1113 16:43:58.489712  8747 recover.cpp:566] Updating replica status to 
> STARTING
> I1113 16:43:58.490706  8742 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 745443ns
> I1113 16:43:58.490739  8742 replica.cpp:323] Persisted replica status to 
> STARTING
> I1113 16:43:58.490859  8742 recover.cpp:475] Replica is in STARTING status
> I1113 16:43:58.491786  8747 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (59)@10.0.2.15:45384
> I1113 16:43:58.492542  8749 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1113 16:43:58.493221  8743 recover.cpp:566] Updating replica status to VOTING
> I1113 16:43:58.493710  8743 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 331874ns
> I1113 16:43:58.493767  8743 replica.cpp:323] Persisted replica status to 
> VOTING
> I1113 16:43:58.493868  8743 recover.cpp:580] Successfully joined the Paxos 
> group
> I1113 16:43:58.494119  8743 recover.cpp:464] Recover process terminated
> I1113 16:43:58.504369  8749 master.cpp:367] Master 
> d59449fc-5462-43c5-b935-e05563fdd4b6 (vagrant-ubuntu-wily-64) started on 
> 10.0.2.15:45384
> I1113 16:43:58.504438  8749 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/ZB7csS/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" 
> --registry_strict="true" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/ZB7csS/master" 
> --zk_session_timeout="10secs"
> I1113 16:43:58.504717  8749 master.cpp:416] Master allowing unauthenticated 
> frameworks to register
> I1113 16:43:58.504889  8749 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1113 16:43:58.504922  8749 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/ZB7csS/credentials'
> I1113 16:43:58.505497  8749 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1113 16:43:58.505759  8749 master.cpp:495] Authorization enabled
> I1113 16:43:58.507638  8746 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:45384 with id d59449fc-5462-43c5-b935-e05563fdd4b6
> I1113 16:43:58.507693  8746 master.cpp:1619] Elected as the leading master!
> I1113 16:43:58.507720  8746 master.cpp:1379] Recovering from registrar
> I1113 16:43:58.507946  8749 registrar.cpp:309] Recovering registrar
> I1113 16:43:58.508561  8749 log.cpp:661] Attempting to start the writer
> I1113 16:43:58.510282  8747 replica.cpp:496] Replica received implicit 
> promise request from (60)@10.0.2.15:45384 with proposal 1
> I1113 16:43:58.510867  8747 leveldb.cpp:306] Persisting metadata 

[jira] [Commented] (MESOS-3151) Reservation Test failed

2015-11-11 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001485#comment-15001485
 ] 

Anand Mazumdar commented on MESOS-3151:
---

[~jihun] Can you also add the verbose test logs to the JIRA description ?

> Reservation Test failed
> ---
>
> Key: MESOS-3151
> URL: https://issues.apache.org/jira/browse/MESOS-3151
> Project: Mesos
>  Issue Type: Bug
> Environment: Ubuntu 14.04.2
> OpenJDK 1.7.0_79
> IBM JDK 1.7.0 SR8
>Reporter: Jihun Kang
>Assignee: Jihun Kang
>
> Here is the details.
> {noformat}
> [ RUN  ] 
> ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes
> ../../src/tests/reservation_tests.cpp:1055: Failure
> Value of: Resources(offer.resources()).contains(unreserved + unreservedDisk)
>   Actual: false
> Expected: true
> 2015-07-27 
> 17:31:16,280:9247(0x2ae20f41d700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:33182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-07-27 
> 17:31:19,617:9247(0x2ae20f41d700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:33182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-07-27 
> 17:31:22,951:9247(0x2ae20f41d700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:33182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-07-27 
> 17:31:26,288:9247(0x2ae20f41d700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:33182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> ../../src/tests/reservation_tests.cpp:1076: Failure
> Failed to wait 15secs for message1
> *** Aborted at 1437985889 (unix time) try "date -d @1437985889" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> ../../3rdparty/libprocess/include/process/gmock.hpp:365: Failure
> Actual function call count doesn't match EXPECT_CALL(filter->mock, 
> filter(testing::A()))...
> Expected args: message matcher (8-byte object , 
> 1-byte object <61>, 1-byte object )
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> ../../3rdparty/libprocess/include/process/gmock.hpp:365: Failure
> Actual function call count doesn't match EXPECT_CALL(filter->mock, 
> filter(testing::A()))...
> Expected args: message matcher (8-byte object , 
> 1-byte object <61>, 1-byte object )
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> [  FAILED  ] 
> ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes (15354 
> ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3151) ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes is flaky

2015-11-11 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001485#comment-15001485
 ] 

Anand Mazumdar edited comment on MESOS-3151 at 11/12/15 2:01 AM:
-

[~jihun] Thanks for reporting this. 

- Can you also add the verbose test logs to the JIRA description ? 
- Can you also find a shepherd for this ?

[~mcypark] [~jieyu] Would you be willing to shepherd this ?


was (Author: anandmazumdar):
[~jihun] Can you also add the verbose test logs to the JIRA description ?

> ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes is flaky
> -
>
> Key: MESOS-3151
> URL: https://issues.apache.org/jira/browse/MESOS-3151
> Project: Mesos
>  Issue Type: Bug
> Environment: Ubuntu 14.04.2
> OpenJDK 1.7.0_79
> IBM JDK 1.7.0 SR8
>Reporter: Jihun Kang
>Assignee: Jihun Kang
>
> Here is the details.
> {noformat}
> [ RUN  ] 
> ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes
> ../../src/tests/reservation_tests.cpp:1055: Failure
> Value of: Resources(offer.resources()).contains(unreserved + unreservedDisk)
>   Actual: false
> Expected: true
> 2015-07-27 
> 17:31:16,280:9247(0x2ae20f41d700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:33182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-07-27 
> 17:31:19,617:9247(0x2ae20f41d700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:33182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-07-27 
> 17:31:22,951:9247(0x2ae20f41d700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:33182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-07-27 
> 17:31:26,288:9247(0x2ae20f41d700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:33182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> ../../src/tests/reservation_tests.cpp:1076: Failure
> Failed to wait 15secs for message1
> *** Aborted at 1437985889 (unix time) try "date -d @1437985889" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> ../../3rdparty/libprocess/include/process/gmock.hpp:365: Failure
> Actual function call count doesn't match EXPECT_CALL(filter->mock, 
> filter(testing::A()))...
> Expected args: message matcher (8-byte object , 
> 1-byte object <61>, 1-byte object )
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> ../../3rdparty/libprocess/include/process/gmock.hpp:365: Failure
> Actual function call count doesn't match EXPECT_CALL(filter->mock, 
> filter(testing::A()))...
> Expected args: message matcher (8-byte object , 
> 1-byte object <61>, 1-byte object )
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> [  FAILED  ] 
> ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes (15354 
> ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3151) ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes is flaky

2015-11-11 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3151:
--
Target Version/s: 0.27.0
 Summary: 
ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes is flaky  
(was: Reservation Test failed)

> ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes is flaky
> -
>
> Key: MESOS-3151
> URL: https://issues.apache.org/jira/browse/MESOS-3151
> Project: Mesos
>  Issue Type: Bug
> Environment: Ubuntu 14.04.2
> OpenJDK 1.7.0_79
> IBM JDK 1.7.0 SR8
>Reporter: Jihun Kang
>Assignee: Jihun Kang
>
> Here is the details.
> {noformat}
> [ RUN  ] 
> ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes
> ../../src/tests/reservation_tests.cpp:1055: Failure
> Value of: Resources(offer.resources()).contains(unreserved + unreservedDisk)
>   Actual: false
> Expected: true
> 2015-07-27 
> 17:31:16,280:9247(0x2ae20f41d700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:33182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-07-27 
> 17:31:19,617:9247(0x2ae20f41d700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:33182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-07-27 
> 17:31:22,951:9247(0x2ae20f41d700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:33182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2015-07-27 
> 17:31:26,288:9247(0x2ae20f41d700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:33182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> ../../src/tests/reservation_tests.cpp:1076: Failure
> Failed to wait 15secs for message1
> *** Aborted at 1437985889 (unix time) try "date -d @1437985889" if you are 
> using GNU date ***
> PC: @0x0 (unknown)
> ../../3rdparty/libprocess/include/process/gmock.hpp:365: Failure
> Actual function call count doesn't match EXPECT_CALL(filter->mock, 
> filter(testing::A()))...
> Expected args: message matcher (8-byte object , 
> 1-byte object <61>, 1-byte object )
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> ../../3rdparty/libprocess/include/process/gmock.hpp:365: Failure
> Actual function call count doesn't match EXPECT_CALL(filter->mock, 
> filter(testing::A()))...
> Expected args: message matcher (8-byte object , 
> 1-byte object <61>, 1-byte object )
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> [  FAILED  ] 
> ReservationTest.CompatibleCheckpointedResourcesWithPersistentVolumes (15354 
> ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3476) Refactor Status Update method on Agent to handle HTTP based Executors

2015-11-07 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3476:
--
Target Version/s: 0.27.0

> Refactor Status Update method on Agent to handle HTTP based Executors
> -
>
> Key: MESOS-3476
> URL: https://issues.apache.org/jira/browse/MESOS-3476
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Isabel Jimenez
>  Labels: mesosphere
>
> Currently, receiving a status update sent from slave to itself , {{runTask}} 
> , {{killTask}} and status updates from executors are handled by the 
> {{Slave::statusUpdate}} method on Slave. The signature of the method is 
> {{void Slave::statusUpdate(StatusUpdate update, const UPID& pid)}}. 
> We need to create another overload of it that can also handle HTTP based 
> executors which the previous PID based function can also call into. The 
> signature of the new function could be:
> {{void Slave::statusUpdate(StatusUpdate update, Executor* executor)}}
> The HTTP Executor would also call into this new function via 
> {{src/slave/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3515) Support Subscribe Call for HTTP based Executors

2015-11-07 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3515:
--
Target Version/s: 0.27.0

> Support Subscribe Call for HTTP based Executors
> ---
>
> Key: MESOS-3515
> URL: https://issues.apache.org/jira/browse/MESOS-3515
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> We need to add a {{subscribe(...)}} method in {{src/slave/slave.cpp}} to 
> introduce the ability for HTTP based executors to subscribe and then receive 
> events on the persistent HTTP connection. Most of the functionality needed 
> would be similar to {{Master::subscribe}} in {{src/master/master.cpp}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3550) Create a Executor Library based on the new Executor HTTP API

2015-11-07 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3550:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Create a Executor Library based on the new Executor HTTP API
> 
>
> Key: MESOS-3550
> URL: https://issues.apache.org/jira/browse/MESOS-3550
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Similar to the Scheduler Library {{src/scheduler/scheduler.cpp}} , we would 
> need a Executor Library that speaks the new Executor HTTP API. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2831) FetcherCacheTest.SimpleEviction is flaky

2015-11-07 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995535#comment-14995535
 ] 

Anand Mazumdar commented on MESOS-2831:
---

This is still flaky. The root cause might be different though, not sure. 
[~bernd-mesos] Should I open a new JIRA or re-open this ?

Logs from a ASF CI Run:

https://builds.apache.org/job/Mesos/1193/COMPILER=gcc,CONFIGURATION=--verbose,OS=centos:7,label_exp=docker%7C%7CHadoop/console

{code}
[ RUN  ] FetcherCacheTest.SimpleEviction
I1107 19:36:43.040343 30634 leveldb.cpp:176] Opened db in 5.759408ms
I1107 19:36:43.044983 30634 leveldb.cpp:183] Compacted db in 4.530307ms
I1107 19:36:43.045104 30634 leveldb.cpp:198] Created db iterator in 31039ns
I1107 19:36:43.045132 30634 leveldb.cpp:204] Seeked to beginning of db in 3710ns
I1107 19:36:43.045143 30634 leveldb.cpp:273] Iterated through 0 keys in the db 
in 345ns
I1107 19:36:43.045212 30634 replica.cpp:780] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1107 19:36:43.048341 30655 recover.cpp:449] Starting replica recovery
I1107 19:36:43.048702 30655 recover.cpp:475] Replica is in EMPTY status
I1107 19:36:43.050097 30665 replica.cpp:676] Replica in EMPTY status received a 
broadcasted recover request from (1366)@172.17.5.200:41587
I1107 19:36:43.050868 30655 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I1107 19:36:43.051520 30655 recover.cpp:566] Updating replica status to STARTING
I1107 19:36:43.052522 30655 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 670849ns
I1107 19:36:43.052634 30655 replica.cpp:323] Persisted replica status to 
STARTING
I1107 19:36:43.053145 30664 recover.cpp:475] Replica is in STARTING status
I1107 19:36:43.054857 30655 replica.cpp:676] Replica in STARTING status 
received a broadcasted recover request from (1367)@172.17.5.200:41587
I1107 19:36:43.056457 30655 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I1107 19:36:43.057020 30655 recover.cpp:566] Updating replica status to VOTING
I1107 19:36:43.058007 30664 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 774859ns
I1107 19:36:43.058068 30664 replica.cpp:323] Persisted replica status to VOTING
I1107 19:36:43.058660 30664 recover.cpp:580] Successfully joined the Paxos group
I1107 19:36:43.058848 30664 recover.cpp:464] Recover process terminated
I1107 19:36:43.059594 30666 master.cpp:367] Master 
7d94c7fb-8950-4bcf-80c1-46112292dcd6 (095b2d77f516) started on 
172.17.5.200:41587
I1107 19:36:43.059622 30666 master.cpp:369] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/dpiizb/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
--work_dir="/tmp/dpiizb/master" --zk_session_timeout="10secs"
I1107 19:36:43.060155 30666 master.cpp:414] Master only allowing authenticated 
frameworks to register
I1107 19:36:43.060170 30666 master.cpp:419] Master only allowing authenticated 
slaves to register
I1107 19:36:43.060180 30666 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/dpiizb/credentials'
I1107 19:36:43.060549 30666 master.cpp:458] Using default 'crammd5' 
authenticator
I1107 19:36:43.060745 30666 master.cpp:495] Authorization enabled
I1107 19:36:43.061125 30658 hierarchical.cpp:140] Initialized hierarchical 
allocator process
I1107 19:36:43.062026 30667 whitelist_watcher.cpp:79] No whitelist given
I1107 19:36:43.064836 30666 master.cpp:1606] The newly elected leader is 
master@172.17.5.200:41587 with id 7d94c7fb-8950-4bcf-80c1-46112292dcd6
I1107 19:36:43.064914 30666 master.cpp:1619] Elected as the leading master!
I1107 19:36:43.064952 30666 master.cpp:1379] Recovering from registrar
I1107 19:36:43.065999 30655 registrar.cpp:309] Recovering registrar
I1107 19:36:43.067461 30662 log.cpp:661] Attempting to start the writer
I1107 19:36:43.069129 30662 replica.cpp:496] Replica received implicit promise 
request from (1368)@172.17.5.200:41587 with proposal 1
I1107 19:36:43.069782 30662 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 478241ns
I1107 19:36:43.069875 30662 replica.cpp:345] Persisted promised to 1
I1107 19:36:43.071180 30662 coordinator.cpp:240] Coordinator attempting to fill 
missing positions
I1107 

[jira] [Commented] (MESOS-2831) FetcherCacheTest.SimpleEviction is flaky

2015-11-07 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995537#comment-14995537
 ] 

Anand Mazumdar commented on MESOS-2831:
---

Nevermind, this looks related to some recent changes in the command executor. 
Let me dig more into the root cause and file a separate JIRA for it.

> FetcherCacheTest.SimpleEviction is flaky
> 
>
> Key: MESOS-2831
> URL: https://issues.apache.org/jira/browse/MESOS-2831
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 0.23.0
>Reporter: Vinod Kone
>Assignee: Bernd Mathiske
>  Labels: flaky-test, mesosphere
>
> Saw this when reviewbot was testing an unrelated review 
> https://reviews.apache.org/r/35119/
> {code}
> [ RUN  ] FetcherCacheTest.SimpleEviction
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
> Function call: resourceOffers(0x5365320, @0x2b7bef9f1b20 { 128-byte 
> object  00-00 C0-75 00-18 7C-2B 00-00 60-76 00-18 7C-2B 00-00 00-77 00-18 7C-2B 00-00 
> 40-3A 00-18 7C-2B 00-00 04-00 00-00 04-00 00-00 04-00 00-00 7C-2B 00-00 00-00 
> 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 0F-00 
> 00-00> })
> Stack trace:
> F0607 21:19:23.181392  4246 fetcher_cache_tests.cpp:354] CHECK_READY(offers): 
> is PENDING Failed to wait for resource offers
> *** Check failure stack trace: ***
> @ 0x2b7be56c5972  google::LogMessage::Fail()
> @ 0x2b7be56c58be  google::LogMessage::SendToLog()
> @ 0x2b7be56c52c0  google::LogMessage::Flush()
> @ 0x2b7be56c81d4  google::LogMessageFatal::~LogMessageFatal()
> @   0x97d182  _CheckFatal::~_CheckFatal()
> @   0xb58a28  
> mesos::internal::tests::FetcherCacheTest::launchTask()
> @   0xb65b50  
> mesos::internal::tests::FetcherCacheTest_SimpleEviction_Test::TestBody()
> @  0x11923b7  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x118d5b4  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x1175975  testing::Test::Run()
> @  0x1176098  testing::TestInfo::Run()
> @  0x1176620  testing::TestCase::Run()
> @  0x117b2ea  testing::internal::UnitTestImpl::RunAllTests()
> @  0x1193229  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x118e2a5  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x117a1f6  testing::UnitTest::Run()
> @   0xcc832b  main
> @ 0x2b7be7d46ec5  (unknown)
> @   0x872379  (unknown)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3851) Investigate recent crashes in Command Executor

2015-11-07 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3851:
-

 Summary: Investigate recent crashes in Command Executor
 Key: MESOS-3851
 URL: https://issues.apache.org/jira/browse/MESOS-3851
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Anand Mazumdar
Priority: Blocker


Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to 
support rootfs. There seem to be some tests showing frequent crashes due to 
assert violations.

{{FetcherCacheTest.SimpleEviction}} failed due to the following log:

{code}
I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to executor 
''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at 
executor(1)@172.17.5.200:33871'
I1107 19:36:46.363682  1236 exec.cpp:297] Executor asked to run task '3'
IStarting task 3
1107 19:36:46.363988  1236 exec.cpp:306] Executor::launchTask took 153278ns
F1107 19:36:46.364225  1237 executor.cpp:184] CHECK_SOME(executorInfo): is NONE 
*** Check failure stack trace: ***
I1107 19:36:46.373569  1245 exec.cpp:210] Executor registered on slave 
7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0
@ 0x7f9f5a7db3fa  google::LogMessage::Fail()
I1107 19:36:46.394081  1245 exec.cpp:222] Executor::registered took 395411ns
@ 0x7f9f5a7db359  google::LogMessage::SendToLog()
@ 0x7f9f5a7dad6a  google::LogMessage::Flush()
@ 0x7f9f5a7dda9e  google::LogMessageFatal::~LogMessageFatal()
@   0x48d00a  _CheckFatal::~_CheckFatal()
@   0x49c99d  mesos::internal::CommandExecutorProcess::launchTask()
@   0x4b3dd7  
_ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_
@   0x4c470c  
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
@ 0x7f9f5a761b1b  std::function<>::operator()()
@ 0x7f9f5a749935  process::ProcessBase::visit()
@ 0x7f9f5a74d700  process::DispatchEvent::visit()
@   0x48e004  process::ProcessBase::serve()
@ 0x7f9f5a745d21  process::ProcessManager::resume()
@ 0x7f9f5a742f52  
_ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
@ 0x7f9f5a74cf2c  
_ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
@ 0x7f9f5a74cedc  
_ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
@ 0x7f9f5a74ce6e  
_ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
@ 0x7f9f5a74cdc5  
_ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
@ 0x7f9f5a74cd5e  
_ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
@ 0x7f9f5624f1e0  (unknown)
@ 0x7f9f564a8df5  start_thread
@ 0x7f9f559b71ad  __clone
I1107 19:36:46.551370 30656 containerizer.cpp:1257] Executor for container 
'6553a617-6b4a-418d-9759-5681f45ff854' has exited
I1107 19:36:46.551429 30656 containerizer.cpp:1074] Destroying container 
'6553a617-6b4a-418d-9759-5681f45ff854'
I1107 19:36:46.553869 30656 containerizer.cpp:1257] Executor for container 
'd2c1f924-c92a-453e-82b1-c294d09c4873' has exited
{code}

The reason seems to be a race between the executor receiving a 
{{RunTaskMessage}} before {{ExecutorRegisteredMessage}} leading to the 
{{CHECK_SOME(executorInfo)}} failure.

Link to complete log: 
https://issues.apache.org/jira/browse/MESOS-2831?focusedCommentId=14995535=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14995535

Another related failure from {{ExamplesTest.PersistentVolumeFramework}}

{code}
@ 0x7f4f71529cbd  google::LogMessage::SendToLog()
I1107 13:15:09.949987 31573 slave.cpp:2337] Status update manager successfully 
handled status update acknowledgement (UUID: 
721c7316-5580-4636-a83a-098e3bd4ed1f) for task 
ad90531f-d3d8-43f6-96f2-c81c4548a12d of framework 
ac4ea54a-7d19-4e41-9ee3-1a761f8e5b0f-
@ 0x7f4f715296ce  google::LogMessage::Flush()
@ 0x7f4f7152c402  google::LogMessageFatal::~LogMessageFatal()
@   0x48d00a  _CheckFatal::~_CheckFatal()
@   0x49c99d  mesos::internal::CommandExecutorProcess::launchTask()
@   0x4b3dd7  

[jira] [Updated] (MESOS-3339) Implement filtering mechanism for (Scheduler API Events) Testing

2015-11-06 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3339:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Implement filtering mechanism for (Scheduler API Events) Testing
> 
>
> Key: MESOS-3339
> URL: https://issues.apache.org/jira/browse/MESOS-3339
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, our testing infrastructure does not have a mechanism of 
> filtering/dropping HTTP events of a particular type from the Scheduler API 
> response stream.  We need a {{DROP_HTTP_CALLS}} abstraction that can help us 
> to filter a particular event type.
> {code}
> // Enqueues all received events into a libprocess queue.
> ACTION_P(Enqueue, queue)
> {
>   std::queue events = arg0;
>   while (!events.empty()) {
> // Note that we currently drop HEARTBEATs because most of these tests
> // are not designed to deal with heartbeats.
> // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
> if (events.front().type() == Event::HEARTBEAT) {
>   VLOG(1) << "Ignoring HEARTBEAT event";
> } else {
>   queue->put(events.front());
> }
> events.pop();
>   }
> }
> {code}
> This helper code is duplicated in at least two places currently, Scheduler 
> Library/Maintenance Primitives tests. 
> - The solution can be as trivial as moving this helper function to a common 
> test-header.
> - Implement a {{DROP_HTTP_CALLS}} similar to what we do for other protobufs 
> via {{DROP_CALLS}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3339) Implement filtering mechanism for (Scheduler API Events) Testing

2015-11-06 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993492#comment-14993492
 ] 

Anand Mazumdar commented on MESOS-3339:
---

Yep, Sure.

> Implement filtering mechanism for (Scheduler API Events) Testing
> 
>
> Key: MESOS-3339
> URL: https://issues.apache.org/jira/browse/MESOS-3339
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, our testing infrastructure does not have a mechanism of 
> filtering/dropping HTTP events of a particular type from the Scheduler API 
> response stream.  We need a {{DROP_HTTP_CALLS}} abstraction that can help us 
> to filter a particular event type.
> {code}
> // Enqueues all received events into a libprocess queue.
> ACTION_P(Enqueue, queue)
> {
>   std::queue events = arg0;
>   while (!events.empty()) {
> // Note that we currently drop HEARTBEATs because most of these tests
> // are not designed to deal with heartbeats.
> // TODO(vinod): Implement DROP_HTTP_CALLS that can filter heartbeats.
> if (events.front().type() == Event::HEARTBEAT) {
>   VLOG(1) << "Ignoring HEARTBEAT event";
> } else {
>   queue->put(events.front());
> }
> events.pop();
>   }
> }
> {code}
> This helper code is duplicated in at least two places currently, Scheduler 
> Library/Maintenance Primitives tests. 
> - The solution can be as trivial as moving this helper function to a common 
> test-header.
> - Implement a {{DROP_HTTP_CALLS}} similar to what we do for other protobufs 
> via {{DROP_CALLS}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3583) Introduce sessions in HTTP Scheduler API Subscribed Responses

2015-11-05 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3583:
--
Target Version/s: 0.27.0  (was: 0.26.0)

> Introduce sessions in HTTP Scheduler API Subscribed Responses
> -
>
> Key: MESOS-3583
> URL: https://issues.apache.org/jira/browse/MESOS-3583
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere, tech-debt
>
> Currently, the HTTP Scheduler API has no concept of Sessions aka 
> {{SessionID}} or a {{TokenID}}. This is useful in some failure scenarios. As 
> of now, if a framework fails over and then subscribes again with the same 
> {{FrameworkID}} with the {{force}} option set. The Mesos master would 
> subscribe it.
> If the previous instance of the framework/scheduler tries to send a Call , 
> e.g. {{Call::KILL}} with the same previous {{FrameworkID}} set, it would be 
> still accepted by the master leading to erroneously killing a task.
> This is possible because we do not have a way currently of distinguishing 
> connections. It used to work in the previous driver implementation due to the 
> master also performing a {{UPID}} check to verify if they matched and only 
> then allowing the call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3832) Scheduler HTTP API does not redirect to leading master

2015-11-04 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3832:
--
Shepherd: Vinod Kone
Target Version/s: 0.26.0
  Labels: newbie  (was: )
 Description: 
The documentation for the Scheduler HTTP API says:

{quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
Redirect” will be received with the “Location” header pointing to the leading 
master.{quote}
While the redirect functionality has been implemented, it was not actually used 
in the handler for the HTTP api.

A probable fix could be:
- Check if the current master is the leading master.
- If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}

  was:
The documentation for the HTTP api says:

{quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
Redirect” will be received with the “Location” header pointing to the leading 
master.{quote}
While the redirect functionality has been implemented, it was not actually used 
in the handler for the HTTP api.

 Summary: Scheduler HTTP API does not redirect to leading master  
(was: HTTP API does not redirect to leading master)

> Scheduler HTTP API does not redirect to leading master
> --
>
> Key: MESOS-3832
> URL: https://issues.apache.org/jira/browse/MESOS-3832
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Dario Rexin
>Assignee: Dario Rexin
>  Labels: newbie
>
> The documentation for the Scheduler HTTP API says:
> {quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
> Redirect” will be received with the “Location” header pointing to the leading 
> master.{quote}
> While the redirect functionality has been implemented, it was not actually 
> used in the handler for the HTTP api.
> A probable fix could be:
> - Check if the current master is the leading master.
> - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3833) /help endpoints do not work for nested paths

2015-11-04 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3833:
-

 Summary: /help endpoints do not work for nested paths
 Key: MESOS-3833
 URL: https://issues.apache.org/jira/browse/MESOS-3833
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API
Reporter: Anand Mazumdar
Priority: Minor


Mesos displays the list of all supported endpoints starting at a given path 
prefix using the {{/help}} suffix, e.g. {{master:5050/help}}.

It seems that the {{help}} functionality is broken for URL's having nested 
paths e.g. {{master:5050/help/master/machine/down}}. The response returned is:
{quote}
Malformed URL, expecting '/help/id/name/'
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3550) Create a Executor Library based on the new Executor HTTP API

2015-11-02 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-3550:
-

Assignee: Anand Mazumdar

> Create a Executor Library based on the new Executor HTTP API
> 
>
> Key: MESOS-3550
> URL: https://issues.apache.org/jira/browse/MESOS-3550
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Similar to the Scheduler Library {{src/scheduler/scheduler.cpp}} , we would 
> need a Executor Library that speaks the new Executor HTTP API. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3550) Create a Executor Library based on the new Executor HTTP API

2015-11-02 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3550:
--
  Sprint: Mesosphere Sprint 21
Story Points: 5

> Create a Executor Library based on the new Executor HTTP API
> 
>
> Key: MESOS-3550
> URL: https://issues.apache.org/jira/browse/MESOS-3550
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> Similar to the Scheduler Library {{src/scheduler/scheduler.cpp}} , we would 
> need a Executor Library that speaks the new Executor HTTP API. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3819) Add documentation explaining "roles"

2015-11-02 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3819:
--
  Sprint: Mesosphere Sprint 21
Story Points: 2

> Add documentation explaining "roles"
> 
>
> Key: MESOS-3819
> URL: https://issues.apache.org/jira/browse/MESOS-3819
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: documentation, mesosphere
>
> Docs currently talk about resources, static/dynamic reservations, but don't 
> explain what a "role" concept is to begin with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3807) RegistryClientTest.SimpleGetManifest is flaky

2015-10-26 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3807:
-

 Summary: RegistryClientTest.SimpleGetManifest is flaky
 Key: MESOS-3807
 URL: https://issues.apache.org/jira/browse/MESOS-3807
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Anand Mazumdar


>From ASF CI:
https://builds.apache.org/job/Mesos/976/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=centos:7,label_exp=docker%7C%7CHadoop/console

{code}
[ RUN  ] RegistryClientTest.SimpleGetManifest
I1026 18:02:45.320374 31975 registry_client.cpp:264] Response status: 401 
Unauthorized
I1026 18:02:45.323772 31982 libevent_ssl_socket.cpp:1025] Socket error: 
Connection reset by peer
../../src/tests/containerizer/provisioner_docker_tests.cpp:718: Failure
(socket).failure(): Failed accept: connection error: Connection reset by peer
[  FAILED  ] RegistryClientTest.SimpleGetManifest (13 ms)
{code}

Logs from a good run:
{code}
[ RUN  ] RegistryClientTest.SimpleGetManifest
I1025 15:35:36.248955 31970 registry_client.cpp:264] Response status: 401 
Unauthorized
I1025 15:35:36.267873 31979 registry_client.cpp:264] Response status: 200 OK
[   OK ] RegistryClientTest.SimpleGetManifest (32 ms)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3480) Refactor Executor struct in Slave to handle HTTP based executors

2015-10-23 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971385#comment-14971385
 ] 

Anand Mazumdar commented on MESOS-3480:
---

{code}
commit e1b0e125723dd6f144aa733961c490c1f0e1ef17
Author: Anand Mazumdar 
Date:   Thu Oct 22 23:13:51 2015 -0700

Added HttpConnection to the Executor struct in the Agent.

This lays an initial part of the groundwork needed to
support executors using the HTTP API in the Agent.

Review: https://reviews.apache.org/r/38874

{code}

{code}
commit 02c7d93ceefce19743b0e043ead62fb02a160dbd
Author: Anand Mazumdar 
Date:   Thu Oct 22 18:25:55 2015 -0700

Added output operator for Executor struct in agent.

Review: https://reviews.apache.org/r/39569
{code}

> Refactor Executor struct in Slave to handle HTTP based executors
> 
>
> Key: MESOS-3480
> URL: https://issues.apache.org/jira/browse/MESOS-3480
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> Currently, the {{struct Executor}} in slave only supports executors connected 
> via message passing (driver). We should refactor it to add support for HTTP 
> based Executors similar to what was done for the Scheduler API {{struct 
> Framework}} in {{src/master/master.hpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3766) Can not kill task in Status STAGING

2015-10-23 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971791#comment-14971791
 ] 

Anand Mazumdar commented on MESOS-3766:
---

I can take this up.

> Can not kill task in Status STAGING
> ---
>
> Key: MESOS-3766
> URL: https://issues.apache.org/jira/browse/MESOS-3766
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Affects Versions: 0.25.0
> Environment: OSX 
>Reporter: Matthias Veit
>Assignee: Niklas Quarfot Nielsen
> Attachments: master.log.zip, slave.log.zip
>
>
> I have created a simple Marathon Application with instance count 100 (100 
> tasks) with a simple sleep command. Before all tasks were running, I killed 
> all tasks. This operation was successful, except 2 tasks. These 2 tasks are 
> in state STAGING (according to the mesos UI). Marathon tries to kill those 
> tasks every 5 seconds (for over an hour now) - unsuccessfully.
> I picked one task and grepped the slave log:
> {noformat}
> I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour
> I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80
> I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container 
> '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr
> I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing 
> executor's forked pid 37096 to 
> '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks
> I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000
> I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame
> I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-

<    4   5   6   7   8   9   10   11   >