Re: Tasks always lost

2014-07-01 Thread qingyang li
here is the slave log:

E0702 10:32:07.599364 14915 slave.cpp:2686] Failed to unmonitor container
for executor 20140616-104524-1694607552-5050-26919-1 of framework
20140702-102939-1694607552-5050-14846-: Not monitored

E0702 11:35:08.869998 17840 slave.cpp:2310] Container
'af557235-2d5f-4062-aaf3-a747cb3cd0d1' for executor
'20140616-104524-1694607552-5050-26919-1' of framework
'20140702-113428-1694607552-5050-17766-' failed to start: Failed to
fetch URIs for container 'af557235-2d5f-4062-aaf3-a747cb3cd0d1': exit
status 32512



2014-07-01 16:24 GMT+08:00 qingyang li :

> i am using mesos0.19 and spark0.9.0 ,  the mesos cluster is started, when
> I using spark-shell to submit one job, the tasks always lost.  here is the
> log:
> --
> 14/07/01 16:24:27 INFO DAGScheduler: Host gained which was in lost list
> earlier: bigdata005
> 14/07/01 16:24:27 INFO TaskSetManager: Starting task 0.0:1 as TID 4042 on
> executor 20140616-143932-1694607552-5050-4080-2: bigdata005 (PROCESS_LOCAL)
> 14/07/01 16:24:27 INFO TaskSetManager: Serialized task 0.0:1 as 1570 bytes
> in 0 ms
> 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for
> 20140616-104524-1694607552-5050-26919-1 from TaskSet 0.0
> 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4041 (task 0.0:0)
> 14/07/01 16:24:28 INFO DAGScheduler: Executor lost:
> 20140616-104524-1694607552-5050-26919-1 (epoch 3427)
> 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor
> 20140616-104524-1694607552-5050-26919-1 from BlockManagerMaster.
> 14/07/01 16:24:28 INFO BlockManagerMaster: Removed
> 20140616-104524-1694607552-5050-26919-1 successfully in removeExecutor
> 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for
> 20140616-143932-1694607552-5050-4080-2 from TaskSet 0.0
> 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4042 (task 0.0:1)
> 14/07/01 16:24:28 INFO DAGScheduler: Executor lost:
> 20140616-143932-1694607552-5050-4080-2 (epoch 3428)
> 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor
> 20140616-143932-1694607552-5050-4080-2 from BlockManagerMaster.
> 14/07/01 16:24:28 INFO BlockManagerMaster: Removed
> 20140616-143932-1694607552-5050-4080-2 successfully in removeExecutor
> 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list
> earlier: bigdata005
> 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list
> earlier: bigdata001
> 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:1 as TID 4043 on
> executor 20140616-143932-1694607552-5050-4080-2: bigdata005 (PROCESS_LOCAL)
> 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:1 as 1570 bytes
> in 0 ms
> 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:0 as TID 4044 on
> executor 20140616-104524-1694607552-5050-26919-1: bigdata001 (PROCESS_LOCAL)
> 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:0 as 1570 bytes
> in 0 ms
>
>
> it seems other guy has also encountered such problem,
>
> http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201305.mbox/%3c201305161047069952...@nfs.iscas.ac.cn%3E
>


Re: Review Request 22832: HTTP Authenticated '/shutdown' endpoint

2014-07-01 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22832/#review47177
---


Patch looks great!

Reviews applied: [22832]

All tests passed.

- Mesos ReviewBot


On July 1, 2014, 10:46 p.m., Isabel Jimenez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22832/
> ---
> 
> (Updated July 1, 2014, 10:46 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Dominic Hamon, and Till 
> Toenshoff.
> 
> 
> Bugs: MESOS-1390
> https://issues.apache.org/jira/browse/MESOS-1390
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> HTTP Authenticated '/shutdown' endpoint for shutting down a running framework
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 12d84bf 
>   src/master/http.cpp 5d86976 
>   src/master/master.hpp 5fef354 
>   src/master/master.cpp 474014b 
>   src/slave/http.cpp cd7f692 
>   src/slave/slave.hpp 605ee4a 
>   src/slave/slave.cpp f42ab60 
>   src/tests/shutdown_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22832/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Isabel Jimenez
> 
>



Jenkins build is back to normal : Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME #1950

2014-07-01 Thread Apache Jenkins Server
See 




Re: Review Request 22313: MESOS-886: Prevented slave from launching tasks before containerize's update completes.

2014-07-01 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22313/#review47168
---


Patch looks great!

Reviews applied: [22313]

All tests passed.

- Mesos ReviewBot


On July 1, 2014, 8:33 p.m., Yifan Gu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22313/
> ---
> 
> (Updated July 1, 2014, 8:33 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-886
> https://issues.apache.org/jira/browse/MESOS-886
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Added __runTask() to wait for the completion of containerizer->update() and 
> check the result before sending RunTaskMessage.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 605ee4a 
>   src/slave/slave.cpp f42ab60 
>   src/tests/slave_tests.cpp 371a5b8 
> 
> Diff: https://reviews.apache.org/r/22313/diff/
> 
> 
> Testing
> ---
> 
> SlaveTest, CancelTaskIfContainerizerFails
> 
> Which tests that if the containerizer->update() return a Failure, the task 
> won't be launched and the scheduler will get TASK_LOST.
> 
> make check
> 
> 
> File Attachments
> 
> 
> framework will exit
>   
> https://reviews.apache.org/media/uploaded/files/2014/06/18/fbe73273-7aa9-4faa-b1c5-003ab03042a9__issue-886.diff
> log
>   
> https://reviews.apache.org/media/uploaded/files/2014/06/18/84d801a0-5c2a-4bb9-901b-e1962031461c__log
> 
> 
> Thanks,
> 
> Yifan Gu
> 
>



Jenkins build is back to normal : mesos-reviewbot #1058

2014-07-01 Thread Apache Jenkins Server
See 



Review Request 23224: Refactored the python bindings into multiple modules.

2014-07-01 Thread Thomas Rampelberg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23224/
---

Review request for mesos.


Bugs: MESOS-857
https://issues.apache.org/jira/browse/MESOS-857


Repository: mesos-git


Description
---

The existing module has been split into three separate ones:

- mesos.api - This contains the stub implementations for the Executor/Scheduler
- mesos.native - The old _mesos module.
- mesos.protocol - All protobufs.

There is also a base metapackage `mesos` that allows a potential `pip install 
mesos` to correctly install everything required. While mesos.api and 
mesos.protocol can now be uploaded to the cheeseshop, mesos.native has not 
changed and will need some more work first.


Diffs
-

  Makefile.am b91d8cf011832e6e91b16f03a2d80fbb601eba8f 
  configure.ac e7472081339fc9c773eb2cf2d5f15dc459ac378d 
  mpi/mpiexec-mesos.in da0733fc29f97e67385cab55d60d4e2afd76aba9 
  mpi/mpiexec-mesos.py 0ab50167eaa43f9d69f37b7c10e26fa7a7d9f250 
  src/Makefile.am 918b0d04a5de69a9213e3d31c8f9424756e4ade5 
  src/examples/python/test-containerizer.in 
569519b3b9755959f9bf931d3c81be9a00b64bc9 
  src/examples/python/test-executor.in 7e8875f0fd74dc9f9207986864edbce588ec3fb8 
  src/examples/python/test-framework.in 
c4683b97b87ba8753e842b0c75cc3d65140a5cf7 
  src/examples/python/test_containerizer.py 
c65d891539bcee775741626596997afe8471c930 
  src/examples/python/test_executor.py 065b50a6146cb39a82024d82c20cf89f940a9e57 
  src/examples/python/test_framework.py 
fce090fe542e3863770d7daea3d8764da1d8d5df 
  src/python/api/src/mesos/__init__.py PRE-CREATION 
  src/python/native/mesos_executor_driver_impl.hpp  
  src/python/native/mesos_executor_driver_impl.cpp  
  src/python/native/mesos_scheduler_driver_impl.hpp  
  src/python/native/mesos_scheduler_driver_impl.cpp  
  src/python/native/module.hpp  
  src/python/native/module.cpp b94712681e6f0e9bf5dfdafa10621d1df82dc367 
  src/python/native/proxy_executor.hpp  
  src/python/native/proxy_executor.cpp  
  src/python/native/proxy_scheduler.hpp  
  src/python/native/proxy_scheduler.cpp  
  src/python/native/src/mesos/__init__.py PRE-CREATION 
  src/python/native/src/mesos/native/__init__.py PRE-CREATION 
  src/python/protocol/src/mesos/__init__.py PRE-CREATION 
  src/python/setup.py.in b996dfef5c7a6d330991522bf0047ed3cac6760d 
  src/python/src/mesos.py 0152ab456f072f8d4a1c4ab19fe74e181eadbd05 
  src/python/src/mesos/__init__.py PRE-CREATION 

Diff: https://reviews.apache.org/r/23224/diff/


Testing
---


Thanks,

Thomas Rampelberg



Re: Review Request 23214: PortMapping: allow containers to recover even when they were not managed by Network Isolator previously.

2014-07-01 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23214/#review47162
---



src/slave/containerizer/isolators/network/port_mapping.cpp


Remove ':'



src/slave/containerizer/isolators/network/port_mapping.cpp


Remove ':' and use 'pid' (instead of 'PID')



src/slave/containerizer/isolators/network/port_mapping.cpp


network isolator



src/slave/containerizer/isolators/network/port_mapping.cpp


Remove ':'



src/tests/port_mapping_tests.cpp


What if network isolation is not available (e.g., libnl is too old)? Should 
we check for that?



src/tests/port_mapping_tests.cpp


Why not move this down. You can do:

slave::Flags flags2 = CreateSlaveFlags();



src/tests/port_mapping_tests.cpp


What if cgroups is not available when testing (e.g. on Mac?)



src/tests/port_mapping_tests.cpp


Probably say: this is to verify the case where one task is running w/o 
network isolator and another task is running w/ network isolator.


- Jie Yu


On July 1, 2014, 9:22 p.m., Chi Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23214/
> ---
> 
> (Updated July 1, 2014, 9:22 p.m.)
> 
> 
> Review request for mesos, Ian Downes, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/MESOS-1557
> 
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/MESOS-1557
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> This eases deployment where a slave could be upgraded to use Network Isolator
> without removing all the existing tasks.
> 
> - Added a new test.
> - Moved all portmapping tests to a new file.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am e3ff6d7 
>   src/slave/containerizer/isolators/network/port_mapping.cpp a326653 
>   src/tests/isolator_tests.cpp 4650f97 
>   src/tests/port_mapping_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/23214/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Chi Zhang
> 
>



Re: Review Request 23221: PortMappingMesosTest: added a test to ensure that all configuations are cleaned up for an orphan container

2014-07-01 Thread Chi Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23221/
---

(Updated July 2, 2014, 12:56 a.m.)


Review request for mesos, Ian Downes, Jie Yu, and Vinod Kone.


Changes
---

added dependency.


Repository: mesos-git


Description
---

see summary.


Diffs
-

  src/slave/containerizer/isolators/network/port_mapping.hpp ac3bee3 
  src/slave/containerizer/isolators/network/port_mapping.cpp a326653 
  src/tests/port_mapping_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/23221/diff/


Testing
---

ran tests.


Thanks,

Chi Zhang



Review Request 23221: PortMappingMesosTest: added a test to ensure that all configuations are cleaned up for an orphan container

2014-07-01 Thread Chi Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23221/
---

Review request for mesos, Ian Downes, Jie Yu, and Vinod Kone.


Repository: mesos-git


Description
---

see summary.


Diffs
-

  src/slave/containerizer/isolators/network/port_mapping.hpp ac3bee3 
  src/slave/containerizer/isolators/network/port_mapping.cpp a326653 
  src/tests/port_mapping_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/23221/diff/


Testing
---

ran tests.


Thanks,

Chi Zhang



Review Request 23220: Fixed and renamed AllocatorZooKeeper tests.

2014-07-01 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23220/
---

Review request for mesos and Jiang Yan Xu.


Bugs: MESOS-1536
https://issues.apache.org/jira/browse/MESOS-1536


Repository: mesos-git


Description
---

These tests don't need ZK. It unnecessarily slows these tests down and also 
makes it hard to finely control the timing of slave/framework re-registrations. 
So I made them AllocatorTests.

Also, fixed the timing issue described in the bug.


Diffs
-

  src/Makefile.am e3ff6d71d9324ea8376c14fae056568452f22bdc 
  src/tests/allocator_tests.cpp 42cb3455a82090ce85fe6dd4382146bcdb17f651 
  src/tests/allocator_zookeeper_tests.cpp 
091fb08f712104c721da0935f0600972f6d2bc71 

Diff: https://reviews.apache.org/r/23220/diff/


Testing
---

make check


Thanks,

Vinod Kone



Build failed in Jenkins: Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME #2225

2014-07-01 Thread Apache Jenkins Server
See 
<https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/2225/changes>

Changes:

[vinodkone] Added the low level scheduler example using libprocess.

[adam] Fixed flaky OrphanTasks test.

[yan] Made sure Cluster::Masters::start() returns when the Master is ready for 
incoming requests. i.e., after it has finished executing _recover().

--
[...truncated 145540 lines...]
I0701 23:29:44.128330  7446 hierarchical_allocator_process.hpp:588] Framework 
20140701-214948-1032504131-40759-7416- filtered slave 
20140701-214948-1032504131-40759-7416-1 for 1secs
I0701 23:29:44.641430  7444 monitor.cpp:140] Failed to collect resource usage 
for container 'a62a341d-5a30-4729-a356-b952e3de4346' for executor 'default' of 
framework '20140701-214948-1032504131-40759-7416-': Unknown container: 
a62a341d-5a30-4729-a356-b952e3de4346
I0701 23:29:44.718559  7447 monitor.cpp:140] Failed to collect resource usage 
for container 'b6c9857b-de87-4bc0-9e2f-9eed997f3d85' for executor 'default' of 
framework '20140701-214948-1032504131-40759-7416-': Unknown container: 
b6c9857b-de87-4bc0-9e2f-9eed997f3d85
I0701 23:29:45.127030  7441 hierarchical_allocator_process.hpp:833] Filtered 
cpus(*):8; mem(*):15025; disk(*):23038; ports(*):[31000-32000] on slave 
20140701-214948-1032504131-40759-7416-0 for framework 
20140701-214948-1032504131-40759-7416-
I0701 23:29:45.127105  7441 hierarchical_allocator_process.hpp:833] Filtered 
cpus(*):8; mem(*):15025; disk(*):23038; ports(*):[31000-32000] on slave 
20140701-214948-1032504131-40759-7416-1 for framework 
20140701-214948-1032504131-40759-7416-
I0701 23:29:45.127157  7441 hierarchical_allocator_process.hpp:833] Filtered 
cpus(*):8; mem(*):15025; disk(*):23038; ports(*):[31000-32000] on slave 
20140701-214948-1032504131-40759-7416-2 for framework 
20140701-214948-1032504131-40759-7416-
I0701 23:29:45.127185  7441 hierarchical_allocator_process.hpp:686] Performed 
allocation for 3 slaves in 234681ns
I0701 23:29:45.641852  7443 monitor.cpp:140] Failed to collect resource usage 
for container 'a62a341d-5a30-4729-a356-b952e3de4346' for executor 'default' of 
framework '20140701-214948-1032504131-40759-7416-': Unknown container: 
a62a341d-5a30-4729-a356-b952e3de4346
I0701 23:29:45.719053  7446 monitor.cpp:140] Failed to collect resource usage 
for container 'b6c9857b-de87-4bc0-9e2f-9eed997f3d85' for executor 'default' of 
framework '20140701-214948-1032504131-40759-7416-': Unknown container: 
b6c9857b-de87-4bc0-9e2f-9eed997f3d85
I0701 23:29:46.127477  7441 hierarchical_allocator_process.hpp:750] Offering 
cpus(*):8; mem(*):15025; disk(*):23038; ports(*):[31000-32000] on slave 
20140701-214948-1032504131-40759-7416-0 to framework 
20140701-214948-1032504131-40759-7416-
I0701 23:29:46.127588  7441 hierarchical_allocator_process.hpp:750] Offering 
cpus(*):8; mem(*):15025; disk(*):23038; ports(*):[31000-32000] on slave 
20140701-214948-1032504131-40759-7416-1 to framework 
20140701-214948-1032504131-40759-7416-
I0701 23:29:46.127679  7441 hierarchical_allocator_process.hpp:750] Offering 
cpus(*):8; mem(*):15025; disk(*):23038; ports(*):[31000-32000] on slave 
20140701-214948-1032504131-40759-7416-2 to framework 
20140701-214948-1032504131-40759-7416-
I0701 23:29:46.127882  7441 hierarchical_allocator_process.hpp:686] Performed 
allocation for 3 slaves in 516450ns
I0701 23:29:46.127938  7442 master.hpp:794] Adding offer 
20140701-214948-1032504131-40759-7416-8985 with resources cpus(*):8; 
mem(*):15025; disk(*):23038; ports(*):[31000-32000] on slave 
20140701-214948-1032504131-40759-7416-2 (juno.apache.org)
I0701 23:29:46.128057  7442 master.hpp:794] Adding offer 
20140701-214948-1032504131-40759-7416-8986 with resources cpus(*):8; 
mem(*):15025; disk(*):23038; ports(*):[31000-32000] on slave 
20140701-214948-1032504131-40759-7416-0 (juno.apache.org)
I0701 23:29:46.128151  7442 master.hpp:794] Adding offer 
20140701-214948-1032504131-40759-7416-8987 with resources cpus(*):8; 
mem(*):15025; disk(*):23038; ports(*):[31000-32000] on slave 
20140701-214948-1032504131-40759-7416-1 (juno.apache.org)
I0701 23:29:46.128204  7442 master.cpp:3449] Sending 3 offers to framework 
20140701-214948-1032504131-40759-7416-
I0701 23:29:46.128942  7444 sched.cpp:546] Scheduler::resourceOffers took 
494524ns
I0701 23:29:46.129137  7443 master.hpp:804] Removing offer 
20140701-214948-1032504131-40759-7416-8985 with resources cpus(*):8; 
mem(*):15025; disk(*):23038; ports(*):[31000-32000] on slave 
20140701-214948-1032504131-40759-7416-2 (juno.apache.org)
I0701 23:29:46.129215  7443 master.cpp:2128] Processing reply for offers: [ 
20140701-214948-1032504131-40759-7416-8985 ] on slave 
20140701-214948-1032504131-40759-7416-2 at slave(3)@67.195.138.61:40759 
(juno.apache.org) for fr

Review Request 23216: Added the low level scheduler example using pthread.

2014-07-01 Thread Zuyu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23216/
---

Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Repository: mesos-git


Description
---

Added the low level scheduler example using pthread.


Diffs
-

  src/Makefile.am e3ff6d71d9324ea8376c14fae056568452f22bdc 
  src/examples/low_level_scheduler_pthread.cpp PRE-CREATION 
  src/tests/examples_tests.cpp 2b554d72f0058a68f589719373f3d3e37a3a7ba3 
  src/tests/low_level_scheduler_pthread_test.sh PRE-CREATION 

Diff: https://reviews.apache.org/r/23216/diff/


Testing
---

[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from ExamplesTest
[ RUN  ] ExamplesTest.LowLevelSchedulerPthread
[   OK ] ExamplesTest.LowLevelSchedulerPthread (1655 ms)
[--] 1 test from ExamplesTest (1655 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (1669 ms total)
[  PASSED  ] 1 test.


Thanks,

Zuyu Zhang



Re: Review Request 22832: HTTP Authenticated '/shutdown' endpoint

2014-07-01 Thread Dominic Hamon

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22832/#review47156
---



src/Makefile.am


nit: watch the tabs/spaces here :)



src/master/http.cpp


would you mind changing the indent here to match the lines above?



src/master/http.cpp


how long do you expect this call to take? ie, will the client timeout 
waiting for a response? should this dispatch the request to the master instead 
and return Accepted()?



src/slave/slave.hpp


is this change just for symmetry with Master::Http?



src/tests/shutdown_tests.cpp


our test methods usually have capital initials:

TEST_F(ShutdownTest, ShutdownEndpoint)



src/tests/shutdown_tests.cpp


i think each of these test cases should be a different test.



src/tests/shutdown_tests.cpp


you can also use OK().status (which is more descriptive)



src/tests/shutdown_tests.cpp


also test with authorization header and bad credentials?


- Dominic Hamon


On July 1, 2014, 3:46 p.m., Isabel Jimenez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22832/
> ---
> 
> (Updated July 1, 2014, 3:46 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Dominic Hamon, and Till 
> Toenshoff.
> 
> 
> Bugs: MESOS-1390
> https://issues.apache.org/jira/browse/MESOS-1390
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> HTTP Authenticated '/shutdown' endpoint for shutting down a running framework
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 12d84bf 
>   src/master/http.cpp 5d86976 
>   src/master/master.hpp 5fef354 
>   src/master/master.cpp 474014b 
>   src/slave/http.cpp cd7f692 
>   src/slave/slave.hpp 605ee4a 
>   src/slave/slave.cpp f42ab60 
>   src/tests/shutdown_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22832/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Isabel Jimenez
> 
>



Re: Review Request 22832: HTTP Authenticated '/shutdown' endpoint

2014-07-01 Thread Isabel Jimenez


> On June 24, 2014, 10:03 a.m., Adam B wrote:
> > src/master/master.hpp, line 422
> > 
> >
> > No more const. :(
> > Can you give us a quick explanation why? (Master::removeFramework() is 
> > not const?)

No removeFramework is not const :(


- Isabel


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22832/#review46506
---


On July 1, 2014, 10:46 p.m., Isabel Jimenez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22832/
> ---
> 
> (Updated July 1, 2014, 10:46 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Dominic Hamon, and Till 
> Toenshoff.
> 
> 
> Bugs: MESOS-1390
> https://issues.apache.org/jira/browse/MESOS-1390
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> HTTP Authenticated '/shutdown' endpoint for shutting down a running framework
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 12d84bf 
>   src/master/http.cpp 5d86976 
>   src/master/master.hpp 5fef354 
>   src/master/master.cpp 474014b 
>   src/slave/http.cpp cd7f692 
>   src/slave/slave.hpp 605ee4a 
>   src/slave/slave.cpp f42ab60 
>   src/tests/shutdown_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22832/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Isabel Jimenez
> 
>



Re: Review Request 22832: HTTP Authenticated '/shutdown' endpoint

2014-07-01 Thread Isabel Jimenez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22832/
---

(Updated July 1, 2014, 10:46 p.m.)


Review request for mesos, Adam B, Benjamin Hindman, Dominic Hamon, and Till 
Toenshoff.


Changes
---

Changes after Benh's and Adam's comments


Bugs: MESOS-1390
https://issues.apache.org/jira/browse/MESOS-1390


Repository: mesos-git


Description
---

HTTP Authenticated '/shutdown' endpoint for shutting down a running framework


Diffs (updated)
-

  src/Makefile.am 12d84bf 
  src/master/http.cpp 5d86976 
  src/master/master.hpp 5fef354 
  src/master/master.cpp 474014b 
  src/slave/http.cpp cd7f692 
  src/slave/slave.hpp 605ee4a 
  src/slave/slave.cpp f42ab60 
  src/tests/shutdown_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/22832/diff/


Testing
---

make check


Thanks,

Isabel Jimenez



Re: Review Request 22123: Failover boolean to prevent using large timeout values

2014-07-01 Thread Isabel Jimenez


> On June 26, 2014, 8:39 a.m., Adam B wrote:
> > include/mesos/mesos.proto, line 128
> > 
> >
> > Please add some documentation to the FrameworkInfo comment that 
> > explains what a value of failover=true means and when it should be used.

There is a comment about this on 
https://issues.apache.org/jira/browse/MESOS-1118, do you have something in mind 
?


- Isabel


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22123/#review46725
---


On June 2, 2014, 4:14 a.m., Isabel Jimenez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22123/
> ---
> 
> (Updated June 2, 2014, 4:14 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Dominic Hamon, and Till Toenshoff.
> 
> 
> Bugs: MESOS-1118
> https://issues.apache.org/jira/browse/MESOS-1118
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> I think the name of the boolean is a bit confusing, I could change it into 
> 'nofailover' which I think to be clearer.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 82388e1 
>   src/master/master.cpp 766a0e3 
> 
> Diff: https://reviews.apache.org/r/22123/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Isabel Jimenez
> 
>



Re: Review Request 22313: MESOS-886: Prevented slave from launching tasks before containerize's update completes.

2014-07-01 Thread Yifan Gu


> On June 23, 2014, 6:50 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 1185
> > 
> >
> > Also, what about launching the tasks after updating resources in 
> > registerExecutor()?
> 
> Yifan Gu wrote:
> Sounds good, Should I do it in this patch or open a new JIRA?
> 
> Vinod Kone wrote:
> this one is fine. in fact i would prefer to registerExecutor() one first 
> because that is going to affect all executors. the one in launchTask() when 
> executor is RUNNING can be done latter (maybe in a subsequent patch) because 
> that only affects executors that have multiple tasks.
> 
> Yifan Gu wrote:
> Ok. Will do, I just reverted this patch.

And I found we need to do this too in reregisterExecutor()...


- Yifan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22313/#review46429
---


On July 1, 2014, 8:33 p.m., Yifan Gu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22313/
> ---
> 
> (Updated July 1, 2014, 8:33 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-886
> https://issues.apache.org/jira/browse/MESOS-886
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Added __runTask() to wait for the completion of containerizer->update() and 
> check the result before sending RunTaskMessage.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 605ee4a 
>   src/slave/slave.cpp f42ab60 
>   src/tests/slave_tests.cpp 371a5b8 
> 
> Diff: https://reviews.apache.org/r/22313/diff/
> 
> 
> Testing
> ---
> 
> SlaveTest, CancelTaskIfContainerizerFails
> 
> Which tests that if the containerizer->update() return a Failure, the task 
> won't be launched and the scheduler will get TASK_LOST.
> 
> make check
> 
> 
> File Attachments
> 
> 
> framework will exit
>   
> https://reviews.apache.org/media/uploaded/files/2014/06/18/fbe73273-7aa9-4faa-b1c5-003ab03042a9__issue-886.diff
> log
>   
> https://reviews.apache.org/media/uploaded/files/2014/06/18/84d801a0-5c2a-4bb9-901b-e1962031461c__log
> 
> 
> Thanks,
> 
> Yifan Gu
> 
>



Review Request 23214: PortMapping: allow containers to recover even when they were not managed by Network Isolator previously.

2014-07-01 Thread Chi Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23214/
---

Review request for mesos, Ian Downes, Jie Yu, and Vinod Kone.


Bugs: https://issues.apache.org/jira/browse/MESOS-1557

https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/MESOS-1557


Repository: mesos-git


Description
---

This eases deployment where a slave could be upgraded to use Network Isolator
without removing all the existing tasks.

- Added a new test.
- Moved all portmapping tests to a new file.


Diffs
-

  src/Makefile.am e3ff6d7 
  src/slave/containerizer/isolators/network/port_mapping.cpp a326653 
  src/tests/isolator_tests.cpp 4650f97 
  src/tests/port_mapping_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/23214/diff/


Testing
---


Thanks,

Chi Zhang



Re: Review Request 22313: MESOS-886: Prevented slave from launching tasks before containerize's update completes.

2014-07-01 Thread Yifan Gu


> On June 23, 2014, 6:50 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 1185
> > 
> >
> > Also, what about launching the tasks after updating resources in 
> > registerExecutor()?
> 
> Yifan Gu wrote:
> Sounds good, Should I do it in this patch or open a new JIRA?
> 
> Vinod Kone wrote:
> this one is fine. in fact i would prefer to registerExecutor() one first 
> because that is going to affect all executors. the one in launchTask() when 
> executor is RUNNING can be done latter (maybe in a subsequent patch) because 
> that only affects executors that have multiple tasks.

Ok. Will do, I just reverted this patch.


- Yifan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22313/#review46429
---


On July 1, 2014, 8:33 p.m., Yifan Gu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22313/
> ---
> 
> (Updated July 1, 2014, 8:33 p.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-886
> https://issues.apache.org/jira/browse/MESOS-886
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Added __runTask() to wait for the completion of containerizer->update() and 
> check the result before sending RunTaskMessage.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 605ee4a 
>   src/slave/slave.cpp f42ab60 
>   src/tests/slave_tests.cpp 371a5b8 
> 
> Diff: https://reviews.apache.org/r/22313/diff/
> 
> 
> Testing
> ---
> 
> SlaveTest, CancelTaskIfContainerizerFails
> 
> Which tests that if the containerizer->update() return a Failure, the task 
> won't be launched and the scheduler will get TASK_LOST.
> 
> make check
> 
> 
> File Attachments
> 
> 
> framework will exit
>   
> https://reviews.apache.org/media/uploaded/files/2014/06/18/fbe73273-7aa9-4faa-b1c5-003ab03042a9__issue-886.diff
> log
>   
> https://reviews.apache.org/media/uploaded/files/2014/06/18/84d801a0-5c2a-4bb9-901b-e1962031461c__log
> 
> 
> Thanks,
> 
> Yifan Gu
> 
>



Re: Review Request 22313: MESOS-886: Prevented slave from launching tasks before containerize's update completes.

2014-07-01 Thread Yifan Gu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22313/
---

(Updated July 1, 2014, 8:33 p.m.)


Review request for mesos, Ian Downes and Vinod Kone.


Changes
---

Reverted.


Bugs: MESOS-886
https://issues.apache.org/jira/browse/MESOS-886


Repository: mesos-git


Description
---

Added __runTask() to wait for the completion of containerizer->update() and 
check the result before sending RunTaskMessage.


Diffs (updated)
-

  src/slave/slave.hpp 605ee4a 
  src/slave/slave.cpp f42ab60 
  src/tests/slave_tests.cpp 371a5b8 

Diff: https://reviews.apache.org/r/22313/diff/


Testing
---

SlaveTest, CancelTaskIfContainerizerFails

Which tests that if the containerizer->update() return a Failure, the task 
won't be launched and the scheduler will get TASK_LOST.

make check


File Attachments


framework will exit
  
https://reviews.apache.org/media/uploaded/files/2014/06/18/fbe73273-7aa9-4faa-b1c5-003ab03042a9__issue-886.diff
log
  
https://reviews.apache.org/media/uploaded/files/2014/06/18/84d801a0-5c2a-4bb9-901b-e1962031461c__log


Thanks,

Yifan Gu



Jenkins build is back to normal : mesos-reviewbot #1056

2014-07-01 Thread Apache Jenkins Server
See 



Re: Review Request 22857: Made sure Cluster::Masters::start() returns when the Master is ready for incoming requests. i.e., after it has finished executing _recover().

2014-07-01 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22857/#review47135
---


Patch looks great!

Reviews applied: [22857]

All tests passed.

- Mesos ReviewBot


On July 1, 2014, 6:52 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22857/
> ---
> 
> (Updated July 1, 2014, 6:52 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> See summary.
> 
> 
> Diffs
> -
> 
>   src/tests/cluster.hpp 1c96ee7d27d1f2f277bc4617f9e17092460d7191 
> 
> Diff: https://reviews.apache.org/r/22857/diff/
> 
> 
> Testing
> ---
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>



Re: Review Request 22313: MESOS-886: Prevented slave from launching tasks before containerize's update completes.

2014-07-01 Thread Vinod Kone


> On June 23, 2014, 6:50 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 1185
> > 
> >
> > Also, what about launching the tasks after updating resources in 
> > registerExecutor()?
> 
> Yifan Gu wrote:
> Sounds good, Should I do it in this patch or open a new JIRA?

this one is fine. in fact i would prefer to registerExecutor() one first 
because that is going to affect all executors. the one in launchTask() when 
executor is RUNNING can be done latter (maybe in a subsequent patch) because 
that only affects executors that have multiple tasks.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22313/#review46429
---


On June 19, 2014, 1:30 a.m., Yifan Gu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22313/
> ---
> 
> (Updated June 19, 2014, 1:30 a.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-886
> https://issues.apache.org/jira/browse/MESOS-886
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Added __runTask() to wait for the completion of containerizer->update() and 
> check the result before sending RunTaskMessage.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 34687e5 
>   src/slave/slave.cpp 643c088 
>   src/tests/slave_tests.cpp 2c8f183 
> 
> Diff: https://reviews.apache.org/r/22313/diff/
> 
> 
> Testing
> ---
> 
> SlaveTest, CancelTaskIfContainerizerFails
> 
> Which tests that if the containerizer->update() return a Failure, the task 
> won't be launched and the scheduler will get TASK_LOST.
> 
> make check
> 
> 
> File Attachments
> 
> 
> framework will exit
>   
> https://reviews.apache.org/media/uploaded/files/2014/06/18/fbe73273-7aa9-4faa-b1c5-003ab03042a9__issue-886.diff
> log
>   
> https://reviews.apache.org/media/uploaded/files/2014/06/18/84d801a0-5c2a-4bb9-901b-e1962031461c__log
> 
> 
> Thanks,
> 
> Yifan Gu
> 
>



Jenkins build is back to normal : Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME #2224

2014-07-01 Thread Apache Jenkins Server
See 




Re: Review Request 22857: Made sure Cluster::Masters::start() returns when the Master is ready for incoming requests. i.e., after it has finished executing _recover().

2014-07-01 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22857/#review47127
---

Ship it!


Ship It!

- Vinod Kone


On July 1, 2014, 6:52 p.m., Jiang Yan Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22857/
> ---
> 
> (Updated July 1, 2014, 6:52 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> See summary.
> 
> 
> Diffs
> -
> 
>   src/tests/cluster.hpp 1c96ee7d27d1f2f277bc4617f9e17092460d7191 
> 
> Diff: https://reviews.apache.org/r/22857/diff/
> 
> 
> Testing
> ---
> 
> make check.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>



Re: Review Request 22313: MESOS-886: Prevented slave from launching tasks before containerize's update completes.

2014-07-01 Thread Yifan Gu


> On June 23, 2014, 6:50 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 1185
> > 
> >
> > Also, what about launching the tasks after updating resources in 
> > registerExecutor()?

Sounds good, Should I do it in this patch or open a new JIRA?


- Yifan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22313/#review46429
---


On June 19, 2014, 1:30 a.m., Yifan Gu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22313/
> ---
> 
> (Updated June 19, 2014, 1:30 a.m.)
> 
> 
> Review request for mesos, Ian Downes and Vinod Kone.
> 
> 
> Bugs: MESOS-886
> https://issues.apache.org/jira/browse/MESOS-886
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Added __runTask() to wait for the completion of containerizer->update() and 
> check the result before sending RunTaskMessage.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 34687e5 
>   src/slave/slave.cpp 643c088 
>   src/tests/slave_tests.cpp 2c8f183 
> 
> Diff: https://reviews.apache.org/r/22313/diff/
> 
> 
> Testing
> ---
> 
> SlaveTest, CancelTaskIfContainerizerFails
> 
> Which tests that if the containerizer->update() return a Failure, the task 
> won't be launched and the scheduler will get TASK_LOST.
> 
> make check
> 
> 
> File Attachments
> 
> 
> framework will exit
>   
> https://reviews.apache.org/media/uploaded/files/2014/06/18/fbe73273-7aa9-4faa-b1c5-003ab03042a9__issue-886.diff
> log
>   
> https://reviews.apache.org/media/uploaded/files/2014/06/18/84d801a0-5c2a-4bb9-901b-e1962031461c__log
> 
> 
> Thanks,
> 
> Yifan Gu
> 
>



Re: Review Request 22857: Made sure Cluster::Masters::start() returns when the Master is ready for incoming requests. i.e., after it has finished executing _recover().

2014-07-01 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22857/
---

(Updated July 1, 2014, 11:52 a.m.)


Review request for mesos and Vinod Kone.


Changes
---

Add reviewers


Repository: mesos-git


Description
---

See summary.


Diffs
-

  src/tests/cluster.hpp 1c96ee7d27d1f2f277bc4617f9e17092460d7191 

Diff: https://reviews.apache.org/r/22857/diff/


Testing (updated)
---

make check.


Thanks,

Jiang Yan Xu



Re: Review Request 23203: Fixed building of python egg on osx.

2014-07-01 Thread Thomas Rampelberg


> On July 1, 2014, 5:03 p.m., Benjamin Hindman wrote:
> > configure.ac, line 380
> > 
> >
> > Are you sure that this is still necessary after 
> > https://github.com/apache/mesos/commit/294337466ccdbbc933c17712d9ca6877def3f83e?
> > 
> > I think it would be nice to make what I did even more explicit though 
> > (right now it works because the Python checks are below the clang checks).

Nope, not necessary at all. My bad! Thanks for the heads up =)


- Thomas


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23203/#review47096
---


On July 1, 2014, 4:54 p.m., Thomas Rampelberg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23203/
> ---
> 
> (Updated July 1, 2014, 4:54 p.m.)
> 
> 
> Review request for mesos and Niklas Nielsen.
> 
> 
> Bugs: MESOS-1468
> https://issues.apache.org/jira/browse/MESOS-1468
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Fixed building of python egg on osx.
> 
> 
> Diffs
> -
> 
>   configure.ac e7472081339fc9c773eb2cf2d5f15dc459ac378d 
> 
> Diff: https://reviews.apache.org/r/23203/diff/
> 
> 
> Testing
> ---
> 
> Successfully built the python egg on OSX.
> 
> 
> Thanks,
> 
> Thomas Rampelberg
> 
>



Re: Review Request 23001: Fixed flaky OrphanTasks test.

2014-07-01 Thread Yifan Gu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23001/
---

(Updated July 1, 2014, 6:23 p.m.)


Review request for mesos, Adam B and Vinod Kone.


Bugs: MESOS-1543
https://issues.apache.org/jira/browse/MESOS-1543


Repository: mesos-git


Description
---

Added Clock::settle after StartMaster() to ensure the master finishes executing 
_recover().


Diffs (updated)
-

  src/tests/master_tests.cpp 4dc28f2 

Diff: https://reviews.apache.org/r/23001/diff/


Testing
---

/bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure 
--gtest_filter=*OrphanTask*

Succeeded for 2000+ times.


Thanks,

Yifan Gu



Re: Review Request 22539: Add the low level scheduler example using libprocess

2014-07-01 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22539/#review47116
---

Ship it!


Ship It!

- Vinod Kone


On July 1, 2014, 12:58 a.m., Zuyu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22539/
> ---
> 
> (Updated July 1, 2014, 12:58 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Add the low level scheduler example using libprocess
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 918b0d04a5de69a9213e3d31c8f9424756e4ade5 
>   src/examples/low_level_scheduler_libprocess.cpp PRE-CREATION 
>   src/tests/examples_tests.cpp 34f0233aca3433faba7528ac8c354100b8d3a4f7 
>   src/tests/low_level_scheduler_libprocess_test.sh PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22539/diff/
> 
> 
> Testing
> ---
> 
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from ExamplesTest
> [ RUN  ] ExamplesTest.LowLevelSchedulerLibprocess
> [   OK ] ExamplesTest.LowLevelSchedulerLibprocess (1670 ms)
> [--] 1 test from ExamplesTest (1670 ms total)
> 
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (1684 ms total)
> [  PASSED  ] 1 test.
> 
> 
> Thanks,
> 
> Zuyu Zhang
> 
>



Re: Review Request 23187: Fixed a regression in ExecutorInfoChecker to allow an executor with the same id but different executor info as long as it is on a different slave.

2014-07-01 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23187/#review47109
---

Ship it!


Ship It!

- Jiang Yan Xu


On June 30, 2014, 5:21 p.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23187/
> ---
> 
> (Updated June 30, 2014, 5:21 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1555
> https://issues.apache.org/jira/browse/MESOS-1555
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> See summary and bug for details.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 474014b24982449f2f64c2e8d835c25a28cbcbb8 
>   src/tests/master_authorization_tests.cpp 
> 478041cdea533e548ca92c4b8e8c793554855969 
>   src/tests/resource_offers_tests.cpp 
> 3ec688abea1557a17526589b31f58ad6e53abec4 
> 
> Diff: https://reviews.apache.org/r/23187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> make GTEST_FILTER="*PendingExecutorInfoDiffersOnDifferentSlaves*" 
> GTEST_REPEAT=1000 GTEST_BREAK_ON_FAILURE
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 23203: Fixed building of python egg on osx.

2014-07-01 Thread Benjamin Hindman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23203/#review47096
---



configure.ac


Are you sure that this is still necessary after 
https://github.com/apache/mesos/commit/294337466ccdbbc933c17712d9ca6877def3f83e?

I think it would be nice to make what I did even more explicit though 
(right now it works because the Python checks are below the clang checks).


- Benjamin Hindman


On July 1, 2014, 4:54 p.m., Thomas Rampelberg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23203/
> ---
> 
> (Updated July 1, 2014, 4:54 p.m.)
> 
> 
> Review request for mesos and Niklas Nielsen.
> 
> 
> Bugs: MESOS-1468
> https://issues.apache.org/jira/browse/MESOS-1468
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Fixed building of python egg on osx.
> 
> 
> Diffs
> -
> 
>   configure.ac e7472081339fc9c773eb2cf2d5f15dc459ac378d 
> 
> Diff: https://reviews.apache.org/r/23203/diff/
> 
> 
> Testing
> ---
> 
> Successfully built the python egg on OSX.
> 
> 
> Thanks,
> 
> Thomas Rampelberg
> 
>



Review Request 23203: Fixed building of python egg on osx.

2014-07-01 Thread Thomas Rampelberg

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23203/
---

Review request for mesos and Niklas Nielsen.


Bugs: MESOS-1468
https://issues.apache.org/jira/browse/MESOS-1468


Repository: mesos-git


Description
---

Fixed building of python egg on osx.


Diffs
-

  configure.ac e7472081339fc9c773eb2cf2d5f15dc459ac378d 

Diff: https://reviews.apache.org/r/23203/diff/


Testing
---

Successfully built the python egg on OSX.


Thanks,

Thomas Rampelberg



Re: Review Request 23187: Fixed a regression in ExecutorInfoChecker to allow an executor with the same id but different executor info as long as it is on a different slave.

2014-07-01 Thread Jie Yu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23187/#review47076
---

Ship it!


LGTM

- Jie Yu


On July 1, 2014, 12:21 a.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23187/
> ---
> 
> (Updated July 1, 2014, 12:21 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1555
> https://issues.apache.org/jira/browse/MESOS-1555
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> See summary and bug for details.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 474014b24982449f2f64c2e8d835c25a28cbcbb8 
>   src/tests/master_authorization_tests.cpp 
> 478041cdea533e548ca92c4b8e8c793554855969 
>   src/tests/resource_offers_tests.cpp 
> 3ec688abea1557a17526589b31f58ad6e53abec4 
> 
> Diff: https://reviews.apache.org/r/23187/diff/
> 
> 
> Testing
> ---
> 
> make check
> 
> make GTEST_FILTER="*PendingExecutorInfoDiffersOnDifferentSlaves*" 
> GTEST_REPEAT=1000 GTEST_BREAK_ON_FAILURE
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 22754: First part of the deactivate slaves mechanism - HTTP deactivate endpoint

2014-07-01 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22754/#review47067
---


Patch looks great!

Reviews applied: [22754]

All tests passed.

- Mesos ReviewBot


On July 1, 2014, 2:53 p.m., Alexandra Sava wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22754/
> ---
> 
> (Updated July 1, 2014, 2:53 p.m.)
> 
> 
> Review request for mesos and Ben Mahler.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> This is the first part of the 'deactivate slave' mechanism.
> 
> This part consists of creating an HTTP endpoint which receives HTTP posts. 
> Each post contains a JSON object with a list of slave ids which operators 
> want to deactivate. The list is further passed to the master which marks them 
> as 'deactivated' in the allocator. The final result is that the master will 
> no longer send resource offers belonging to the deactivates slaves.
> 
> The TODO for the second part is to make the list of deactivated slaves 
> persistent, by storing it into the Registry.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 5d869767cd3ed48aae1e702e8d014a37ef371123 
>   src/master/master.hpp 5fef35406c2ce2ad11e030aa7752eb691aab5857 
>   src/master/master.cpp 474014b24982449f2f64c2e8d835c25a28cbcbb8 
> 
> Diff: https://reviews.apache.org/r/22754/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Alexandra Sava
> 
>



Re: Review Request 22754: First part of the deactivate slaves mechanism - HTTP deactivate endpoint

2014-07-01 Thread Alexandra Sava

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22754/
---

(Updated July 1, 2014, 2:53 p.m.)


Review request for mesos and Ben Mahler.


Repository: mesos-git


Description (updated)
---

This is the first part of the 'deactivate slave' mechanism.

This part consists of creating an HTTP endpoint which receives HTTP posts. Each 
post contains a JSON object with a list of slave ids which operators want to 
deactivate. The list is further passed to the master which marks them as 
'deactivated' in the allocator. The final result is that the master will no 
longer send resource offers belonging to the deactivates slaves.

The TODO for the second part is to make the list of deactivated slaves 
persistent, by storing it into the Registry.


Diffs (updated)
-

  src/master/http.cpp 5d869767cd3ed48aae1e702e8d014a37ef371123 
  src/master/master.hpp 5fef35406c2ce2ad11e030aa7752eb691aab5857 
  src/master/master.cpp 474014b24982449f2f64c2e8d835c25a28cbcbb8 

Diff: https://reviews.apache.org/r/22754/diff/


Testing
---


Thanks,

Alexandra Sava



Re: Review Request 23001: Fixed flaky OrphanTasks test.

2014-07-01 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23001/#review47048
---


Patch looks great!

Reviews applied: [23001]

All tests passed.

- Mesos ReviewBot


On July 1, 2014, 8:35 a.m., Yifan Gu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23001/
> ---
> 
> (Updated July 1, 2014, 8:35 a.m.)
> 
> 
> Review request for mesos, Adam B and Vinod Kone.
> 
> 
> Bugs: MESOS-1543
> https://issues.apache.org/jira/browse/MESOS-1543
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Added Clock::settle after StartMaster() to ensure the master finishes 
> executing _recover().
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 4dc28f2 
> 
> Diff: https://reviews.apache.org/r/23001/diff/
> 
> 
> Testing
> ---
> 
> /bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure 
> --gtest_filter=*OrphanTask*
> 
> Succeeded for 2000+ times.
> 
> 
> Thanks,
> 
> Yifan Gu
> 
>



Re: Review Request 23001: Fixed flaky OrphanTasks test.

2014-07-01 Thread Adam B


> On July 1, 2014, 1:41 a.m., Adam B wrote:
> > Ship It!

(pending actually enabling the test by removing "DISABLED_")


- Adam


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23001/#review47045
---


On July 1, 2014, 1:35 a.m., Yifan Gu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23001/
> ---
> 
> (Updated July 1, 2014, 1:35 a.m.)
> 
> 
> Review request for mesos, Adam B and Vinod Kone.
> 
> 
> Bugs: MESOS-1543
> https://issues.apache.org/jira/browse/MESOS-1543
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Added Clock::settle after StartMaster() to ensure the master finishes 
> executing _recover().
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 4dc28f2 
> 
> Diff: https://reviews.apache.org/r/23001/diff/
> 
> 
> Testing
> ---
> 
> /bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure 
> --gtest_filter=*OrphanTask*
> 
> Succeeded for 2000+ times.
> 
> 
> Thanks,
> 
> Yifan Gu
> 
>



Re: Review Request 23001: Fixed flaky OrphanTasks test.

2014-07-01 Thread Adam B

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23001/#review47046
---



src/tests/master_tests.cpp


Make sure to remove "DISABLED_" (and retest)


- Adam B


On July 1, 2014, 1:35 a.m., Yifan Gu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23001/
> ---
> 
> (Updated July 1, 2014, 1:35 a.m.)
> 
> 
> Review request for mesos, Adam B and Vinod Kone.
> 
> 
> Bugs: MESOS-1543
> https://issues.apache.org/jira/browse/MESOS-1543
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Added Clock::settle after StartMaster() to ensure the master finishes 
> executing _recover().
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 4dc28f2 
> 
> Diff: https://reviews.apache.org/r/23001/diff/
> 
> 
> Testing
> ---
> 
> /bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure 
> --gtest_filter=*OrphanTask*
> 
> Succeeded for 2000+ times.
> 
> 
> Thanks,
> 
> Yifan Gu
> 
>



Re: Review Request 23001: Fixed flaky OrphanTasks test.

2014-07-01 Thread Adam B

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23001/#review47045
---

Ship it!


Ship It!

- Adam B


On July 1, 2014, 1:35 a.m., Yifan Gu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23001/
> ---
> 
> (Updated July 1, 2014, 1:35 a.m.)
> 
> 
> Review request for mesos, Adam B and Vinod Kone.
> 
> 
> Bugs: MESOS-1543
> https://issues.apache.org/jira/browse/MESOS-1543
> 
> 
> Repository: mesos-git
> 
> 
> Description
> ---
> 
> Added Clock::settle after StartMaster() to ensure the master finishes 
> executing _recover().
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 4dc28f2 
> 
> Diff: https://reviews.apache.org/r/23001/diff/
> 
> 
> Testing
> ---
> 
> /bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure 
> --gtest_filter=*OrphanTask*
> 
> Succeeded for 2000+ times.
> 
> 
> Thanks,
> 
> Yifan Gu
> 
>



Re: Review Request 23001: Fixed flaky OrphanTasks test.

2014-07-01 Thread Yifan Gu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23001/
---

(Updated July 1, 2014, 8:35 a.m.)


Review request for mesos, Adam B and Vinod Kone.


Bugs: MESOS-1543
https://issues.apache.org/jira/browse/MESOS-1543


Repository: mesos-git


Description
---

Added Clock::settle after StartMaster() to ensure the master finishes executing 
_recover().


Diffs (updated)
-

  src/tests/master_tests.cpp 4dc28f2 

Diff: https://reviews.apache.org/r/23001/diff/


Testing
---

/bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure 
--gtest_filter=*OrphanTask*

Succeeded for 2000+ times.


Thanks,

Yifan Gu



Re: Review Request 23001: Fixed flaky OrphanTasks test.

2014-07-01 Thread Yifan Gu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23001/
---

(Updated July 1, 2014, 8:34 a.m.)


Review request for mesos, Adam B and Vinod Kone.


Summary (updated)
-

Fixed flaky OrphanTasks test.


Bugs: MESOS-1543
https://issues.apache.org/jira/browse/MESOS-1543


Repository: mesos-git


Description
---

Added Clock::settle after StartMaster() to ensure the master finishes executing 
_recover().


Diffs (updated)
-

  src/tests/master_tests.cpp 4dc28f2 

Diff: https://reviews.apache.org/r/23001/diff/


Testing
---

/bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure 
--gtest_filter=*OrphanTask*

Succeeded for 2000+ times.


Thanks,

Yifan Gu



Re: Review Request 23001: Fix race condition in OrphanTasks test

2014-07-01 Thread Yifan Gu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23001/
---

(Updated July 1, 2014, 8:32 a.m.)


Review request for mesos, Adam B and Vinod Kone.


Bugs: MESOS-1543
https://issues.apache.org/jira/browse/MESOS-1543


Repository: mesos-git


Description (updated)
---

Added Clock::settle after StartMaster() to ensure the master finishes executing 
_recover().


Diffs (updated)
-

  src/tests/master_tests.cpp 4dc28f2 

Diff: https://reviews.apache.org/r/23001/diff/


Testing (updated)
---

/bin/mesos-tests.sh --gtest_repeat=-1 --gtest_break_on_failure 
--gtest_filter=*OrphanTask*

Succeeded for 2000+ times.


Thanks,

Yifan Gu



Tasks always lost

2014-07-01 Thread qingyang li
i am using mesos0.19 and spark0.9.0 ,  the mesos cluster is started, when I
using spark-shell to submit one job, the tasks always lost.  here is the
log:
--
14/07/01 16:24:27 INFO DAGScheduler: Host gained which was in lost list
earlier: bigdata005
14/07/01 16:24:27 INFO TaskSetManager: Starting task 0.0:1 as TID 4042 on
executor 20140616-143932-1694607552-5050-4080-2: bigdata005 (PROCESS_LOCAL)
14/07/01 16:24:27 INFO TaskSetManager: Serialized task 0.0:1 as 1570 bytes
in 0 ms
14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for
20140616-104524-1694607552-5050-26919-1 from TaskSet 0.0
14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4041 (task 0.0:0)
14/07/01 16:24:28 INFO DAGScheduler: Executor lost:
20140616-104524-1694607552-5050-26919-1 (epoch 3427)
14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor
20140616-104524-1694607552-5050-26919-1 from BlockManagerMaster.
14/07/01 16:24:28 INFO BlockManagerMaster: Removed
20140616-104524-1694607552-5050-26919-1 successfully in removeExecutor
14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for
20140616-143932-1694607552-5050-4080-2 from TaskSet 0.0
14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4042 (task 0.0:1)
14/07/01 16:24:28 INFO DAGScheduler: Executor lost:
20140616-143932-1694607552-5050-4080-2 (epoch 3428)
14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor
20140616-143932-1694607552-5050-4080-2 from BlockManagerMaster.
14/07/01 16:24:28 INFO BlockManagerMaster: Removed
20140616-143932-1694607552-5050-4080-2 successfully in removeExecutor
14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list
earlier: bigdata005
14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list
earlier: bigdata001
14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:1 as TID 4043 on
executor 20140616-143932-1694607552-5050-4080-2: bigdata005 (PROCESS_LOCAL)
14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:1 as 1570 bytes
in 0 ms
14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:0 as TID 4044 on
executor 20140616-104524-1694607552-5050-26919-1: bigdata001 (PROCESS_LOCAL)
14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:0 as 1570 bytes
in 0 ms


it seems other guy has also encountered such problem,
http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201305.mbox/%3c201305161047069952...@nfs.iscas.ac.cn%3E