Build failed in Jenkins: mesos-reviewbot #1534

2014-09-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/mesos-reviewbot/1534/changes

Changes:

[bmahler] Fixed the flaky FaultToleranceTest.ReconcilePendingTasks.

--
[...truncated 5696 lines...]
Removing aclocal.m4
Removing ar-lib
Removing autom4te.cache/
Removing bin/gdb-mesos-local.sh
Removing bin/gdb-mesos-master.sh
Removing bin/gdb-mesos-slave.sh
Removing bin/gdb-mesos-tests.sh
Removing bin/lldb-mesos-local.sh
Removing bin/lldb-mesos-master.sh
Removing bin/lldb-mesos-slave.sh
Removing bin/lldb-mesos-tests.sh
Removing bin/mesos-local-flags.sh
Removing bin/mesos-local.sh
Removing bin/mesos-master-flags.sh
Removing bin/mesos-master.sh
Removing bin/mesos-slave-flags.sh
Removing bin/mesos-slave.sh
Removing bin/mesos-tests-flags.sh
Removing bin/mesos-tests.sh
Removing bin/mesos.sh
Removing bin/valgrind-mesos-local.sh
Removing bin/valgrind-mesos-master.sh
Removing bin/valgrind-mesos-slave.sh
Removing bin/valgrind-mesos-tests.sh
Removing compile
Removing config.guess
Removing config.log
Removing config.lt
Removing config.status
Removing config.sub
Removing configure
Removing depcomp
Removing ec2/Makefile
Removing ec2/Makefile.in
Removing include/mesos/mesos.hpp
Removing install-sh
Removing libtool
Removing ltmain.sh
Removing m4/libtool.m4
Removing m4/ltoptions.m4
Removing m4/ltsugar.m4
Removing m4/ltversion.m4
Removing m4/lt~obsolete.m4
Removing mesos-0.21.0.tar.gz
Removing mesos.pc
Removing missing
Removing mpi/mpiexec-mesos
Removing src/.deps/
Removing src/Makefile
Removing src/Makefile.in
Removing src/authorizer/.deps/
Removing src/cli/.deps/
Removing src/common/.deps/
Removing src/containerizer/
Removing src/deploy/mesos-daemon.sh
Removing src/deploy/mesos-start-cluster.sh
Removing src/deploy/mesos-start-masters.sh
Removing src/deploy/mesos-start-slaves.sh
Removing src/deploy/mesos-stop-cluster.sh
Removing src/deploy/mesos-stop-masters.sh
Removing src/deploy/mesos-stop-slaves.sh
Removing src/docker/.deps/
Removing src/examples/.deps/
Removing src/examples/java/test-exception-framework
Removing src/examples/java/test-executor
Removing src/examples/java/test-framework
Removing src/examples/java/test-log
Removing src/examples/java/test-multiple-executors-framework
Removing src/examples/python/test-containerizer
Removing src/examples/python/test-executor
Removing src/examples/python/test-framework
Removing src/exec/.deps/
Removing src/files/.deps/
Removing src/health-check/.deps/
Removing src/java/generated/org/apache/mesos/MesosNativeLibrary.java
Removing src/java/jni/.deps/
Removing src/java/mesos.pom
Removing src/jvm/.deps/
Removing src/jvm/org/apache/.deps/
Removing src/launcher/.deps/
Removing src/linux/.deps/
Removing src/linux/routing/.deps/
Removing src/linux/routing/filter/.deps/
Removing src/linux/routing/link/.deps/
Removing src/linux/routing/queueing/.deps/
Removing src/local/.deps/
Removing src/log/.deps/
Removing src/log/tool/.deps/
Removing src/logging/.deps/
Removing src/master/.deps/
Removing src/messages/.deps/
Removing src/python/interface/setup.py
Removing src/python/native/ext_modules.py
Removing src/python/native/setup.py
Removing src/python/setup.py
Removing src/sasl/.deps/
Removing src/sched/.deps/
Removing src/scheduler/.deps/
Removing src/slave/.deps/
Removing src/slave/containerizer/.deps/
Removing src/slave/containerizer/isolators/cgroups/.deps/
Removing src/slave/containerizer/isolators/network/.deps/
Removing src/slave/containerizer/mesos/.deps/
Removing src/state/.deps/
Removing src/tests/.deps/
Removing src/tests/common/.deps/
Removing src/usage/.deps/
Removing src/zookeeper/.deps/
+ git reset --hard HEAD
HEAD is now at 9b2bbba Fixed the flaky FaultToleranceTest.ReconcilePendingTasks.
+ date
Sat Sep 13 04:31:02 UTC 2014
+ ./support/verify-reviews.py mesos-review mesos-review42 1
Build timed out (after 180 minutes). Marking the build as failed.
Build was aborted
Checking if review: 25250 needs verification
Latest diff timestamp: 2014-09-07 18:33:13
Latest review timestamp: 2014-09-07 19:41:08
Checking if review: 24984 needs verification
Latest diff timestamp: 2014-09-04 09:27:05
Latest review timestamp: 2014-09-05 01:45:02
Checking if review: 25191 needs verification
Skipping blocking review 25191
Checking if review: 25448 needs verification
Latest diff timestamp: 2014-09-08 17:26:38
Latest review timestamp: 2014-09-08 22:36:18
Checking if review: 24264 needs verification
Latest diff timestamp: 2014-09-02 21:24:08
Latest review timestamp: 2014-09-03 13:39:00
Checking if review: 24776 needs verification
Latest diff timestamp: 2014-08-16 22:23:18
Latest review timestamp: 2014-08-17 01:31:38
Checking if review: 25487 needs verification
Latest diff timestamp: 2014-09-10 17:59:55
Latest review timestamp: 2014-09-11 04:17:45
Checking if review: 25511 needs verification
Latest diff timestamp: 2014-09-10 17:59:59
Latest review timestamp: 2014-09-11 05:27:04
Checking if review: 25403 needs verification
Latest diff timestamp: 2014-09-11 00:40:06
Latest review 

Jenkins build is back to normal : mesos-reviewbot #1535

2014-09-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/mesos-reviewbot/1535/



Re: Review Request 25569: Only perform docker validation once for tests

2014-09-13 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25569/#review53262
---


Patch looks great!

Reviews applied: [25569]

All tests passed.

- Mesos ReviewBot


On Sept. 12, 2014, 4:31 a.m., Timothy Chen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25569/
 ---
 
 (Updated Sept. 12, 2014, 4:31 a.m.)
 
 
 Review request for mesos and Ben Mahler.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Only perform docker validation once for tests
 
 
 Diffs
 -
 
   src/tests/environment.cpp 2274251aaf653d83c2d03ef2186763978067a747 
 
 Diff: https://reviews.apache.org/r/25569/diff/
 
 
 Testing
 ---
 
 make check
 
 
 Thanks,
 
 Timothy Chen
 




Build failed in Jenkins: Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui #2373

2014-09-13 Thread Apache Jenkins Server
 07:19:54.294569  5764 slave.cpp:289] Slave resources: cpus(*):2; 
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0913 07:19:54.294627  5764 slave.cpp:317] Slave hostname: pomona.apache.org
I0913 07:19:54.294639  5764 slave.cpp:318] Slave checkpoint: false
I0913 07:19:54.295058  5764 state.cpp:33] Recovering state from 
'/tmp/GarbageCollectorIntegrationTest_DiskUsage_1giBe1/meta'
I0913 07:19:54.295155  5764 status_update_manager.cpp:193] Recovering status 
update manager
I0913 07:19:54.295239  5764 slave.cpp:3219] Finished recovery
I0913 07:19:54.295469  5764 slave.cpp:600] New master detected at 
master@67.195.81.187:57627
I0913 07:19:54.295492  5764 slave.cpp:674] Authenticating with master 
master@67.195.81.187:57627
I0913 07:19:54.295533  5764 slave.cpp:647] Detecting new master
I0913 07:19:54.295568  5764 status_update_manager.cpp:167] New master detected 
at master@67.195.81.187:57627
I0913 07:19:54.295603  5764 authenticatee.hpp:128] Creating new client SASL 
connection
I0913 07:19:54.295697  5764 master.cpp:3653] Authenticating 
slave(115)@67.195.81.187:57627
I0913 07:19:54.295781  5764 authenticator.hpp:156] Creating new server SASL 
connection
I0913 07:19:54.295841  5764 authenticatee.hpp:219] Received SASL authentication 
mechanisms: CRAM-MD5
I0913 07:19:54.295860  5764 authenticatee.hpp:245] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I0913 07:19:54.295886  5764 authenticator.hpp:262] Received SASL authentication 
start
I0913 07:19:54.295920  5764 authenticator.hpp:384] Authentication requires more 
steps
I0913 07:19:54.295948  5764 authenticatee.hpp:265] Received SASL authentication 
step
I0913 07:19:54.295987  5764 authenticator.hpp:290] Received SASL authentication 
step
I0913 07:19:54.296005  5764 auxprop.cpp:81] Request to lookup properties for 
user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 
'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false 
SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0913 07:19:54.296015  5764 auxprop.cpp:153] Looking up auxiliary property 
'*userPassword'
I0913 07:19:54.296030  5764 auxprop.cpp:153] Looking up auxiliary property 
'*cmusaslsecretCRAM-MD5'
I0913 07:19:54.296042  5764 auxprop.cpp:81] Request to lookup properties for 
user: 'test-principal' realm: 'pomona.apache.org' server FQDN: 
'pomona.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false 
SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0913 07:19:54.296051  5764 auxprop.cpp:103] Skipping auxiliary property 
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0913 07:19:54.296058  5764 auxprop.cpp:103] Skipping auxiliary property 
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0913 07:19:54.296073  5764 authenticator.hpp:376] Authentication success
I0913 07:19:54.296103  5764 authenticatee.hpp:305] Authentication success
I0913 07:19:54.296128  5764 master.cpp:3693] Successfully authenticated 
principal 'test-principal' at slave(115)@67.195.81.187:57627
I0913 07:19:54.296193  5764 slave.cpp:731] Successfully authenticated with 
master master@67.195.81.187:57627
I0913 07:19:54.296231  5764 slave.cpp:994] Will retry registration in 
19.491161ms if necessary
I0913 07:19:54.296299  5764 master.cpp:2843] Registering slave at 
slave(115)@67.195.81.187:57627 (pomona.apache.org) with id 
20140913-071954-3142697795-57627-5741-0
I0913 07:19:54.296406  5764 registrar.cpp:422] Attempting to update the 
'registry'
I0913 07:19:54.297817  5764 log.cpp:680] Attempting to append 332 bytes to the 
log
I0913 07:19:54.297863  5764 coordinator.cpp:340] Coordinator attempting to 
write APPEND action at position 3
I0913 07:19:54.298079  5764 replica.cpp:508] Replica received write request for 
position 3
I0913 07:19:54.298280  5764 leveldb.cpp:343] Persisting action (351 bytes) to 
leveldb took 180777ns
I0913 07:19:54.298295  5764 replica.cpp:676] Persisted action at 3
I0913 07:19:54.298491  5764 replica.cpp:655] Replica received learned notice 
for position 3
I0913 07:19:54.298826  5764 leveldb.cpp:343] Persisting action (353 bytes) to 
leveldb took 318226ns
I0913 07:19:54.298843  5764 replica.cpp:676] Persisted action at 3
I0913 07:19:54.298853  5764 replica.cpp:661] Replica learned APPEND action at 
position 3
I0913 07:19:54.299161  5764 registrar.cpp:479] Successfully updated 'registry'
I0913 07:19:54.299239  5764 log.cpp:699] Attempting to truncate the log to 3
I0913 07:19:54.299294  5764 master.cpp:2883] Registered slave 
20140913-071954-3142697795-57627-5741-0 at slave(115)@67.195.81.187:57627 
(pomona.apache.org)
I0913 07:19:54.299310  5764 master.cpp:4126] Adding slave 
20140913-071954-3142697795-57627-5741-0 at slave(115)@67.195.81.187:57627 
(pomona.apache.org) with cpus(*):2; mem(*):1024; disk(*):1024; 
ports(*):[31000-32000]
I0913 07:19:54.299408  5764 coordinator.cpp:340] Coordinator attempting to 
write TRUNCATE action at position 4
I0913 07:19:54.299609  5764 slave.cpp:765] Registered with master 
master@67.195.81.187:57627; given slave ID

Re: Review Request 25597: Added a version checker class to stout.

2014-09-13 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25597/#review53263
---


Bad patch!

Reviews applied: [25597]

Failed command: ./support/mesos-style.py

Error:
 Checking 506 files using filter 
--filter=-,+build/class,+build/deprecated,+build/endif_comment,+readability/todo,+readability/namespace,+runtime/vlog,+whitespace/blank_line,+whitespace/comma,+whitespace/end_of_line,+whitespace/ending_newline,+whitespace/forcolon,+whitespace/indent,+whitespace/line_length,+whitespace/tab,+whitespace/todo
3rdparty/libprocess/3rdparty/stout/include/stout/version.hpp:27:  public: 
should not be indented inside class Version  [whitespace/indent] [3]
3rdparty/libprocess/3rdparty/stout/include/stout/version.hpp:85:  private: 
should not be indented inside class Version  [whitespace/indent] [3]
Total errors found: 2

- Mesos ReviewBot


On Sept. 13, 2014, 12:14 a.m., Kapil Arya wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25597/
 ---
 
 (Updated Sept. 13, 2014, 12:14 a.m.)
 
 
 Review request for mesos, Adam B and Niklas Nielsen.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Currently there is no facility in Mesos for checking compatibility of various 
 Mesos components that could have been built at different times with 
 potentially different Mesos versions.  This requirement is especially 
 important for doing various compatibility checks between Mesos and Mesos 
 modules (WIP).
 
 - Features major, minor, and patch numbers.
 - Convenience functions for comparing two versions.
 
 
 Diffs
 -
 
   3rdparty/libprocess/3rdparty/Makefile.am 
 db9766d70adb9076946cd2b467c55636fe5f7235 
   3rdparty/libprocess/3rdparty/stout/Makefile.am 
 b6464de53c3873ecd0b62a08ca9aac530043ffb9 
   3rdparty/libprocess/3rdparty/stout/include/Makefile.am 
 6fa5b741bdd7f089ba93bf6fea43b9f39f8f0edb 
   3rdparty/libprocess/3rdparty/stout/include/stout/version.hpp PRE-CREATION 
   3rdparty/libprocess/3rdparty/stout/tests/version_tests.cpp PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25597/diff/
 
 
 Testing
 ---
 
 Added a stout test and ran make check
 
 
 Thanks,
 
 Kapil Arya
 




Build failed in Jenkins: mesos-reviewbot #1537

2014-09-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/mesos-reviewbot/1537/

--
[...truncated 5393 lines...]
rm -f slave/containerizer/isolators/network/.dirstamp
rm -f sched/*.o
rm -f slave/containerizer/mesos/.deps/.dirstamp
rm -f sched/*.lo
rm -f slave/containerizer/mesos/.dirstamp
rm -f state/.deps/.dirstamp
rm -f scheduler/*.o
rm -f state/.dirstamp
rm -f scheduler/*.lo
rm -f tests/.deps/.dirstamp
rm -f slave/*.o
rm -f tests/.dirstamp
rm -f slave/*.lo
rm -f tests/common/.deps/.dirstamp
rm -f slave/containerizer/*.o
rm -f tests/common/.dirstamp
rm -f usage/.deps/.dirstamp
rm -f usage/.dirstamp
rm -f zookeeper/.deps/.dirstamp
rm -f slave/containerizer/*.lo
rm -f zookeeper/.dirstamp
rm -f slave/containerizer/isolators/cgroups/*.o
rm -f slave/containerizer/isolators/cgroups/*.lo
rm -f slave/containerizer/isolators/network/*.o
rm -f slave/containerizer/isolators/network/*.lo
rm -f slave/containerizer/mesos/*.o
rm -f slave/containerizer/mesos/*.lo
rm -f state/*.o
rm -f state/*.lo
rm -f tests/*.o
rm -rf authorizer/.libs authorizer/_libs
rm -rf common/.libs common/_libs
rm -rf containerizer/.libs containerizer/_libs
rm -rf docker/.libs docker/_libs
rm -rf exec/.libs exec/_libs
rm -rf files/.libs files/_libs
rm -rf java/jni/.libs java/jni/_libs
rm -rf jvm/.libs jvm/_libs
rm -rf jvm/org/apache/.libs jvm/org/apache/_libs
rm -rf linux/.libs linux/_libs
rm -rf linux/routing/.libs linux/routing/_libs
rm -rf linux/routing/filter/.libs linux/routing/filter/_libs
rm -rf linux/routing/link/.libs linux/routing/link/_libs
rm -rf linux/routing/queueing/.libs linux/routing/queueing/_libs
rm -rf local/.libs local/_libs
rm -rf log/.libs log/_libs
rm -rf log/tool/.libs log/tool/_libs
rm -rf logging/.libs logging/_libs
rm -rf master/.libs master/_libs
rm -rf messages/.libs messages/_libs
rm -rf sasl/.libs sasl/_libs
rm -rf sched/.libs sched/_libs
rm -rf scheduler/.libs scheduler/_libs
rm -rf slave/.libs slave/_libs
rm -rf slave/containerizer/.libs slave/containerizer/_libs
rm -rf slave/containerizer/isolators/cgroups/.libs 
slave/containerizer/isolators/cgroups/_libs
rm -rf slave/containerizer/isolators/network/.libs 
slave/containerizer/isolators/network/_libs
rm -rf slave/containerizer/mesos/.libs slave/containerizer/mesos/_libs
rm -f tests/common/*.o
rm -f usage/*.o
rm -rf state/.libs state/_libs
rm -f usage/*.lo
rm -f zookeeper/*.o
rm -rf usage/.libs usage/_libs
rm -rf zookeeper/.libs zookeeper/_libs
rm -f zookeeper/*.lo
rm -rf ./.deps authorizer/.deps cli/.deps common/.deps containerizer/.deps 
docker/.deps examples/.deps exec/.deps files/.deps health-check/.deps 
java/jni/.deps jvm/.deps jvm/org/apache/.deps launcher/.deps linux/.deps 
linux/routing/.deps linux/routing/filter/.deps linux/routing/link/.deps 
linux/routing/queueing/.deps local/.deps log/.deps log/tool/.deps logging/.deps 
master/.deps messages/.deps sasl/.deps sched/.deps scheduler/.deps slave/.deps 
slave/containerizer/.deps slave/containerizer/isolators/cgroups/.deps 
slave/containerizer/isolators/network/.deps slave/containerizer/mesos/.deps 
state/.deps tests/.deps tests/common/.deps usage/.deps zookeeper/.deps
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/src'
Making distclean in ec2
make[2]: Entering directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -rf .libs _libs
rm -f *.lo
test -z  || rm -f 
test . = ../../ec2 || test -z  || rm -f 
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -f config.status config.cache config.log configure.lineno 
config.status.lineno
rm -f Makefile
make[1]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build'
if test -d mesos-0.21.0; then find mesos-0.21.0 -type d ! -perm -200 -exec 
chmod u+w {} ';'  rm -rf mesos-0.21.0 || { sleep 5  rm -rf 
mesos-0.21.0; }; else :; fi
==
mesos-0.21.0 archives ready for distribution: 
mesos-0.21.0.tar.gz
==

real96m11.007s
user144m13.981s
sys 8m19.950s
+ chmod -R +w 3rdparty CHANGELOG Doxyfile LICENSE Makefile Makefile.am 
Makefile.in NOTICE README.md aclocal.m4 ar-lib autom4te.cache bin bootstrap 
compile config.guess config.log config.lt config.status config.sub configure 
configure.ac depcomp docs ec2 frameworks include install-sh libtool ltmain.sh 
m4 mesos-0.21.0.tar.gz mesos.pc mesos.pc.in missing mpi src support
+ git clean -fdx
Removing .libs/
Removing 3rdparty/Makefile
Removing 3rdparty/Makefile.in
Removing 3rdparty/libprocess/.deps/
Removing 3rdparty/libprocess/3rdparty/.deps/
Removing 3rdparty/libprocess/3rdparty/Makefile
Removing 3rdparty/libprocess/3rdparty/Makefile.in
Removing 3rdparty/libprocess/3rdparty/gmock_sources.cc
Removing 3rdparty/libprocess/3rdparty/stout/Makefile
Removing 3rdparty/libprocess/3rdparty/stout/Makefile.in
Removing 

Re: Review Request 25035: Fix for MESOS-1688

2014-09-13 Thread Martin Weindel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
---

(Updated Sept. 13, 2014, 6:56 nachm.)


Review request for mesos and Vinod Kone.


Bugs: MESOS-1688
https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
---

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs (updated)
-

  CHANGELOG a822cc4 
  src/common/resources.cpp edf36b1 
  src/master/constants.cpp faa1503 
  src/master/hierarchical_allocator_process.hpp 34f8cd6 
  src/master/master.cpp 18464ba 
  src/tests/allocator_tests.cpp 774528a 

Diff: https://reviews.apache.org/r/25035/diff/


Testing
---

Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
running multiple parallel Spark jobs in fine-grained mode to saturate 
allocatable memory. The jobs run fine now. This load always caused a dead lock 
in all Spark jobs within one minute with the unpatched Mesos.


Thanks,

Martin Weindel



Re: Review Request 25035: Fix for MESOS-1688

2014-09-13 Thread Martin Weindel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
---

(Updated Sept. 13, 2014, 7:10 nachm.)


Review request for mesos and Vinod Kone.


Changes
---

improved understandability of patch in Resources::find()


Bugs: MESOS-1688
https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
---

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs (updated)
-

  CHANGELOG a822cc4 
  src/common/resources.cpp edf36b1 
  src/master/constants.cpp faa1503 
  src/master/hierarchical_allocator_process.hpp 34f8cd6 
  src/master/master.cpp 18464ba 
  src/tests/allocator_tests.cpp 774528a 

Diff: https://reviews.apache.org/r/25035/diff/


Testing
---

Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
running multiple parallel Spark jobs in fine-grained mode to saturate 
allocatable memory. The jobs run fine now. This load always caused a dead lock 
in all Spark jobs within one minute with the unpatched Mesos.


Thanks,

Martin Weindel



Re: Dynamic Resource Roles

2014-09-13 Thread Tom Arnfeld
Awesome! That's great to hear. Let me know if there's anything I can help
with.

I can't seem to find a JIRA issue (came across this
https://issues.apache.org/jira/browse/MESOS-505 but seems very old) for it.
So I've made https://issues.apache.org/jira/browse/MESOS-1791.

On 10 September 2014 18:03, Adam Bordelon a...@mesosphere.io wrote:

 BenH has been calling these master reservations (globally control
 reservations across all slaves through the master) and offer reservations
 (I don't care which nodes it's on, as long as I get X cpu and Y RAM, or Z
 sets of {X,Y}), and they're definitely on the roadmap.

 On Wed, Sep 10, 2014 at 9:05 AM, Tom Arnfeld t...@duedil.com wrote:

  That's very cool, thanks.
 
  On Wed, Sep 10, 2014 at 4:59 PM, Timothy Chen tnac...@gmail.com wrote:
 
   Hi Tom,
   Reservations is definitely something we've discussed and will be
  addressed in the near future.
   Tim
   On Sep 10, 2014, at 7:49 AM, Tom Arnfeld t...@duedil.com wrote:
  
   Hey everyone,
  
   Just a quick question. Has the ever been any discussion around dynamic
   roles?
  
   What I mean by this – currently if I want to guarantee 1 core and 10
 GB
  of
   ram to a specific type of framework (or role) I need to do this at a
   slave level. This means if I only want to guarantee a small number of
   resources, I could do this on one slave. If that slave dies, that
  resource
   is no longer available.
  
   It would be interesting to see the master (DRF scheduler) capable of
   reserving a minimum about of resource for offering only to frameworks
  of a
   certain role, such that I can guarantee R amount of resources on N
  slaves
   across the cluster as a whole.
  
   Tom.
 



Jenkins build is back to normal : mesos-reviewbot #1538

2014-09-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/mesos-reviewbot/1538/



Re: Review Request 25525: MESOS-1739: Allow slave reconfiguration on restart

2014-09-13 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25525/#review53271
---


Patch looks great!

Reviews applied: [25261, 25525]

All tests passed.

- Mesos ReviewBot


On Sept. 13, 2014, 12:33 a.m., Cody Maloney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25525/
 ---
 
 (Updated Sept. 13, 2014, 12:33 a.m.)
 
 
 Review request for mesos, Adam B, Benjamin Hindman, Patrick Reilly, and Vinod 
 Kone.
 
 
 Bugs: MESOS-1739
 https://issues.apache.org/jira/browse/MESOS-1739
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Allows attributes and resources to be set to a superset of what they were 
 previously on a slave restart.
 
 Incorporates all comments from: 
 https://issues.apache.org/jira/browse/MESOS-1739
 and the former review request:
 https://reviews.apache.org/r/25111/
 
 
 Diffs
 -
 
   src/Makefile.am 9b973e5 
   src/common/attributes.hpp 0a043d5 
   src/common/attributes.cpp aab114e 
   src/common/slaveinfo_utils.hpp PRE-CREATION 
   src/common/slaveinfo_utils.cpp PRE-CREATION 
   src/master/master.hpp b492600 
   src/master/master.cpp d5db24e 
   src/slave/slave.cpp 1b3dc73 
   src/tests/attributes_tests.cpp 240a8ca 
   src/tests/slave_tests.cpp 69be28f 
 
 Diff: https://reviews.apache.org/r/25525/diff/
 
 
 Testing
 ---
 
 make check on localhost
 
 
 Thanks,
 
 Cody Maloney
 




Build failed in Jenkins: mesos-reviewbot #1539

2014-09-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/mesos-reviewbot/1539/

--
[...truncated 5546 lines...]
rm -f slave/containerizer/mesos/.deps/.dirstamp
rm -f slave/containerizer/mesos/.dirstamp
rm -f state/.deps/.dirstamp
rm -f state/.dirstamp
rm -f tests/.deps/.dirstamp
rm -f sasl/*.lo
rm -f tests/.dirstamp
rm -f sched/*.o
rm -f tests/common/.deps/.dirstamp
rm -f tests/common/.dirstamp
rm -f usage/.deps/.dirstamp
rm -f sched/*.lo
rm -f usage/.dirstamp
rm -f scheduler/*.o
rm -f zookeeper/.deps/.dirstamp
rm -f scheduler/*.lo
rm -f zookeeper/.dirstamp
rm -f slave/*.o
rm -f slave/*.lo
rm -f slave/containerizer/*.o
rm -f slave/containerizer/*.lo
rm -f slave/containerizer/isolators/cgroups/*.o
rm -f slave/containerizer/isolators/cgroups/*.lo
rm -f slave/containerizer/isolators/network/*.o
rm -f slave/containerizer/isolators/network/*.lo
rm -f slave/containerizer/mesos/*.o
rm -f slave/containerizer/mesos/*.lo
rm -f state/*.o
rm -f state/*.lo
rm -f tests/*.o
rm -rf authorizer/.libs authorizer/_libs
rm -rf common/.libs common/_libs
rm -rf containerizer/.libs containerizer/_libs
rm -rf docker/.libs docker/_libs
rm -rf exec/.libs exec/_libs
rm -rf files/.libs files/_libs
rm -rf java/jni/.libs java/jni/_libs
rm -rf jvm/.libs jvm/_libs
rm -rf jvm/org/apache/.libs jvm/org/apache/_libs
rm -rf linux/.libs linux/_libs
rm -rf linux/routing/.libs linux/routing/_libs
rm -rf linux/routing/filter/.libs linux/routing/filter/_libs
rm -rf linux/routing/link/.libs linux/routing/link/_libs
rm -rf linux/routing/queueing/.libs linux/routing/queueing/_libs
rm -rf local/.libs local/_libs
rm -rf log/.libs log/_libs
rm -rf log/tool/.libs log/tool/_libs
rm -rf logging/.libs logging/_libs
rm -rf master/.libs master/_libs
rm -rf messages/.libs messages/_libs
rm -rf sasl/.libs sasl/_libs
rm -rf sched/.libs sched/_libs
rm -f tests/common/*.o
rm -f usage/*.o
rm -f usage/*.lo
rm -f zookeeper/*.o
rm -rf scheduler/.libs scheduler/_libs
rm -f zookeeper/*.lo
rm -rf slave/.libs slave/_libs
rm -rf slave/containerizer/.libs slave/containerizer/_libs
rm -rf slave/containerizer/isolators/cgroups/.libs 
slave/containerizer/isolators/cgroups/_libs
rm -rf slave/containerizer/isolators/network/.libs 
slave/containerizer/isolators/network/_libs
rm -rf slave/containerizer/mesos/.libs slave/containerizer/mesos/_libs
rm -rf state/.libs state/_libs
rm -rf usage/.libs usage/_libs
rm -rf zookeeper/.libs zookeeper/_libs
rm -rf ./.deps authorizer/.deps cli/.deps common/.deps containerizer/.deps 
docker/.deps examples/.deps exec/.deps files/.deps health-check/.deps 
java/jni/.deps jvm/.deps jvm/org/apache/.deps launcher/.deps linux/.deps 
linux/routing/.deps linux/routing/filter/.deps linux/routing/link/.deps 
linux/routing/queueing/.deps local/.deps log/.deps log/tool/.deps logging/.deps 
master/.deps messages/.deps sasl/.deps sched/.deps scheduler/.deps slave/.deps 
slave/containerizer/.deps slave/containerizer/isolators/cgroups/.deps 
slave/containerizer/isolators/network/.deps slave/containerizer/mesos/.deps 
state/.deps tests/.deps tests/common/.deps usage/.deps zookeeper/.deps
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/src'
Making distclean in ec2
make[2]: Entering directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -rf .libs _libs
rm -f *.lo
test -z  || rm -f 
test . = ../../ec2 || test -z  || rm -f 
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -f config.status config.cache config.log configure.lineno 
config.status.lineno
rm -f Makefile
make[1]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build'
if test -d mesos-0.21.0; then find mesos-0.21.0 -type d ! -perm -200 -exec 
chmod u+w {} ';'  rm -rf mesos-0.21.0 || { sleep 5  rm -rf 
mesos-0.21.0; }; else :; fi
==
mesos-0.21.0 archives ready for distribution: 
mesos-0.21.0.tar.gz
==

real116m54.263s
user144m54.581s
sys 8m44.686s
+ chmod -R +w 3rdparty CHANGELOG Doxyfile LICENSE Makefile Makefile.am 
Makefile.in NOTICE README.md aclocal.m4 ar-lib autom4te.cache bin bootstrap 
compile config.guess config.log config.lt config.status config.sub configure 
configure.ac depcomp docs ec2 frameworks include install-sh libtool ltmain.sh 
m4 mesos-0.21.0.tar.gz mesos.pc mesos.pc.in missing mpi src support
+ git clean -fdx
Removing .libs/
Removing 3rdparty/Makefile
Removing 3rdparty/Makefile.in
Removing 3rdparty/libprocess/.deps/
Removing 3rdparty/libprocess/3rdparty/.deps/
Removing 3rdparty/libprocess/3rdparty/Makefile
Removing 3rdparty/libprocess/3rdparty/Makefile.in
Removing 3rdparty/libprocess/3rdparty/gmock_sources.cc
Removing 3rdparty/libprocess/3rdparty/stout/Makefile
Removing 3rdparty/libprocess/3rdparty/stout/Makefile.in
Removing 3rdparty/libprocess/3rdparty/stout/aclocal.m4