date:20140910

Build failed in Jenkins: mesos-reviewbot #1507

2014-09-10 Thread Apache Jenkins Server

See https://builds.apache.org/job/mesos-reviewbot/1507/

--
[...truncated 5668 lines...]
make[1]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build'
if test -d mesos-0.21.0; then find mesos-0.21.0 -type d ! -perm -200 -exec 
chmod u+w {} ';'  rm -rf mesos-0.21.0 || { sleep 5  rm -rf 
mesos-0.21.0; }; else :; fi
==
mesos-0.21.0 archives ready for distribution: 
mesos-0.21.0.tar.gz
==

real88m4.316s
user143m6.603s
sys 7m53.711s
+ chmod -R +w 3rdparty CHANGELOG Doxyfile LICENSE Makefile Makefile.am 
Makefile.in NOTICE README.md aclocal.m4 ar-lib autom4te.cache bin bootstrap 
compile config.guess config.log config.lt config.status config.sub configure 
configure.ac depcomp docs ec2 frameworks include install-sh libtool ltmain.sh 
m4 mesos-0.21.0.tar.gz mesos.pc mesos.pc.in missing mpi src support
+ git clean -fdx
Removing .libs/
Removing 3rdparty/Makefile
Removing 3rdparty/Makefile.in
Removing 3rdparty/libprocess/.deps/
Removing 3rdparty/libprocess/3rdparty/.deps/
Removing 3rdparty/libprocess/3rdparty/Makefile
Removing 3rdparty/libprocess/3rdparty/Makefile.in
Removing 3rdparty/libprocess/3rdparty/gmock_sources.cc
Removing 3rdparty/libprocess/3rdparty/stout/Makefile
Removing 3rdparty/libprocess/3rdparty/stout/Makefile.in
Removing 3rdparty/libprocess/3rdparty/stout/aclocal.m4
Removing 3rdparty/libprocess/3rdparty/stout/autom4te.cache/
Removing 3rdparty/libprocess/3rdparty/stout/config.log
Removing 3rdparty/libprocess/3rdparty/stout/config.status
Removing 3rdparty/libprocess/3rdparty/stout/configure
Removing 3rdparty/libprocess/3rdparty/stout/include/Makefile
Removing 3rdparty/libprocess/3rdparty/stout/include/Makefile.in
Removing 3rdparty/libprocess/3rdparty/stout/missing
Removing 3rdparty/libprocess/Makefile
Removing 3rdparty/libprocess/Makefile.in
Removing 3rdparty/libprocess/aclocal.m4
Removing 3rdparty/libprocess/ar-lib
Removing 3rdparty/libprocess/autom4te.cache/
Removing 3rdparty/libprocess/compile
Removing 3rdparty/libprocess/config.guess
Removing 3rdparty/libprocess/config.log
Removing 3rdparty/libprocess/config.lt
Removing 3rdparty/libprocess/config.status
Removing 3rdparty/libprocess/config.sub
Removing 3rdparty/libprocess/configure
Removing 3rdparty/libprocess/depcomp
Removing 3rdparty/libprocess/include/Makefile
Removing 3rdparty/libprocess/include/Makefile.in
Removing 3rdparty/libprocess/libtool
Removing 3rdparty/libprocess/ltmain.sh
Removing 3rdparty/libprocess/m4/libtool.m4
Removing 3rdparty/libprocess/m4/ltoptions.m4
Removing 3rdparty/libprocess/m4/ltsugar.m4
Removing 3rdparty/libprocess/m4/ltversion.m4
Removing 3rdparty/libprocess/m4/lt~obsolete.m4
Removing 3rdparty/libprocess/missing
Removing Makefile
Removing Makefile.in
Removing aclocal.m4
Removing ar-lib
Removing autom4te.cache/
Removing bin/gdb-mesos-local.sh
Removing bin/gdb-mesos-master.sh
Removing bin/gdb-mesos-slave.sh
Removing bin/gdb-mesos-tests.sh
Removing bin/lldb-mesos-local.sh
Removing bin/lldb-mesos-master.sh
Removing bin/lldb-mesos-slave.sh
Removing bin/lldb-mesos-tests.sh
Removing bin/mesos-local-flags.sh
Removing bin/mesos-local.sh
Removing bin/mesos-master-flags.sh
Removing bin/mesos-master.sh
Removing bin/mesos-slave-flags.sh
Removing bin/mesos-slave.sh
Removing bin/mesos-tests-flags.sh
Removing bin/mesos-tests.sh
Removing bin/mesos.sh
Removing bin/valgrind-mesos-local.sh
Removing bin/valgrind-mesos-master.sh
Removing bin/valgrind-mesos-slave.sh
Removing bin/valgrind-mesos-tests.sh
Removing compile
Removing config.guess
Removing config.log
Removing config.lt
Removing config.status
Removing config.sub
Removing configure
Removing depcomp
Removing ec2/Makefile
Removing ec2/Makefile.in
Removing include/mesos/mesos.hpp
Removing install-sh
Removing libtool
Removing ltmain.sh
Removing m4/libtool.m4
Removing m4/ltoptions.m4
Removing m4/ltsugar.m4
Removing m4/ltversion.m4
Removing m4/lt~obsolete.m4
Removing mesos-0.21.0.tar.gz
Removing mesos.pc
Removing missing
Removing mpi/mpiexec-mesos
Removing src/.deps/
Removing src/Makefile
Removing src/Makefile.in
Removing src/authorizer/.deps/
Removing src/cli/.deps/
Removing src/common/.deps/
Removing src/containerizer/
Removing src/deploy/mesos-daemon.sh
Removing src/deploy/mesos-start-cluster.sh
Removing src/deploy/mesos-start-masters.sh
Removing src/deploy/mesos-start-slaves.sh
Removing src/deploy/mesos-stop-cluster.sh
Removing src/deploy/mesos-stop-masters.sh
Removing src/deploy/mesos-stop-slaves.sh
Removing src/docker/.deps/
Removing src/examples/.deps/
Removing src/examples/java/test-exception-framework
Removing src/examples/java/test-executor
Removing src/examples/java/test-framework
Removing src/examples/java/test-log
Removing src/examples/java/test-multiple-executors-framework
Removing src/examples/python/test-containerizer
Removing src/examples/python/test-executor
Removing

Build failed in Jenkins: mesos-reviewbot #1508

2014-09-10 Thread Apache Jenkins Server

See https://builds.apache.org/job/mesos-reviewbot/1508/changes

Changes:

[niklas] Fixed line comments end punctuation in Mesos source.

[niklas] Fixed line comments end punctuation in stout.

[niklas] Fixed line comments end punctuation in libprocess.

[dlester] Adds Qubit to PoweredByMesos list.

--
[...truncated 5704 lines...]
rm -f scheduler/.deps/.dirstamp
rm -f slave/containerizer/isolators/cgroups/*.lo
rm -f scheduler/.dirstamp
rm -f slave/containerizer/isolators/network/*.o
rm -f slave/.deps/.dirstamp
rm -f slave/containerizer/isolators/network/*.lo
rm -f slave/.dirstamp
rm -f slave/containerizer/mesos/*.o
rm -f slave/containerizer/.deps/.dirstamp
rm -f slave/containerizer/mesos/*.lo
rm -f slave/containerizer/.dirstamp
rm -f state/*.o
rm -f slave/containerizer/isolators/cgroups/.deps/.dirstamp
rm -f state/*.lo
rm -f slave/containerizer/isolators/cgroups/.dirstamp
rm -f tests/*.o
rm -f slave/containerizer/isolators/network/.deps/.dirstamp
rm -f slave/containerizer/isolators/network/.dirstamp
rm -f slave/containerizer/mesos/.deps/.dirstamp
rm -f slave/containerizer/mesos/.dirstamp
rm -f state/.deps/.dirstamp
rm -f state/.dirstamp
rm -f tests/.deps/.dirstamp
rm -f tests/.dirstamp
rm -f tests/common/.deps/.dirstamp
rm -f tests/common/.dirstamp
rm -f usage/.deps/.dirstamp
rm -f usage/.dirstamp
rm -f zookeeper/.deps/.dirstamp
rm -f zookeeper/.dirstamp
rm -rf authorizer/.libs authorizer/_libs
rm -rf common/.libs common/_libs
rm -rf containerizer/.libs containerizer/_libs
rm -rf docker/.libs docker/_libs
rm -rf exec/.libs exec/_libs
rm -rf files/.libs files/_libs
rm -rf java/jni/.libs java/jni/_libs
rm -rf jvm/.libs jvm/_libs
rm -rf jvm/org/apache/.libs jvm/org/apache/_libs
rm -rf linux/.libs linux/_libs
rm -rf linux/routing/.libs linux/routing/_libs
rm -rf linux/routing/filter/.libs linux/routing/filter/_libs
rm -rf linux/routing/link/.libs linux/routing/link/_libs
rm -rf linux/routing/queueing/.libs linux/routing/queueing/_libs
rm -rf local/.libs local/_libs
rm -rf log/.libs log/_libs
rm -rf log/tool/.libs log/tool/_libs
rm -f tests/common/*.o
rm -rf logging/.libs logging/_libs
rm -f usage/*.o
rm -rf master/.libs master/_libs
rm -f usage/*.lo
rm -f zookeeper/*.o
rm -f zookeeper/*.lo
rm -rf messages/.libs messages/_libs
rm -rf sasl/.libs sasl/_libs
rm -rf sched/.libs sched/_libs
rm -rf scheduler/.libs scheduler/_libs
rm -rf slave/.libs slave/_libs
rm -rf slave/containerizer/.libs slave/containerizer/_libs
rm -rf slave/containerizer/isolators/cgroups/.libs 
slave/containerizer/isolators/cgroups/_libs
rm -rf slave/containerizer/isolators/network/.libs 
slave/containerizer/isolators/network/_libs
rm -rf slave/containerizer/mesos/.libs slave/containerizer/mesos/_libs
rm -rf state/.libs state/_libs
rm -rf usage/.libs usage/_libs
rm -rf zookeeper/.libs zookeeper/_libs
rm -rf ./.deps authorizer/.deps cli/.deps common/.deps containerizer/.deps 
docker/.deps examples/.deps exec/.deps files/.deps health-check/.deps 
java/jni/.deps jvm/.deps jvm/org/apache/.deps launcher/.deps linux/.deps 
linux/routing/.deps linux/routing/filter/.deps linux/routing/link/.deps 
linux/routing/queueing/.deps local/.deps log/.deps log/tool/.deps logging/.deps 
master/.deps messages/.deps sasl/.deps sched/.deps scheduler/.deps slave/.deps 
slave/containerizer/.deps slave/containerizer/isolators/cgroups/.deps 
slave/containerizer/isolators/network/.deps slave/containerizer/mesos/.deps 
state/.deps tests/.deps tests/common/.deps usage/.deps zookeeper/.deps
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/src'
Making distclean in ec2
make[2]: Entering directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -rf .libs _libs
rm -f *.lo
test -z  || rm -f 
test . = ../../ec2 || test -z  || rm -f 
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -f config.status config.cache config.log configure.lineno 
config.status.lineno
rm -f Makefile
make[1]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build'
if test -d mesos-0.21.0; then find mesos-0.21.0 -type d ! -perm -200 -exec 
chmod u+w {} ';'  rm -rf mesos-0.21.0 || { sleep 5  rm -rf 
mesos-0.21.0; }; else :; fi
==
mesos-0.21.0 archives ready for distribution: 
mesos-0.21.0.tar.gz
==

real111m33.909s
user141m57.055s
sys 7m58.774s
+ chmod -R +w 3rdparty CHANGELOG Doxyfile LICENSE Makefile Makefile.am 
Makefile.in NOTICE README.md aclocal.m4 ar-lib autom4te.cache bin bootstrap 
compile config.guess config.log config.lt config.status config.sub configure 
configure.ac depcomp docs ec2 frameworks include install-sh libtool ltmain.sh 
m4 mesos-0.21.0.tar.gz mesos.pc mesos.pc.in missing mpi src support
+ git clean -fdx
Removing .libs/
Removing 3rdparty/Makefile

Build failed in Jenkins: mesos-reviewbot #1509

2014-09-10 Thread Apache Jenkins Server

See https://builds.apache.org/job/mesos-reviewbot/1509/

--
[...truncated 5591 lines...]
rm -f slave/*.lo
rm -f slave/containerizer/.deps/.dirstamp
rm -f slave/containerizer/*.o
rm -f slave/containerizer/.dirstamp
rm -f slave/containerizer/*.lo
rm -f slave/containerizer/isolators/cgroups/.deps/.dirstamp
rm -f slave/containerizer/isolators/cgroups/*.o
rm -f slave/containerizer/isolators/cgroups/.dirstamp
rm -f slave/containerizer/isolators/cgroups/*.lo
rm -f slave/containerizer/isolators/network/.deps/.dirstamp
rm -f slave/containerizer/isolators/network/*.o
rm -f slave/containerizer/isolators/network/.dirstamp
rm -f slave/containerizer/isolators/network/*.lo
rm -f slave/containerizer/mesos/.deps/.dirstamp
rm -f slave/containerizer/mesos/*.o
rm -f slave/containerizer/mesos/.dirstamp
rm -f state/.deps/.dirstamp
rm -f slave/containerizer/mesos/*.lo
rm -f state/.dirstamp
rm -f state/*.o
rm -f tests/.deps/.dirstamp
rm -f state/*.lo
rm -f tests/.dirstamp
rm -rf authorizer/.libs authorizer/_libs
rm -f tests/*.o
rm -f tests/common/.deps/.dirstamp
rm -rf common/.libs common/_libs
rm -f tests/common/.dirstamp
rm -f usage/.deps/.dirstamp
rm -rf containerizer/.libs containerizer/_libs
rm -f usage/.dirstamp
rm -rf docker/.libs docker/_libs
rm -f zookeeper/.deps/.dirstamp
rm -f zookeeper/.dirstamp
rm -rf exec/.libs exec/_libs
rm -rf files/.libs files/_libs
rm -rf java/jni/.libs java/jni/_libs
rm -rf jvm/.libs jvm/_libs
rm -rf jvm/org/apache/.libs jvm/org/apache/_libs
rm -rf linux/.libs linux/_libs
rm -rf linux/routing/.libs linux/routing/_libs
rm -rf linux/routing/filter/.libs linux/routing/filter/_libs
rm -rf linux/routing/link/.libs linux/routing/link/_libs
rm -rf linux/routing/queueing/.libs linux/routing/queueing/_libs
rm -rf local/.libs local/_libs
rm -rf log/.libs log/_libs
rm -rf log/tool/.libs log/tool/_libs
rm -rf logging/.libs logging/_libs
rm -rf master/.libs master/_libs
rm -rf messages/.libs messages/_libs
rm -rf sasl/.libs sasl/_libs
rm -rf sched/.libs sched/_libs
rm -rf scheduler/.libs scheduler/_libs
rm -rf slave/.libs slave/_libs
rm -f tests/common/*.o
rm -rf slave/containerizer/.libs slave/containerizer/_libs
rm -f usage/*.o
rm -f usage/*.lo
rm -f zookeeper/*.o
rm -f zookeeper/*.lo
rm -rf slave/containerizer/isolators/cgroups/.libs 
slave/containerizer/isolators/cgroups/_libs
rm -rf slave/containerizer/isolators/network/.libs 
slave/containerizer/isolators/network/_libs
rm -rf slave/containerizer/mesos/.libs slave/containerizer/mesos/_libs
rm -rf state/.libs state/_libs
rm -rf usage/.libs usage/_libs
rm -rf zookeeper/.libs zookeeper/_libs
rm -rf ./.deps authorizer/.deps cli/.deps common/.deps containerizer/.deps 
docker/.deps examples/.deps exec/.deps files/.deps health-check/.deps 
java/jni/.deps jvm/.deps jvm/org/apache/.deps launcher/.deps linux/.deps 
linux/routing/.deps linux/routing/filter/.deps linux/routing/link/.deps 
linux/routing/queueing/.deps local/.deps log/.deps log/tool/.deps logging/.deps 
master/.deps messages/.deps sasl/.deps sched/.deps scheduler/.deps slave/.deps 
slave/containerizer/.deps slave/containerizer/isolators/cgroups/.deps 
slave/containerizer/isolators/network/.deps slave/containerizer/mesos/.deps 
state/.deps tests/.deps tests/common/.deps usage/.deps zookeeper/.deps
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/src'
Making distclean in ec2
make[2]: Entering directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -rf .libs _libs
rm -f *.lo
test -z  || rm -f 
test . = ../../ec2 || test -z  || rm -f 
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -f config.status config.cache config.log configure.lineno 
config.status.lineno
rm -f Makefile
make[1]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build'
if test -d mesos-0.21.0; then find mesos-0.21.0 -type d ! -perm -200 -exec 
chmod u+w {} ';'  rm -rf mesos-0.21.0 || { sleep 5  rm -rf 
mesos-0.21.0; }; else :; fi
==
mesos-0.21.0 archives ready for distribution: 
mesos-0.21.0.tar.gz
==

real76m42.235s
user142m15.842s
sys 7m52.073s
+ chmod -R +w 3rdparty CHANGELOG Doxyfile LICENSE Makefile Makefile.am 
Makefile.in NOTICE README.md aclocal.m4 ar-lib autom4te.cache bin bootstrap 
compile config.guess config.log config.lt config.status config.sub configure 
configure.ac depcomp docs ec2 frameworks include install-sh libtool ltmain.sh 
m4 mesos-0.21.0.tar.gz mesos.pc mesos.pc.in missing mpi src support
+ git clean -fdx
Removing .libs/
Removing 3rdparty/Makefile
Removing 3rdparty/Makefile.in
Removing 3rdparty/libprocess/.deps/
Removing 3rdparty/libprocess/3rdparty/.deps/
Removing 3rdparty/libprocess/3rdparty/Makefile
Removing 3rdparty/libprocess/3rdparty/Makefile.in
Removing

Dynamic Resource Roles

2014-09-10 Thread Tom Arnfeld

Hey everyone,

Just a quick question. Has the ever been any discussion around dynamic
roles?

What I mean by this – currently if I want to guarantee 1 core and 10 GB of
ram to a specific type of framework (or role) I need to do this at a
slave level. This means if I only want to guarantee a small number of
resources, I could do this on one slave. If that slave dies, that resource
is no longer available.

It would be interesting to see the master (DRF scheduler) capable of
reserving a minimum about of resource for offering only to frameworks of a
certain role, such that I can guarantee R amount of resources on N slaves
across the cluster as a whole.

Tom.

Re: Build failed in Jenkins: Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui #2358

2014-09-10 Thread Tim St Clair

Just flying cover, but my change is unrelated to the issue.  

However, I see this issue quite often as well. 

Cheers,
Tim

- Original Message -
 From: Yan Xu y...@jxu.me
 To: dev@mesos.apache.org
 Cc: Vinod Kone vinodk...@gmail.com
 Sent: Tuesday, September 9, 2014 4:50:06 PM
 Subject: Re: Build failed in Jenkins: 
 Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui
 #2358
 
 this is https://issues.apache.org/jira/browse/MESOS-1766
 
 --
 Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan
 
 On Fri, Sep 5, 2014 at 8:53 AM, Apache Jenkins Server 
 jenk...@builds.apache.org wrote:
 
  See 
  https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2358/changes
  
 
  Changes:
 
  [tstclair] Minor update to include package config file
 
  --
  [...truncated 57484 lines...]
  I0905 15:53:16.220577 25788 replica.cpp:676] Persisted action at 1
  I0905 15:53:16.220588 25788 replica.cpp:661] Replica learned APPEND action
  at position 1
  I0905 15:53:16.221040 25794 registrar.cpp:479] Successfully updated
  'registry'
  I0905 15:53:16.221119 25795 log.cpp:699] Attempting to truncate the log to
  1
  I0905 15:53:16.221146 25794 registrar.cpp:372] Successfully recovered
  registrar
  I0905 15:53:16.221195 25791 coordinator.cpp:340] Coordinator attempting to
  write TRUNCATE action at position 2
  I0905 15:53:16.221336 25797 master.cpp:1063] Recovered 0 slaves from the
  Registry (102B) ; allowing 10mins for slaves to re-register
  I0905 15:53:16.221873 25795 replica.cpp:508] Replica received write
  request for position 2
  I0905 15:53:16.61 25795 leveldb.cpp:343] Persisting action (16 bytes)
  to leveldb took 362390ns
  I0905 15:53:16.81 25795 replica.cpp:676] Persisted action at 2
  I0905 15:53:16.222586 25789 replica.cpp:655] Replica received learned
  notice for position 2
  I0905 15:53:16.222740 25789 leveldb.cpp:343] Persisting action (18 bytes)
  to leveldb took 129933ns
  I0905 15:53:16.222772 25789 leveldb.cpp:401] Deleting ~1 keys from leveldb
  took 14255ns
  I0905 15:53:16.222786 25789 replica.cpp:676] Persisted action at 2
  I0905 15:53:16.222796 25789 replica.cpp:661] Replica learned TRUNCATE
  action at position 2
  I0905 15:53:16.376282 25769 sched.cpp:137] Version: 0.21.0
  I0905 15:53:16.376565 25789 sched.cpp:233] New master detected at
  master@67.195.81.186:49188
  I0905 15:53:16.376590 25789 sched.cpp:283] Authenticating with master
  master@67.195.81.186:49188
  I0905 15:53:16.376866 25784 authenticatee.hpp:128] Creating new client
  SASL connection
  I0905 15:53:16.376965 25784 master.cpp:3637] Authenticating
  scheduler-002519ef-8af3-45c8-bb43-fc4662045bc7@67.195.81.186:49188
  I0905 15:53:16.377059 25796 authenticator.hpp:156] Creating new server
  SASL connection
  I0905 15:53:16.377255 25789 authenticatee.hpp:219] Received SASL
  authentication mechanisms: CRAM-MD5
  I0905 15:53:16.377290 25789 authenticatee.hpp:245] Attempting to
  authenticate with mechanism 'CRAM-MD5'
  I0905 15:53:16.377455 25793 authenticator.hpp:262] Received SASL
  authentication start
  I0905 15:53:16.377508 25793 authenticator.hpp:384] Authentication requires
  more steps
  I0905 15:53:16.377614 25786 authenticatee.hpp:265] Received SASL
  authentication step
  I0905 15:53:16.377678 25786 authenticator.hpp:290] Received SASL
  authentication step
  I0905 15:53:16.377699 25786 auxprop.cpp:81] Request to lookup properties
  for user: 'test-principal' realm: 'penates.apache.org' server FQDN: '
  penates.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false
  SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false
  I0905 15:53:16.377710 25786 auxprop.cpp:153] Looking up auxiliary property
  '*userPassword'
  I0905 15:53:16.377723 25786 auxprop.cpp:153] Looking up auxiliary property
  '*cmusaslsecretCRAM-MD5'
  I0905 15:53:16.377737 25786 auxprop.cpp:81] Request to lookup properties
  for user: 'test-principal' realm: 'penates.apache.org' server FQDN: '
  penates.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false
  SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true
  I0905 15:53:16.377745 25786 auxprop.cpp:103] Skipping auxiliary property
  '*userPassword' since SASL_AUXPROP_AUTHZID == true
  I0905 15:53:16.377753 25786 auxprop.cpp:103] Skipping auxiliary property
  '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
  I0905 15:53:16.377768 25786 authenticator.hpp:376] Authentication success
  I0905 15:53:16.377856 25798 authenticatee.hpp:305] Authentication success
  I0905 15:53:16.377874 25796 master.cpp:3677] Successfully authenticated
  principal 'test-principal' at
  scheduler-002519ef-8af3-45c8-bb43-fc4662045bc7@67.195.81.186:49188
  I0905 15:53:16.378038 25798 sched.cpp:357] Successfully authenticated with
  master master@67.195.81.186:49188
  I0905 15:53:16.378059 25798 sched.cpp:476] Sending registration request to
  master@67.195.81.186:49188
  I0905

Re: Dynamic Resource Roles

2014-09-10 Thread Timothy Chen

Hi Tom,

Reservations is definitely something we've discussed and will be addressed in 
the near future.

Tim


 On Sep 10, 2014, at 7:49 AM, Tom Arnfeld t...@duedil.com wrote:
 
 Hey everyone,
 
 Just a quick question. Has the ever been any discussion around dynamic
 roles?
 
 What I mean by this – currently if I want to guarantee 1 core and 10 GB of
 ram to a specific type of framework (or role) I need to do this at a
 slave level. This means if I only want to guarantee a small number of
 resources, I could do this on one slave. If that slave dies, that resource
 is no longer available.
 
 It would be interesting to see the master (DRF scheduler) capable of
 reserving a minimum about of resource for offering only to frameworks of a
 certain role, such that I can guarantee R amount of resources on N slaves
 across the cluster as a whole.
 
 Tom.

Re: Dynamic Resource Roles

2014-09-10 Thread Tom Arnfeld

That's very cool, thanks.

On Wed, Sep 10, 2014 at 4:59 PM, Timothy Chen tnac...@gmail.com wrote:

 Hi Tom,
 Reservations is definitely something we've discussed and will be addressed in 
 the near future.
 Tim
 On Sep 10, 2014, at 7:49 AM, Tom Arnfeld t...@duedil.com wrote:
 
 Hey everyone,
 
 Just a quick question. Has the ever been any discussion around dynamic
 roles?
 
 What I mean by this – currently if I want to guarantee 1 core and 10 GB of
 ram to a specific type of framework (or role) I need to do this at a
 slave level. This means if I only want to guarantee a small number of
 resources, I could do this on one slave. If that slave dies, that resource
 is no longer available.
 
 It would be interesting to see the master (DRF scheduler) capable of
 reserving a minimum about of resource for offering only to frameworks of a
 certain role, such that I can guarantee R amount of resources on N slaves
 across the cluster as a whole.
 
 Tom.

Re: Review Request 25487: Increased session timeouts for ZooKeeper related tests.

2014-09-10 Thread Dominic Hamon


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25487/#review52884
---



src/tests/zookeeper.cpp
https://reviews.apache.org/r/25487/#comment92073

Seconds(10) ?


- Dominic Hamon


On Sept. 9, 2014, 10:57 p.m., Jiang Yan Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25487/
 ---
 
 (Updated Sept. 9, 2014, 10:57 p.m.)
 
 
 Review request for mesos and Ben Mahler.
 
 
 Bugs: MESOS-1676
 https://issues.apache.org/jira/browse/MESOS-1676
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 - On slower machines sometimes the zookeeper c client times out where we 
 aren't expecting because either the test server or the client is too slow to 
 respond. Increasing this value helps mitigate the problem.
 - The effect of server-shutdownNetwork() is immediate so this won't 
 prolong the tests so long as they don't wait for session expiration without 
 clock advances, which I have checked and there is none.
 
 
 Diffs
 -
 
   src/tests/master_contender_detector_tests.cpp 
 9ac59aa446a132e734238e0e55801117c4ef31b4 
   src/tests/zookeeper.cpp e45f956e1486e952a4efeb123e15568518fb53fe 
 
 Diff: https://reviews.apache.org/r/25487/diff/
 
 
 Testing
 ---
 
 make check.
 
 
 Thanks,
 
 Jiang Yan Xu

Review Request 25508: Fix git clean -xdf skipping leveldb

2014-09-10 Thread Timothy St. Clair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25508/
---

Review request for mesos, Jie Yu and Vinod Kone.


Bugs: MESOS-1764
https://issues.apache.org/jira/browse/MESOS-1764


Repository: mesos-git


Description
---

Very minor change to allow git clean -xdf to remove the leveldb directory


Diffs
-

  3rdparty/Makefile.am 7cf0c88 

Diff: https://reviews.apache.org/r/25508/diff/


Testing
---

make check 


Thanks,

Timothy St. Clair

Re: Review Request 25434: Propagate slave shutdown grace period to Executor and CommandExecutor.

2014-09-10 Thread Benjamin Hindman



 On Sept. 9, 2014, 5:50 p.m., Benjamin Hindman wrote:
  src/slave/constants.hpp, line 53
  https://reviews.apache.org/r/25434/diff/2/?file=683947#file683947line53
 
  What is the 'base executor' versus the 'command executor'?
 
 Alexander Rukletsov wrote:
 We have Executor (lives in src/exec/exec.cpp) and CommandExecutor aka 
 mesos-executor (lives in src/launcher/executor.cpp). I find executor too 
 vague and use base executor to stress out I mean the one that lives in 
 exec.cpp. Is there a convention about naming these folks?
 
 Benjamin Hindman wrote:
 Ah, I see. Well, CommandExecutor is just an instance of an executor and 
 actually uses the code from exec.cpp just like all current executors do (that 
 use libmesos). So there aren't actually two executors (base and command), 
 just one, and they all use exec.cpp (if they use libmesos). Does that make 
 sense?
 
 Alexander Rukletsov wrote:
 Sorry, I was inexact in my comment. Indeed, there is only one executor, 
 but two libprocess processes (where all most of the work is done). Here is 
 what we have:
 Executor - ExecutorProcess (I call them both base executor, though 
 base executor process is more correct)
 CommandExecutor - CommandExecutorProcess
 The OS process where CommandExecutor lives instantiates also the driver, 
 together it looks like this:
 ___
 MesosExecutorDriver
   * ExecutorProcess
 |
 V
   * CommandExecutor - CommandExecutorProcess
|
V
   task
 ___
 
 My aim was to explain that there is a wrapper around the 
 CommandExecutorProcess which has its own shutdown period. For simplicity I 
 called this wrapper (which is ExecutorProcess and a bit 
 MesosExecutorDriver) base executor. However, it looks like my terminology 
 is not good enough and maybe even misleading. What terms would you suggest, 
 Ben?

IMHO we should just use the terms ExecutorProcess, MesosExecutorDriver, etc. If 
you want to alias them within that comment then I'd suggest defining the alias 
(as you've done for me here) and then using that alias in the comment. That 
being said, my hunch is that you'll get more milage just using the class names. 
This will also make renaming/refactoring easier as code searches will grab 
these comments too.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25434/#review52750
---


On Sept. 9, 2014, 12:54 p.m., Alexander Rukletsov wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25434/
 ---
 
 (Updated Sept. 9, 2014, 12:54 p.m.)
 
 
 Review request for mesos, Niklas Nielsen, Till Toenshoff, and Timothy St. 
 Clair.
 
 
 Bugs: MESOS-1571
 https://issues.apache.org/jira/browse/MESOS-1571
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 The configurable slave's executor_shutdown_grace_period flag is propagated to 
 Executor and CommandExecutor through an environment variable. Shutdown 
 timeout in Executor and signal escalation timeout in CommandExecutor are now 
 dependent on this flag. Each nested timeout is somewhat shorter than the 
 parent one.
 
 
 Diffs
 -
 
   src/exec/exec.cpp 36d1778 
   src/launcher/executor.cpp 12ac14b 
   src/slave/constants.hpp 9030871 
   src/slave/constants.cpp e1da5c0 
   src/slave/containerizer/containerizer.hpp 8a66412 
   src/slave/containerizer/containerizer.cpp 0254679 
   src/slave/containerizer/docker.cpp 0febbac 
   src/slave/containerizer/external_containerizer.cpp efbc68f 
   src/slave/containerizer/mesos/containerizer.cpp 9d08329 
   src/slave/flags.hpp 21e0021 
   src/tests/containerizer.cpp a17e1e0 
 
 Diff: https://reviews.apache.org/r/25434/diff/
 
 
 Testing
 ---
 
 make check (OS X 10.9.4; Ubuntu 14.04 amd64)
 
 
 Thanks,
 
 Alexander Rukletsov

Re: Dynamic Resource Roles

2014-09-10 Thread Adam Bordelon

BenH has been calling these master reservations (globally control
reservations across all slaves through the master) and offer reservations
(I don't care which nodes it's on, as long as I get X cpu and Y RAM, or Z
sets of {X,Y}), and they're definitely on the roadmap.

On Wed, Sep 10, 2014 at 9:05 AM, Tom Arnfeld t...@duedil.com wrote:

 That's very cool, thanks.

 On Wed, Sep 10, 2014 at 4:59 PM, Timothy Chen tnac...@gmail.com wrote:

  Hi Tom,
  Reservations is definitely something we've discussed and will be
 addressed in the near future.
  Tim
  On Sep 10, 2014, at 7:49 AM, Tom Arnfeld t...@duedil.com wrote:
 
  Hey everyone,
 
  Just a quick question. Has the ever been any discussion around dynamic
  roles?
 
  What I mean by this – currently if I want to guarantee 1 core and 10 GB
 of
  ram to a specific type of framework (or role) I need to do this at a
  slave level. This means if I only want to guarantee a small number of
  resources, I could do this on one slave. If that slave dies, that
 resource
  is no longer available.
 
  It would be interesting to see the master (DRF scheduler) capable of
  reserving a minimum about of resource for offering only to frameworks
 of a
  certain role, such that I can guarantee R amount of resources on N
 slaves
  across the cluster as a whole.
 
  Tom.

Re: Review Request 25434: Propagate slave shutdown grace period to Executor and CommandExecutor.

2014-09-10 Thread Alexander Rukletsov



 On Sept. 9, 2014, 5:50 p.m., Benjamin Hindman wrote:
  src/slave/constants.hpp, line 53
  https://reviews.apache.org/r/25434/diff/2/?file=683947#file683947line53
 
  What is the 'base executor' versus the 'command executor'?
 
 Alexander Rukletsov wrote:
 We have Executor (lives in src/exec/exec.cpp) and CommandExecutor aka 
 mesos-executor (lives in src/launcher/executor.cpp). I find executor too 
 vague and use base executor to stress out I mean the one that lives in 
 exec.cpp. Is there a convention about naming these folks?
 
 Benjamin Hindman wrote:
 Ah, I see. Well, CommandExecutor is just an instance of an executor and 
 actually uses the code from exec.cpp just like all current executors do (that 
 use libmesos). So there aren't actually two executors (base and command), 
 just one, and they all use exec.cpp (if they use libmesos). Does that make 
 sense?
 
 Alexander Rukletsov wrote:
 Sorry, I was inexact in my comment. Indeed, there is only one executor, 
 but two libprocess processes (where all most of the work is done). Here is 
 what we have:
 Executor - ExecutorProcess (I call them both base executor, though 
 base executor process is more correct)
 CommandExecutor - CommandExecutorProcess
 The OS process where CommandExecutor lives instantiates also the driver, 
 together it looks like this:
 ___
 MesosExecutorDriver
   * ExecutorProcess
 |
 V
   * CommandExecutor - CommandExecutorProcess
|
V
   task
 ___
 
 My aim was to explain that there is a wrapper around the 
 CommandExecutorProcess which has its own shutdown period. For simplicity I 
 called this wrapper (which is ExecutorProcess and a bit 
 MesosExecutorDriver) base executor. However, it looks like my terminology 
 is not good enough and maybe even misleading. What terms would you suggest, 
 Ben?
 
 Benjamin Hindman wrote:
 IMHO we should just use the terms ExecutorProcess, MesosExecutorDriver, 
 etc. If you want to alias them within that comment then I'd suggest defining 
 the alias (as you've done for me here) and then using that alias in the 
 comment. That being said, my hunch is that you'll get more milage just using 
 the class names. This will also make renaming/refactoring easier as code 
 searches will grab these comments too.

Ok, agreed.


- Alexander


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25434/#review52750
---


On Sept. 9, 2014, 12:54 p.m., Alexander Rukletsov wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25434/
 ---
 
 (Updated Sept. 9, 2014, 12:54 p.m.)
 
 
 Review request for mesos, Niklas Nielsen, Till Toenshoff, and Timothy St. 
 Clair.
 
 
 Bugs: MESOS-1571
 https://issues.apache.org/jira/browse/MESOS-1571
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 The configurable slave's executor_shutdown_grace_period flag is propagated to 
 Executor and CommandExecutor through an environment variable. Shutdown 
 timeout in Executor and signal escalation timeout in CommandExecutor are now 
 dependent on this flag. Each nested timeout is somewhat shorter than the 
 parent one.
 
 
 Diffs
 -
 
   src/exec/exec.cpp 36d1778 
   src/launcher/executor.cpp 12ac14b 
   src/slave/constants.hpp 9030871 
   src/slave/constants.cpp e1da5c0 
   src/slave/containerizer/containerizer.hpp 8a66412 
   src/slave/containerizer/containerizer.cpp 0254679 
   src/slave/containerizer/docker.cpp 0febbac 
   src/slave/containerizer/external_containerizer.cpp efbc68f 
   src/slave/containerizer/mesos/containerizer.cpp 9d08329 
   src/slave/flags.hpp 21e0021 
   src/tests/containerizer.cpp a17e1e0 
 
 Diff: https://reviews.apache.org/r/25434/diff/
 
 
 Testing
 ---
 
 make check (OS X 10.9.4; Ubuntu 14.04 amd64)
 
 
 Thanks,
 
 Alexander Rukletsov

Re: Review Request 25508: Fix git clean -xdf skipping leveldb

2014-09-10 Thread Vinod Kone


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25508/#review52894
---



3rdparty/Makefile.am
https://reviews.apache.org/r/25508/#comment92088

didn't realize that the leveldb we bundle has git files in it! isn't the 
proper fix here to bundle a proper 'dist'ribution of leveldb instead of its git 
tree?


- Vinod Kone


On Sept. 10, 2014, 4:30 p.m., Timothy St. Clair wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25508/
 ---
 
 (Updated Sept. 10, 2014, 4:30 p.m.)
 
 
 Review request for mesos, Jie Yu and Vinod Kone.
 
 
 Bugs: MESOS-1764
 https://issues.apache.org/jira/browse/MESOS-1764
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Very minor change to allow git clean -xdf to remove the leveldb directory
 
 
 Diffs
 -
 
   3rdparty/Makefile.am 7cf0c88 
 
 Diff: https://reviews.apache.org/r/25508/diff/
 
 
 Testing
 ---
 
 make check 
 
 
 Thanks,
 
 Timothy St. Clair

Re: Review Request 25508: Fix git clean -xdf skipping leveldb

2014-09-10 Thread Timothy St. Clair



 On Sept. 10, 2014, 5:26 p.m., Vinod Kone wrote:
  3rdparty/Makefile.am, line 84
  https://reviews.apache.org/r/25508/diff/1/?file=684613#file684613line84
 
  didn't realize that the leveldb we bundle has git files in it! isn't 
  the proper fix here to bundle a proper 'dist'ribution of leveldb instead of 
  its git tree?

You're probably right.  I didn't want to rethunk a tarball though.


- Timothy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25508/#review52894
---


On Sept. 10, 2014, 4:30 p.m., Timothy St. Clair wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25508/
 ---
 
 (Updated Sept. 10, 2014, 4:30 p.m.)
 
 
 Review request for mesos, Jie Yu and Vinod Kone.
 
 
 Bugs: MESOS-1764
 https://issues.apache.org/jira/browse/MESOS-1764
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Very minor change to allow git clean -xdf to remove the leveldb directory
 
 
 Diffs
 -
 
   3rdparty/Makefile.am 7cf0c88 
 
 Diff: https://reviews.apache.org/r/25508/diff/
 
 
 Testing
 ---
 
 make check 
 
 
 Thanks,
 
 Timothy St. Clair

Re: Review Request 25439: Fix protobuf detection on systems with Python 3 as default

2014-09-10 Thread Timothy St. Clair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25439/#review52898
---

Ship it!


Ship It!

- Timothy St. Clair


On Sept. 9, 2014, 2:49 p.m., Kamil Domanski wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25439/
 ---
 
 (Updated Sept. 9, 2014, 2:49 p.m.)
 
 
 Review request for mesos and Timothy St. Clair.
 
 
 Bugs: MESOS-1774
 https://issues.apache.org/jira/browse/MESOS-1774
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 MESOS-1774
 
 
 Diffs
 -
 
   m4/ac_python_module.m4 8360b65434e3c1912e2b8670f70e4130352a3c92 
 
 Diff: https://reviews.apache.org/r/25439/diff/
 
 
 Testing
 ---
 
  ./configure --disable-bundled
 
 
 Thanks,
 
 Kamil Domanski

Re: Review Request 25487: Increased session timeouts for ZooKeeper related tests.

2014-09-10 Thread Jiang Yan Xu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25487/
---

(Updated Sept. 10, 2014, 11 a.m.)


Review request for mesos and Ben Mahler.


Changes
---

Minor fix per Dominic's review.


Bugs: MESOS-1676
https://issues.apache.org/jira/browse/MESOS-1676


Repository: mesos-git


Description
---

- On slower machines sometimes the zookeeper c client times out where we aren't 
expecting because either the test server or the client is too slow to respond. 
Increasing this value helps mitigate the problem.
- The effect of server-shutdownNetwork() is immediate so this won't prolong 
the tests so long as they don't wait for session expiration without clock 
advances, which I have checked and there is none.


Diffs (updated)
-

  src/tests/master_contender_detector_tests.cpp 
9ac59aa446a132e734238e0e55801117c4ef31b4 
  src/tests/zookeeper.cpp e45f956e1486e952a4efeb123e15568518fb53fe 

Diff: https://reviews.apache.org/r/25487/diff/


Testing
---

make check.


Thanks,

Jiang Yan Xu

Review Request 25511: Pulled the log line in ZooKeeperTestServer::shutdownNetwork() to above the shutdown call.

2014-09-10 Thread Jiang Yan Xu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25511/
---

Review request for mesos and Ben Mahler.


Repository: mesos-git


Description
---

- When debugging zookeeper related tests it's often more useful to know when 
the tests is about to shut down the ZK server to reason about the order of 
events. Otherwise client disconnections are often logged before this shutdown 
line and can be confusing.


Diffs
-

  src/tests/zookeeper_test_server.cpp a8c9b1cd8a546abdeb4d89a8fe9ebc3b3d577665 

Diff: https://reviews.apache.org/r/25511/diff/


Testing
---

make check.


Thanks,

Jiang Yan Xu

Jenkins build is back to normal : mesos-reviewbot #1510

2014-09-10 Thread Apache Jenkins Server

See https://builds.apache.org/job/mesos-reviewbot/1510/

Re: Review Request 25508: Fix git clean -xdf skipping leveldb

2014-09-10 Thread Vinod Kone



 On Sept. 10, 2014, 5:26 p.m., Vinod Kone wrote:
  3rdparty/Makefile.am, line 84
  https://reviews.apache.org/r/25508/diff/1/?file=684613#file684613line84
 
  didn't realize that the leveldb we bundle has git files in it! isn't 
  the proper fix here to bundle a proper 'dist'ribution of leveldb instead of 
  its git tree?
 
 Timothy St. Clair wrote:
 You're probably right.  I didn't want to rethunk a tarball though.

can you try with replacing the bundled leveldb.tar.gz with this?

git archive -o leveldb.tar.gz --prefix=leveldb/ HEAD  (# run this from 
unbundled leveldb clone, e.g., mesos/build/3rdparty/leveldb)

once you confirm that works, just have this patch be a replacement of the 
.tar.gz. sounds good?


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25508/#review52894
---


On Sept. 10, 2014, 4:30 p.m., Timothy St. Clair wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25508/
 ---
 
 (Updated Sept. 10, 2014, 4:30 p.m.)
 
 
 Review request for mesos, Jie Yu and Vinod Kone.
 
 
 Bugs: MESOS-1764
 https://issues.apache.org/jira/browse/MESOS-1764
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Very minor change to allow git clean -xdf to remove the leveldb directory
 
 
 Diffs
 -
 
   3rdparty/Makefile.am 7cf0c88 
 
 Diff: https://reviews.apache.org/r/25508/diff/
 
 
 Testing
 ---
 
 make check 
 
 
 Thanks,
 
 Timothy St. Clair

Re: Review Request 25508: Fix git clean -xdf skipping leveldb

2014-09-10 Thread Timothy St. Clair



 On Sept. 10, 2014, 5:26 p.m., Vinod Kone wrote:
  3rdparty/Makefile.am, line 84
  https://reviews.apache.org/r/25508/diff/1/?file=684613#file684613line84
 
  didn't realize that the leveldb we bundle has git files in it! isn't 
  the proper fix here to bundle a proper 'dist'ribution of leveldb instead of 
  its git tree?
 
 Timothy St. Clair wrote:
 You're probably right.  I didn't want to rethunk a tarball though.
 
 Vinod Kone wrote:
 can you try with replacing the bundled leveldb.tar.gz with this?
 
 git archive -o leveldb.tar.gz --prefix=leveldb/ HEAD  (# run this from 
 unbundled leveldb clone, e.g., mesos/build/3rdparty/leveldb)
 
 once you confirm that works, just have this patch be a replacement of the 
 .tar.gz. sounds good?

So if you navigate into the leveldb folder and run: 

'git archive --format=tar.gz --prefix=leveldb/ origin/master  leveldb.tar.gz' 
 mv -f leveldb.tar.gz ../

It works, but it produces a large binary diff.  

How do you want to handle *this?


- Timothy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25508/#review52894
---


On Sept. 10, 2014, 4:30 p.m., Timothy St. Clair wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25508/
 ---
 
 (Updated Sept. 10, 2014, 4:30 p.m.)
 
 
 Review request for mesos, Jie Yu and Vinod Kone.
 
 
 Bugs: MESOS-1764
 https://issues.apache.org/jira/browse/MESOS-1764
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Very minor change to allow git clean -xdf to remove the leveldb directory
 
 
 Diffs
 -
 
   3rdparty/Makefile.am 7cf0c88 
 
 Diff: https://reviews.apache.org/r/25508/diff/
 
 
 Testing
 ---
 
 make check 
 
 
 Thanks,
 
 Timothy St. Clair

Review Request 25512: Made sure IPv6 is disabled for port mapping network isolator.

2014-09-10 Thread Jie Yu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25512/
---

Review request for mesos, Chi Zhang, Vinod Kone, and Cong Wang.


Repository: mesos-git


Description
---

See summary. Since we are not forwarding IPv6 packets, it doesn't make sense to 
enable ipv6. By disabling IPv6, we won't get spamming kernel log warning 
duplicated IPv6 addresses since all veth have the same mac.


Diffs
-

  src/slave/containerizer/isolators/network/port_mapping.cpp 
938782ae2ab1da34eb316381131e9bfcb7c810d1 

Diff: https://reviews.apache.org/r/25512/diff/


Testing
---

sudo make check


Thanks,

Jie Yu

Re: Mesos Driver aborted silently?

2014-09-10 Thread Vinod Kone

My guess is that your driver threw an exception while handling the
offerRescinded() callback which was detected by the JNI binding (IIRC
Mantis is a JVM framework?) causing it to abort the driver. Note that when
a driver aborts, it will send a DeactivateFrameworkMessage to the master
causing the master to deactivate the framework (but still keep it's tasks
alive until the framework failover timeout).

Having said that, your point regarding the scheduler not being able to
detect that the driver is aborted until it makes *another* driver call is
true. The driver doesn't call the error() callback when aborted for a
couple reasons 1) abort() can be called by the scheduler itself, so it
doesn't make too much sense to send a error() callback and 2) if abort() is
causing by a JVM exception, the scheduler probably already knows of it (I'm
guessing this wasn't the case for Mantis?). Perhaps these semantics are
worth reconsidering.

On Tue, Sep 9, 2014 at 3:14 PM, Sharma Podila spod...@netflix.com wrote:

 We had this problem show up yesterday, just one time, that I don't
 understand. Would appreciate any help.

 This is the sequence of events, as far as I can tell:

 From framework's perspective:
 F1: framework got an offer from a host that it decided it will not use, so
 it declines it
 F2: got scheduler call back about offer being rescinded (I believe same
 host that I just declined; the host was terminated by a separate decom
 process)
 F3: calling Mesos driver to kill a task shows driver status as
 DRIVER_ABORTED. However, there was no scheduler callback to reflect this.
 Wouldn't scheduler be told about driver being aborted via one of
 disconnected(), error(), other?

 From Mesos Master perspective:
 M1: failed to validate offer (must be in response to F1)
 M2: deactivating framework

 I am thinking that F1 was initiated by framework before that slave went
 down. But, the slave went down and offer rescinded in Mesos before F1 was
 received in Mesos master, which resulted in M1.

 Which should be OK, I'd imagine. But, here are two things I can't
 understand:

 1. Why was the framework deactivated? I looked in Mesos logs and only
 found the below lines of interest.

 2. Why was the framework not notified about being deactivated, but using
 the driver shows status as DRIVER_ABORTED?

   2.1 Are frameworks required to periodically check the status of driver
 via mechanisms other than the scheduler callback? If so, what are they?

 As I said, this happened only once and likely is a race condition of
 sorts. I can't reproduce it. This sequence of events happen routinely but
 this error happened only once. It is nasty since then the framework just
 sits there with no offers and therefore no tasks get scheduled.

 We're on Mesos 0.18.0 (if this is specifically addressed in 0.19 or 0.20,
 that'd be good to know).
 I remember there was a reference to a problem caused when the created
 mesos driver gets GC'ed. However, our driver reference never goes out of
 scope.

 I have the following relevant logs from framework and Mesos master. The
 timestamps in the logs are from the same clock (on the same machine).

 From MantisMaster:
 2014-09-08 20:08:46,263 WARN Thread-42 MesosSchedulerCallbackHandler -
 Declining offer from host 10.200.13.87 due to missing attribute value for
 EC2_AMI_ID - expecting [ami-5e6bc836] got [ami-28d47740]
 2014-09-08 20:08:46,271 WARN Thread-58 MesosSchedulerCallbackHandler -
 Offer rescinded: offerID=20140908-195444-2298791946-7103-5698-5
 .
 2014-09-08 20:11:31,322 INFO pool-27-thread-1
 VirtualMachineMasterServiceMesosImpl - Calling mesos to kill
 outliers-5-worker-0-7
 2014-09-08 20:11:31,322 INFO pool-27-thread-1
 VirtualMachineMasterServiceMesosImpl - Kill status = DRIVER_ABORTED


 From Mesos-Master:
 W0908 20:08:46.277575  5791 master.cpp:1556] Failed to validate offer
 20140908-195444-2298791946-7103-5698-5 : Offer
 20140908-195444-2298791946-7103-5698-5 is no longer valid
 I0908 20:08:46.277721  5791 master.cpp:1079] Deactivating framework
 MantisFramework
 I0908 20:08:46.278017  5789 hierarchical_allocator_process.hpp:408]
 Deactivated framework MantisFramework

Re: Review Request 25512: Made sure IPv6 is disabled for port mapping network isolator.

2014-09-10 Thread Vinod Kone


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25512/#review52913
---

Ship it!


Have you confirmed/tested that this is safe?

- Vinod Kone


On Sept. 10, 2014, 6:26 p.m., Jie Yu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25512/
 ---
 
 (Updated Sept. 10, 2014, 6:26 p.m.)
 
 
 Review request for mesos, Chi Zhang, Vinod Kone, and Cong Wang.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary. Since we are not forwarding IPv6 packets, it doesn't make sense 
 to enable ipv6. By disabling IPv6, we won't get spamming kernel log warning 
 duplicated IPv6 addresses since all veth have the same mac.
 
 
 Diffs
 -
 
   src/slave/containerizer/isolators/network/port_mapping.cpp 
 938782ae2ab1da34eb316381131e9bfcb7c810d1 
 
 Diff: https://reviews.apache.org/r/25512/diff/
 
 
 Testing
 ---
 
 sudo make check
 
 
 Thanks,
 
 Jie Yu

Re: Review Request 25261: Check for variadic template and default/deleted function support

2014-09-10 Thread Vinod Kone



 On Sept. 2, 2014, 7:50 p.m., Michael Park wrote:
  Just something to note here, there's a bug in earlier GCC versions where 
  the access control of `= default`ed functions aren't enforced correctly.
  
  e.g.
  
  ```
  class Foo
  {
private:
  
Foo() = default;
  
  };
  
  class Bar
  {
private:
  
Bar() {}
  
  };
  
  int main() {
Foo foo;  // Foo::Foo() is private but not enforced.
// Bar bar;  // error: 'Bar::Bar()' is private.
  }
  ```
  
  The above code snippet compiles fine with GCC-4.6.

does it work correctly for gcc-4.4? if yes, it should be fine.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25261/#review52067
---


On Sept. 2, 2014, 5:57 p.m., Dominic Hamon wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25261/
 ---
 
 (Updated Sept. 2, 2014, 5:57 p.m.)
 
 
 Review request for mesos and Benjamin Hindman.
 
 
 Bugs: MESOS-1752 and MESOS-1753
 https://issues.apache.org/jira/browse/MESOS-1752
 https://issues.apache.org/jira/browse/MESOS-1753
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 add c++11 language features to m4 macro that checks for c++11 support
 
 
 Diffs
 -
 
   m4/ax_cxx_compile_stdcxx_11.m4 07b298f151094e818287f741b3e0efd28374e82b 
 
 Diff: https://reviews.apache.org/r/25261/diff/
 
 
 Testing
 ---
 
 built with g++-4.4, the minimum compiler we support.
 
 
 Thanks,
 
 Dominic Hamon

Re: Review Request 25512: Made sure IPv6 is disabled for port mapping network isolator.

2014-09-10 Thread Cong Wang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25512/#review52918
---

Ship it!


Maybe check if /proc/sys/net/ipv6/conf/all/disable_ipv6 exists in child script 
too since you did outside?

- Cong Wang


On Sept. 10, 2014, 6:26 p.m., Jie Yu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25512/
 ---
 
 (Updated Sept. 10, 2014, 6:26 p.m.)
 
 
 Review request for mesos, Chi Zhang, Vinod Kone, and Cong Wang.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary. Since we are not forwarding IPv6 packets, it doesn't make sense 
 to enable ipv6. By disabling IPv6, we won't get spamming kernel log warning 
 duplicated IPv6 addresses since all veth have the same mac.
 
 
 Diffs
 -
 
   src/slave/containerizer/isolators/network/port_mapping.cpp 
 938782ae2ab1da34eb316381131e9bfcb7c810d1 
 
 Diff: https://reviews.apache.org/r/25512/diff/
 
 
 Testing
 ---
 
 sudo make check
 
 
 Thanks,
 
 Jie Yu

Re: Review Request 25508: Fix git clean -xdf skipping leveldb

2014-09-10 Thread Timothy St. Clair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25508/#review52919
---

Ship it!


So for completeness the actual diff is not posted here b/c it is a binary blob 
re: comments, but was the result of the following: 

cd 3rdparty
tar -xzf leveldb.tar.gz 
cd leveldb 
git archive --format=tar.gz --prefix=leveldb/ origin/master  leveldb.tar.gz
mv leveldb.tar.gz ../ 
cd .. 
rm -rf leveldb 

then proceed to test the make mechanics as b4. 

git clean -xdf now removes the leveldb subdir.

- Timothy St. Clair


On Sept. 10, 2014, 4:30 p.m., Timothy St. Clair wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25508/
 ---
 
 (Updated Sept. 10, 2014, 4:30 p.m.)
 
 
 Review request for mesos, Jie Yu and Vinod Kone.
 
 
 Bugs: MESOS-1764
 https://issues.apache.org/jira/browse/MESOS-1764
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Very minor change to allow git clean -xdf to remove the leveldb directory
 
 
 Diffs
 -
 
   3rdparty/Makefile.am 7cf0c88 
 
 Diff: https://reviews.apache.org/r/25508/diff/
 
 
 Testing
 ---
 
 make check 
 
 
 Thanks,
 
 Timothy St. Clair

Re: Review Request 25512: Made sure IPv6 is disabled for port mapping network isolator.

2014-09-10 Thread Ian Downes


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25512/#review52920
---


Does this mean that users that open sockets (withouth specifying) will only get 
a v4 socket? What happens if they try to open a v6 socket?

- Ian Downes


On Sept. 10, 2014, 11:26 a.m., Jie Yu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25512/
 ---
 
 (Updated Sept. 10, 2014, 11:26 a.m.)
 
 
 Review request for mesos, Chi Zhang, Vinod Kone, and Cong Wang.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary. Since we are not forwarding IPv6 packets, it doesn't make sense 
 to enable ipv6. By disabling IPv6, we won't get spamming kernel log warning 
 duplicated IPv6 addresses since all veth have the same mac.
 
 
 Diffs
 -
 
   src/slave/containerizer/isolators/network/port_mapping.cpp 
 938782ae2ab1da34eb316381131e9bfcb7c810d1 
 
 Diff: https://reviews.apache.org/r/25512/diff/
 
 
 Testing
 ---
 
 sudo make check
 
 
 Thanks,
 
 Jie Yu

Build failed in Jenkins: Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME #2096

2014-09-10 Thread Apache Jenkins Server

 hostname: penates.apache.org
I0910 19:26:48.788871 16941 slave.cpp:316] Slave checkpoint: false
I0910 19:26:48.789297 16945 state.cpp:33] Recovering state from 
'/tmp/GarbageCollectorIntegrationTest_DiskUsage_rWhuy4/meta'
I0910 19:26:48.789433 16945 status_update_manager.cpp:193] Recovering status 
update manager
I0910 19:26:48.789624 16939 slave.cpp:3202] Finished recovery
I0910 19:26:48.789911 16937 slave.cpp:598] New master detected at 
master@67.195.81.186:41538
I0910 19:26:48.789952 16937 slave.cpp:672] Authenticating with master 
master@67.195.81.186:41538
I0910 19:26:48.789994 16948 status_update_manager.cpp:167] New master detected 
at master@67.195.81.186:41538
I0910 19:26:48.790019 16937 slave.cpp:645] Detecting new master
I0910 19:26:48.790046 16936 authenticatee.hpp:128] Creating new client SASL 
connection
I0910 19:26:48.922570 16936 master.cpp:3653] Authenticating 
slave(206)@67.195.81.186:41538
I0910 19:26:48.922710 16934 authenticator.hpp:156] Creating new server SASL 
connection
I0910 19:26:48.922807 16934 authenticatee.hpp:219] Received SASL authentication 
mechanisms: CRAM-MD5
I0910 19:26:48.922827 16934 authenticatee.hpp:245] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I0910 19:26:48.922914 16940 authenticator.hpp:262] Received SASL authentication 
start
I0910 19:26:48.922978 16940 authenticator.hpp:384] Authentication requires more 
steps
I0910 19:26:48.923027 16940 authenticatee.hpp:265] Received SASL authentication 
step
I0910 19:26:48.923100 16948 authenticator.hpp:290] Received SASL authentication 
step
I0910 19:26:48.923125 16948 auxprop.cpp:81] Request to lookup properties for 
user: 'test-principal' realm: 'penates.apache.org' server FQDN: 
'penates.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false 
SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0910 19:26:48.923135 16948 auxprop.cpp:153] Looking up auxiliary property 
'*userPassword'
I0910 19:26:48.923147 16948 auxprop.cpp:153] Looking up auxiliary property 
'*cmusaslsecretCRAM-MD5'
I0910 19:26:48.923159 16948 auxprop.cpp:81] Request to lookup properties for 
user: 'test-principal' realm: 'penates.apache.org' server FQDN: 
'penates.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH: false 
SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0910 19:26:48.923168 16948 auxprop.cpp:103] Skipping auxiliary property 
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I0910 19:26:48.923177 16948 auxprop.cpp:103] Skipping auxiliary property 
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0910 19:26:48.923192 16948 authenticator.hpp:376] Authentication success
I0910 19:26:48.923270 16947 authenticatee.hpp:305] Authentication success
I0910 19:26:48.923288 16937 master.cpp:3693] Successfully authenticated 
principal 'test-principal' at slave(206)@67.195.81.186:41538
I0910 19:26:48.923444 16947 slave.cpp:729] Successfully authenticated with 
master master@67.195.81.186:41538
I0910 19:26:48.923501 16947 slave.cpp:980] Will retry registration in 
7.844963ms if necessary
I0910 19:26:48.923569 16946 master.cpp:2843] Registering slave at 
slave(206)@67.195.81.186:41538 (penates.apache.org) with id 
20140910-192648-3125920579-41538-16920-0
I0910 19:26:48.923704 16937 registrar.cpp:422] Attempting to update the 
'registry'
I0910 19:26:48.925449 16943 log.cpp:680] Attempting to append 337 bytes to the 
log
I0910 19:26:48.925525 16940 coordinator.cpp:340] Coordinator attempting to 
write APPEND action at position 3
I0910 19:26:48.925945 16942 replica.cpp:508] Replica received write request for 
position 3
I0910 19:26:48.926174 16942 leveldb.cpp:343] Persisting action (356 bytes) to 
leveldb took 207163ns
I0910 19:26:48.926193 16942 replica.cpp:676] Persisted action at 3
I0910 19:26:48.926488 16939 replica.cpp:655] Replica received learned notice 
for position 3
I0910 19:26:48.926950 16939 leveldb.cpp:343] Persisting action (358 bytes) to 
leveldb took 437632ns
I0910 19:26:48.926970 16939 replica.cpp:676] Persisted action at 3
I0910 19:26:48.926980 16939 replica.cpp:661] Replica learned APPEND action at 
position 3
I0910 19:26:48.927336 16949 registrar.cpp:479] Successfully updated 'registry'
I0910 19:26:48.927433 16935 log.cpp:699] Attempting to truncate the log to 3
I0910 19:26:48.927454 16948 master.cpp:2883] Registered slave 
20140910-192648-3125920579-41538-16920-0 at slave(206)@67.195.81.186:41538 
(penates.apache.org)
I0910 19:26:48.927476 16948 master.cpp:4126] Adding slave 
20140910-192648-3125920579-41538-16920-0 at slave(206)@67.195.81.186:41538 
(penates.apache.org) with cpus(*):2; mem(*):1024; disk(*):1024; 
ports(*):[31000-32000]
I0910 19:26:48.927518 16947 coordinator.cpp:340] Coordinator attempting to 
write TRUNCATE action at position 4
I0910 19:26:48.927639 16940 slave.cpp:763] Registered with master 
master@67.195.81.186:41538; given slave ID 
20140910-192648-3125920579-41538-16920-0
I0910 19:26:48.927705 16940 slave.cpp:2329] Received ping from 
slave-observer(184)@67.195.81.186:41538
I0910 19:26

Re: Review Request 25512: Made sure IPv6 is disabled for port mapping network isolator.

2014-09-10 Thread Chi Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25512/#review52923
---

Ship it!


agree to check to make sure this works in dev-clusters and the kernel warning 
messages go away, if hasn't been.

- Chi Zhang


On Sept. 10, 2014, 6:26 p.m., Jie Yu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25512/
 ---
 
 (Updated Sept. 10, 2014, 6:26 p.m.)
 
 
 Review request for mesos, Chi Zhang, Vinod Kone, and Cong Wang.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary. Since we are not forwarding IPv6 packets, it doesn't make sense 
 to enable ipv6. By disabling IPv6, we won't get spamming kernel log warning 
 duplicated IPv6 addresses since all veth have the same mac.
 
 
 Diffs
 -
 
   src/slave/containerizer/isolators/network/port_mapping.cpp 
 938782ae2ab1da34eb316381131e9bfcb7c810d1 
 
 Diff: https://reviews.apache.org/r/25512/diff/
 
 
 Testing
 ---
 
 sudo make check
 
 
 Thanks,
 
 Jie Yu

Review Request 25516: Fixed authorization tests to properly deal with registration retries.

2014-09-10 Thread Vinod Kone


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25516/
---

Review request for mesos and Jiang Yan Xu.


Bugs: MESOS-1760 and MESOS-1766
https://issues.apache.org/jira/browse/MESOS-1760
https://issues.apache.org/jira/browse/MESOS-1766


Repository: mesos-git


Description
---

Since the authorization tests do not control the retry behavior of the 
scheduler driver, it is possible for the driver to retry registrations and thus 
'register_framework' authorizations. The MockAuthorizer needs to account for 
this by allowing all subsequent authorization attempts.


Diffs
-

  src/tests/master_authorization_tests.cpp 
b9aa7bf4f53e414d84f8cf4e020a645db8e5d855 

Diff: https://reviews.apache.org/r/25516/diff/


Testing
---

make check


Thanks,

Vinod Kone

Build failed in Jenkins: mesos-reviewbot #1511

2014-09-10 Thread Apache Jenkins Server

See https://builds.apache.org/job/mesos-reviewbot/1511/changes

Changes:

[tstclair] Fix protobuf detection on systems with Python 3 as default (part2)

--
[...truncated 5420 lines...]
rm -f slave/containerizer/mesos/.dirstamp
rm -f sasl/*.o
rm -f state/.deps/.dirstamp
rm -f sasl/*.lo
rm -f state/.dirstamp
rm -f sched/*.o
rm -f tests/.deps/.dirstamp
rm -f sched/*.lo
rm -f tests/.dirstamp
rm -f scheduler/*.o
rm -f tests/common/.deps/.dirstamp
rm -f scheduler/*.lo
rm -f tests/common/.dirstamp
rm -f slave/*.o
rm -f usage/.deps/.dirstamp
rm -f usage/.dirstamp
rm -f zookeeper/.deps/.dirstamp
rm -f zookeeper/.dirstamp
rm -f slave/*.lo
rm -f slave/containerizer/*.o
rm -f slave/containerizer/*.lo
rm -f slave/containerizer/isolators/cgroups/*.o
rm -f slave/containerizer/isolators/cgroups/*.lo
rm -f slave/containerizer/isolators/network/*.o
rm -f slave/containerizer/isolators/network/*.lo
rm -f slave/containerizer/mesos/*.o
rm -f slave/containerizer/mesos/*.lo
rm -f state/*.o
rm -f state/*.lo
rm -f tests/*.o
rm -f tests/common/*.o
rm -f usage/*.o
rm -f usage/*.lo
rm -f zookeeper/*.o
rm -f zookeeper/*.lo
rm -rf authorizer/.libs authorizer/_libs
rm -rf common/.libs common/_libs
rm -rf containerizer/.libs containerizer/_libs
rm -rf docker/.libs docker/_libs
rm -rf exec/.libs exec/_libs
rm -rf files/.libs files/_libs
rm -rf java/jni/.libs java/jni/_libs
rm -rf jvm/.libs jvm/_libs
rm -rf jvm/org/apache/.libs jvm/org/apache/_libs
rm -rf linux/.libs linux/_libs
rm -rf linux/routing/.libs linux/routing/_libs
rm -rf linux/routing/filter/.libs linux/routing/filter/_libs
rm -rf linux/routing/link/.libs linux/routing/link/_libs
rm -rf linux/routing/queueing/.libs linux/routing/queueing/_libs
rm -rf local/.libs local/_libs
rm -rf log/.libs log/_libs
rm -rf log/tool/.libs log/tool/_libs
rm -rf logging/.libs logging/_libs
rm -rf master/.libs master/_libs
rm -rf messages/.libs messages/_libs
rm -rf sasl/.libs sasl/_libs
rm -rf sched/.libs sched/_libs
rm -rf scheduler/.libs scheduler/_libs
rm -rf slave/.libs slave/_libs
rm -rf slave/containerizer/.libs slave/containerizer/_libs
rm -rf slave/containerizer/isolators/cgroups/.libs 
slave/containerizer/isolators/cgroups/_libs
rm -rf slave/containerizer/isolators/network/.libs 
slave/containerizer/isolators/network/_libs
rm -rf slave/containerizer/mesos/.libs slave/containerizer/mesos/_libs
rm -rf state/.libs state/_libs
rm -rf usage/.libs usage/_libs
rm -rf zookeeper/.libs zookeeper/_libs
rm -rf ./.deps authorizer/.deps cli/.deps common/.deps containerizer/.deps 
docker/.deps examples/.deps exec/.deps files/.deps health-check/.deps 
java/jni/.deps jvm/.deps jvm/org/apache/.deps launcher/.deps linux/.deps 
linux/routing/.deps linux/routing/filter/.deps linux/routing/link/.deps 
linux/routing/queueing/.deps local/.deps log/.deps log/tool/.deps logging/.deps 
master/.deps messages/.deps sasl/.deps sched/.deps scheduler/.deps slave/.deps 
slave/containerizer/.deps slave/containerizer/isolators/cgroups/.deps 
slave/containerizer/isolators/network/.deps slave/containerizer/mesos/.deps 
state/.deps tests/.deps tests/common/.deps usage/.deps zookeeper/.deps
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/src'
Making distclean in ec2
make[2]: Entering directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -rf .libs _libs
rm -f *.lo
test -z  || rm -f 
test . = ../../ec2 || test -z  || rm -f 
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -f config.status config.cache config.log configure.lineno 
config.status.lineno
rm -f Makefile
make[1]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build'
if test -d mesos-0.21.0; then find mesos-0.21.0 -type d ! -perm -200 -exec 
chmod u+w {} ';'  rm -rf mesos-0.21.0 || { sleep 5  rm -rf 
mesos-0.21.0; }; else :; fi
==
mesos-0.21.0 archives ready for distribution: 
mesos-0.21.0.tar.gz
==

real73m32.497s
user142m31.486s
sys 7m54.482s
+ chmod -R +w 3rdparty CHANGELOG Doxyfile LICENSE Makefile Makefile.am 
Makefile.in NOTICE README.md aclocal.m4 ar-lib autom4te.cache bin bootstrap 
compile config.guess config.log config.lt config.status config.sub configure 
configure.ac depcomp docs ec2 frameworks include install-sh libtool ltmain.sh 
m4 mesos-0.21.0.tar.gz mesos.pc mesos.pc.in missing mpi src support
+ git clean -fdx
Removing .libs/
Removing 3rdparty/Makefile
Removing 3rdparty/Makefile.in
Removing 3rdparty/libprocess/.deps/
Removing 3rdparty/libprocess/3rdparty/.deps/
Removing 3rdparty/libprocess/3rdparty/Makefile
Removing 3rdparty/libprocess/3rdparty/Makefile.in
Removing 3rdparty/libprocess/3rdparty/gmock_sources.cc
Removing 3rdparty/libprocess/3rdparty/stout/Makefile
Removing

Jenkins build is back to normal : Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME #2097

2014-09-10 Thread Apache Jenkins Server

See 
https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-In-Src-Set-JAVA_HOME/2097/changes

Design doc for updating FrameworkInfo

2014-09-10 Thread Vinod Kone

Hi folks,

We have a design doc up (attached to MESOS-1784
https://issues.apache.org/jira/browse/MESOS-1784) for properly updating
the FrameworkInfo.

The basic idea is to provide frameworks the ability update any fields of
their FrameworkInfo (e.g., 'user', 'failover_timeout') without having to
restart masters/slaves/tasks/executors.

Feel fee to provide feedback on the doc or the ticket.

Thanks,
Vinod

Re: Review Request 25035: Fix for MESOS-1688

2014-09-10 Thread Martin Weindel


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
---

(Updated Sept. 10, 2014, 10 nachm.)


Review request for mesos and Vinod Kone.


Changes
---

fixed review issues


Bugs: MESOS-1688
https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
---

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs (updated)
-

  src/common/resources.cpp edf36b1 
  src/master/constants.cpp faa1503 
  src/master/hierarchical_allocator_process.hpp 34f8cd6 
  src/master/master.cpp 18464ba 
  src/tests/allocator_tests.cpp 774528a 

Diff: https://reviews.apache.org/r/25035/diff/


Testing
---

Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
running multiple parallel Spark jobs in fine-grained mode to saturate 
allocatable memory. The jobs run fine now. This load always caused a dead lock 
in all Spark jobs within one minute with the unpatched Mesos.


Thanks,

Martin Weindel

Re: Review Request 25035: Fix for MESOS-1688

2014-09-10 Thread Martin Weindel



 On Sept. 9, 2014, 7:10 nachm., Vinod Kone wrote:
  src/master/master.cpp, line 1901
  https://reviews.apache.org/r/25035/diff/4/?file=682182#file682182line1901
 
  I like these warnings.
  
  Are you planning to get this in to 0.20.1 or 0.21.0 ? If the former, 
  can you add this to the list of deprecations in CHANGELOG.

Would be nice to see this in 0.20.1.
But it is not clear to me, how to update the CHANGELOG. There is no section for 
upcoming releases.


- Martin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/#review52763
---


On Sept. 10, 2014, 10 nachm., Martin Weindel wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25035/
 ---
 
 (Updated Sept. 10, 2014, 10 nachm.)
 
 
 Review request for mesos and Vinod Kone.
 
 
 Bugs: MESOS-1688
 https://issues.apache.org/jira/browse/MESOS-1688
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 As already explained in JIRA MESOS-1688, there are schedulers allocating 
 memory only for the executor and not for tasks. For tasks only CPU resources 
 are allocated in this case.
 Such a scheduler does not get offered any idle CPUs if the slave has nearly 
 used up all memory.
 This can easily lead to a dead lock (in the application, not in Mesos).
 
 Simple example:
 1. Scheduler allocates all memory of a slave for an executor
 2. Scheduler launches a task for this executor (allocating 1 CPU)
 3. Task finishes: 1 CPU , 0 MB memory allocatable.
 4. No offers are made, as no memory is left. Scheduler will wait for offers 
 forever. Dead lock in the application.
 
 To fix this problem, offers must be made if CPU resources are allocatable 
 without considering allocatable memory
 
 
 Diffs
 -
 
   src/common/resources.cpp edf36b1 
   src/master/constants.cpp faa1503 
   src/master/hierarchical_allocator_process.hpp 34f8cd6 
   src/master/master.cpp 18464ba 
   src/tests/allocator_tests.cpp 774528a 
 
 Diff: https://reviews.apache.org/r/25035/diff/
 
 
 Testing
 ---
 
 Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
 running multiple parallel Spark jobs in fine-grained mode to saturate 
 allocatable memory. The jobs run fine now. This load always caused a dead 
 lock in all Spark jobs within one minute with the unpatched Mesos.
 
 
 Thanks,
 
 Martin Weindel

Re: Review Request 25516: Fixed authorization tests to properly deal with registration retries.

2014-09-10 Thread Jiang Yan Xu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25516/#review52960
---

Ship it!


Ship It!

- Jiang Yan Xu


On Sept. 10, 2014, 12:55 p.m., Vinod Kone wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25516/
 ---
 
 (Updated Sept. 10, 2014, 12:55 p.m.)
 
 
 Review request for mesos and Jiang Yan Xu.
 
 
 Bugs: MESOS-1760 and MESOS-1766
 https://issues.apache.org/jira/browse/MESOS-1760
 https://issues.apache.org/jira/browse/MESOS-1766
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Since the authorization tests do not control the retry behavior of the 
 scheduler driver, it is possible for the driver to retry registrations and 
 thus 'register_framework' authorizations. The MockAuthorizer needs to account 
 for this by allowing all subsequent authorization attempts.
 
 
 Diffs
 -
 
   src/tests/master_authorization_tests.cpp 
 b9aa7bf4f53e414d84f8cf4e020a645db8e5d855 
 
 Diff: https://reviews.apache.org/r/25516/diff/
 
 
 Testing
 ---
 
 make check
 
 
 Thanks,
 
 Vinod Kone

Completed tasks remains in TASK_RUNNING when framework is disconnected

2014-09-10 Thread Niklas Nielsen

Hi guys,

We have run into a problem that cause tasks which completes, when a
framework is disconnected and has a fail-over time, to remain in a running
state even though the tasks actually finishes.

Here is a test framework we have been able to reproduce the issue with:
https://gist.github.com/nqn/9b9b1de9123a6e836f54
It launches many short-lived tasks (1 second sleep) and when killing the
framework instance, the master reports the tasks as running even after
several minutes:
http://cl.ly/image/2R3719461e0t/Screen%20Shot%202014-09-10%20at%203.19.39%20PM.png

When clicking on one of the slaves where, for example, task 49 runs; the
slave knows that it completed:
http://cl.ly/image/2P410L3m1O1N/Screen%20Shot%202014-09-10%20at%203.21.29%20PM.png

The tasks only finish when the framework connects again (which it may never
do). This is on Mesos 0.20.0, but also applies to HEAD (as of today).
Do you guys have any insights into what may be going on here? Is this
by-design or a bug?

Thanks,
Niklas

Re: Review Request 25512: Made sure IPv6 is disabled for port mapping network isolator.

2014-09-10 Thread Jie Yu



 On Sept. 10, 2014, 6:48 p.m., Vinod Kone wrote:
  Have you confirmed/tested that this is safe?

Tested.


- Jie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25512/#review52913
---


On Sept. 10, 2014, 6:26 p.m., Jie Yu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25512/
 ---
 
 (Updated Sept. 10, 2014, 6:26 p.m.)
 
 
 Review request for mesos, Chi Zhang, Vinod Kone, and Cong Wang.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary. Since we are not forwarding IPv6 packets, it doesn't make sense 
 to enable ipv6. By disabling IPv6, we won't get spamming kernel log warning 
 duplicated IPv6 addresses since all veth have the same mac.
 
 
 Diffs
 -
 
   src/slave/containerizer/isolators/network/port_mapping.cpp 
 938782ae2ab1da34eb316381131e9bfcb7c810d1 
 
 Diff: https://reviews.apache.org/r/25512/diff/
 
 
 Testing
 ---
 
 sudo make check
 
 
 Thanks,
 
 Jie Yu

Re: Review Request 25512: Made sure IPv6 is disabled for port mapping network isolator.

2014-09-10 Thread Jie Yu



 On Sept. 10, 2014, 7:06 p.m., Cong Wang wrote:
  Maybe check if /proc/sys/net/ipv6/conf/all/disable_ipv6 exists in child 
  script too since you did outside?

It's OK, if the proc file does not exist, it'll be a no-op as we don't use set 
-e.


- Jie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25512/#review52918
---


On Sept. 10, 2014, 6:26 p.m., Jie Yu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25512/
 ---
 
 (Updated Sept. 10, 2014, 6:26 p.m.)
 
 
 Review request for mesos, Chi Zhang, Vinod Kone, and Cong Wang.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary. Since we are not forwarding IPv6 packets, it doesn't make sense 
 to enable ipv6. By disabling IPv6, we won't get spamming kernel log warning 
 duplicated IPv6 addresses since all veth have the same mac.
 
 
 Diffs
 -
 
   src/slave/containerizer/isolators/network/port_mapping.cpp 
 938782ae2ab1da34eb316381131e9bfcb7c810d1 
 
 Diff: https://reviews.apache.org/r/25512/diff/
 
 
 Testing
 ---
 
 sudo make check
 
 
 Thanks,
 
 Jie Yu

Re: Review Request 25512: Made sure IPv6 is disabled for port mapping network isolator.

2014-09-10 Thread Jie Yu



 On Sept. 10, 2014, 7:11 p.m., Ian Downes wrote:
  Does this mean that users that open sockets (withouth specifying) will only 
  get a v4 socket? What happens if they try to open a v6 socket?

Tested. If they open a v6 socket, IPv4 will be used for communication (unless 
they use IPV6_ONLY).


- Jie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25512/#review52920
---


On Sept. 10, 2014, 6:26 p.m., Jie Yu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25512/
 ---
 
 (Updated Sept. 10, 2014, 6:26 p.m.)
 
 
 Review request for mesos, Chi Zhang, Vinod Kone, and Cong Wang.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary. Since we are not forwarding IPv6 packets, it doesn't make sense 
 to enable ipv6. By disabling IPv6, we won't get spamming kernel log warning 
 duplicated IPv6 addresses since all veth have the same mac.
 
 
 Diffs
 -
 
   src/slave/containerizer/isolators/network/port_mapping.cpp 
 938782ae2ab1da34eb316381131e9bfcb7c810d1 
 
 Diff: https://reviews.apache.org/r/25512/diff/
 
 
 Testing
 ---
 
 sudo make check
 
 
 Thanks,
 
 Jie Yu

Re: Review Request 25512: Made sure IPv6 is disabled for port mapping network isolator.

2014-09-10 Thread Jie Yu



 On Sept. 10, 2014, 7:40 p.m., Chi Zhang wrote:
  agree to check to make sure this works in dev-clusters and the kernel 
  warning messages go away, if hasn't been.

The kernel log no longer has that warning after this change.


- Jie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25512/#review52923
---


On Sept. 10, 2014, 6:26 p.m., Jie Yu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25512/
 ---
 
 (Updated Sept. 10, 2014, 6:26 p.m.)
 
 
 Review request for mesos, Chi Zhang, Vinod Kone, and Cong Wang.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary. Since we are not forwarding IPv6 packets, it doesn't make sense 
 to enable ipv6. By disabling IPv6, we won't get spamming kernel log warning 
 duplicated IPv6 addresses since all veth have the same mac.
 
 
 Diffs
 -
 
   src/slave/containerizer/isolators/network/port_mapping.cpp 
 938782ae2ab1da34eb316381131e9bfcb7c810d1 
 
 Diff: https://reviews.apache.org/r/25512/diff/
 
 
 Testing
 ---
 
 sudo make check
 
 
 Thanks,
 
 Jie Yu

Re: Completed tasks remains in TASK_RUNNING when framework is disconnected

2014-09-10 Thread Niklas Nielsen

Here is the log of a mesos-local instance where I reproduced it:
https://gist.github.com/nqn/f7ee20601199d70787c0 (Here task 10 to 19 are
stuck in running state).
There is a lot of output, so here is a filtered log for task 10:
https://gist.github.com/nqn/a53e5ea05c5e41cd5a7d

At first glance, it looks like the task can't be found when trying to
forward the finish update because the running update never got acknowledged
before the framework disconnected. I may be missing something here.

Niklas


On 10 September 2014 16:09, Niklas Nielsen nik...@mesosphere.io wrote:

 Hi guys,

 We have run into a problem that cause tasks which completes, when a
 framework is disconnected and has a fail-over time, to remain in a running
 state even though the tasks actually finishes.

 Here is a test framework we have been able to reproduce the issue with:
 https://gist.github.com/nqn/9b9b1de9123a6e836f54
 It launches many short-lived tasks (1 second sleep) and when killing the
 framework instance, the master reports the tasks as running even after
 several minutes:
 http://cl.ly/image/2R3719461e0t/Screen%20Shot%202014-09-10%20at%203.19.39%20PM.png

 When clicking on one of the slaves where, for example, task 49 runs; the
 slave knows that it completed:
 http://cl.ly/image/2P410L3m1O1N/Screen%20Shot%202014-09-10%20at%203.21.29%20PM.png

 The tasks only finish when the framework connects again (which it may
 never do). This is on Mesos 0.20.0, but also applies to HEAD (as of today).
 Do you guys have any insights into what may be going on here? Is this
 by-design or a bug?

 Thanks,
 Niklas

Build failed in Jenkins: mesos-reviewbot #1512

2014-09-10 Thread Apache Jenkins Server

See https://builds.apache.org/job/mesos-reviewbot/1512/changes

Changes:

[bmahler] Send pending tasks during re-registration.

[bmahler] Made the GarbageCollector injectable into the Slave.

[bmahler] Added a test for sending pending tasks during re-registration.

[tstclair] Fix git clean -xdf skipping leveldb, removes internal .git dirs

--
[...truncated 5607 lines...]
rm -f scheduler/.dirstamp
rm -f sched/*.lo
rm -f slave/.deps/.dirstamp
rm -f scheduler/*.o
rm -f slave/.dirstamp
rm -rf jvm/.libs jvm/_libs
rm -f scheduler/*.lo
rm -f slave/containerizer/.deps/.dirstamp
rm -rf jvm/org/apache/.libs jvm/org/apache/_libs
rm -f slave/*.o
rm -f slave/containerizer/.dirstamp
rm -rf linux/.libs linux/_libs
rm -f slave/containerizer/isolators/cgroups/.deps/.dirstamp
rm -f slave/containerizer/isolators/cgroups/.dirstamp
rm -rf linux/routing/.libs linux/routing/_libs
rm -f slave/containerizer/isolators/network/.deps/.dirstamp
rm -rf linux/routing/filter/.libs linux/routing/filter/_libs
rm -f slave/*.lo
rm -f slave/containerizer/isolators/network/.dirstamp
rm -rf linux/routing/link/.libs linux/routing/link/_libs
rm -f slave/containerizer/*.o
rm -f slave/containerizer/mesos/.deps/.dirstamp
rm -rf linux/routing/queueing/.libs linux/routing/queueing/_libs
rm -f slave/containerizer/mesos/.dirstamp
rm -rf local/.libs local/_libs
rm -f state/.deps/.dirstamp
rm -f slave/containerizer/*.lo
rm -rf log/.libs log/_libs
rm -f state/.dirstamp
rm -f slave/containerizer/isolators/cgroups/*.o
rm -f tests/.deps/.dirstamp
rm -f tests/.dirstamp
rm -f slave/containerizer/isolators/cgroups/*.lo
rm -f tests/common/.deps/.dirstamp
rm -f slave/containerizer/isolators/network/*.o
rm -f tests/common/.dirstamp
rm -f slave/containerizer/isolators/network/*.lo
rm -f usage/.deps/.dirstamp
rm -rf log/tool/.libs log/tool/_libs
rm -f slave/containerizer/mesos/*.o
rm -f usage/.dirstamp
rm -rf logging/.libs logging/_libs
rm -f zookeeper/.deps/.dirstamp
rm -f slave/containerizer/mesos/*.lo
rm -rf master/.libs master/_libs
rm -f state/*.o
rm -f zookeeper/.dirstamp
rm -f state/*.lo
rm -f tests/*.o
rm -rf messages/.libs messages/_libs
rm -rf sasl/.libs sasl/_libs
rm -rf sched/.libs sched/_libs
rm -rf scheduler/.libs scheduler/_libs
rm -rf slave/.libs slave/_libs
rm -rf slave/containerizer/.libs slave/containerizer/_libs
rm -rf slave/containerizer/isolators/cgroups/.libs 
slave/containerizer/isolators/cgroups/_libs
rm -rf slave/containerizer/isolators/network/.libs 
slave/containerizer/isolators/network/_libs
rm -rf slave/containerizer/mesos/.libs slave/containerizer/mesos/_libs
rm -rf state/.libs state/_libs
rm -rf usage/.libs usage/_libs
rm -rf zookeeper/.libs zookeeper/_libs
rm -f tests/common/*.o
rm -f usage/*.o
rm -f usage/*.lo
rm -f zookeeper/*.o
rm -f zookeeper/*.lo
rm -rf ./.deps authorizer/.deps cli/.deps common/.deps containerizer/.deps 
docker/.deps examples/.deps exec/.deps files/.deps health-check/.deps 
java/jni/.deps jvm/.deps jvm/org/apache/.deps launcher/.deps linux/.deps 
linux/routing/.deps linux/routing/filter/.deps linux/routing/link/.deps 
linux/routing/queueing/.deps local/.deps log/.deps log/tool/.deps logging/.deps 
master/.deps messages/.deps sasl/.deps sched/.deps scheduler/.deps slave/.deps 
slave/containerizer/.deps slave/containerizer/isolators/cgroups/.deps 
slave/containerizer/isolators/network/.deps slave/containerizer/mesos/.deps 
state/.deps tests/.deps tests/common/.deps usage/.deps zookeeper/.deps
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/src'
Making distclean in ec2
make[2]: Entering directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -rf .libs _libs
rm -f *.lo
test -z  || rm -f 
test . = ../../ec2 || test -z  || rm -f 
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -f config.status config.cache config.log configure.lineno 
config.status.lineno
rm -f Makefile
make[1]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build'
if test -d mesos-0.21.0; then find mesos-0.21.0 -type d ! -perm -200 -exec 
chmod u+w {} ';'  rm -rf mesos-0.21.0 || { sleep 5  rm -rf 
mesos-0.21.0; }; else :; fi
==
mesos-0.21.0 archives ready for distribution: 
mesos-0.21.0.tar.gz
==

real74m57.207s
user144m19.003s
sys 7m59.166s
+ chmod -R +w 3rdparty CHANGELOG Doxyfile LICENSE Makefile Makefile.am 
Makefile.in NOTICE README.md aclocal.m4 ar-lib autom4te.cache bin bootstrap 
compile config.guess config.log config.lt config.status config.sub configure 
configure.ac depcomp docs ec2 frameworks include install-sh libtool ltmain.sh 
m4 mesos-0.21.0.tar.gz mesos.pc mesos.pc.in missing mpi src support
+ git clean -fdx
Removing .libs/
Removing 3rdparty/Makefile
Removing 3rdparty/Makefile.in
Removing

Re: Completed tasks remains in TASK_RUNNING when framework is disconnected

2014-09-10 Thread Vinod Kone

What you observed is expected because of the way the slave (specifically,
the status update manager) operates.

The status update manager only sends the next update for a task if a
previous update (if it exists) has been acked.

In your case, since TASK_RUNNING was not acked by the framework, master
doesn't know about the TASK_FINISHED update that is queued up by the status
update manager.

If the framework never comes back, i.e., failover timeout elapses, master
shuts down the framework, which releases those resources.

On Wed, Sep 10, 2014 at 4:43 PM, Niklas Nielsen nik...@mesosphere.io
wrote:

 Here is the log of a mesos-local instance where I reproduced it:
 https://gist.github.com/nqn/f7ee20601199d70787c0 (Here task 10 to 19 are
 stuck in running state).
 There is a lot of output, so here is a filtered log for task 10:
 https://gist.github.com/nqn/a53e5ea05c5e41cd5a7d

 At first glance, it looks like the task can't be found when trying to
 forward the finish update because the running update never got acknowledged
 before the framework disconnected. I may be missing something here.

 Niklas


 On 10 September 2014 16:09, Niklas Nielsen nik...@mesosphere.io wrote:

  Hi guys,
 
  We have run into a problem that cause tasks which completes, when a
  framework is disconnected and has a fail-over time, to remain in a
 running
  state even though the tasks actually finishes.
 
  Here is a test framework we have been able to reproduce the issue with:
  https://gist.github.com/nqn/9b9b1de9123a6e836f54
  It launches many short-lived tasks (1 second sleep) and when killing the
  framework instance, the master reports the tasks as running even after
  several minutes:
 
 http://cl.ly/image/2R3719461e0t/Screen%20Shot%202014-09-10%20at%203.19.39%20PM.png
 
  When clicking on one of the slaves where, for example, task 49 runs; the
  slave knows that it completed:
 
 http://cl.ly/image/2P410L3m1O1N/Screen%20Shot%202014-09-10%20at%203.21.29%20PM.png
 
  The tasks only finish when the framework connects again (which it may
  never do). This is on Mesos 0.20.0, but also applies to HEAD (as of
 today).
  Do you guys have any insights into what may be going on here? Is this
  by-design or a bug?
 
  Thanks,
  Niklas

Review Request 25523: Add Docker pull to docker abstraction

2014-09-10 Thread Timothy Chen


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25523/
---

Review request for mesos and Benjamin Hindman.


Repository: mesos-git


Description
---

Add Docker pull to docker abstraction


Diffs
-

  src/docker/docker.hpp e7adedb93272209231a3a9aefecfd6ccc7802ff5 
  src/docker/docker.cpp af51ac9058382aede61b09e06e312ad2ce6de03e 
  src/slave/containerizer/docker.cpp 0febbac5df4126f6c8d9a06dd0ba1668d041b34a 
  src/tests/docker_tests.cpp 826a8c1ef1b3089d416e5775fa2cf4e5cb0c26d1 

Diff: https://reviews.apache.org/r/25523/diff/


Testing
---

make check


Thanks,

Timothy Chen

Re: Review Request 25403: Override entrypoint when shell enabled in Docker

2014-09-10 Thread Timothy Chen



 On Sept. 9, 2014, 6:50 p.m., Benjamin Hindman wrote:
  src/docker/docker.cpp, line 337
  https://reviews.apache.org/r/25403/diff/1/?file=680701#file680701line337
 
  Why not move this up above as well?

The Docker cli --entrypoint only allows you to put in a single string, but we 
actually need a array of entrypoint entries (which is what docker inspect 
returns).
I tried --entrypoint=/bin/sh -c on the cli and it immediately failed. 
Therefore, I have to run this in the cli: docker run --entrypoint=/bin/sh 
busybox -c ls


- Timothy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25403/#review52767
---


On Sept. 5, 2014, 10:13 p.m., Timothy Chen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25403/
 ---
 
 (Updated Sept. 5, 2014, 10:13 p.m.)
 
 
 Review request for mesos, Benjamin Hindman and Jie Yu.
 
 
 Bugs: MESOS-1770
 https://issues.apache.org/jira/browse/MESOS-1770
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Override entrypoint when shell enabled in Docker
 
 
 Diffs
 -
 
   src/docker/docker.cpp af51ac9058382aede61b09e06e312ad2ce6de03e 
 
 Diff: https://reviews.apache.org/r/25403/diff/
 
 
 Testing
 ---
 
 make check
 
 
 Thanks,
 
 Timothy Chen

Re: Review Request 25403: Override entrypoint when shell enabled in Docker

2014-09-10 Thread Timothy Chen


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25403/
---

(Updated Sept. 11, 2014, 12:40 a.m.)


Review request for mesos, Benjamin Hindman and Jie Yu.


Bugs: MESOS-1770
https://issues.apache.org/jira/browse/MESOS-1770


Repository: mesos-git


Description
---

Review: https://reviews.apache.org/r/25403


Diffs
-

  src/docker/docker.cpp af51ac9058382aede61b09e06e312ad2ce6de03e 

Diff: https://reviews.apache.org/r/25403/diff/


Testing
---

make check


Thanks,

Timothy Chen

Re: Review Request 25403: Override entrypoint when shell enabled in Docker

2014-09-10 Thread Timothy Chen


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25403/
---

(Updated Sept. 11, 2014, 12:40 a.m.)


Review request for drill, Benjamin Hindman and Jie Yu.


Bugs: MESOS-1770
https://issues.apache.org/jira/browse/MESOS-1770


Repository: mesos-git


Description (updated)
---

Review: https://reviews.apache.org/r/25403


Diffs (updated)
-

  src/docker/docker.cpp af51ac9058382aede61b09e06e312ad2ce6de03e 

Diff: https://reviews.apache.org/r/25403/diff/


Testing
---

make check


Thanks,

Timothy Chen

Re: Review Request 24776: Add docker containerizer destroy tests

2014-09-10 Thread Timothy Chen



 On Sept. 9, 2014, 6:15 p.m., Benjamin Hindman wrote:
  Why did you need to mock DockerContainerizerProcess in order to write these 
  tests? Couldn't you have just used the existing MockDockerContainerizer?

I wanted to simulate having destroy called in a pull/fetching state, so I 
thought the only way to do so is to mock the process since the callbacks are on 
DockerContainerizerProcess and not the Containerizer, so the callbacks for 
fetch and pull blocks and I can call destroy in that state and verify it was 
able to destroy.


- Timothy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24776/#review52758
---


On Aug. 16, 2014, 10:23 p.m., Timothy Chen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24776/
 ---
 
 (Updated Aug. 16, 2014, 10:23 p.m.)
 
 
 Review request for mesos, Benjamin Hindman and Jie Yu.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Add docker containerizer destroy tests
 
 
 Diffs
 -
 
   src/slave/containerizer/docker.hpp fbbd45d77e5f2f74ca893552f85eb893b3dd948f 
   src/slave/containerizer/docker.cpp fe5b29167811d4ac2fe29070c70a04f84093a6ff 
   src/tests/docker_containerizer_tests.cpp 
 8654f9c787bd207f6a7b821651e0c083bea9dc8a 
 
 Diff: https://reviews.apache.org/r/24776/diff/
 
 
 Testing
 ---
 
 make check
 
 
 Thanks,
 
 Timothy Chen

Re: Completed tasks remains in TASK_RUNNING when framework is disconnected

2014-09-10 Thread Vinod Kone

The main reason is to keep status update manager simple. Also, it is very
easy to enforce the order of updates to the master/framework in this model.
If we allow multiple updates for a task to be in flight, it's really hard
(impossible?) to ensure that we are not delivering out-of-order updates
even in edge cases (failover, network partitions etc).

On Wed, Sep 10, 2014 at 5:35 PM, Niklas Nielsen nik...@mesosphere.io
wrote:

 Hey Vinod - thanks for chiming in!

 Is there a particular reason for only having one status in flight? Or to
 put it in another way, isn't that too strict behavior taken that the master
 state could present the most recent known state if the status update
 manager tried to send more than the front of the stream?
 Taken very long timeouts, just waiting for those to disappear seems a bit
 tedious and hogs the cluster.

 Niklas

 On 10 September 2014 17:18, Vinod Kone vinodk...@gmail.com wrote:

  What you observed is expected because of the way the slave (specifically,
  the status update manager) operates.
 
  The status update manager only sends the next update for a task if a
  previous update (if it exists) has been acked.
 
  In your case, since TASK_RUNNING was not acked by the framework, master
  doesn't know about the TASK_FINISHED update that is queued up by the
 status
  update manager.
 
  If the framework never comes back, i.e., failover timeout elapses, master
  shuts down the framework, which releases those resources.
 
  On Wed, Sep 10, 2014 at 4:43 PM, Niklas Nielsen nik...@mesosphere.io
  wrote:
 
   Here is the log of a mesos-local instance where I reproduced it:
   https://gist.github.com/nqn/f7ee20601199d70787c0 (Here task 10 to 19
 are
   stuck in running state).
   There is a lot of output, so here is a filtered log for task 10:
   https://gist.github.com/nqn/a53e5ea05c5e41cd5a7d
  
   At first glance, it looks like the task can't be found when trying to
   forward the finish update because the running update never got
  acknowledged
   before the framework disconnected. I may be missing something here.
  
   Niklas
  
  
   On 10 September 2014 16:09, Niklas Nielsen nik...@mesosphere.io
 wrote:
  
Hi guys,
   
We have run into a problem that cause tasks which completes, when a
framework is disconnected and has a fail-over time, to remain in a
   running
state even though the tasks actually finishes.
   
Here is a test framework we have been able to reproduce the issue
 with:
https://gist.github.com/nqn/9b9b1de9123a6e836f54
It launches many short-lived tasks (1 second sleep) and when killing
  the
framework instance, the master reports the tasks as running even
 after
several minutes:
   
  
 
 http://cl.ly/image/2R3719461e0t/Screen%20Shot%202014-09-10%20at%203.19.39%20PM.png
   
When clicking on one of the slaves where, for example, task 49 runs;
  the
slave knows that it completed:
   
  
 
 http://cl.ly/image/2P410L3m1O1N/Screen%20Shot%202014-09-10%20at%203.21.29%20PM.png
   
The tasks only finish when the framework connects again (which it may
never do). This is on Mesos 0.20.0, but also applies to HEAD (as of
   today).
Do you guys have any insights into what may be going on here? Is this
by-design or a bug?
   
Thanks,
Niklas

Re: Completed tasks remains in TASK_RUNNING when framework is disconnected

2014-09-10 Thread Adam Bordelon

I agree with Niklas that if the executor has sent a terminal status update
to the slave, then the task is done and the master should be able to
recover those resources. Only sending the oldest status update to the
master, especially in the case of framework failover, prevents these
resources from being recovered in a timely manner. I see a couple of
options for getting around this, each with their own disadvantages.
1) Send the entire status update stream to the master. Once the master sees
the terminal status update, it will removeTask and recover the resources.
Future resends of the update will be forwarded to the scheduler, but the
master will ignore (with warning and invalid_update++ metrics) the
subsequent updates as far as its own state for the removed task is
concerned. Disadvantage: Potentially sends a lot of status update messages
until the scheduler reregisters and acknowledges the updates.
Disadvantage2: Updates could be sent to the scheduler out of order if some
updates are dropped between the slave and master.
2) Send only the oldest status update to the master, but with an annotation
of the final/terminal state of the task, if any. That way the master can
call removeTask to update its internal state for the task (and update the
UI) and recover the resources for the task. While the scheduler is still
down, the oldest update will continue to be resent and forwarded, but the
master will ignore the update (with a warning as above) as far as its own
internal state is concerned. When the scheduler reregisters, the update
stream will be forwarded and acknowledged one-at-a-time as before,
guaranteeing status update ordering to the scheduler. Disadvantage: Seems a
bit hacky to tack a terminal state onto a running update. Disadvantage2:
State endpoint won't show all the status updates until the entire stream
actually gets forwarded+acknowledged.
Thoughts?


On Wed, Sep 10, 2014 at 5:55 PM, Vinod Kone vinodk...@gmail.com wrote:

 The main reason is to keep status update manager simple. Also, it is very
 easy to enforce the order of updates to the master/framework in this model.
 If we allow multiple updates for a task to be in flight, it's really hard
 (impossible?) to ensure that we are not delivering out-of-order updates
 even in edge cases (failover, network partitions etc).

 On Wed, Sep 10, 2014 at 5:35 PM, Niklas Nielsen nik...@mesosphere.io
 wrote:

  Hey Vinod - thanks for chiming in!
 
  Is there a particular reason for only having one status in flight? Or to
  put it in another way, isn't that too strict behavior taken that the
 master
  state could present the most recent known state if the status update
  manager tried to send more than the front of the stream?
  Taken very long timeouts, just waiting for those to disappear seems a bit
  tedious and hogs the cluster.
 
  Niklas
 
  On 10 September 2014 17:18, Vinod Kone vinodk...@gmail.com wrote:
 
   What you observed is expected because of the way the slave
 (specifically,
   the status update manager) operates.
  
   The status update manager only sends the next update for a task if a
   previous update (if it exists) has been acked.
  
   In your case, since TASK_RUNNING was not acked by the framework, master
   doesn't know about the TASK_FINISHED update that is queued up by the
  status
   update manager.
  
   If the framework never comes back, i.e., failover timeout elapses,
 master
   shuts down the framework, which releases those resources.
  
   On Wed, Sep 10, 2014 at 4:43 PM, Niklas Nielsen nik...@mesosphere.io
   wrote:
  
Here is the log of a mesos-local instance where I reproduced it:
https://gist.github.com/nqn/f7ee20601199d70787c0 (Here task 10 to 19
  are
stuck in running state).
There is a lot of output, so here is a filtered log for task 10:
https://gist.github.com/nqn/a53e5ea05c5e41cd5a7d
   
At first glance, it looks like the task can't be found when trying to
forward the finish update because the running update never got
   acknowledged
before the framework disconnected. I may be missing something here.
   
Niklas
   
   
On 10 September 2014 16:09, Niklas Nielsen nik...@mesosphere.io
  wrote:
   
 Hi guys,

 We have run into a problem that cause tasks which completes, when a
 framework is disconnected and has a fail-over time, to remain in a
running
 state even though the tasks actually finishes.

 Here is a test framework we have been able to reproduce the issue
  with:
 https://gist.github.com/nqn/9b9b1de9123a6e836f54
 It launches many short-lived tasks (1 second sleep) and when
 killing
   the
 framework instance, the master reports the tasks as running even
  after
 several minutes:

   
  
 
 http://cl.ly/image/2R3719461e0t/Screen%20Shot%202014-09-10%20at%203.19.39%20PM.png

 When clicking on one of the slaves where, for example, task 49
 runs;
   the
 slave knows that it completed:

Re: Review Request 25111: Added the concept of dynamically configurable slave attributes

2014-09-10 Thread Patrick Reilly


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25111/
---

(Updated Sept. 11, 2014, 1:24 a.m.)


Review request for mesos, Adam B and Benjamin Hindman.


Changes
---

Get the test closer to passing


Bugs: MESOS-1739
https://issues.apache.org/jira/browse/MESOS-1739


Repository: mesos-git


Description
---

Add basic stub for dynamic slave attributes


Diffs (updated)
-

  src/Makefile.am 9b973e5 
  src/common/attributes.hpp 0a043d5 
  src/common/attributes.cpp aab114e 
  src/common/slaveinfo_utils.hpp PRE-CREATION 
  src/common/slaveinfo_utils.cpp PRE-CREATION 
  src/master/master.hpp b492600 
  src/master/master.cpp d5db24e 
  src/slave/slave.cpp 1b3dc73 
  src/tests/slave_tests.cpp 69be28f 

Diff: https://reviews.apache.org/r/25111/diff/


Testing
---

This is currently a work in progress, (WIP)


Thanks,

Patrick Reilly

Review Request 25525: MESOS-1739: Allow slave reconfiguration on restart

2014-09-10 Thread Cody Maloney


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25525/
---

Review request for mesos, Adam B, Benjamin Hindman, Patrick Reilly, and Vinod 
Kone.


Bugs: MESOS-1739
https://issues.apache.org/jira/browse/MESOS-1739


Repository: mesos-git


Description
---

Allows attributes and resources to be set to a superset of what they were 
previously on a slave restart.

Incorporates all comments from: 
https://issues.apache.org/jira/browse/MESOS-1739
and the former review request:
https://reviews.apache.org/r/25111/


Diffs
-

  src/Makefile.am 9b973e5 
  src/common/attributes.hpp 0a043d5 
  src/common/attributes.cpp aab114e 
  src/common/slaveinfo_utils.hpp PRE-CREATION 
  src/common/slaveinfo_utils.cpp PRE-CREATION 
  src/master/master.hpp b492600 
  src/master/master.cpp d5db24e 
  src/slave/slave.cpp 1b3dc73 
  src/tests/slave_tests.cpp 69be28f 

Diff: https://reviews.apache.org/r/25525/diff/


Testing
---

make check on localhost


Thanks,

Cody Maloney

Build failed in Jenkins: mesos-reviewbot #1513

2014-09-10 Thread Apache Jenkins Server

See https://builds.apache.org/job/mesos-reviewbot/1513/changes

Changes:

[adam] Fixed command executor path check

[yujie.jay] Made sure IPv6 is disabled for port mapping network isolator.

--
[...truncated 5561 lines...]
rm -f slave/containerizer/.dirstamp
rm -f slave/*.o
rm -f slave/containerizer/isolators/cgroups/.deps/.dirstamp
rm -f slave/containerizer/isolators/cgroups/.dirstamp
rm -f slave/containerizer/isolators/network/.deps/.dirstamp
rm -f slave/containerizer/isolators/network/.dirstamp
rm -f slave/containerizer/mesos/.deps/.dirstamp
rm -f slave/*.lo
rm -f slave/containerizer/mesos/.dirstamp
rm -f slave/containerizer/*.o
rm -f state/.deps/.dirstamp
rm -f state/.dirstamp
rm -f tests/.deps/.dirstamp
rm -f tests/.dirstamp
rm -f tests/common/.deps/.dirstamp
rm -f tests/common/.dirstamp
rm -f slave/containerizer/*.lo
rm -f usage/.deps/.dirstamp
rm -f slave/containerizer/isolators/cgroups/*.o
rm -f usage/.dirstamp
rm -f zookeeper/.deps/.dirstamp
rm -f zookeeper/.dirstamp
rm -f slave/containerizer/isolators/cgroups/*.lo
rm -f slave/containerizer/isolators/network/*.o
rm -f slave/containerizer/isolators/network/*.lo
rm -f slave/containerizer/mesos/*.o
rm -f slave/containerizer/mesos/*.lo
rm -f state/*.o
rm -f state/*.lo
rm -f tests/*.o
rm -rf authorizer/.libs authorizer/_libs
rm -rf common/.libs common/_libs
rm -rf containerizer/.libs containerizer/_libs
rm -rf docker/.libs docker/_libs
rm -rf exec/.libs exec/_libs
rm -rf files/.libs files/_libs
rm -rf java/jni/.libs java/jni/_libs
rm -rf jvm/.libs jvm/_libs
rm -rf jvm/org/apache/.libs jvm/org/apache/_libs
rm -rf linux/.libs linux/_libs
rm -rf linux/routing/.libs linux/routing/_libs
rm -rf linux/routing/filter/.libs linux/routing/filter/_libs
rm -rf linux/routing/link/.libs linux/routing/link/_libs
rm -rf linux/routing/queueing/.libs linux/routing/queueing/_libs
rm -rf local/.libs local/_libs
rm -rf log/.libs log/_libs
rm -rf log/tool/.libs log/tool/_libs
rm -rf logging/.libs logging/_libs
rm -rf master/.libs master/_libs
rm -rf messages/.libs messages/_libs
rm -rf sasl/.libs sasl/_libs
rm -rf sched/.libs sched/_libs
rm -rf scheduler/.libs scheduler/_libs
rm -rf slave/.libs slave/_libs
rm -rf slave/containerizer/.libs slave/containerizer/_libs
rm -rf slave/containerizer/isolators/cgroups/.libs 
slave/containerizer/isolators/cgroups/_libs
rm -rf slave/containerizer/isolators/network/.libs 
slave/containerizer/isolators/network/_libs
rm -rf slave/containerizer/mesos/.libs slave/containerizer/mesos/_libs
rm -rf state/.libs state/_libs
rm -rf usage/.libs usage/_libs
rm -rf zookeeper/.libs zookeeper/_libs
rm -f tests/common/*.o
rm -f usage/*.o
rm -f usage/*.lo
rm -f zookeeper/*.o
rm -f zookeeper/*.lo
rm -rf ./.deps authorizer/.deps cli/.deps common/.deps containerizer/.deps 
docker/.deps examples/.deps exec/.deps files/.deps health-check/.deps 
java/jni/.deps jvm/.deps jvm/org/apache/.deps launcher/.deps linux/.deps 
linux/routing/.deps linux/routing/filter/.deps linux/routing/link/.deps 
linux/routing/queueing/.deps local/.deps log/.deps log/tool/.deps logging/.deps 
master/.deps messages/.deps sasl/.deps sched/.deps scheduler/.deps slave/.deps 
slave/containerizer/.deps slave/containerizer/isolators/cgroups/.deps 
slave/containerizer/isolators/network/.deps slave/containerizer/mesos/.deps 
state/.deps tests/.deps tests/common/.deps usage/.deps zookeeper/.deps
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/src'
Making distclean in ec2
make[2]: Entering directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -rf .libs _libs
rm -f *.lo
test -z  || rm -f 
test . = ../../ec2 || test -z  || rm -f 
rm -f Makefile
make[2]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build/ec2'
rm -f config.status config.cache config.log configure.lineno 
config.status.lineno
rm -f Makefile
make[1]: Leaving directory 
`https://builds.apache.org/job/mesos-reviewbot/ws/mesos-0.21.0/_build'
if test -d mesos-0.21.0; then find mesos-0.21.0 -type d ! -perm -200 -exec 
chmod u+w {} ';'  rm -rf mesos-0.21.0 || { sleep 5  rm -rf 
mesos-0.21.0; }; else :; fi
==
mesos-0.21.0 archives ready for distribution: 
mesos-0.21.0.tar.gz
==

real71m40.401s
user143m18.903s
sys 7m50.486s
+ chmod -R +w 3rdparty CHANGELOG Doxyfile LICENSE Makefile Makefile.am 
Makefile.in NOTICE README.md aclocal.m4 ar-lib autom4te.cache bin bootstrap 
compile config.guess config.log config.lt config.status config.sub configure 
configure.ac depcomp docs ec2 frameworks include install-sh libtool ltmain.sh 
m4 mesos-0.21.0.tar.gz mesos.pc mesos.pc.in missing mpi src support
+ git clean -fdx
Removing .libs/
Removing 3rdparty/Makefile
Removing 3rdparty/Makefile.in
Removing 3rdparty/libprocess/.deps/
Removing 3rdparty/libprocess/3rdparty/.deps/
Removing

Review Request 25526: catch traling spaces in style checker

2014-09-10 Thread Kamil Domanski


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25526/
---

Review request for mesos, Benjamin Hindman and Vinod Kone.


Bugs: MESOS-1779
https://issues.apache.org/jira/browse/MESOS-1779


Repository: mesos-git


Description
---

fixes MESOS-1779


Diffs
-

  support/mesos-style.py d24cb11adc06bc0ebaaa206301616c8b597f09e8 

Diff: https://reviews.apache.org/r/25526/diff/


Testing
---


Thanks,

Kamil Domanski

Jenkins build is back to normal : mesos-reviewbot #1514

2014-09-10 Thread Apache Jenkins Server

See https://builds.apache.org/job/mesos-reviewbot/1514/changes

Re: Review Request 25487: Increased session timeouts for ZooKeeper related tests.

2014-09-10 Thread Mesos ReviewBot


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25487/#review52993
---


Patch looks great!

Reviews applied: [25487]

All tests passed.

- Mesos ReviewBot


On Sept. 10, 2014, 6 p.m., Jiang Yan Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25487/
 ---
 
 (Updated Sept. 10, 2014, 6 p.m.)
 
 
 Review request for mesos and Ben Mahler.
 
 
 Bugs: MESOS-1676
 https://issues.apache.org/jira/browse/MESOS-1676
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 - On slower machines sometimes the zookeeper c client times out where we 
 aren't expecting because either the test server or the client is too slow to 
 respond. Increasing this value helps mitigate the problem.
 - The effect of server-shutdownNetwork() is immediate so this won't 
 prolong the tests so long as they don't wait for session expiration without 
 clock advances, which I have checked and there is none.
 
 
 Diffs
 -
 
   src/tests/master_contender_detector_tests.cpp 
 9ac59aa446a132e734238e0e55801117c4ef31b4 
   src/tests/zookeeper.cpp e45f956e1486e952a4efeb123e15568518fb53fe 
 
 Diff: https://reviews.apache.org/r/25487/diff/
 
 
 Testing
 ---
 
 make check.
 
 
 Thanks,
 
 Jiang Yan Xu

Re: Review Request 25526: catch traling spaces in style checker

2014-09-10 Thread Vinod Kone


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25526/#review52994
---

Ship it!


Thank you sir! Didn't realize this rule already existed in cpplint.

- Vinod Kone


On Sept. 11, 2014, 3:36 a.m., Kamil Domanski wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25526/
 ---
 
 (Updated Sept. 11, 2014, 3:36 a.m.)
 
 
 Review request for mesos, Benjamin Hindman and Vinod Kone.
 
 
 Bugs: MESOS-1779
 https://issues.apache.org/jira/browse/MESOS-1779
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 fixes MESOS-1779
 
 
 Diffs
 -
 
   support/mesos-style.py d24cb11adc06bc0ebaaa206301616c8b597f09e8 
 
 Diff: https://reviews.apache.org/r/25526/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Kamil Domanski

Re: Review Request 25487: Increased session timeouts for ZooKeeper related tests.

2014-09-10 Thread Ben Mahler


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25487/#review52998
---

Ship it!


Ship It!

- Ben Mahler


On Sept. 10, 2014, 6 p.m., Jiang Yan Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25487/
 ---
 
 (Updated Sept. 10, 2014, 6 p.m.)
 
 
 Review request for mesos and Ben Mahler.
 
 
 Bugs: MESOS-1676
 https://issues.apache.org/jira/browse/MESOS-1676
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 - On slower machines sometimes the zookeeper c client times out where we 
 aren't expecting because either the test server or the client is too slow to 
 respond. Increasing this value helps mitigate the problem.
 - The effect of server-shutdownNetwork() is immediate so this won't 
 prolong the tests so long as they don't wait for session expiration without 
 clock advances, which I have checked and there is none.
 
 
 Diffs
 -
 
   src/tests/master_contender_detector_tests.cpp 
 9ac59aa446a132e734238e0e55801117c4ef31b4 
   src/tests/zookeeper.cpp e45f956e1486e952a4efeb123e15568518fb53fe 
 
 Diff: https://reviews.apache.org/r/25487/diff/
 
 
 Testing
 ---
 
 make check.
 
 
 Thanks,
 
 Jiang Yan Xu

Re: Review Request 25511: Pulled the log line in ZooKeeperTestServer::shutdownNetwork() to above the shutdown call.

2014-09-10 Thread Ben Mahler


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25511/#review52999
---

Ship it!


Ship It!

- Ben Mahler


On Sept. 10, 2014, 6:02 p.m., Jiang Yan Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25511/
 ---
 
 (Updated Sept. 10, 2014, 6:02 p.m.)
 
 
 Review request for mesos and Ben Mahler.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 - When debugging zookeeper related tests it's often more useful to know when 
 the tests is about to shut down the ZK server to reason about the order of 
 events. Otherwise client disconnections are often logged before this 
 shutdown line and can be confusing.
 
 
 Diffs
 -
 
   src/tests/zookeeper_test_server.cpp 
 a8c9b1cd8a546abdeb4d89a8fe9ebc3b3d577665 
 
 Diff: https://reviews.apache.org/r/25511/diff/
 
 
 Testing
 ---
 
 make check.
 
 
 Thanks,
 
 Jiang Yan Xu

Re: Review Request 25035: Fix for MESOS-1688

2014-09-10 Thread Vinod Kone



 On Sept. 9, 2014, 7:10 p.m., Vinod Kone wrote:
  src/master/master.cpp, line 1901
  https://reviews.apache.org/r/25035/diff/4/?file=682182#file682182line1901
 
  I like these warnings.
  
  Are you planning to get this in to 0.20.1 or 0.21.0 ? If the former, 
  can you add this to the list of deprecations in CHANGELOG.
 
 Martin Weindel wrote:
 Would be nice to see this in 0.20.1.
 But it is not clear to me, how to update the CHANGELOG. There is no 
 section for upcoming releases.

Just start one for 0.20.1 and just add the deprecation. See how we did it for 
0.20.0 and 0.19.1 for inspiration. As we get close to releasing 0.20.1, the 
release manager will make sure to update the CHANGELOG with the tickets and 
other info.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/#review52763
---


On Sept. 10, 2014, 10 p.m., Martin Weindel wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25035/
 ---
 
 (Updated Sept. 10, 2014, 10 p.m.)
 
 
 Review request for mesos and Vinod Kone.
 
 
 Bugs: MESOS-1688
 https://issues.apache.org/jira/browse/MESOS-1688
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 As already explained in JIRA MESOS-1688, there are schedulers allocating 
 memory only for the executor and not for tasks. For tasks only CPU resources 
 are allocated in this case.
 Such a scheduler does not get offered any idle CPUs if the slave has nearly 
 used up all memory.
 This can easily lead to a dead lock (in the application, not in Mesos).
 
 Simple example:
 1. Scheduler allocates all memory of a slave for an executor
 2. Scheduler launches a task for this executor (allocating 1 CPU)
 3. Task finishes: 1 CPU , 0 MB memory allocatable.
 4. No offers are made, as no memory is left. Scheduler will wait for offers 
 forever. Dead lock in the application.
 
 To fix this problem, offers must be made if CPU resources are allocatable 
 without considering allocatable memory
 
 
 Diffs
 -
 
   src/common/resources.cpp edf36b1 
   src/master/constants.cpp faa1503 
   src/master/hierarchical_allocator_process.hpp 34f8cd6 
   src/master/master.cpp 18464ba 
   src/tests/allocator_tests.cpp 774528a 
 
 Diff: https://reviews.apache.org/r/25035/diff/
 
 
 Testing
 ---
 
 Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
 running multiple parallel Spark jobs in fine-grained mode to saturate 
 allocatable memory. The jobs run fine now. This load always caused a dead 
 lock in all Spark jobs within one minute with the unpatched Mesos.
 
 
 Thanks,
 
 Martin Weindel

Re: Review Request 25511: Pulled the log line in ZooKeeperTestServer::shutdownNetwork() to above the shutdown call.

2014-09-10 Thread Mesos ReviewBot


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25511/#review53003
---


Patch looks great!

Reviews applied: [25511]

All tests passed.

- Mesos ReviewBot


On Sept. 10, 2014, 6:02 p.m., Jiang Yan Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25511/
 ---
 
 (Updated Sept. 10, 2014, 6:02 p.m.)
 
 
 Review request for mesos and Ben Mahler.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 - When debugging zookeeper related tests it's often more useful to know when 
 the tests is about to shut down the ZK server to reason about the order of 
 events. Otherwise client disconnections are often logged before this 
 shutdown line and can be confusing.
 
 
 Diffs
 -
 
   src/tests/zookeeper_test_server.cpp 
 a8c9b1cd8a546abdeb4d89a8fe9ebc3b3d577665 
 
 Diff: https://reviews.apache.org/r/25511/diff/
 
 
 Testing
 ---
 
 make check.
 
 
 Thanks,
 
 Jiang Yan Xu

Re: Review Request 25035: Fix for MESOS-1688

2014-09-10 Thread Vinod Kone


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/#review53002
---



src/common/resources.cpp
https://reviews.apache.org/r/25035/#comment92333

I'm not sure what's happening here. Can you add a comment?



src/master/master.cpp
https://reviews.apache.org/r/25035/#comment92334

Add a TODO:

TODO(martin): Return Error instead of logging a warning in 0.21.0.



src/tests/allocator_tests.cpp
https://reviews.apache.org/r/25035/#comment92336

s/with cpus only/using only cpus/



src/tests/allocator_tests.cpp
https://reviews.apache.org/r/25035/#comment92335

s/tasks/task/



src/tests/allocator_tests.cpp
https://reviews.apache.org/r/25035/#comment92337

s/with memory only/using only memory/



src/tests/allocator_tests.cpp
https://reviews.apache.org/r/25035/#comment92338

s/mem/memory/



src/tests/allocator_tests.cpp
https://reviews.apache.org/r/25035/#comment92339

s/tasks/task/


- Vinod Kone


On Sept. 10, 2014, 10 p.m., Martin Weindel wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25035/
 ---
 
 (Updated Sept. 10, 2014, 10 p.m.)
 
 
 Review request for mesos and Vinod Kone.
 
 
 Bugs: MESOS-1688
 https://issues.apache.org/jira/browse/MESOS-1688
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 As already explained in JIRA MESOS-1688, there are schedulers allocating 
 memory only for the executor and not for tasks. For tasks only CPU resources 
 are allocated in this case.
 Such a scheduler does not get offered any idle CPUs if the slave has nearly 
 used up all memory.
 This can easily lead to a dead lock (in the application, not in Mesos).
 
 Simple example:
 1. Scheduler allocates all memory of a slave for an executor
 2. Scheduler launches a task for this executor (allocating 1 CPU)
 3. Task finishes: 1 CPU , 0 MB memory allocatable.
 4. No offers are made, as no memory is left. Scheduler will wait for offers 
 forever. Dead lock in the application.
 
 To fix this problem, offers must be made if CPU resources are allocatable 
 without considering allocatable memory
 
 
 Diffs
 -
 
   src/common/resources.cpp edf36b1 
   src/master/constants.cpp faa1503 
   src/master/hierarchical_allocator_process.hpp 34f8cd6 
   src/master/master.cpp 18464ba 
   src/tests/allocator_tests.cpp 774528a 
 
 Diff: https://reviews.apache.org/r/25035/diff/
 
 
 Testing
 ---
 
 Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
 running multiple parallel Spark jobs in fine-grained mode to saturate 
 allocatable memory. The jobs run fine now. This load always caused a dead 
 lock in all Spark jobs within one minute with the unpatched Mesos.
 
 
 Thanks,
 
 Martin Weindel

Re: Review Request 25035: Fix for MESOS-1688

2014-09-10 Thread Vinod Kone



 On Sept. 11, 2014, 5:35 a.m., Vinod Kone wrote:
 

Can you also update the summary of the review to something more meaningful? We 
typically use the summary to generate the commit message.


- Vinod


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/#review53002
---


On Sept. 10, 2014, 10 p.m., Martin Weindel wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25035/
 ---
 
 (Updated Sept. 10, 2014, 10 p.m.)
 
 
 Review request for mesos and Vinod Kone.
 
 
 Bugs: MESOS-1688
 https://issues.apache.org/jira/browse/MESOS-1688
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 As already explained in JIRA MESOS-1688, there are schedulers allocating 
 memory only for the executor and not for tasks. For tasks only CPU resources 
 are allocated in this case.
 Such a scheduler does not get offered any idle CPUs if the slave has nearly 
 used up all memory.
 This can easily lead to a dead lock (in the application, not in Mesos).
 
 Simple example:
 1. Scheduler allocates all memory of a slave for an executor
 2. Scheduler launches a task for this executor (allocating 1 CPU)
 3. Task finishes: 1 CPU , 0 MB memory allocatable.
 4. No offers are made, as no memory is left. Scheduler will wait for offers 
 forever. Dead lock in the application.
 
 To fix this problem, offers must be made if CPU resources are allocatable 
 without considering allocatable memory
 
 
 Diffs
 -
 
   src/common/resources.cpp edf36b1 
   src/master/constants.cpp faa1503 
   src/master/hierarchical_allocator_process.hpp 34f8cd6 
   src/master/master.cpp 18464ba 
   src/tests/allocator_tests.cpp 774528a 
 
 Diff: https://reviews.apache.org/r/25035/diff/
 
 
 Testing
 ---
 
 Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
 running multiple parallel Spark jobs in fine-grained mode to saturate 
 allocatable memory. The jobs run fine now. This load always caused a dead 
 lock in all Spark jobs within one minute with the unpatched Mesos.
 
 
 Thanks,
 
 Martin Weindel

64 matches

Mail list logo