[jira] [Created] (MESOS-1633) Create a static mesos library

2014-07-23 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-1633:
-

 Summary: Create a static mesos library
 Key: MESOS-1633
 URL: https://issues.apache.org/jira/browse/MESOS-1633
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone
Assignee: Vinod Kone


Sometimes, framework writers (e.g., C++) would like to statically link libmesos 
into their scheduler/executor to tightly control the version of libmesos that 
they depend on. While, they can bundle the libmesos.so with scheduler/executor 
it is convenient to give them access to a static library (e.g.: they want to 
create a static scheduler/executor libs).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1633) Create a static mesos library

2014-07-23 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072131#comment-14072131
 ] 

Timothy St. Clair commented on MESOS-1633:
--

could be a config option, but not the default imho:  '--enable-shared=no'

 Create a static mesos library
 -

 Key: MESOS-1633
 URL: https://issues.apache.org/jira/browse/MESOS-1633
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone
Assignee: Vinod Kone

 Sometimes, framework writers (e.g., C++) would like to statically link 
 libmesos into their scheduler/executor to tightly control the version of 
 libmesos that they depend on. While, they can bundle the libmesos.so with 
 scheduler/executor it is convenient to give them access to a static library 
 (e.g.: they want to create a static scheduler/executor libs).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (MESOS-186) Resource offers should be rescinded after some configurable timeout

2014-07-23 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen resolved MESOS-186.


Resolution: Fixed

 Resource offers should be rescinded after some configurable timeout
 ---

 Key: MESOS-186
 URL: https://issues.apache.org/jira/browse/MESOS-186
 Project: Mesos
  Issue Type: Improvement
  Components: framework
Reporter: Benjamin Hindman
Assignee: Timothy Chen

 Problem: a framework has a bug and holds on to resource offers by accident 
 for 24 hours/
 One suggestion: resource offers should be rescinded after some configurable 
 timeout.
 Possible issue: this might interfere with frameworks that are hoarding. But 
 one possible solution here is to add another API call which checks the status 
 of resource offers (i.e., remindAboutOffer).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1613) HealthCheckTest.ConsecutiveFailures is flaky

2014-07-23 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072336#comment-14072336
 ] 

Timothy Chen commented on MESOS-1613:
-

[~vi...@twitter.com] Vinod I have a reviewboard out already, I saw you 
commented earlier but can you try it or if it looks good commit it?

Thanks!

 HealthCheckTest.ConsecutiveFailures is flaky
 

 Key: MESOS-1613
 URL: https://issues.apache.org/jira/browse/MESOS-1613
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.20.0
 Environment: Ubuntu 10.04 GCC
Reporter: Vinod Kone
Assignee: Timothy Chen

 {code}
 [ RUN  ] HealthCheckTest.ConsecutiveFailures
 Using temporary directory '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV'
 I0717 04:39:59.288471  5009 leveldb.cpp:176] Opened db in 21.575631ms
 I0717 04:39:59.295274  5009 leveldb.cpp:183] Compacted db in 6.471982ms
 I0717 04:39:59.295552  5009 leveldb.cpp:198] Created db iterator in 16783ns
 I0717 04:39:59.296026  5009 leveldb.cpp:204] Seeked to beginning of db in 
 2125ns
 I0717 04:39:59.296257  5009 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 10747ns
 I0717 04:39:59.296584  5009 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0717 04:39:59.297322  5033 recover.cpp:425] Starting replica recovery
 I0717 04:39:59.297413  5033 recover.cpp:451] Replica is in EMPTY status
 I0717 04:39:59.297824  5033 replica.cpp:638] Replica in EMPTY status received 
 a broadcasted recover request
 I0717 04:39:59.297899  5033 recover.cpp:188] Received a recover response from 
 a replica in EMPTY status
 I0717 04:39:59.297997  5033 recover.cpp:542] Updating replica status to 
 STARTING
 I0717 04:39:59.301985  5031 master.cpp:288] Master 
 20140717-043959-16842879-40280-5009 (lucid) started on 127.0.1.1:40280
 I0717 04:39:59.302026  5031 master.cpp:325] Master only allowing 
 authenticated frameworks to register
 I0717 04:39:59.302032  5031 master.cpp:330] Master only allowing 
 authenticated slaves to register
 I0717 04:39:59.302039  5031 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV/credentials'
 I0717 04:39:59.302283  5031 master.cpp:359] Authorization enabled
 I0717 04:39:59.302971  5031 hierarchical_allocator_process.hpp:301] 
 Initializing hierarchical allocator process with master : 
 master@127.0.1.1:40280
 I0717 04:39:59.303022  5031 master.cpp:122] No whitelist given. Advertising 
 offers for all slaves
 I0717 04:39:59.303390  5033 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 5.325097ms
 I0717 04:39:59.303419  5033 replica.cpp:320] Persisted replica status to 
 STARTING
 I0717 04:39:59.304076  5030 master.cpp:1128] The newly elected leader is 
 master@127.0.1.1:40280 with id 20140717-043959-16842879-40280-5009
 I0717 04:39:59.304095  5030 master.cpp:1141] Elected as the leading master!
 I0717 04:39:59.304102  5030 master.cpp:959] Recovering from registrar
 I0717 04:39:59.304182  5030 registrar.cpp:313] Recovering registrar
 I0717 04:39:59.304635  5033 recover.cpp:451] Replica is in STARTING status
 I0717 04:39:59.304962  5033 replica.cpp:638] Replica in STARTING status 
 received a broadcasted recover request
 I0717 04:39:59.305026  5033 recover.cpp:188] Received a recover response from 
 a replica in STARTING status
 I0717 04:39:59.305130  5033 recover.cpp:542] Updating replica status to VOTING
 I0717 04:39:59.310416  5033 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 5.204157ms
 I0717 04:39:59.310459  5033 replica.cpp:320] Persisted replica status to 
 VOTING
 I0717 04:39:59.310534  5033 recover.cpp:556] Successfully joined the Paxos 
 group
 I0717 04:39:59.310607  5033 recover.cpp:440] Recover process terminated
 I0717 04:39:59.310773  5033 log.cpp:656] Attempting to start the writer
 I0717 04:39:59.311157  5033 replica.cpp:474] Replica received implicit 
 promise request with proposal 1
 I0717 04:39:59.313451  5033 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 2.271822ms
 I0717 04:39:59.313627  5033 replica.cpp:342] Persisted promised to 1
 I0717 04:39:59.318038  5031 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0717 04:39:59.318430  5031 replica.cpp:375] Replica received explicit 
 promise request for position 0 with proposal 2
 I0717 04:39:59.323459  5031 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 5.004323ms
 I0717 04:39:59.323493  5031 replica.cpp:676] Persisted action at 0
 I0717 04:39:59.323799  5031 replica.cpp:508] Replica received write request 
 for position 0
 I0717 04:39:59.323837  5031 leveldb.cpp:438] Reading position from leveldb 
 took 21901ns
 I0717 04:39:59.329038  5031 leveldb.cpp:343] Persisting action 

[jira] [Commented] (MESOS-1632) Seg fault due to infinite recursion RepeatedPtrFieldResource

2014-07-23 Thread Dominic Hamon (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072547#comment-14072547
 ] 

Dominic Hamon commented on MESOS-1632:
--

I think this is due to an implicit conversion from 
{{RepeatedPtrFieldResource}} to {{Resources}}. We should try to make that 
conversion explicit. The stream operators are already explicitly converting 
anyway.

 Seg fault due to infinite recursion  RepeatedPtrFieldResource
 ---

 Key: MESOS-1632
 URL: https://issues.apache.org/jira/browse/MESOS-1632
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.20.0
Reporter: Yan Xu
Assignee: Isabel Jimenez

 {noformat:title=error}
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x7fffe7daa940 (LWP 40525)]
 0x74fb3b78 in _int_malloc () from /lib64/libc.so.6
 (gdb) bt
 #0  0x74fb3b78 in _int_malloc () from /lib64/libc.so.6
 #1  0x74fb609e in malloc () from /lib64/libc.so.6
 #2  0x755db25d in operator new(unsigned long) () from 
 /usr/lib64/libstdc++.so.6
 #3  0x755db379 in operator new[](unsigned long) () from 
 /usr/lib64/libstdc++.so.6
 #4  0x76f83c97 in 
 google::protobuf::internal::RepeatedPtrFieldBase::Reserve (this=0xd401b10, 
 new_size=optimized out) at google/protobuf/repeated_field.cc:51
 #5  0x76d9d526 in 
 MergeFromgoogle::protobuf::RepeatedPtrFieldmesos::Value_Range::TypeHandler
  (other=..., this=0xd401b10) at 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:864
 #6  MergeFrom (other=..., this=0xd401b10) at 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:1091
 #7  mesos::Value_Ranges::MergeFrom (this=0xd401b00, from=...) at 
 mesos.pb.cc:7440
 #8  0x76da1e3d in mesos::Resource::MergeFrom 
 (this=this@entry=0xd401a90, from=...) at mesos.pb.cc:9196
 #9  0x004c5acc in Merge (to=optimized out, from=...) at 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:339
 #10 
 google::protobuf::internal::RepeatedPtrFieldBase::MergeFromgoogle::protobuf::RepeatedPtrFieldmesos::Resource::TypeHandler
  (this=this@entry=0x7fffe75ab240, other=...)
 at 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:866
 #11 0x004c5d56 in MergeFrom (other=..., this=0x7fffe75ab240) at 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:1091
 #12 Resources (_resources=..., this=0x7fffe75ab240) at 
 ../../include/mesos/resources.hpp:78
 #13 mesos::operator (stream=..., resources=...) at 
 ../../include/mesos/resources.hpp:251
 #14 0x004c5e7a in operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251 
 #15 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #16 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #17 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #18 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #19 mesos::operator (stream=..., resources=...) at 
 ../../include/mesos/resources.hpp:251
 #20 0x004c5e7a in operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #21 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #22 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #23 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #24 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #25 mesos::operator (stream=..., resources=...) at 
 ../../include/mesos/resources.hpp:251
 #26 0x004c5e7a in operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #27 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #28 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #29 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #30 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #31 mesos::operator (stream=..., resources=...) at 
 ../../include/mesos/resources.hpp:251
 #32 0x004c5e7a in operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #33 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #34 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #35 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 ...
 {noformat}
 {code:title=relevant code}
   /*implicit*/
   Resources(const google::protobuf::RepeatedPtrFieldResource _resources)
   {
 

[jira] [Commented] (MESOS-1632) Seg fault due to infinite recursion RepeatedPtrFieldResource

2014-07-23 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072607#comment-14072607
 ] 

Yan Xu commented on MESOS-1632:
---

Before this change, there is 

{noformat:title=}
std::ostream operator  (
std::ostream stream,
const Resources resources)
{noformat}

preceding 

{noformat:title=}
inline std::ostream operator  (
std::ostream stream,
const google::protobuf::RepeatedPtrFieldResource resources)
{noformat}

and now it's moved to cpp.

I guess if you put the declaration of the former in the hpp it should be fine?

 Seg fault due to infinite recursion  RepeatedPtrFieldResource
 ---

 Key: MESOS-1632
 URL: https://issues.apache.org/jira/browse/MESOS-1632
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.20.0
Reporter: Yan Xu
Assignee: Isabel Jimenez

 {noformat:title=error}
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x7fffe7daa940 (LWP 40525)]
 0x74fb3b78 in _int_malloc () from /lib64/libc.so.6
 (gdb) bt
 #0  0x74fb3b78 in _int_malloc () from /lib64/libc.so.6
 #1  0x74fb609e in malloc () from /lib64/libc.so.6
 #2  0x755db25d in operator new(unsigned long) () from 
 /usr/lib64/libstdc++.so.6
 #3  0x755db379 in operator new[](unsigned long) () from 
 /usr/lib64/libstdc++.so.6
 #4  0x76f83c97 in 
 google::protobuf::internal::RepeatedPtrFieldBase::Reserve (this=0xd401b10, 
 new_size=optimized out) at google/protobuf/repeated_field.cc:51
 #5  0x76d9d526 in 
 MergeFromgoogle::protobuf::RepeatedPtrFieldmesos::Value_Range::TypeHandler
  (other=..., this=0xd401b10) at 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:864
 #6  MergeFrom (other=..., this=0xd401b10) at 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:1091
 #7  mesos::Value_Ranges::MergeFrom (this=0xd401b00, from=...) at 
 mesos.pb.cc:7440
 #8  0x76da1e3d in mesos::Resource::MergeFrom 
 (this=this@entry=0xd401a90, from=...) at mesos.pb.cc:9196
 #9  0x004c5acc in Merge (to=optimized out, from=...) at 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:339
 #10 
 google::protobuf::internal::RepeatedPtrFieldBase::MergeFromgoogle::protobuf::RepeatedPtrFieldmesos::Resource::TypeHandler
  (this=this@entry=0x7fffe75ab240, other=...)
 at 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:866
 #11 0x004c5d56 in MergeFrom (other=..., this=0x7fffe75ab240) at 
 ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:1091
 #12 Resources (_resources=..., this=0x7fffe75ab240) at 
 ../../include/mesos/resources.hpp:78
 #13 mesos::operator (stream=..., resources=...) at 
 ../../include/mesos/resources.hpp:251
 #14 0x004c5e7a in operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251 
 #15 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #16 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #17 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #18 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #19 mesos::operator (stream=..., resources=...) at 
 ../../include/mesos/resources.hpp:251
 #20 0x004c5e7a in operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #21 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #22 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #23 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #24 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #25 mesos::operator (stream=..., resources=...) at 
 ../../include/mesos/resources.hpp:251
 #26 0x004c5e7a in operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #27 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #28 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #29 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #30 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #31 mesos::operator (stream=..., resources=...) at 
 ../../include/mesos/resources.hpp:251
 #32 0x004c5e7a in operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #33 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #34 operator (resources=..., stream=...) at 
 ../../include/mesos/resources.hpp:251
 #35 operator (resources=..., stream=...) at 
 

[jira] [Comment Edited] (MESOS-1626) Add support for C++11 (+Boost) atomic to stout

2014-07-23 Thread Craig Hansen-Sturm (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072676#comment-14072676
 ] 

Craig Hansen-Sturm edited comment on MESOS-1626 at 7/24/14 2:23 AM:


Patches our subsetted version of boost-1.53.0 to add back support for 
boost::atomic 


was (Author: craig-mesos):
Patches boost-1.53.0 to add boost::atomic support

 Add support for C++11 (+Boost) atomic to stout
 

 Key: MESOS-1626
 URL: https://issues.apache.org/jira/browse/MESOS-1626
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Craig Hansen-Sturm
Assignee: Craig Hansen-Sturm
Priority: Minor
 Attachments: boost-1.53.0.patch


 Integrate c++11/atomic into libprocess/stout following the pattern introduced 
 by c++11/memory and c++11/lambda.   The primary difference in this case, is 
 that tr1 did not include support for atomic, this has to come from 
 boost/atomic which was introduced in v1.53.   This task will include a patch 
 to update boost as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (MESOS-1529) Handle a network partition between Master and Slave

2014-07-23 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072537#comment-14072537
 ] 

Benjamin Mahler edited comment on MESOS-1529 at 7/24/14 2:55 AM:
-

For now we will proceed by adding a ping timeout on the slave to ensure that 
the slave re-registers when the master is no longer pinging it. This will 
resolve the case that motivated this ticket:

https://reviews.apache.org/r/23874/
https://reviews.apache.org/r/23875/
https://reviews.apache.org/r/23866/
https://reviews.apache.org/r/23867/
https://reviews.apache.org/r/23868/

I decided to punt on the failover timeout in the master in the first pass 
because it can be dangerous when ZooKeeper issues are preventing the slave from 
re-registering with the master; we do not want to remove a ton of slaves in 
this situation. Rather, when the slave is health checking correctly but does 
not re-register within a timeout, we could send a registration request from the 
master to the slave, telling the slave that it must re-register. This message 
could also be used when receiving status updates (or other messages) from 
slaves that are disconnected in the master.


was (Author: bmahler):
For now we will proceed by adding a ping timeout on the slave to ensure that 
the slave re-registers when the master is no longer pinging it. This will 
resolve the case that motivated this ticket:

https://reviews.apache.org/r/23866/
https://reviews.apache.org/r/23867/
https://reviews.apache.org/r/23868/

I decided to punt on the failover timeout in the master in the first pass 
because it can be dangerous when ZooKeeper issues are preventing the slave from 
re-registering with the master; we do not want to remove a ton of slaves in 
this situation. Rather, when the slave is health checking correctly but does 
not re-register within a timeout, we could send a registration request from the 
master to the slave, telling the slave that it must re-register. This message 
could also be used when receiving status updates (or other messages) from 
slaves that are disconnected in the master.

 Handle a network partition between Master and Slave
 ---

 Key: MESOS-1529
 URL: https://issues.apache.org/jira/browse/MESOS-1529
 Project: Mesos
  Issue Type: Bug
Reporter: Dominic Hamon
Assignee: Benjamin Mahler

 If a network partition occurs between a Master and Slave, the Master will 
 remove the Slave (as it fails health check) and mark the tasks being run 
 there as LOST. However, the Slave is not aware that it has been removed so 
 the tasks will continue to run.
 (To clarify a little bit: neither the master nor the slave receives 'exited' 
 event, indicating that the connection between the master and slave is not 
 closed).
 There are at least two possible approaches to solving this issue:
 1. Introduce a health check from Slave to Master so they have a consistent 
 view of a network partition. We may still see this issue should a one-way 
 connection error occur.
 2. Be less aggressive about marking tasks and Slaves as lost. Wait until the 
 Slave reappears and reconcile then. We'd still need to mark Slaves and tasks 
 as potentially lost (zombie state) but maybe the Scheduler can make a more 
 intelligent decision.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (MESOS-1626) Add support for C++11 (+Boost) atomic to stout

2014-07-23 Thread Craig Hansen-Sturm (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072676#comment-14072676
 ] 

Craig Hansen-Sturm edited comment on MESOS-1626 at 7/24/14 3:10 AM:


Patches our (subsetted) version of boost-1.53.0 to add support for 
boost::atomic 


was (Author: craig-mesos):
Patches our subsetted version of boost-1.53.0 to add back support for 
boost::atomic 

 Add support for C++11 (+Boost) atomic to stout
 

 Key: MESOS-1626
 URL: https://issues.apache.org/jira/browse/MESOS-1626
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Craig Hansen-Sturm
Assignee: Craig Hansen-Sturm
Priority: Minor
 Attachments: boost-1.53.0.patch


 Integrate c++11/atomic into libprocess/stout following the pattern introduced 
 by c++11/memory and c++11/lambda.   The primary difference in this case, is 
 that tr1 did not include support for atomic, this has to come from 
 boost/atomic which was introduced in v1.53.   This task will include a patch 
 to update boost as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MESOS-1635) zk flag fails when specifying a file and the

2014-07-23 Thread Ken Sipe (JIRA)
Ken Sipe created MESOS-1635:
---

 Summary: zk flag fails when specifying a file and the 
 Key: MESOS-1635
 URL: https://issues.apache.org/jira/browse/MESOS-1635
 Project: Mesos
  Issue Type: Bug
  Components: cli
Affects Versions: 0.19.1
 Environment: Linux ubuntu 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 
03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ken Sipe


The zk flag supports referencing a file.  It works  when registry is in_memory, 
however in a real environment it fails.

the following starts up just fine.
/usr/local/sbin/mesos-master --zk=file:///etc/mesos/zk --registry=in_memory

however when the follow is executed it fails:
 /usr/local/sbin/mesos-master --zk=file:///etc/mesos/zk --quorum=1 
--work_dir=/tmp/mesos

It uses the same working format for the zk flag, but now we are using the 
replicated logs. it fails with:
I0723 19:24:34.755506 39856 main.cpp:150] Build: 2014-07-18 18:50:58 by root
I0723 19:24:34.755580 39856 main.cpp:152] Version: 0.19.1
I0723 19:24:34.755591 39856 main.cpp:155] Git tag: 0.19.1
I0723 19:24:34.755601 39856 main.cpp:159] Git SHA: 
dc0b7bf2a1a7981079b33a16b689892f9cda0d8d
Error parsing ZooKeeper URL: Expecting 'zk://' at the beginning of the URL



--
This message was sent by Atlassian JIRA
(v6.2#6252)