[jira] [Commented] (MESOS-1640) HealthCheckTests are flaky under gprof/gcov

2014-07-25 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074715#comment-14074715
 ] 

Timothy Chen commented on MESOS-1640:
-

Interesting, I thought I covered all the flakiness of the tests :( 
Is this only manifested in the CI machines again? Also how do you run 
support/coverage.sh? 

 HealthCheckTests are flaky under gprof/gcov
 ---

 Key: MESOS-1640
 URL: https://issues.apache.org/jira/browse/MESOS-1640
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Dominic Hamon
Priority: Minor

 When running {{/support/coverage.sh}} the {{HealthCheckTest}} fixture 
 exhibits multiple flakes:
 {noformat}
 [ RUN  ] HealthCheckTest.HealthyTask
 ../../src/tests/health_check_tests.cpp:165: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/health_check_tests.cpp:167: Failure
 Failed to wait 10secs for statusHealth
 ../../src/tests/health_check_tests.cpp:158: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] HealthCheckTest.HealthyTask (11854 ms)
 [ RUN  ] HealthCheckTest.EnvironmentSetup
 ../../src/tests/health_check_tests.cpp:314: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/health_check_tests.cpp:316: Failure
 Failed to wait 10secs for statusHealth
 ../../src/tests/health_check_tests.cpp:307: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] HealthCheckTest.EnvironmentSetup (12020 ms)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (MESOS-1640) HealthCheckTests are flaky under gprof/gcov

2014-07-25 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen reassigned MESOS-1640:
---

Assignee: Timothy Chen

 HealthCheckTests are flaky under gprof/gcov
 ---

 Key: MESOS-1640
 URL: https://issues.apache.org/jira/browse/MESOS-1640
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Dominic Hamon
Assignee: Timothy Chen
Priority: Minor

 When running {{/support/coverage.sh}} the {{HealthCheckTest}} fixture 
 exhibits multiple flakes:
 {noformat}
 [ RUN  ] HealthCheckTest.HealthyTask
 ../../src/tests/health_check_tests.cpp:165: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/health_check_tests.cpp:167: Failure
 Failed to wait 10secs for statusHealth
 ../../src/tests/health_check_tests.cpp:158: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] HealthCheckTest.HealthyTask (11854 ms)
 [ RUN  ] HealthCheckTest.EnvironmentSetup
 ../../src/tests/health_check_tests.cpp:314: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/health_check_tests.cpp:316: Failure
 Failed to wait 10secs for statusHealth
 ../../src/tests/health_check_tests.cpp:307: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] HealthCheckTest.EnvironmentSetup (12020 ms)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1640) HealthCheckTests are flaky under gprof/gcov

2014-07-25 Thread Dominic Hamon (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074807#comment-14074807
 ] 

Dominic Hamon commented on MESOS-1640:
--

This is manifesting on my own dev machine but only under gcov.

If you have {{lcov}} installed, just run {{./support/coverage.sh}} from the 
mesos root. It only works with g++ (clang/python issue) and nukes your build. 
Be warned ;)

there are other tests that fail and i'll be adding more tickets.

 HealthCheckTests are flaky under gprof/gcov
 ---

 Key: MESOS-1640
 URL: https://issues.apache.org/jira/browse/MESOS-1640
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Dominic Hamon
Assignee: Timothy Chen
Priority: Minor

 When running {{/support/coverage.sh}} the {{HealthCheckTest}} fixture 
 exhibits multiple flakes:
 {noformat}
 [ RUN  ] HealthCheckTest.HealthyTask
 ../../src/tests/health_check_tests.cpp:165: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/health_check_tests.cpp:167: Failure
 Failed to wait 10secs for statusHealth
 ../../src/tests/health_check_tests.cpp:158: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] HealthCheckTest.HealthyTask (11854 ms)
 [ RUN  ] HealthCheckTest.EnvironmentSetup
 ../../src/tests/health_check_tests.cpp:314: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/health_check_tests.cpp:316: Failure
 Failed to wait 10secs for statusHealth
 ../../src/tests/health_check_tests.cpp:307: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] HealthCheckTest.EnvironmentSetup (12020 ms)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1638) Create project website section to feature professional support

2014-07-25 Thread Dave Lester (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074825#comment-14074825
 ] 

Dave Lester commented on MESOS-1638:


Also mentioned on the dev@ mailing list:

* Otto Ops LLC

 Create project website section to feature professional support
 --

 Key: MESOS-1638
 URL: https://issues.apache.org/jira/browse/MESOS-1638
 Project: Mesos
  Issue Type: Improvement
  Components: project website
Reporter: Dave Lester
Assignee: Dave Lester

 As I suggested in a dev mailing list thread last September 
 http://markmail.org/thread/o3nlnihmwqtgsm7d, I think it'd be great for the 
 Mesos website/community page to include links to companies that provide Mesos 
 services and support. At the time, we heard from:
 * Grand Logic
 * Mesosphere
 * Big Data Open Source Security LLC
 Before I made such a change, I wanted to open this ticket to the
 community to see if there are other companies/individuals that could also be 
 linked to. This
 info is also important to gather in order to be compliant with Apache's
 policy of vender neutrality:
 http://mail-archives.apache.org/mod_mbox/cloudstack-marketing/201303.mbox/%3c5138b...@shanecurcuru.org%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1384) Add support for loadable MesosModule

2014-07-25 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074885#comment-14074885
 ] 

Niklas Quarfot Nielsen commented on MESOS-1384:
---

Here is a small write up on the proposed design:

h2. Motivation

The motivation behind mesos modules is to make it possible to implement 
abstract interfaces to internal Mesos components outside the Mesos binary, as 
dynamic loadable libraries.
Instead of mapping each and every call to C++ mangled (or self mangled) 
symbols: http://mentorembedded.github.io/cxx-abi/abi.html and 
https://github.com/nqn/dylib .
We can use the already existing vtable layout to dispatch to implementations 
within a loaded library (just like our current use of JNI).

{code}
  Mesos  +  External Library (.so / .dylib)
 |
 |
Implementation
Pure virtual class +--+  |  
+-+
   | IsolatorModule   |  |  | IsolatorModuleImpl : 
IsolatorModule |
   |  |  |  |   
  |
   | + Module metadata|  |  | + Module metadata (Version: 
XXX)|
   | + create(library)|  |  |   
  |
   |  |  |  |   
  |
   | +--+ |  |  | 
+-+ |
   | |Isolator  | |  |  | | IsolatorImpl
| |
   | |  | |  |  | | 
| |
   | |  | |  |  | | 
| |
   | |  | |  |  | | 
| |
   | |  | |  |  | | 
| |
   | +--+ |  |  | 
+-+ |
   +-+--+-+  |  
+-+-+-+
 |
 |
Tryshared_ptrIsolator  isolator =|  extern C {
  IsolatorModule::create(libxxx.so);   |void* 
create_isolator_module(void*) {
// Use isolator as if it was hosted by Mesos |  return new 
IsolatorModuleImpl;
 |}
 |  }
 |
 +


{code}

This design has been implemented as a proof-of-concept in this branch:  
https://github.com/nqn/mesos/tree/niklas/mesos_module

The sequence is as follows (taken IsolatorModule as an example):
1) Use stout's DynamicLibrary abstraction to get a library object
2) Call (new and dedicated) module shims initialize method 
(IsolatorModule::create(), AllocatorModule::create(), 
AuthenticatorModule::create(), ...)
  2.a) Behind the scenes, the create() method localize the proper symbol in the 
library
  2.b) calls it (with optional arguments as void*)
  2.c) Verifies it through metadata located between the isolator module object 
start and the isolator offset.
  2.d) returns a try to the isolator (compiler rebase from isolator module to 
isolator)
3) Use returned isolator object as usual.

h3. Notes to drawing

The definition of IsolatorModule is shared (included) by both Mesos and library 
implementer.
The IsolatorModule::create() is static and knows which symbol to look for.
A standard scheme of create_X_module could be used, so one library can be 
used to multiple module implementations.
create() is called init() in the proof-of-concept: 
https://github.com/nqn/mesos/tree/niklas/mesos_module

h2. QA

h4. 1) Why do we return a void pointer across the library border?

We need to be able to locate the right symbol in the library to generate a 
module object.
This means that we can / should not use mangled names (which C++ types, like 
classes) requires if it were a return type.

h4. 2) Why don't we cast the void pointer with dynamic_cast for type 
verification.

GCC doesn't support this on void pointers unfortunately.

h4. 3) Why does create() / init() require a path to the library file instead of 
a DynamicLibrary type?

We want to be able to control the life-cycle of the loaded library separately 
from the life-cycle of a module object.
Multiple modules can be provided by one library file.
A short-cut could be to overload the create() / init() methods and add the path 
constructor as a 2nd option.

h4. 4) Why do we use multiple inheritance? How about having the isolator as a 
member variable instead?

We get an _IsolatorModule_ from the 

[jira] [Updated] (MESOS-1635) zk flag fails when specifying a file and the

2014-07-25 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-1635:
---

Shepherd: Benjamin Mahler

 zk flag fails when specifying a file and the 
 -

 Key: MESOS-1635
 URL: https://issues.apache.org/jira/browse/MESOS-1635
 Project: Mesos
  Issue Type: Bug
  Components: cli
Affects Versions: 0.19.1
 Environment: Linux ubuntu 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 
 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ken Sipe

 The zk flag supports referencing a file.  It works  when registry is 
 in_memory, however in a real environment it fails.
 the following starts up just fine.
 /usr/local/sbin/mesos-master --zk=file:///etc/mesos/zk --registry=in_memory
 however when the follow is executed it fails:
  /usr/local/sbin/mesos-master --zk=file:///etc/mesos/zk --quorum=1 
 --work_dir=/tmp/mesos
 It uses the same working format for the zk flag, but now we are using the 
 replicated logs. it fails with:
 I0723 19:24:34.755506 39856 main.cpp:150] Build: 2014-07-18 18:50:58 by root
 I0723 19:24:34.755580 39856 main.cpp:152] Version: 0.19.1
 I0723 19:24:34.755591 39856 main.cpp:155] Git tag: 0.19.1
 I0723 19:24:34.755601 39856 main.cpp:159] Git SHA: 
 dc0b7bf2a1a7981079b33a16b689892f9cda0d8d
 Error parsing ZooKeeper URL: Expecting 'zk://' at the beginning of the URL



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MESOS-1642) Slave should proceed with recovery if the old resources is a subset of the new resources.

2014-07-25 Thread Jie Yu (JIRA)
Jie Yu created MESOS-1642:
-

 Summary: Slave should proceed with recovery if the old resources 
is a subset of the new resources.
 Key: MESOS-1642
 URL: https://issues.apache.org/jira/browse/MESOS-1642
 Project: Mesos
  Issue Type: Improvement
Reporter: Jie Yu


This would simply the deploy a lot if we want to increase the slave resources 
(or slave private resources) in the SlaveInfo. The current slave will simply 
flap if it finds the old/new SlaveInfo are not equal.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1642) Slave should proceed with recovery if the old resources is a subset of the new resources.

2014-07-25 Thread Tobias Weingartner (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075020#comment-14075020
 ] 

Tobias Weingartner commented on MESOS-1642:
---

This is actually quite critical to be able to manage a large cluster in a more 
dynamic way.

Note, this may be useful to interact with inverse offers.

 Slave should proceed with recovery if the old resources is a subset of the 
 new resources.
 -

 Key: MESOS-1642
 URL: https://issues.apache.org/jira/browse/MESOS-1642
 Project: Mesos
  Issue Type: Improvement
Reporter: Jie Yu

 This would simply the deploy a lot if we want to increase the slave resources 
 (or slave private resources) in the SlaveInfo. The current slave will simply 
 flap if it finds the old/new SlaveInfo are not equal.



--
This message was sent by Atlassian JIRA
(v6.2#6252)