[jira] [Commented] (MESOS-1640) HealthCheckTests are flaky under gprof/gcov
[ https://issues.apache.org/jira/browse/MESOS-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074715#comment-14074715 ] Timothy Chen commented on MESOS-1640: - Interesting, I thought I covered all the flakiness of the tests :( Is this only manifested in the CI machines again? Also how do you run support/coverage.sh? HealthCheckTests are flaky under gprof/gcov --- Key: MESOS-1640 URL: https://issues.apache.org/jira/browse/MESOS-1640 Project: Mesos Issue Type: Bug Components: test Reporter: Dominic Hamon Priority: Minor When running {{/support/coverage.sh}} the {{HealthCheckTest}} fixture exhibits multiple flakes: {noformat} [ RUN ] HealthCheckTest.HealthyTask ../../src/tests/health_check_tests.cpp:165: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/health_check_tests.cpp:167: Failure Failed to wait 10secs for statusHealth ../../src/tests/health_check_tests.cpp:158: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] HealthCheckTest.HealthyTask (11854 ms) [ RUN ] HealthCheckTest.EnvironmentSetup ../../src/tests/health_check_tests.cpp:314: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/health_check_tests.cpp:316: Failure Failed to wait 10secs for statusHealth ../../src/tests/health_check_tests.cpp:307: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] HealthCheckTest.EnvironmentSetup (12020 ms) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (MESOS-1640) HealthCheckTests are flaky under gprof/gcov
[ https://issues.apache.org/jira/browse/MESOS-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen reassigned MESOS-1640: --- Assignee: Timothy Chen HealthCheckTests are flaky under gprof/gcov --- Key: MESOS-1640 URL: https://issues.apache.org/jira/browse/MESOS-1640 Project: Mesos Issue Type: Bug Components: test Reporter: Dominic Hamon Assignee: Timothy Chen Priority: Minor When running {{/support/coverage.sh}} the {{HealthCheckTest}} fixture exhibits multiple flakes: {noformat} [ RUN ] HealthCheckTest.HealthyTask ../../src/tests/health_check_tests.cpp:165: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/health_check_tests.cpp:167: Failure Failed to wait 10secs for statusHealth ../../src/tests/health_check_tests.cpp:158: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] HealthCheckTest.HealthyTask (11854 ms) [ RUN ] HealthCheckTest.EnvironmentSetup ../../src/tests/health_check_tests.cpp:314: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/health_check_tests.cpp:316: Failure Failed to wait 10secs for statusHealth ../../src/tests/health_check_tests.cpp:307: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] HealthCheckTest.EnvironmentSetup (12020 ms) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1640) HealthCheckTests are flaky under gprof/gcov
[ https://issues.apache.org/jira/browse/MESOS-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074807#comment-14074807 ] Dominic Hamon commented on MESOS-1640: -- This is manifesting on my own dev machine but only under gcov. If you have {{lcov}} installed, just run {{./support/coverage.sh}} from the mesos root. It only works with g++ (clang/python issue) and nukes your build. Be warned ;) there are other tests that fail and i'll be adding more tickets. HealthCheckTests are flaky under gprof/gcov --- Key: MESOS-1640 URL: https://issues.apache.org/jira/browse/MESOS-1640 Project: Mesos Issue Type: Bug Components: test Reporter: Dominic Hamon Assignee: Timothy Chen Priority: Minor When running {{/support/coverage.sh}} the {{HealthCheckTest}} fixture exhibits multiple flakes: {noformat} [ RUN ] HealthCheckTest.HealthyTask ../../src/tests/health_check_tests.cpp:165: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/health_check_tests.cpp:167: Failure Failed to wait 10secs for statusHealth ../../src/tests/health_check_tests.cpp:158: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] HealthCheckTest.HealthyTask (11854 ms) [ RUN ] HealthCheckTest.EnvironmentSetup ../../src/tests/health_check_tests.cpp:314: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/health_check_tests.cpp:316: Failure Failed to wait 10secs for statusHealth ../../src/tests/health_check_tests.cpp:307: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] HealthCheckTest.EnvironmentSetup (12020 ms) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1638) Create project website section to feature professional support
[ https://issues.apache.org/jira/browse/MESOS-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074825#comment-14074825 ] Dave Lester commented on MESOS-1638: Also mentioned on the dev@ mailing list: * Otto Ops LLC Create project website section to feature professional support -- Key: MESOS-1638 URL: https://issues.apache.org/jira/browse/MESOS-1638 Project: Mesos Issue Type: Improvement Components: project website Reporter: Dave Lester Assignee: Dave Lester As I suggested in a dev mailing list thread last September http://markmail.org/thread/o3nlnihmwqtgsm7d, I think it'd be great for the Mesos website/community page to include links to companies that provide Mesos services and support. At the time, we heard from: * Grand Logic * Mesosphere * Big Data Open Source Security LLC Before I made such a change, I wanted to open this ticket to the community to see if there are other companies/individuals that could also be linked to. This info is also important to gather in order to be compliant with Apache's policy of vender neutrality: http://mail-archives.apache.org/mod_mbox/cloudstack-marketing/201303.mbox/%3c5138b...@shanecurcuru.org%3E -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1384) Add support for loadable MesosModule
[ https://issues.apache.org/jira/browse/MESOS-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074885#comment-14074885 ] Niklas Quarfot Nielsen commented on MESOS-1384: --- Here is a small write up on the proposed design: h2. Motivation The motivation behind mesos modules is to make it possible to implement abstract interfaces to internal Mesos components outside the Mesos binary, as dynamic loadable libraries. Instead of mapping each and every call to C++ mangled (or self mangled) symbols: http://mentorembedded.github.io/cxx-abi/abi.html and https://github.com/nqn/dylib . We can use the already existing vtable layout to dispatch to implementations within a loaded library (just like our current use of JNI). {code} Mesos + External Library (.so / .dylib) | | Implementation Pure virtual class +--+ | +-+ | IsolatorModule | | | IsolatorModuleImpl : IsolatorModule | | | | | | | + Module metadata| | | + Module metadata (Version: XXX)| | + create(library)| | | | | | | | | | +--+ | | | +-+ | | |Isolator | | | | | IsolatorImpl | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +--+ | | | +-+ | +-+--+-+ | +-+-+-+ | | Tryshared_ptrIsolator isolator =| extern C { IsolatorModule::create(libxxx.so); |void* create_isolator_module(void*) { // Use isolator as if it was hosted by Mesos | return new IsolatorModuleImpl; |} | } | + {code} This design has been implemented as a proof-of-concept in this branch: https://github.com/nqn/mesos/tree/niklas/mesos_module The sequence is as follows (taken IsolatorModule as an example): 1) Use stout's DynamicLibrary abstraction to get a library object 2) Call (new and dedicated) module shims initialize method (IsolatorModule::create(), AllocatorModule::create(), AuthenticatorModule::create(), ...) 2.a) Behind the scenes, the create() method localize the proper symbol in the library 2.b) calls it (with optional arguments as void*) 2.c) Verifies it through metadata located between the isolator module object start and the isolator offset. 2.d) returns a try to the isolator (compiler rebase from isolator module to isolator) 3) Use returned isolator object as usual. h3. Notes to drawing The definition of IsolatorModule is shared (included) by both Mesos and library implementer. The IsolatorModule::create() is static and knows which symbol to look for. A standard scheme of create_X_module could be used, so one library can be used to multiple module implementations. create() is called init() in the proof-of-concept: https://github.com/nqn/mesos/tree/niklas/mesos_module h2. QA h4. 1) Why do we return a void pointer across the library border? We need to be able to locate the right symbol in the library to generate a module object. This means that we can / should not use mangled names (which C++ types, like classes) requires if it were a return type. h4. 2) Why don't we cast the void pointer with dynamic_cast for type verification. GCC doesn't support this on void pointers unfortunately. h4. 3) Why does create() / init() require a path to the library file instead of a DynamicLibrary type? We want to be able to control the life-cycle of the loaded library separately from the life-cycle of a module object. Multiple modules can be provided by one library file. A short-cut could be to overload the create() / init() methods and add the path constructor as a 2nd option. h4. 4) Why do we use multiple inheritance? How about having the isolator as a member variable instead? We get an _IsolatorModule_ from the
[jira] [Updated] (MESOS-1635) zk flag fails when specifying a file and the
[ https://issues.apache.org/jira/browse/MESOS-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-1635: --- Shepherd: Benjamin Mahler zk flag fails when specifying a file and the - Key: MESOS-1635 URL: https://issues.apache.org/jira/browse/MESOS-1635 Project: Mesos Issue Type: Bug Components: cli Affects Versions: 0.19.1 Environment: Linux ubuntu 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Reporter: Ken Sipe The zk flag supports referencing a file. It works when registry is in_memory, however in a real environment it fails. the following starts up just fine. /usr/local/sbin/mesos-master --zk=file:///etc/mesos/zk --registry=in_memory however when the follow is executed it fails: /usr/local/sbin/mesos-master --zk=file:///etc/mesos/zk --quorum=1 --work_dir=/tmp/mesos It uses the same working format for the zk flag, but now we are using the replicated logs. it fails with: I0723 19:24:34.755506 39856 main.cpp:150] Build: 2014-07-18 18:50:58 by root I0723 19:24:34.755580 39856 main.cpp:152] Version: 0.19.1 I0723 19:24:34.755591 39856 main.cpp:155] Git tag: 0.19.1 I0723 19:24:34.755601 39856 main.cpp:159] Git SHA: dc0b7bf2a1a7981079b33a16b689892f9cda0d8d Error parsing ZooKeeper URL: Expecting 'zk://' at the beginning of the URL -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MESOS-1642) Slave should proceed with recovery if the old resources is a subset of the new resources.
Jie Yu created MESOS-1642: - Summary: Slave should proceed with recovery if the old resources is a subset of the new resources. Key: MESOS-1642 URL: https://issues.apache.org/jira/browse/MESOS-1642 Project: Mesos Issue Type: Improvement Reporter: Jie Yu This would simply the deploy a lot if we want to increase the slave resources (or slave private resources) in the SlaveInfo. The current slave will simply flap if it finds the old/new SlaveInfo are not equal. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MESOS-1642) Slave should proceed with recovery if the old resources is a subset of the new resources.
[ https://issues.apache.org/jira/browse/MESOS-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075020#comment-14075020 ] Tobias Weingartner commented on MESOS-1642: --- This is actually quite critical to be able to manage a large cluster in a more dynamic way. Note, this may be useful to interact with inverse offers. Slave should proceed with recovery if the old resources is a subset of the new resources. - Key: MESOS-1642 URL: https://issues.apache.org/jira/browse/MESOS-1642 Project: Mesos Issue Type: Improvement Reporter: Jie Yu This would simply the deploy a lot if we want to increase the slave resources (or slave private resources) in the SlaveInfo. The current slave will simply flap if it finds the old/new SlaveInfo are not equal. -- This message was sent by Atlassian JIRA (v6.2#6252)