[jira] [Commented] (MESOS-3202) Avoid frameworks starving in DRF allocator.
[ https://issues.apache.org/jira/browse/MESOS-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654897#comment-14654897 ] Joerg Schad commented on MESOS-3202: Short Update: After various discussions with [~alex-mesos] and [~benjaminhindman] we hope to avoid such situations using Quota (MESOS-1791). Avoid frameworks starving in DRF allocator. --- Key: MESOS-3202 URL: https://issues.apache.org/jira/browse/MESOS-3202 Project: Mesos Issue Type: Bug Reporter: Joerg Schad We currently run into issues with the DRF scheduler that frameworks do not receive offers (see https://github.com/mesosphere/marathon/issues/1931 for details). Imagine that we have 10 frameworks and unallocated resources from a single slave. Allocation interval is 1 sec, and refuse_seconds (i.e. the time for which a declined resource is filtered) is 3 sec across all frameworks. Allocator offers resources to framework 1 (according to DRF) which declines the offer immediately. In the next allocation interval framework 1 is skipped due to the declined offer before. Hence the next framework 2 is offered the resources, which it also declines. The same procedure in the next allocation interval (with framework 3). In the next allocation interval the refuse_seconds for framework 1 are over, and as it still has the lowest DRF share it gets the resource offered again, which it again declines. And the cycle begins again Framework 4 (which is actually waiting for this resource) is never offered this resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2916) Expose State API via HTTP
[ https://issues.apache.org/jira/browse/MESOS-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654984#comment-14654984 ] Tomás Senart commented on MESOS-2916: - This refers to the state abstraction. Expose State API via HTTP - Key: MESOS-2916 URL: https://issues.apache.org/jira/browse/MESOS-2916 Project: Mesos Issue Type: Story Reporter: Tomás Senart Labels: http The State API is a useful service for frameworks to use. It would make sense to have it available via the public HTTP API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3207) C++ style guide is not rendered correctly (code section syntax disregarded)
Bernd Mathiske created MESOS-3207: - Summary: C++ style guide is not rendered correctly (code section syntax disregarded) Key: MESOS-3207 URL: https://issues.apache.org/jira/browse/MESOS-3207 Project: Mesos Issue Type: Bug Components: project website, webui Affects Versions: 0.23.0 Reporter: Anand Mazumdar Assignee: Bernd Mathiske Priority: Minor Some paragraphs at the bottom of docs/mesos-c++-style-guide.md containing code sections are not rendered correctly by the web site generator. It looks fine in a github gist and apparently the syntax used is correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3207) C++ style guide is not rendered correctly (code section syntax disregarded)
[ https://issues.apache.org/jira/browse/MESOS-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655060#comment-14655060 ] haosdent commented on MESOS-3207: - {quote} Prefer `constexpr to `const` for all constant POD declarations, `constexpr` `char` arrays are preferred to `const` `string` literals. {quote} should be {quote} Prefer `constexpr` to `const` for all constant POD declarations, `constexpr` `char` arrays are preferred to `const` `string` literals. {quote} C++ style guide is not rendered correctly (code section syntax disregarded) --- Key: MESOS-3207 URL: https://issues.apache.org/jira/browse/MESOS-3207 Project: Mesos Issue Type: Bug Components: project website, webui Affects Versions: 0.23.0 Reporter: Anand Mazumdar Assignee: Bernd Mathiske Priority: Minor Labels: mesosphere Some paragraphs at the bottom of docs/mesos-c++-style-guide.md containing code sections are not rendered correctly by the web site generator. It looks fine in a github gist and apparently the syntax used is correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3207) C++ style guide is not rendered correctly (code section syntax disregarded)
[ https://issues.apache.org/jira/browse/MESOS-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655061#comment-14655061 ] Bernd Mathiske commented on MESOS-3207: --- Thus I learned the hard way that using a gist preview is not enough to find out if a markdown file will be rendered as expected eventually. Better to generate the web site and observe it in dev mode (see the README.md in the web site repository), checking the real thing. C++ style guide is not rendered correctly (code section syntax disregarded) --- Key: MESOS-3207 URL: https://issues.apache.org/jira/browse/MESOS-3207 Project: Mesos Issue Type: Bug Components: project website, webui Affects Versions: 0.23.0 Reporter: Anand Mazumdar Assignee: Bernd Mathiske Priority: Minor Labels: mesosphere Some paragraphs at the bottom of docs/mesos-c++-style-guide.md containing code sections are not rendered correctly by the web site generator. It looks fine in a github gist and apparently the syntax used is correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3207) C++ style guide is not rendered correctly (code section syntax disregarded)
[ https://issues.apache.org/jira/browse/MESOS-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655063#comment-14655063 ] Bernd Mathiske commented on MESOS-3207: --- Ah, thx [~haosd...@gmail.com]! Maybe the missing tick threw it over. In any case, our markdown style guide now recommends using tildes instead. That's what I will do in the fix as well. C++ style guide is not rendered correctly (code section syntax disregarded) --- Key: MESOS-3207 URL: https://issues.apache.org/jira/browse/MESOS-3207 Project: Mesos Issue Type: Bug Components: project website, webui Affects Versions: 0.23.0 Reporter: Anand Mazumdar Assignee: Bernd Mathiske Priority: Minor Labels: mesosphere Some paragraphs at the bottom of docs/mesos-c++-style-guide.md containing code sections are not rendered correctly by the web site generator. It looks fine in a github gist and apparently the syntax used is correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3062) Add authorization for dynamic reservation
[ https://issues.apache.org/jira/browse/MESOS-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655124#comment-14655124 ] Michael Park commented on MESOS-3062: - Introduced ACL protobuf definitions for dynamic reservation: https://reviews.apache.org/r/37002/ Enabled the Authorizer to handle Reserve/Unreserve ACLs: https://reviews.apache.org/r/37110/ Added 'Master::authorize' for Reserve/Unreserve: https://reviews.apache.org/r/37125/ Added authorization for dynamic reservation master endpoints: https://reviews.apache.org/r/37126/ Added framework authorization for dynamic reservation: https://reviews.apache.org/r/37127/ Add authorization for dynamic reservation - Key: MESOS-3062 URL: https://issues.apache.org/jira/browse/MESOS-3062 Project: Mesos Issue Type: Task Components: master Reporter: Michael Park Assignee: Michael Park Labels: mesosphere Dynamic reservations should be authorized with the {{principal}} of the reserving entity (framework or master). The idea is to introduce {{Reserve}} and {{Unreserve}} into the ACL. {code} message Reserve { // Subjects. required Entity principals = 1; // Objects. MVP: Only possible values = ANY, NONE required Entity resources = 1; } message Unreserve { // Subjects. required Entity principals = 1; // Objects. required Entity reserver_principals = 2; } {code} When a framework/operator reserves resources, reserve ACLs are checked to see if the framework ({{FrameworkInfo.principal}}) or the operator ({{Credential.user}}) is authorized to reserve the specified resources. If not authorized, the reserve operation is rejected. When a framework/operator unreserves resources, unreserve ACLs are checked to see if the framework ({{FrameworkInfo.principal}}) or the operator ({{Credential.user}}) is authorized to unreserve the resources reserved by a framework or operator ({{Resource.ReservationInfo.principal}}). If not authorized, the unreserve operation is rejected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2073) Fetcher cache file verification, updating and invalidation
[ https://issues.apache.org/jira/browse/MESOS-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-2073: -- Story Points: (was: 2) Target Version/s: (was: 0.24.0) Issue Type: Epic (was: Improvement) After working out how to implement this, it turned out to become an epic by itself and so I removed this from under the fetcher cache epic now. Fetcher cache file verification, updating and invalidation -- Key: MESOS-2073 URL: https://issues.apache.org/jira/browse/MESOS-2073 Project: Mesos Issue Type: Epic Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Priority: Minor Labels: mesosphere Original Estimate: 96h Remaining Estimate: 96h The other tickets in the fetcher cache epic do not necessitate a check sum (e.g. MD5, SHA*) for files cached by the fetcher. Whereas such a check sum could be used to verify whether the file arrived without unintended alterations, it can first and foremost be employed to detect and trigger updates. Scenario: If a UIR is requested for fetching and the indicated download has the same check sum as the cached file, then the cache file will be used and the download forgone. If the check sum is different, then fetching proceeds and the cached file gets replaced. This capability will be indicated by an additional field in the URI protobuf. Details TBD, i.e. to be discussed in comments below. In addition to the above, even if the check sum is the same, we can support voluntary cache file invalidation: a fresh download can be requested, or the caching behavior can be revoked entirely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-336) Mesos slave should cache executors
[ https://issues.apache.org/jira/browse/MESOS-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-336: - Comment: was deleted (was: By spinning off MESOS-2073 into its own epic, this one can now be regarded as completed. (That we are only using two-level hierarchies to break down tickets led to this repositioning.)) Mesos slave should cache executors -- Key: MESOS-336 URL: https://issues.apache.org/jira/browse/MESOS-336 Project: Mesos Issue Type: Epic Components: slave Reporter: brian wickman Assignee: Bernd Mathiske Labels: mesosphere Original Estimate: 672h Remaining Estimate: 672h The slave should be smarter about how it handles pulling down executors. In our environment, executors rarely change but the slave will always pull it down from regardless HDFS. This puts undue stress on our HDFS clusters, and is not resilient to reduced HDFS availability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3208) Fetch checksum files to inform fetcher cache use
Bernd Mathiske created MESOS-3208: - Summary: Fetch checksum files to inform fetcher cache use Key: MESOS-3208 URL: https://issues.apache.org/jira/browse/MESOS-3208 Project: Mesos Issue Type: Improvement Components: fetcher Reporter: Bernd Mathiske Assignee: Bernd Mathiske Priority: Minor This is the first part of phase 1 as described in the comments for MESOS-2073. We add a field to CommandInfo::URI that contains the URI of a checksum file. When this file has new content, then the contents of the associated value URI needs to be refreshed in the fetcher cache. In this implementation step, we just add the above basic functionality (download, checksum comparison). In later steps, we will add more control flow to cover corner cases and thus make this feature more useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2073) Fetcher cache file verification, updating and invalidation
[ https://issues.apache.org/jira/browse/MESOS-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-2073: -- Epic Name: Fetcher cache checksums Fetcher cache file verification, updating and invalidation -- Key: MESOS-2073 URL: https://issues.apache.org/jira/browse/MESOS-2073 Project: Mesos Issue Type: Epic Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Priority: Minor Labels: mesosphere Original Estimate: 96h Remaining Estimate: 96h The other tickets in the fetcher cache epic do not necessitate a check sum (e.g. MD5, SHA*) for files cached by the fetcher. Whereas such a check sum could be used to verify whether the file arrived without unintended alterations, it can first and foremost be employed to detect and trigger updates. Scenario: If a UIR is requested for fetching and the indicated download has the same check sum as the cached file, then the cache file will be used and the download forgone. If the check sum is different, then fetching proceeds and the cached file gets replaced. This capability will be indicated by an additional field in the URI protobuf. Details TBD, i.e. to be discussed in comments below. In addition to the above, even if the check sum is the same, we can support voluntary cache file invalidation: a fresh download can be requested, or the caching behavior can be revoked entirely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-2073) Fetcher cache file verification, updating and invalidation
[ https://issues.apache.org/jira/browse/MESOS-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-2073: -- Comment: was deleted (was: First patch in a series to implement phase 1: https://reviews.apache.org/r/37075/ ) Fetcher cache file verification, updating and invalidation -- Key: MESOS-2073 URL: https://issues.apache.org/jira/browse/MESOS-2073 Project: Mesos Issue Type: Epic Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Priority: Minor Labels: mesosphere Original Estimate: 96h Remaining Estimate: 96h The other tickets in the fetcher cache epic do not necessitate a check sum (e.g. MD5, SHA*) for files cached by the fetcher. Whereas such a check sum could be used to verify whether the file arrived without unintended alterations, it can first and foremost be employed to detect and trigger updates. Scenario: If a UIR is requested for fetching and the indicated download has the same check sum as the cached file, then the cache file will be used and the download forgone. If the check sum is different, then fetching proceeds and the cached file gets replaced. This capability will be indicated by an additional field in the URI protobuf. Details TBD, i.e. to be discussed in comments below. In addition to the above, even if the check sum is the same, we can support voluntary cache file invalidation: a fresh download can be requested, or the caching behavior can be revoked entirely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2073) Fetcher cache file verification, updating and invalidation
[ https://issues.apache.org/jira/browse/MESOS-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653655#comment-14653655 ] Bernd Mathiske edited comment on MESOS-2073 at 8/5/15 11:36 AM: First patch in a series to implement phase 1: https://reviews.apache.org/r/37075/ was (Author: bernd-mesos): First patch in a series to implement phase 1: https://reviews.apache.org/r/37075/ Fetcher cache file verification, updating and invalidation -- Key: MESOS-2073 URL: https://issues.apache.org/jira/browse/MESOS-2073 Project: Mesos Issue Type: Epic Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Priority: Minor Labels: mesosphere Original Estimate: 96h Remaining Estimate: 96h The other tickets in the fetcher cache epic do not necessitate a check sum (e.g. MD5, SHA*) for files cached by the fetcher. Whereas such a check sum could be used to verify whether the file arrived without unintended alterations, it can first and foremost be employed to detect and trigger updates. Scenario: If a UIR is requested for fetching and the indicated download has the same check sum as the cached file, then the cache file will be used and the download forgone. If the check sum is different, then fetching proceeds and the cached file gets replaced. This capability will be indicated by an additional field in the URI protobuf. Details TBD, i.e. to be discussed in comments below. In addition to the above, even if the check sum is the same, we can support voluntary cache file invalidation: a fresh download can be requested, or the caching behavior can be revoked entirely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3208) Fetch checksum files to inform fetcher cache use
[ https://issues.apache.org/jira/browse/MESOS-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655233#comment-14655233 ] Bernd Mathiske commented on MESOS-3208: --- First patch in a series to implement phase 1: https://reviews.apache.org/r/37075/ Fetch checksum files to inform fetcher cache use Key: MESOS-3208 URL: https://issues.apache.org/jira/browse/MESOS-3208 Project: Mesos Issue Type: Improvement Components: fetcher Reporter: Bernd Mathiske Assignee: Bernd Mathiske Priority: Minor This is the first part of phase 1 as described in the comments for MESOS-2073. We add a field to CommandInfo::URI that contains the URI of a checksum file. When this file has new content, then the contents of the associated value URI needs to be refreshed in the fetcher cache. In this implementation step, we just add the above basic functionality (download, checksum comparison). In later steps, we will add more control flow to cover corner cases and thus make this feature more useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1457) Process IDs should be required to be human-readable
[ https://issues.apache.org/jira/browse/MESOS-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658324#comment-14658324 ] Till Toenshoff commented on MESOS-1457: --- Shepherd will get assigned shortly. Process IDs should be required to be human-readable Key: MESOS-1457 URL: https://issues.apache.org/jira/browse/MESOS-1457 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Dominic Hamon Assignee: Palak Choudhary Priority: Minor When debugging, it's very useful to understand which processes are getting timeslices. As such, the human-readable names that can be passed to {{ProcessBase}} are incredibly valuable, however they are currently optional. If the constructor of {{ProcessBase}} took a mandatory string, every process would get a human-readable name and debugging would be much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3199) Validate Quota Requests.
[ https://issues.apache.org/jira/browse/MESOS-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-3199: -- Shepherd: Bernd Mathiske Validate Quota Requests. Key: MESOS-3199 URL: https://issues.apache.org/jira/browse/MESOS-3199 Project: Mesos Issue Type: Task Reporter: Joerg Schad Assignee: Joerg Schad Labels: mesosphere We need to validate quota requests in terms of syntax correctness, update Master bookkeeping structures, and persist quota requests in the {{Registry}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3063) Add an example framework using dynamic reservation
[ https://issues.apache.org/jira/browse/MESOS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658408#comment-14658408 ] Klaus Ma commented on MESOS-3063: - Have a draft example to reserve the resources, i'm thinking to un-reserve the resource after all tasks done. Will update the code by the end of this week. Add an example framework using dynamic reservation -- Key: MESOS-3063 URL: https://issues.apache.org/jira/browse/MESOS-3063 Project: Mesos Issue Type: Task Reporter: Michael Park Assignee: Klaus Ma An example framework using dynamic reservation should added to # test dynamic reservations further, and # to be used as a reference for those who want to use the dynamic reservation feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3015) Add hooks for Slave exits
[ https://issues.apache.org/jira/browse/MESOS-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-3015: -- Shepherd: Niklas Quarfot Nielsen Add hooks for Slave exits - Key: MESOS-3015 URL: https://issues.apache.org/jira/browse/MESOS-3015 Project: Mesos Issue Type: Task Reporter: Kapil Arya Assignee: Kapil Arya Labels: mesosphere The hook will be triggered on slave exits. A master hook module can use this to do Slave-specific cleanups. In our particular use case, the hook would trigger cleanup of IPs assigned to the given Slave (see the [design doc | https://docs.google.com/document/d/17mXtAmdAXcNBwp_JfrxmZcQrs7EO6ancSbejrqjLQ0g/edit#]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3092) Configure Jenkins to run Docker tests
[ https://issues.apache.org/jira/browse/MESOS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658410#comment-14658410 ] Niklas Quarfot Nielsen commented on MESOS-3092: --- [~vinodkone] Would you mind shepherd this? :) Configure Jenkins to run Docker tests - Key: MESOS-3092 URL: https://issues.apache.org/jira/browse/MESOS-3092 Project: Mesos Issue Type: Improvement Components: docker Reporter: Timothy Chen Assignee: Timothy Chen Labels: mesosphere Add a jenkin job to run the Docker tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3021) Implement Docker Image Provisioner Reference Store
[ https://issues.apache.org/jira/browse/MESOS-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niklas Quarfot Nielsen updated MESOS-3021: -- Shepherd: Timothy Chen Implement Docker Image Provisioner Reference Store -- Key: MESOS-3021 URL: https://issues.apache.org/jira/browse/MESOS-3021 Project: Mesos Issue Type: Improvement Reporter: Lily Chen Assignee: Lily Chen Labels: mesosphere Create a comprehensive store to look up an image and tag's associated image layer ID. Implement add, remove, save, and update images and their associated tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1010) Python extension build is broken if gflags-dev is installed
[ https://issues.apache.org/jira/browse/MESOS-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658414#comment-14658414 ] Niklas Quarfot Nielsen commented on MESOS-1010: --- [~jvanremoortere] Would you mind shepherd this? :) Python extension build is broken if gflags-dev is installed --- Key: MESOS-1010 URL: https://issues.apache.org/jira/browse/MESOS-1010 Project: Mesos Issue Type: Bug Components: build, python api Environment: Fedora 20, amd64. GCC: 4.8.2. Reporter: Nikita Vetoshkin Assignee: Greg Mann Labels: flaky-test, mesosphere In my environment mesos build from master results in broken python api module {{_mesos.so}}: {noformat} nekto0n@ya-darkstar ~/workspace/mesos/src/python $ PYTHONPATH=build/lib.linux-x86_64-2.7/ python -c import _mesos Traceback (most recent call last): File string, line 1, in module ImportError: /home/nekto0n/workspace/mesos/src/python/build/lib.linux-x86_64-2.7/_mesos.so: undefined symbol: _ZN6google14FlagRegistererC1EPKcS2_S2_S2_PvS3_ {noformat} Unmangled version of symbol looks like this: {noformat} google::FlagRegisterer::FlagRegisterer(char const*, char const*, char const*, char const*, void*, void*) {noformat} During {{./configure}} step {{glog}} finds {{gflags}} development files and starts using them, thus *implicitly* adding dependency on {{libgflags.so}}. This breaks Python extensions module and perhaps can break other mesos subsystems when moved to hosts without {{gflags}} installed. This task is done when the ExamplesTest.PythonFramework test will pass on a system with gflags installed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-830) ExamplesTest.JavaFramework is flaky
[ https://issues.apache.org/jira/browse/MESOS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658415#comment-14658415 ] Niklas Quarfot Nielsen commented on MESOS-830: -- [~jvanremoortere] Would you mind shepherd this? :) ExamplesTest.JavaFramework is flaky --- Key: MESOS-830 URL: https://issues.apache.org/jira/browse/MESOS-830 Project: Mesos Issue Type: Bug Components: test Reporter: Vinod Kone Assignee: Greg Mann Labels: flaky, mesosphere Identify the cause of the following test failure: [ RUN ] ExamplesTest.JavaFramework Using temporary directory '/tmp/ExamplesTest_JavaFramework_wSc7u8' Enabling authentication for the framework I1120 15:13:39.820032 1681264640 master.cpp:285] Master started on 172.25.133.171:52576 I1120 15:13:39.820180 1681264640 master.cpp:299] Master ID: 201311201513-2877626796-52576-3234 I1120 15:13:39.820194 1681264640 master.cpp:302] Master only allowing authenticated frameworks to register! I1120 15:13:39.821197 1679654912 slave.cpp:112] Slave started on 1)@172.25.133.171:52576 I1120 15:13:39.821795 1679654912 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.822855 1682337792 slave.cpp:112] Slave started on 2)@172.25.133.171:52576 I1120 15:13:39.823652 1682337792 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.825330 1679118336 master.cpp:744] The newly elected leader is master@172.25.133.171:52576 I1120 15:13:39.825445 1679118336 master.cpp:748] Elected as the leading master! I1120 15:13:39.825907 1681264640 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/0/meta' I1120 15:13:39.826127 1681264640 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.826331 1681801216 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.826738 1682874368 slave.cpp:2743] Finished recovery I1120 15:13:39.827747 1682337792 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/1/meta' I1120 15:13:39.827945 1680191488 slave.cpp:112] Slave started on 3)@172.25.133.171:52576 I1120 15:13:39.828415 1682337792 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.828608 1680728064 sched.cpp:260] Authenticating with master master@172.25.133.171:52576 I1120 15:13:39.828606 1680191488 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.828680 1682874368 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.828765 1682337792 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.829828 1680728064 sched.cpp:229] Detecting new master I1120 15:13:39.830288 1679654912 authenticatee.hpp:100] Initializing client SASL I1120 15:13:39.831635 1680191488 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/2/meta' I1120 15:13:39.831991 1679118336 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.832042 1682874368 slave.cpp:524] Detecting new master I1120 15:13:39.832314 1682337792 slave.cpp:2743] Finished recovery I1120 15:13:39.832309 1681264640 master.cpp:1266] Attempting to register slave on vkone.local at slave(1)@172.25.133.171:52576 I1120 15:13:39.832929 1680728064 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.833371 1681801216 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.833273 1681264640 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-0 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.833595 1680728064 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.833859 1681801216 slave.cpp:524] Detecting new master I1120 15:13:39.833861 1682874368 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.834092 1680191488 slave.cpp:542] Registered with master master@172.25.133.171:52576; given slave ID 201311201513-2877626796-52576-3234-0 I1120 15:13:39.834486 1681264640 master.cpp:1266] Attempting to register slave on vkone.local at slave(2)@172.25.133.171:52576 I1120 15:13:39.834549 1681264640 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-1 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.834750 1680191488 slave.cpp:555] Checkpointing SlaveInfo to '/tmp/ExamplesTest_JavaFramework_wSc7u8/0/meta/slaves/201311201513-2877626796-52576-3234-0/slave.info' I1120 15:13:39.834875 1682874368 hierarchical_allocator_process.hpp:445] Added slave
[jira] [Commented] (MESOS-830) ExamplesTest.JavaFramework is flaky
[ https://issues.apache.org/jira/browse/MESOS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658434#comment-14658434 ] Till Toenshoff commented on MESOS-830: -- [~greggomann] I added some debug code into that macro which told me that pthread_rwlock_wrlock returned 22 (Invalid Argument) and from that I assumed that the mutex in question had gotten killed already. ExamplesTest.JavaFramework is flaky --- Key: MESOS-830 URL: https://issues.apache.org/jira/browse/MESOS-830 Project: Mesos Issue Type: Bug Components: test Reporter: Vinod Kone Assignee: Greg Mann Labels: flaky, mesosphere Identify the cause of the following test failure: [ RUN ] ExamplesTest.JavaFramework Using temporary directory '/tmp/ExamplesTest_JavaFramework_wSc7u8' Enabling authentication for the framework I1120 15:13:39.820032 1681264640 master.cpp:285] Master started on 172.25.133.171:52576 I1120 15:13:39.820180 1681264640 master.cpp:299] Master ID: 201311201513-2877626796-52576-3234 I1120 15:13:39.820194 1681264640 master.cpp:302] Master only allowing authenticated frameworks to register! I1120 15:13:39.821197 1679654912 slave.cpp:112] Slave started on 1)@172.25.133.171:52576 I1120 15:13:39.821795 1679654912 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.822855 1682337792 slave.cpp:112] Slave started on 2)@172.25.133.171:52576 I1120 15:13:39.823652 1682337792 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.825330 1679118336 master.cpp:744] The newly elected leader is master@172.25.133.171:52576 I1120 15:13:39.825445 1679118336 master.cpp:748] Elected as the leading master! I1120 15:13:39.825907 1681264640 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/0/meta' I1120 15:13:39.826127 1681264640 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.826331 1681801216 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.826738 1682874368 slave.cpp:2743] Finished recovery I1120 15:13:39.827747 1682337792 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/1/meta' I1120 15:13:39.827945 1680191488 slave.cpp:112] Slave started on 3)@172.25.133.171:52576 I1120 15:13:39.828415 1682337792 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.828608 1680728064 sched.cpp:260] Authenticating with master master@172.25.133.171:52576 I1120 15:13:39.828606 1680191488 slave.cpp:212] Slave resources: cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.828680 1682874368 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.828765 1682337792 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.829828 1680728064 sched.cpp:229] Detecting new master I1120 15:13:39.830288 1679654912 authenticatee.hpp:100] Initializing client SASL I1120 15:13:39.831635 1680191488 state.cpp:33] Recovering state from '/tmp/ExamplesTest_JavaFramework_wSc7u8/2/meta' I1120 15:13:39.831991 1679118336 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.832042 1682874368 slave.cpp:524] Detecting new master I1120 15:13:39.832314 1682337792 slave.cpp:2743] Finished recovery I1120 15:13:39.832309 1681264640 master.cpp:1266] Attempting to register slave on vkone.local at slave(1)@172.25.133.171:52576 I1120 15:13:39.832929 1680728064 status_update_manager.cpp:180] Recovering status update manager I1120 15:13:39.833371 1681801216 slave.cpp:497] New master detected at master@172.25.133.171:52576 I1120 15:13:39.833273 1681264640 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-0 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.833595 1680728064 process_isolator.cpp:317] Recovering isolator I1120 15:13:39.833859 1681801216 slave.cpp:524] Detecting new master I1120 15:13:39.833861 1682874368 status_update_manager.cpp:158] New master detected at master@172.25.133.171:52576 I1120 15:13:39.834092 1680191488 slave.cpp:542] Registered with master master@172.25.133.171:52576; given slave ID 201311201513-2877626796-52576-3234-0 I1120 15:13:39.834486 1681264640 master.cpp:1266] Attempting to register slave on vkone.local at slave(2)@172.25.133.171:52576 I1120 15:13:39.834549 1681264640 master.cpp:2513] Adding slave 201311201513-2877626796-52576-3234-1 at vkone.local with cpus(*):4; mem(*):7168; disk(*):481998; ports(*):[31000-32000] I1120 15:13:39.834750 1680191488 slave.cpp:555] Checkpointing SlaveInfo to
[jira] [Created] (MESOS-3209) parameterize allocator benchmark by framework count
James Peach created MESOS-3209: -- Summary: parameterize allocator benchmark by framework count Key: MESOS-3209 URL: https://issues.apache.org/jira/browse/MESOS-3209 Project: Mesos Issue Type: Bug Components: test Reporter: James Peach Assignee: James Peach Priority: Minor In order to explore allocation performance with multiple frameworks, extend the {{HierarchicalAllocator_BENCHMARK_Test}} benchmark so it is parameterized by the framework count as well as the slave count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-1838) Add documentation for Authentication
[ https://issues.apache.org/jira/browse/MESOS-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Anderegg reassigned MESOS-1838: --- Assignee: Tim Anderegg Add documentation for Authentication Key: MESOS-1838 URL: https://issues.apache.org/jira/browse/MESOS-1838 Project: Mesos Issue Type: Documentation Components: documentation Reporter: Vinod Kone Assignee: Tim Anderegg We need some documentation about how to enable framework and slave authentication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3210) DiscoveryInfo is broken in state.json
Dr. Stefan Schimanski created MESOS-3210: Summary: DiscoveryInfo is broken in state.json Key: MESOS-3210 URL: https://issues.apache.org/jira/browse/MESOS-3210 Project: Mesos Issue Type: Bug Components: master Affects Versions: 0.23.0 Reporter: Dr. Stefan Schimanski The DiscoveryInfo field of a task in state.json is broken: ports and labels fields are nested once too much. Got: {code} discovery : { name : docker, labels : { labels : [ { key : canary, value : Mallorca } ] }, visibility : CLUSTER, ports : { ports : [ { name : health, number : 1080, protocol : http } ] } }, {code} Expected: {code} discovery : { name : docker, labels : [ { key : canary, value : Mallorca } ] visibility : CLUSTER, ports : [ { name : health, number : 1080, protocol : http } ] }, {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3210) DiscoveryInfo is broken in state.json
[ https://issues.apache.org/jira/browse/MESOS-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658498#comment-14658498 ] haosdent commented on MESOS-3210: - Because we use a general function convert discovery protobuf to json, and {quote} ports : { ports : [ { name : health, number : 1080, protocol : http } ] } {quote} is match the protobuf message struct. Suppose we add notes to ports in protobuf, the struct would change to {quote} ports : { notes: This is a note. ports : [ { name : health, number : 1080, protocol : http } ] } {quote} it would not compatible with your expected struct. So I think keep current struct would be better, unless we change the protobuf of DiscoveryInfo. DiscoveryInfo is broken in state.json - Key: MESOS-3210 URL: https://issues.apache.org/jira/browse/MESOS-3210 Project: Mesos Issue Type: Bug Components: master Affects Versions: 0.23.0 Reporter: Dr. Stefan Schimanski The DiscoveryInfo field of a task in state.json is broken: ports and labels fields are nested once too much. Got: {code} discovery : { name : docker, labels : { labels : [ { key : canary, value : Mallorca } ] }, visibility : CLUSTER, ports : { ports : [ { name : health, number : 1080, protocol : http } ] } }, {code} Expected: {code} discovery : { name : docker, labels : [ { key : canary, value : Mallorca } ] visibility : CLUSTER, ports : [ { name : health, number : 1080, protocol : http } ] }, {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-3210) DiscoveryInfo is broken in state.json
[ https://issues.apache.org/jira/browse/MESOS-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658498#comment-14658498 ] haosdent edited comment on MESOS-3210 at 8/5/15 4:51 PM: - Because we use a general function convert discovery protobuf to json, and {code} ports : { ports : [ { name : health, number : 1080, protocol : http } ] } {code} is match the protobuf message struct. Suppose we add notes to ports in protobuf, the struct would change to {code} ports : { notes: This is a note. ports : [ { name : health, number : 1080, protocol : http } ] } {code} it would not compatible with your expected struct. So I think keep current struct would be better, unless we change the protobuf of DiscoveryInfo. was (Author: haosd...@gmail.com): Because we use a general function convert discovery protobuf to json, and {quote} ports : { ports : [ { name : health, number : 1080, protocol : http } ] } {quote} is match the protobuf message struct. Suppose we add notes to ports in protobuf, the struct would change to {quote} ports : { notes: This is a note. ports : [ { name : health, number : 1080, protocol : http } ] } {quote} it would not compatible with your expected struct. So I think keep current struct would be better, unless we change the protobuf of DiscoveryInfo. DiscoveryInfo is broken in state.json - Key: MESOS-3210 URL: https://issues.apache.org/jira/browse/MESOS-3210 Project: Mesos Issue Type: Bug Components: master Affects Versions: 0.23.0 Reporter: Dr. Stefan Schimanski The DiscoveryInfo field of a task in state.json is broken: ports and labels fields are nested once too much. Got: {code} discovery : { name : docker, labels : { labels : [ { key : canary, value : Mallorca } ] }, visibility : CLUSTER, ports : { ports : [ { name : health, number : 1080, protocol : http } ] } }, {code} Expected: {code} discovery : { name : docker, labels : [ { key : canary, value : Mallorca } ] visibility : CLUSTER, ports : [ { name : health, number : 1080, protocol : http } ] }, {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3209) parameterize allocator benchmark by framework count
[ https://issues.apache.org/jira/browse/MESOS-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658562#comment-14658562 ] James Peach commented on MESOS-3209: https://reviews.apache.org/r/37133/ parameterize allocator benchmark by framework count --- Key: MESOS-3209 URL: https://issues.apache.org/jira/browse/MESOS-3209 Project: Mesos Issue Type: Bug Components: test Reporter: James Peach Assignee: James Peach Priority: Minor In order to explore allocation performance with multiple frameworks, extend the {{HierarchicalAllocator_BENCHMARK_Test}} benchmark so it is parameterized by the framework count as well as the slave count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3209) parameterize allocator benchmark by framework count
[ https://issues.apache.org/jira/browse/MESOS-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-3209: --- Shepherd: Benjamin Mahler parameterize allocator benchmark by framework count --- Key: MESOS-3209 URL: https://issues.apache.org/jira/browse/MESOS-3209 Project: Mesos Issue Type: Bug Components: test Reporter: James Peach Assignee: James Peach Priority: Minor In order to explore allocation performance with multiple frameworks, extend the {{HierarchicalAllocator_BENCHMARK_Test}} benchmark so it is parameterized by the framework count as well as the slave count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3092) Configure Jenkins to run Docker tests
[ https://issues.apache.org/jira/browse/MESOS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658586#comment-14658586 ] Vinod Kone commented on MESOS-3092: --- Yup. Happy to shepherd. I was going to delete https://builds.apache.org/user/vinodkone/my-views/view/Mesos/job/Mesos-Docker-Tests/ because I thought it was something I created for testing a while ago and forgot. Now, it looks like [~tnachen] created this? Tim, lets not point this job to builds@ until it has been baked and green for a while. Configure Jenkins to run Docker tests - Key: MESOS-3092 URL: https://issues.apache.org/jira/browse/MESOS-3092 Project: Mesos Issue Type: Improvement Components: docker Reporter: Timothy Chen Assignee: Timothy Chen Labels: mesosphere Add a jenkin job to run the Docker tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3202) Avoid frameworks starving in DRF allocator.
[ https://issues.apache.org/jira/browse/MESOS-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658595#comment-14658595 ] Vinod Kone commented on MESOS-3202: --- Do you want to close this as a duplicate of MESOS-1791 then? Avoid frameworks starving in DRF allocator. --- Key: MESOS-3202 URL: https://issues.apache.org/jira/browse/MESOS-3202 Project: Mesos Issue Type: Bug Reporter: Joerg Schad We currently run into issues with the DRF scheduler that frameworks do not receive offers (see https://github.com/mesosphere/marathon/issues/1931 for details). Imagine that we have 10 frameworks and unallocated resources from a single slave. Allocation interval is 1 sec, and refuse_seconds (i.e. the time for which a declined resource is filtered) is 3 sec across all frameworks. Allocator offers resources to framework 1 (according to DRF) which declines the offer immediately. In the next allocation interval framework 1 is skipped due to the declined offer before. Hence the next framework 2 is offered the resources, which it also declines. The same procedure in the next allocation interval (with framework 3). In the next allocation interval the refuse_seconds for framework 1 are over, and as it still has the lowest DRF share it gets the resource offered again, which it again declines. And the cycle begins again Framework 4 (which is actually waiting for this resource) is never offered this resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2562) 0.24.0 release
[ https://issues.apache.org/jira/browse/MESOS-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2562: -- Description: The main feature of this release is going to be v1 (beta) release of the HTTP scheduler API (part of MESOS-2288 epic). 0.24.0 release -- Key: MESOS-2562 URL: https://issues.apache.org/jira/browse/MESOS-2562 Project: Mesos Issue Type: Task Reporter: Kapil Arya Assignee: Vinod Kone The main feature of this release is going to be v1 (beta) release of the HTTP scheduler API (part of MESOS-2288 epic). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3142) As a Developer I want a better way to run shell commands
[ https://issues.apache.org/jira/browse/MESOS-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658663#comment-14658663 ] Marco Massenzio commented on MESOS-3142: Waiting on this to be available, as it greatly simplifies the code for this functionality. As a Developer I want a better way to run shell commands Key: MESOS-3142 URL: https://issues.apache.org/jira/browse/MESOS-3142 Project: Mesos Issue Type: Story Components: stout Affects Versions: 0.23.0 Reporter: Benjamin Hindman Assignee: Marco Massenzio Labels: mesosphere, tech-debt When reviewing the code in [r/36425|https://reviews.apache.org/r/36425/] [~benjaminhindman] noticed that there is a better abstraction that is possible to introduce for {{os::shell()}} that will simplify the caller's life. Instead of having to handle all possible outcomes, we propose to refactor {{os::shell()}} as follows: {code} /** * Returns the output from running the specified command with the shell. */ Trystd::string shell(const string command) { // Actually handle the WIFEXITED, WIFSIGNALED here! } {code} where the returned string is {{stdout}} and, should the program be signaled, or exit with a non-zero exit code, we will simply return a {{Failure}} with an error message that will encapsulate both the returned/signaled state, and, possibly {{stderr}}. And some test driven development: {code} EXPECT_ERROR(os::shell(false)); EXPECT_SOME(os::shell(true)); EXPECT_SOME_EQ(hello world, os::shell(echo hello world)); {code} Alternatively, the caller can ask to have {{stderr}} conflated with {{stdout}}: {code} Trystring outAndErr = os::shell(myCmd --foo 21); {code} However, {{stderr}} will be ignored by default: {code} // We don't read standard error by default. EXPECT_SOME_EQ(, os::shell(echo hello world 12)); // We don't even read stderr if something fails (to return in Try::error). Trystring output = os::shell(echo hello world 12 false); EXPECT_ERROR(output); EXPECT_FALSE(strings::contains(output.error(), hello world)); {code} An analysis of existing usage shows that in almost all cases, the caller only cares {{if not error}}; in fact, the actual exit code is read only once, and even then, in a test case. We believe this will simplify the API to the caller, and will significantly reduce the length and complexity at the calling sites (6 LOC against the current 20+). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1013) ExamplesTest.JavaLog is flaky
[ https://issues.apache.org/jira/browse/MESOS-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-1013: --- Assignee: Greg Mann ExamplesTest.JavaLog is flaky - Key: MESOS-1013 URL: https://issues.apache.org/jira/browse/MESOS-1013 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.19.0 Reporter: Vinod Kone Assignee: Greg Mann Labels: flaky [ RUN ] ExamplesTest.JavaLog Using temporary directory '/tmp/ExamplesTest_JavaLog_WBWEb9' Feb 18, 2014 12:10:57 PM TestLog main INFO: Starting a local ZooKeeper server log4j:WARN No appenders could be found for logger (org.apache.zookeeper.server.ZooKeeperServer). log4j:WARN Please initialize the log4j system properly. Feb 18, 2014 12:10:57 PM TestLog main INFO: Initializing log /tmp/mesos-epljTr/log1 with /var/jenkins/workspace/mesos-fedora-19-clang/src/mesos-log WARNING: Logging before InitGoogleLogging() is written to STDERR I0218 12:10:58.107450 17404 process.cpp:1591] libprocess is initialized on 192.168.122.134:36627 for 8 cpus I0218 12:10:58.111640 17404 leveldb.cpp:166] Opened db in 3.145702ms I0218 12:10:58.113097 17404 leveldb.cpp:173] Compacted db in 770230ns I0218 12:10:58.113137 17404 leveldb.cpp:188] Created db iterator in 20506ns I0218 12:10:58.113152 17404 leveldb.cpp:194] Seeked to beginning of db in 12095ns I0218 12:10:58.113198 17404 leveldb.cpp:255] Iterated through 1 keys in the db in 43127ns I0218 12:10:58.113248 17404 replica.cpp:732] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@716: Client environment:host.name=fedora-19 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@724: Client environment:os.arch=3.12.9-201.fc19.x86_64 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Wed Jan 29 15:44:35 UTC 2014 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@733: Client environment:user.name=jenkins 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@753: Client environment:user.dir=/tmp/ExamplesTest_JavaLog_WBWEb9 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=0:0:0:0:0:0:0:0:40410 sessionTimeout=3000 watcher=0x7f792228c440 sessionId=0 sessionPasswd=null context=0x13089c0 flags=0 2014-02-18 12:10:58,117:17397(0x7f7921407700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@716: Client environment:host.name=fedora-19 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@724: Client environment:os.arch=3.12.9-201.fc19.x86_64 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Wed Jan 29 15:44:35 UTC 2014 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@733: Client environment:user.name=jenkins 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@753: Client environment:user.dir=/tmp/ExamplesTest_JavaLog_WBWEb9 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=0:0:0:0:0:0:0:0:40410 sessionTimeout=3000 watcher=0x7f792228c440 sessionId=0 sessionPasswd=null context=0x7f7904000e40 flags=0 I0218 12:10:58.119313 17452 log.cpp:222] Attempting to join replica to ZooKeeper group I0218 12:10:58.119781 17452 recover.cpp:103] Start recovering a replica I0218 12:10:58.119881 17452 recover.cpp:139] Replica is in VOTING status I0218 12:10:58.119923 17452 recover.cpp:117] Recover process terminated Feb 18, 2014 12:10:58 PM TestLog main INFO: Initializing log /tmp/mesos-epljTr/log2 with /var/jenkins/workspace/mesos-fedora-19-clang/src/mesos-log 2014-02-18 12:10:58,126:17397(0x7f78fcff9700):ZOO_INFO@check_events@1703: initiated connection to server [:::40410] 2014-02-18 12:10:58,131:17397(0x7f78fdffb700):ZOO_INFO@check_events@1703: initiated connection to server [:::40410] 2014-02-18 12:10:58,165:17397(0x7f78fcff9700):ZOO_INFO@check_events@1750: session establishment complete on server
[jira] [Commented] (MESOS-1013) ExamplesTest.JavaLog is flaky
[ https://issues.apache.org/jira/browse/MESOS-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658683#comment-14658683 ] Marco Massenzio commented on MESOS-1013: [~greggomann] can you please review this one and see whether fits with the other ones you're looking at? It may well be this no longer applies (looks from quite some time ago and I don't remember seeing the {{JavaLog}} example framework). Maybe [~vinodkone] has more info. Thanks! ExamplesTest.JavaLog is flaky - Key: MESOS-1013 URL: https://issues.apache.org/jira/browse/MESOS-1013 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.19.0 Reporter: Vinod Kone Assignee: Greg Mann Labels: flaky [ RUN ] ExamplesTest.JavaLog Using temporary directory '/tmp/ExamplesTest_JavaLog_WBWEb9' Feb 18, 2014 12:10:57 PM TestLog main INFO: Starting a local ZooKeeper server log4j:WARN No appenders could be found for logger (org.apache.zookeeper.server.ZooKeeperServer). log4j:WARN Please initialize the log4j system properly. Feb 18, 2014 12:10:57 PM TestLog main INFO: Initializing log /tmp/mesos-epljTr/log1 with /var/jenkins/workspace/mesos-fedora-19-clang/src/mesos-log WARNING: Logging before InitGoogleLogging() is written to STDERR I0218 12:10:58.107450 17404 process.cpp:1591] libprocess is initialized on 192.168.122.134:36627 for 8 cpus I0218 12:10:58.111640 17404 leveldb.cpp:166] Opened db in 3.145702ms I0218 12:10:58.113097 17404 leveldb.cpp:173] Compacted db in 770230ns I0218 12:10:58.113137 17404 leveldb.cpp:188] Created db iterator in 20506ns I0218 12:10:58.113152 17404 leveldb.cpp:194] Seeked to beginning of db in 12095ns I0218 12:10:58.113198 17404 leveldb.cpp:255] Iterated through 1 keys in the db in 43127ns I0218 12:10:58.113248 17404 replica.cpp:732] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@716: Client environment:host.name=fedora-19 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@724: Client environment:os.arch=3.12.9-201.fc19.x86_64 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Wed Jan 29 15:44:35 UTC 2014 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@733: Client environment:user.name=jenkins 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@log_env@753: Client environment:user.dir=/tmp/ExamplesTest_JavaLog_WBWEb9 2014-02-18 12:10:58,115:17397(0x7f79152d9700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=0:0:0:0:0:0:0:0:40410 sessionTimeout=3000 watcher=0x7f792228c440 sessionId=0 sessionPasswd=null context=0x13089c0 flags=0 2014-02-18 12:10:58,117:17397(0x7f7921407700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@716: Client environment:host.name=fedora-19 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@724: Client environment:os.arch=3.12.9-201.fc19.x86_64 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Wed Jan 29 15:44:35 UTC 2014 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@733: Client environment:user.name=jenkins 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@741: Client environment:user.home=/home/jenkins 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@log_env@753: Client environment:user.dir=/tmp/ExamplesTest_JavaLog_WBWEb9 2014-02-18 12:10:58,118:17397(0x7f7921407700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=0:0:0:0:0:0:0:0:40410 sessionTimeout=3000 watcher=0x7f792228c440 sessionId=0 sessionPasswd=null context=0x7f7904000e40 flags=0 I0218 12:10:58.119313 17452 log.cpp:222] Attempting to join replica to ZooKeeper group I0218 12:10:58.119781 17452 recover.cpp:103] Start recovering a replica I0218 12:10:58.119881 17452 recover.cpp:139] Replica is in VOTING status I0218 12:10:58.119923 17452 recover.cpp:117] Recover process terminated Feb 18, 2014 12:10:58 PM TestLog main INFO: Initializing log /tmp/mesos-epljTr/log2 with /var/jenkins/workspace/mesos-fedora-19-clang/src/mesos-log 2014-02-18
[jira] [Commented] (MESOS-1201) Store IP addresses in host order
[ https://issues.apache.org/jira/browse/MESOS-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658688#comment-14658688 ] Marco Massenzio commented on MESOS-1201: [~jieyu]: We have now updated {{MasterInfo}} to use an {{Address}} field instead of the raw {{ip}} int filed (it's still there, for compatibility purposes, as it's declared as {{required}}). Internally, as mentioned, {{net::IP}} is consistent, so the need for this is greatly reduced. What do you think: is this still necessary? Or could we close it with a won't fix? Store IP addresses in host order Key: MESOS-1201 URL: https://issues.apache.org/jira/browse/MESOS-1201 Project: Mesos Issue Type: Bug Components: technical debt Reporter: Jie Yu Currently, in our code base, we store ip addresses in network order. For instance, in UPID. Ironically, we store ports in host order. This can cause some subtle bugs which will be very hard to debug. For example, we store ip in MasterInfo. Say the IP address is: 01.02.03.04. Since we don't convert it into host order in our code, on x86 (little endian), it's integer value will be 0x04030201. Now, we store it as an uint32 field in MasterInfo protobuf. Protobuf will convert all integers into little endian format, since x86 is little endian machine, no conversion will take place. As a result, the value stored in probobuf will be 0x04030201. Now, if a big endian machine reads this protobuf, it will do the conversion. If it later interprets the ip from this integer, it will interpret it to be 04.03.02.01. So I plan to store all IP addresses in our code base to be in host order (which is the common practice). We may have some compatibility issues as we store MasterInfo in ZooKeeper for master detection and redirection. For example, what if the new code reads an old MasterInfo? What would happen? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-426) Python-based frameworks use old API and are broken
[ https://issues.apache.org/jira/browse/MESOS-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-426: -- Assignee: David Greenberg Python-based frameworks use old API and are broken -- Key: MESOS-426 URL: https://issues.apache.org/jira/browse/MESOS-426 Project: Mesos Issue Type: Bug Components: framework, python api Affects Versions: 0.9.0 Reporter: David Greenberg Assignee: David Greenberg Attachments: mesos_changes.p1 If you try to use mesos-submit or torque with mesos 0.9.0+, you get exceptions due to API mismatches in these framework's expectations of the python API. Steps to reproduce: try running mesos-submit mymaster echo hi, note the stacktraces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-426) Python-based frameworks use old API and are broken
[ https://issues.apache.org/jira/browse/MESOS-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658700#comment-14658700 ] Marco Massenzio edited comment on MESOS-426 at 8/5/15 7:05 PM: --- Looking at the review it would appear that it was committed by [~benjaminhindman] (almost two years ago?) so I'm closing this. If this is mistaken, please feel free to re-open and update. was (Author: marco-mesos): Looking at the review it would appear that it was committed by [~benjaminhindman] (a year ago?) so I'm closing this. If this is mistaken, please feel free to re-open and update. Python-based frameworks use old API and are broken -- Key: MESOS-426 URL: https://issues.apache.org/jira/browse/MESOS-426 Project: Mesos Issue Type: Bug Components: framework, python api Affects Versions: 0.9.0 Reporter: David Greenberg Assignee: David Greenberg Attachments: mesos_changes.p1 If you try to use mesos-submit or torque with mesos 0.9.0+, you get exceptions due to API mismatches in these framework's expectations of the python API. Steps to reproduce: try running mesos-submit mymaster echo hi, note the stacktraces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3069) Registry operations do not exist for manipulating maintanence schedules
[ https://issues.apache.org/jira/browse/MESOS-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658705#comment-14658705 ] Joseph Wu commented on MESOS-3069: -- Review: https://reviews.apache.org/r/37052/ Registry operations do not exist for manipulating maintanence schedules --- Key: MESOS-3069 URL: https://issues.apache.org/jira/browse/MESOS-3069 Project: Mesos Issue Type: Task Components: master, replicated log Reporter: Joseph Wu Assignee: Joseph Wu Labels: mesosphere In order to modify the maintenance schedule in the replicated registry, we will need Operations (src/master/registrar.hpp). The operations will likely correspond to the HTTP API: * UpdateMaintenance: Given a blob representing a maintenance schedule, write the blob to the registry. Possibly perform some verification on the blob. * UpdateSlaveMaintenanceStatus: Given a set of machines and a status (action), change the machiness' status in the maintenance schedule. Possible test(s): * UpdateMaintenance: ** Add a schedule with 1 slave, 2+ slaves, and 0 slaves. ** Add multiple schedules (different intervals). ** Delete schedules (empty schedule). * UpdateSlaveMaintenanceStatus: ** Add schedule. ** Change a slave's status. ** Change a slave's status, given a slave that is not in the schedule (slave should be added to the schedule). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3212) As a Java developer I want a simple way to obtain information about Master from ZooKeeper
[ https://issues.apache.org/jira/browse/MESOS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-3212: --- Sprint: (was: Mesosphere Sprint 16) As a Java developer I want a simple way to obtain information about Master from ZooKeeper - Key: MESOS-3212 URL: https://issues.apache.org/jira/browse/MESOS-3212 Project: Mesos Issue Type: Story Reporter: Marco Massenzio Assignee: Marco Massenzio Labels: mesosphere With the new JSON {{MasterInfo}} published to ZK, we want to provide a simple library class for Python developers to retrieve info about the masters and the leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3212) As a Java developer I want a simple way to obtain information about Master from ZooKeeper
Marco Massenzio created MESOS-3212: -- Summary: As a Java developer I want a simple way to obtain information about Master from ZooKeeper Key: MESOS-3212 URL: https://issues.apache.org/jira/browse/MESOS-3212 Project: Mesos Issue Type: Story Reporter: Marco Massenzio Assignee: Marco Massenzio With the new JSON {{MasterInfo}} published to ZK, we want to provide a simple library class for Python developers to retrieve info about the masters and the leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3211) As a Python developer I want a simple way to obtain information about Master from ZooKeeper
Marco Massenzio created MESOS-3211: -- Summary: As a Python developer I want a simple way to obtain information about Master from ZooKeeper Key: MESOS-3211 URL: https://issues.apache.org/jira/browse/MESOS-3211 Project: Mesos Issue Type: Story Reporter: Marco Massenzio Assignee: Marco Massenzio With the new JSON {{MasterInfo}} published to ZK, we want to provide a simple library class for Python developers to retrieve info about the masters and the leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3212) As a Java developer I want a simple way to obtain information about Master from ZooKeeper
[ https://issues.apache.org/jira/browse/MESOS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-3212: --- Description: With the new JSON {{MasterInfo}} published to ZK, we want to provide a simple library class for Java Framework developers to retrieve info about the masters and the leader. (was: With the new JSON {{MasterInfo}} published to ZK, we want to provide a simple library class for Python developers to retrieve info about the masters and the leader.) As a Java developer I want a simple way to obtain information about Master from ZooKeeper - Key: MESOS-3212 URL: https://issues.apache.org/jira/browse/MESOS-3212 Project: Mesos Issue Type: Story Reporter: Marco Massenzio Assignee: Marco Massenzio Labels: mesosphere With the new JSON {{MasterInfo}} published to ZK, we want to provide a simple library class for Java Framework developers to retrieve info about the masters and the leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3211) As a Python developer I want a simple way to obtain information about Master from ZooKeeper
[ https://issues.apache.org/jira/browse/MESOS-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658714#comment-14658714 ] Vinod Kone commented on MESOS-3211: --- Duplicate of MESOS-2912? As a Python developer I want a simple way to obtain information about Master from ZooKeeper --- Key: MESOS-3211 URL: https://issues.apache.org/jira/browse/MESOS-3211 Project: Mesos Issue Type: Story Reporter: Marco Massenzio Assignee: Marco Massenzio Labels: mesosphere With the new JSON {{MasterInfo}} published to ZK, we want to provide a simple library class for Python developers to retrieve info about the masters and the leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3212) As a Java developer I want a simple way to obtain information about Master from ZooKeeper
[ https://issues.apache.org/jira/browse/MESOS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658716#comment-14658716 ] Vinod Kone commented on MESOS-3212: --- Dup of MESOS-2298 ? As a Java developer I want a simple way to obtain information about Master from ZooKeeper - Key: MESOS-3212 URL: https://issues.apache.org/jira/browse/MESOS-3212 Project: Mesos Issue Type: Story Reporter: Marco Massenzio Assignee: Marco Massenzio Labels: mesosphere With the new JSON {{MasterInfo}} published to ZK, we want to provide a simple library class for Java Framework developers to retrieve info about the masters and the leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2834) Support different perf output formats
[ https://issues.apache.org/jira/browse/MESOS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658723#comment-14658723 ] Benjamin Mahler commented on MESOS-2834: Cleanup of existing subprocess usage: {noformat} commit e52f43fe7d3d0606b111b8a9c17212caecbde05e Author: Paul Brett pau...@twopensource.com Date: Wed Aug 5 11:47:08 2015 -0700 Cleanups to Subprocess usage in Linux perf sampling. Review: https://reviews.apache.org/r/37045 {noformat} Support different perf output formats - Key: MESOS-2834 URL: https://issues.apache.org/jira/browse/MESOS-2834 Project: Mesos Issue Type: Improvement Components: isolation Reporter: Ian Downes Assignee: Paul Brett Labels: twitter The output format of perf changes in 3.14 (inserting an additional field) and in again in 4.1 (appending additional) fields. See kernel commits: 410136f5dd96b6013fe6d1011b523b1c247e1ccb d73515c03c6a2706e088094ff6095a3abefd398b Update the perf::parse() function to understand all these formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3201) Libev handle_async can deadlock with run_in_event_loop
[ https://issues.apache.org/jira/browse/MESOS-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658802#comment-14658802 ] Joris Van Remoortere commented on MESOS-3201: - Indeed it is. If you run {{valgrind --tool=helgrind --num-callers=60 ./tests --gtest_filter=IOTest.Read --gtest_repeat=10}} you will likely find this (as well as other locking order violations) in the output: {code} ==2083== Thread #10: lock order 0xC911BD8 before 0xC9110C0 violated ==2083== ==2083== Observed (incorrect) order is: acquisition of lock at 0xC9110C0 ==2083==at 0x4C33596: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==2083==by 0x72F4C4: __gthread_mutex_lock(pthread_mutex_t*) (gthr-default.h:748) ==2083==by 0x753F74: std::mutex::lock() (mutex:135) ==2083==by 0x753F58: Synchronizedstd::mutex synchronizestd::mutex(std::mutex*)::{lambda(std::mutex*)#1}::operator()(std::mutex*) const (in /mesos/build/3rdparty/libprocess/tests) ==2083==by 0x753F37: Synchronizedstd::mutex synchronizestd::mutex(std::mutex*)::{lambda(std::mutex*)#1}::__invoke(std::mutex*) (synchronized.hpp:58) ==2083==by 0x753DA8: Synchronizedstd::mutex::Synchronized(std::mutex*, void (*)(std::mutex*), void (*)(std::mutex*)) (synchronized.hpp:35) ==2083==by 0x753C7B: Synchronizedstd::mutex synchronizestd::mutex(std::mutex*) (synchronized.hpp:56) ==2083==by 0x7E3C31: process::handle_async(ev_loop*, ev_async*, int) (libev.cpp:48) ==2083==by 0x827EF4: ev_invoke_pending (ev.c:2994) ==2083==by 0x828A72: ev_run (ev.c:3394) ==2083==by 0x7E41BA: ev_loop(ev_loop*, int) (ev.h:826) ==2083==by 0x7E4133: process::EventLoop::run() (libev.cpp:135) ==2083==by 0x7B31AE: void std::_Bind_simplevoid (*())()::_M_invoke(std::_Index_tuple) (in /mesos/build/3rdparty/libprocess/tests) ==2083==by 0x7B3184: std::_Bind_simplevoid (*())()::operator()() (in /mesos/build/3rdparty/libprocess/tests) ==2083==by 0x7B315B: std::thread::_Implstd::_Bind_simplevoid (*())() ::_M_run() (in /mesos/build/3rdparty/libprocess/tests) ==2083==by 0x6371E2F: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20) ==2083==by 0x4C31FD6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==2083==by 0x4E456A9: start_thread (pthread_create.c:333) ==2083==by 0x68E2EEC: clone (clone.S:109) ==2083== ==2083== followed by a later acquisition of lock at 0xC911BD8 ==2083==at 0x4C33596: pthread_mutex_lock (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so) ==2083==by 0x52DA84: __gthread_mutex_lock(pthread_mutex_t*) (gthr-default.h:748) ==2083==by 0x52DA54: __gthread_recursive_mutex_lock(pthread_mutex_t*) (gthr-default.h:810) ==2083==by 0x5AFE04: std::recursive_mutex::lock() (mutex:176) ==2083==by 0x5AFDE8: Synchronizedstd::recursive_mutex synchronizestd::recursive_mutex(std::recursive_mutex*)::{lambda(std::recursive_mutex*)#1}::operator()(std::recursive_mutex*) const (in /mesos/build/3rdparty/libprocess/tests) ==2083==by 0x5AFDC7: Synchronizedstd::recursive_mutex synchronizestd::recursive_mutex(std::recursive_mutex*)::{lambda(std::recursive_mutex*)#1}::__invoke(std::recursive_mutex*) (synchronized.hpp:58) ==2083==by 0x5AFC2E: Synchronizedstd::recursive_mutex::Synchronized(std::recursive_mutex*, void (*)(std::recursive_mutex*), void (*)(std::recursive_mutex*)) (synchronized.hpp:35) ==2083==by 0x5AA70B: Synchronizedstd::recursive_mutex synchronizestd::recursive_mutex(std::recursive_mutex*) (synchronized.hpp:56) ==2083==by 0x75A52C: process::ProcessManager::use(process::UPID const) (process.cpp:2136) ==2083==by 0x7694D8: process::ProcessManager::terminate(process::UPID const, bool, process::ProcessBase*) (process.cpp:2613) ==2083==by 0x76BF0A: process::terminate(process::UPID const, bool) (process.cpp:3147) ==2083==by 0x72C98C: process::Latch::trigger() (latch.cpp:53) ==2083==by 0x41F394: process::internal::awaited(process::Ownedprocess::Latch) (future.hpp:1001) ==2083==by 0x48F903: void std::_Bindvoid (*(process::Ownedprocess::Latch))(process::Ownedprocess::Latch)::__callvoid, process::Futureunsigned long const, 0ul(std::tupleprocess::Futureunsigned long const, std::_Index_tuple0ul) (functional:1263) ==2083==by 0x48F88C: void std::_Bindvoid (*(process::Ownedprocess::Latch))(process::Ownedprocess::Latch)::operator()process::Futureunsigned long const, void(process::Futureunsigned long const) (in /mesos/build/3rdparty/libprocess/tests) ==2083==by 0x48F841: std::_Function_handlervoid (process::Futureunsigned long const), std::_Bindvoid (*(process::Ownedprocess::Latch))(process::Ownedprocess::Latch) ::_M_invoke(std::_Any_data const, process::Futureunsigned long const) (functional:2039) ==2083==by 0x7249D7: std::functionvoid (process::Futureunsigned long const)::operator()(process::Futureunsigned long const)
[jira] [Updated] (MESOS-3166) Design doc for docker image registry client
[ https://issues.apache.org/jira/browse/MESOS-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jojy Varghese updated MESOS-3166: - Summary: Design doc for docker image registry client (was: Design doc for docker image registry authenticator) Design doc for docker image registry client --- Key: MESOS-3166 URL: https://issues.apache.org/jira/browse/MESOS-3166 Project: Mesos Issue Type: Bug Components: containerization Environment: linux Reporter: Jojy Varghese Assignee: Jojy Varghese Labels: mesosphere Create design document for the docker registry Authenticator component so that we have a baseline for the implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3213) Design doc for docker registry token manager
Jojy Varghese created MESOS-3213: Summary: Design doc for docker registry token manager Key: MESOS-3213 URL: https://issues.apache.org/jira/browse/MESOS-3213 Project: Mesos Issue Type: Task Components: containerization, docker Environment: linux Reporter: Jojy Varghese Create design document for describing the component and interaction between Docker Registry Client and remote Docker Registry for token based authorization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3213) Design doc for docker registry token manager
[ https://issues.apache.org/jira/browse/MESOS-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jojy Varghese updated MESOS-3213: - Sprint: Mesosphere Sprint 16 Design doc for docker registry token manager Key: MESOS-3213 URL: https://issues.apache.org/jira/browse/MESOS-3213 Project: Mesos Issue Type: Task Components: containerization, docker Environment: linux Reporter: Jojy Varghese Create design document for describing the component and interaction between Docker Registry Client and remote Docker Registry for token based authorization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3205) No need to checkpoint container root filesystem path.
[ https://issues.apache.org/jira/browse/MESOS-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu reassigned MESOS-3205: - Assignee: Jie Yu No need to checkpoint container root filesystem path. - Key: MESOS-3205 URL: https://issues.apache.org/jira/browse/MESOS-3205 Project: Mesos Issue Type: Task Reporter: Jie Yu Assignee: Jie Yu Given the design discussed in [MESOS-3004|https://issues.apache.org/jira/browse/MESOS-3004], one container might have multiple provisioned root filesystems. Only checkpointing the root filesystem for ContainerInfo::image does not make sense. Also, we realized that checkpointing container root filesystem path is not necessary because each provisioner should be able to destroy root filesystems for a given container based on a canonical directory layout (e.g., appc_rootfs_dir/container_id/xxx). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3214) Replace boost foreach with range-based for
[ https://issues.apache.org/jira/browse/MESOS-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659336#comment-14659336 ] Klaus Ma commented on MESOS-3214: - +1 for option 1; it enforce all contributor to use range-based for. Option 2 is safer for production, but foreach range-based for maybe mixed in the code without this background. Replace boost foreach with range-based for -- Key: MESOS-3214 URL: https://issues.apache.org/jira/browse/MESOS-3214 Project: Mesos Issue Type: Task Components: stout Reporter: Michael Park Labels: mesosphere It's desirable to replace the boost {{foreach}} macro with the C++11 range-based {{for}}. This will help avoid some of the pitfalls of boost {{foreach}} such as dealing with types with commas in them, as well as improving compiler diagnostics by avoiding the macro expansion. One way to accomplish this is to replace the existing {{foreach (const Elem elem, container)}} pattern with {{for (const Elem elem : container)}}. We could support {{foreachkey}} and {{foreachvalue}} semantics via adaptors {{keys}} and {{values}} which would be used like this: {{for (const Key key : keys(container))}}, {{for (const Value value : values(container))}}. This leaves {{foreachpair}} which cannot be used with {{for}}. I think it would be desirable to support {{foreachpair}} for cases where the implicit unpacking is useful. Another approach is to keep {{foreach}}, {{foreachpair}}, {{foreachkey}} and {{foreachvalue}}, but simply implement them based on range-based {{for}}. For example, {{#define foreach(elem, container) for (elem : container)}}. While the consistency in the names is desirable, but unnecessary indirection of the macro definition is not. It's unclear to me which approach we would favor in Mesos, so please share your thoughts and preferences. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3214) Replace boost foreach with range-based for
[ https://issues.apache.org/jira/browse/MESOS-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-3214: Description: Replace boost {{foreach}} macro with the C++11 range-based {{for}}. This will help avoid some of the pitfalls of boost {{foreach}} such as dealing with types with commas in them, as well as improving compiler diagnostics by avoiding the macro expansion. The existing {{foreach (const Elem elem, container)}} pattern can be replaced with {{for (const Elem elem : container)}}. {{foreachpair}}, {{foreachkey}} and {{foreachvalue}} will still be supported for cases where the implicit unpacking is useful. The implementation of {{foreachpair}} can be simplified with the use of range-based for within, {{foreachkey}} and {{foreachvalue}} will be exactly as is except it can use {{std::ignore}} instead of the hand-rolled version. was: Replace boost {{foreach}} macro with the C++11 range-based {{for}}. This will help avoid some of the pitfalls of boost {{foreach}} such as dealing with types with commas in them, as well as improving compiler diagnostics by avoiding the macro expansion. The existing {{foreach (const Elem elem, container)}} can be replaced with {{for (const Elem elem : container)}}. {{foreachpair}}, {{foreachkey}} and {{foreachvalue}} will still be supported for cases where the implicit unpacking is useful. The implementation of {{foreachpair}} can be simplified with the use of range-based for within, {{foreachkey}} and {{foreachvalue}} will be exactly as is except it can use {{std::ignore}} instead of the hand-rolled version. Replace boost foreach with range-based for -- Key: MESOS-3214 URL: https://issues.apache.org/jira/browse/MESOS-3214 Project: Mesos Issue Type: Task Components: stout Reporter: Michael Park Labels: mesosphere Replace boost {{foreach}} macro with the C++11 range-based {{for}}. This will help avoid some of the pitfalls of boost {{foreach}} such as dealing with types with commas in them, as well as improving compiler diagnostics by avoiding the macro expansion. The existing {{foreach (const Elem elem, container)}} pattern can be replaced with {{for (const Elem elem : container)}}. {{foreachpair}}, {{foreachkey}} and {{foreachvalue}} will still be supported for cases where the implicit unpacking is useful. The implementation of {{foreachpair}} can be simplified with the use of range-based for within, {{foreachkey}} and {{foreachvalue}} will be exactly as is except it can use {{std::ignore}} instead of the hand-rolled version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3215) CgroupsAnyHierarchyWithPerfEventTest failing on Ubuntu 14.04
Artem Harutyunyan created MESOS-3215: Summary: CgroupsAnyHierarchyWithPerfEventTest failing on Ubuntu 14.04 Key: MESOS-3215 URL: https://issues.apache.org/jira/browse/MESOS-3215 Project: Mesos Issue Type: Bug Reporter: Artem Harutyunyan [ RUN ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf ../../src/tests/containerizer/cgroups_tests.cpp:172: Failure (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup '/sys/fs/cgroup/perf_event/mesos_test': Device or resource busy ../../src/tests/containerizer/cgroups_tests.cpp:190: Failure (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup '/sys/fs/cgroup/perf_event/mesos_test': Device or resource busy [ FAILED ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf (9 ms) [--] 1 test from CgroupsAnyHierarchyWithPerfEventTest (9 ms total) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2402) MesosContainerizerDestroyTest.LauncherDestroyFailure is flaky
[ https://issues.apache.org/jira/browse/MESOS-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659529#comment-14659529 ] billow commented on MESOS-2402: --- i encounter this problem two days ago. MesosContainerizerDestroyTest.LauncherDestroyFailure is flaky - Key: MESOS-2402 URL: https://issues.apache.org/jira/browse/MESOS-2402 Project: Mesos Issue Type: Bug Affects Versions: 0.23.0 Reporter: Vinod Kone Assignee: Vinod Kone Fix For: 0.23.0 Failed to os::execvpe in childMain. Never seen this one before. {code} [ RUN ] MesosContainerizerDestroyTest.LauncherDestroyFailure Using temporary directory '/tmp/MesosContainerizerDestroyTest_LauncherDestroyFailure_QpjQEn' I0224 18:55:49.326912 21391 containerizer.cpp:461] Starting container 'test_container' for executor 'executor' of framework '' I0224 18:55:49.332252 21391 launcher.cpp:130] Forked child with pid '23496' for container 'test_container' ABORT: (src/subprocess.cpp:165): Failed to os::execvpe in childMain *** Aborted at 1424832949 (unix time) try date -d @1424832949 if you are using GNU date *** PC: @ 0x2b178c5db0d5 (unknown) I0224 18:55:49.340955 21392 process.cpp:2117] Dropped / Lost event for PID: scheduler-509d37ac-296f-4429-b101-af433c1800e9@127.0.1.1:39647 I0224 18:55:49.342300 21386 containerizer.cpp:911] Destroying container 'test_container' *** SIGABRT (@0x3e85bc8) received by PID 23496 (TID 0x2b178f9f0700) from PID 23496; stack trace: *** @ 0x2b178c397cb0 (unknown) @ 0x2b178c5db0d5 (unknown) @ 0x2b178c5de83b (unknown) @ 0x87a945 _Abort() @ 0x2b1789f610b9 process::childMain() I0224 18:55:49.391793 21386 containerizer.cpp:1120] Executor for container 'test_container' has exited I0224 18:55:49.400478 21391 process.cpp:2770] Handling HTTP event for process 'metrics' with path: '/metrics/snapshot' tests/containerizer_tests.cpp:485: Failure Value of: metrics.values[containerizer/mesos/container_destroy_errors] Actual: 16-byte object 02-00 00-00 17-2B 00-00 E0-86 0E-04 00-00 00-00 Expected: 1u Which is: 1 [ FAILED ] MesosContainerizerDestroyTest.LauncherDestroyFailure (89 ms) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3214) Replace boost foreach with range-based for
[ https://issues.apache.org/jira/browse/MESOS-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-3214: Description: It's desirable to replace the boost {{foreach}} macro with the C++11 range-based {{for}}. This will help avoid some of the pitfalls of boost {{foreach}} such as dealing with types with commas in them, as well as improving compiler diagnostics by avoiding the macro expansion. One way to accomplish this is to replace the existing {{foreach (const Elem elem, container)}} pattern with {{for (const Elem elem : container)}}. We could support {{foreachkey}} and {{foreachvalue}} semantics via adaptors {{keys}} and {{values}} which would be used like this: {{for (const Key key : keys(container))}}, {{for (const Value value : values(container))}}. This leaves {{foreachpair}} which cannot be used with {{for}}. I think it would be desirable to support {{foreachpair}} for cases where the implicit unpacking is useful. Another approach is to keep {{foreach}}, {{foreachpair}}, {{foreachkey}} and {{foreachvalue}}, but simply implement them based on range-based {{for}}. For example, {{#define foreach(elem, container) for (elem : container)}}. While the consistency in the names is desirable, but unnecessary indirection of the macro definition is not. It's unclear to me which approach we would favor in Mesos, so please share your thoughts and preferences. was: Replace boost {{foreach}} macro with the C++11 range-based {{for}}. This will help avoid some of the pitfalls of boost {{foreach}} such as dealing with types with commas in them, as well as improving compiler diagnostics by avoiding the macro expansion. The existing {{foreach (const Elem elem, container)}} pattern can be replaced with {{for (const Elem elem : container)}}. {{foreachpair}}, {{foreachkey}} and {{foreachvalue}} will still be supported for cases where the implicit unpacking is useful. The implementation of {{foreachpair}} can be simplified with the use of range-based for within, {{foreachkey}} and {{foreachvalue}} will be exactly as is except it can use {{std::ignore}} instead of the hand-rolled version. Replace boost foreach with range-based for -- Key: MESOS-3214 URL: https://issues.apache.org/jira/browse/MESOS-3214 Project: Mesos Issue Type: Task Components: stout Reporter: Michael Park Labels: mesosphere It's desirable to replace the boost {{foreach}} macro with the C++11 range-based {{for}}. This will help avoid some of the pitfalls of boost {{foreach}} such as dealing with types with commas in them, as well as improving compiler diagnostics by avoiding the macro expansion. One way to accomplish this is to replace the existing {{foreach (const Elem elem, container)}} pattern with {{for (const Elem elem : container)}}. We could support {{foreachkey}} and {{foreachvalue}} semantics via adaptors {{keys}} and {{values}} which would be used like this: {{for (const Key key : keys(container))}}, {{for (const Value value : values(container))}}. This leaves {{foreachpair}} which cannot be used with {{for}}. I think it would be desirable to support {{foreachpair}} for cases where the implicit unpacking is useful. Another approach is to keep {{foreach}}, {{foreachpair}}, {{foreachkey}} and {{foreachvalue}}, but simply implement them based on range-based {{for}}. For example, {{#define foreach(elem, container) for (elem : container)}}. While the consistency in the names is desirable, but unnecessary indirection of the macro definition is not. It's unclear to me which approach we would favor in Mesos, so please share your thoughts and preferences. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3214) Replace boost foreach with range-based for
Michael Park created MESOS-3214: --- Summary: Replace boost foreach with range-based for Key: MESOS-3214 URL: https://issues.apache.org/jira/browse/MESOS-3214 Project: Mesos Issue Type: Task Components: stout Reporter: Michael Park Replace boost {{foreach}} macro with the C++11 range-based {{for}}. This will help avoid some of the pitfalls of boost {{foreach}} such as dealing with types with commas in them, as well as improving compiler diagnostics by avoiding the macro expansion. The existing {{foreach (const Elem elem, container)}} can be replaced with {{for (const Elem elem : container)}}. {{foreachpair}}, {{foreachkey}} and {{foreachvalue}} will still be supported for cases where the implicit unpacking is useful. The implementation of {{foreachpair}} can be simplified with the use of range-based for within, {{foreachkey}} and {{foreachvalue}} will be exactly as is except it can use {{std::ignore}} instead of the hand-rolled version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3214) Replace boost foreach with range-based for
[ https://issues.apache.org/jira/browse/MESOS-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659340#comment-14659340 ] Benjamin Mahler commented on MESOS-3214: As an iterative first step, how about doing your second suggestion in order to remove the boost header dependency? FWICT foreach.hpp is a pretty expensive header for compilation? Replace boost foreach with range-based for -- Key: MESOS-3214 URL: https://issues.apache.org/jira/browse/MESOS-3214 Project: Mesos Issue Type: Task Components: stout Reporter: Michael Park Labels: mesosphere It's desirable to replace the boost {{foreach}} macro with the C++11 range-based {{for}}. This will help avoid some of the pitfalls of boost {{foreach}} such as dealing with types with commas in them, as well as improving compiler diagnostics by avoiding the macro expansion. One way to accomplish this is to replace the existing {{foreach (const Elem elem, container)}} pattern with {{for (const Elem elem : container)}}. We could support {{foreachkey}} and {{foreachvalue}} semantics via adaptors {{keys}} and {{values}} which would be used like this: {{for (const Key key : keys(container))}}, {{for (const Value value : values(container))}}. This leaves {{foreachpair}} which cannot be used with {{for}}. I think it would be desirable to support {{foreachpair}} for cases where the implicit unpacking is useful. Another approach is to keep {{foreach}}, {{foreachpair}}, {{foreachkey}} and {{foreachvalue}}, but simply implement them based on range-based {{for}}. For example, {{#define foreach(elem, container) for (elem : container)}}. While the consistency in the names is desirable, but unnecessary indirection of the macro definition is not. It's unclear to me which approach we would favor in Mesos, so please share your thoughts and preferences. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1535) Pyspark on Mesos scheduler error
[ https://issues.apache.org/jira/browse/MESOS-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659221#comment-14659221 ] Timothy Chen commented on MESOS-1535: - kill task is now supported in the Mesos scheduler. Pyspark on Mesos scheduler error Key: MESOS-1535 URL: https://issues.apache.org/jira/browse/MESOS-1535 Project: Mesos Issue Type: Bug Affects Versions: 0.18.0, 0.18.1 Environment: Running a Mesos on a cluster of Centos 6.5 machines. 180 GB memory. Reporter: Ajay Viswanathan Labels: pyspark This is an error that I get while running fine-grained PySpark on the mesos cluster. This comes after running some 200-1000 tasks generally. Pyspark code: while True: sc.parallelize(range(10)).map(lambda n : n*2).collect() Error log: (In console) ERROR DAGSchedulerActorSupervisor: eventProcesserActor failed due to the error EOF reached before Python server acknowledged; shutting down SparkContext Traceback (most recent call last): File stdin, line 2, in module File .../spark-1.0.0/python/pyspark/rdd.py, line 583, in collect bytesInJava = self._jrdd.collect().iterator() File .../spark-1.0.0/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py, line 537, File .../spark-1.0.0/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py, line 300, in py4j.protocol.Py4JJavaError: An error occurred while calling o847.collect. org.apache.spark.SparkException: Job 75 cancelled as part of cancellation of all jobs at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$fail at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:998) at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGS at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:499) at org.apache.spark.scheduler.DAGSchedulerActorSupervisor$$anonfun$2.applyOrElse(DAGSche at org.apache.spark.scheduler.DAGSchedulerActorSupervisor$$anonfun$2.applyOrElse(DAGSche at akka.actor.SupervisorStrategy.handleFailure(FaultHandling.scala:295) at akka.actor.dungeon.FaultHandling$class.handleFailure(FaultHandling.scala:253) at akka.actor.ActorCell.handleFailure(ActorCell.scala:338) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:423) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262) at akka.dispatch.Mailbox.run(Mailbox.scala:218) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.s at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/06/24 02:58:19 ERROR OneForOneStrategy: java.lang.UnsupportedOperationException at org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) at org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedule at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1. at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1. at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1. at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedul at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedul at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGSchedu at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGSchedu at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGSchedu at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$fail at