[jira] [Created] (MESOS-5284) LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor is flaky.

2016-04-25 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-5284:
---

 Summary: 
LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor is flaky.
 Key: MESOS-5284
 URL: https://issues.apache.org/jira/browse/MESOS-5284
 Project: Mesos
  Issue Type: Bug
  Components: tests
 Environment: CentOS 7 with SSL
Reporter: Gilbert Song


Observed on the internal Mesosphere CI:
{code}
[18:03:58] : [Step 10/10] [ RUN  ] 
LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystemCommandExecutor
[18:03:58]W: [Step 10/10] I0425 18:03:58.584962   642 cluster.cpp:149] 
Creating default 'local' authorizer
[18:03:58]W: [Step 10/10] I0425 18:03:58.597232   642 leveldb.cpp:174] 
Opened db in 12.195009ms
[18:03:58]W: [Step 10/10] I0425 18:03:58.598534   642 leveldb.cpp:181] 
Compacted db in 1.266907ms
[18:03:58]W: [Step 10/10] I0425 18:03:58.598558   642 leveldb.cpp:196] 
Created db iterator in 5704ns
[18:03:58]W: [Step 10/10] I0425 18:03:58.598565   642 leveldb.cpp:202] 
Seeked to beginning of db in 703ns
[18:03:58]W: [Step 10/10] I0425 18:03:58.598570   642 leveldb.cpp:271] 
Iterated through 0 keys in the db in 272ns
[18:03:58]W: [Step 10/10] I0425 18:03:58.598585   642 replica.cpp:779] 
Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
[18:03:58]W: [Step 10/10] I0425 18:03:58.598815   663 recover.cpp:447] 
Starting replica recovery
[18:03:58]W: [Step 10/10] I0425 18:03:58.598927   663 recover.cpp:473] 
Replica is in EMPTY status
[18:03:58]W: [Step 10/10] I0425 18:03:58.599241   663 replica.cpp:673] 
Replica in EMPTY status received a broadcasted recover request from 
(17941)@172.30.2.229:48705
[18:03:58]W: [Step 10/10] I0425 18:03:58.599323   663 recover.cpp:193] 
Received a recover response from a replica in EMPTY status
[18:03:58]W: [Step 10/10] I0425 18:03:58.599472   657 recover.cpp:564] 
Updating replica status to STARTING
[18:03:58]W: [Step 10/10] I0425 18:03:58.600092   661 master.cpp:382] 
Master 7e239aa6-d964-4f11-95a8-ba808ad23f4e (ip-172-30-2-229.mesosphere.io) 
started on 172.30.2.229:48705
[18:03:58]W: [Step 10/10] I0425 18:03:58.600105   661 master.cpp:384] Flags 
at startup: --acls="" --allocation_interval="1secs" 
--allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" 
--authenticate_http_frameworks="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/XRr1Iz/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_slave_ping_timeouts="5" --quiet="false" 
--recovery_slave_removal_limit="100%" --registry="replicated_log" 
--registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
--registry_strict="true" --root_submissions="true" 
--slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
--user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/XRr1Iz/master" 
--zk_session_timeout="10secs"
[18:03:58]W: [Step 10/10] I0425 18:03:58.600225   661 master.cpp:433] 
Master only allowing authenticated frameworks to register
[18:03:58]W: [Step 10/10] I0425 18:03:58.600231   661 master.cpp:439] 
Master only allowing authenticated agents to register
[18:03:58]W: [Step 10/10] I0425 18:03:58.600234   661 master.cpp:445] 
Master only allowing authenticated HTTP frameworks to register
[18:03:58]W: [Step 10/10] I0425 18:03:58.600239   661 credentials.hpp:37] 
Loading credentials for authentication from '/tmp/XRr1Iz/credentials'
[18:03:58]W: [Step 10/10] I0425 18:03:58.600371   661 master.cpp:489] Using 
default 'crammd5' authenticator
[18:03:58]W: [Step 10/10] I0425 18:03:58.600410   661 master.cpp:560] Using 
default 'basic' HTTP authenticator
[18:03:58]W: [Step 10/10] I0425 18:03:58.600461   661 master.cpp:640] Using 
default 'basic' HTTP framework authenticator
[18:03:58]W: [Step 10/10] I0425 18:03:58.600525   661 master.cpp:687] 
Authorization enabled
[18:03:58]W: [Step 10/10] I0425 18:03:58.600590   656 
whitelist_watcher.cpp:77] No whitelist given
[18:03:58]W: [Step 10/10] I0425 18:03:58.600610   660 hierarchical.cpp:142] 
Initialized hierarchical allocator process
[18:03:58]W: [Step 10/10] I0425 18:03:58.600883   659 leveldb.cpp:304] 
Persisting metadata (8 bytes) to leveldb took 1.350635ms
[18:03:58]W: [Step 10/10] I0425 18:03:58.600904   659 replica.cpp:320] 
Persisted replica status to STARTING
[18:03:58]W: [Step 10/10] I0425 18:03:58.601029   659 recover.cpp:473] 
Replica is in STARTING status
[18:03:58]W: [Step 10/10] I0425 18:03:58.601161   657 master.cpp:1932] 

[jira] [Created] (MESOS-5283) LinuxFilesystemIsolatorTest.ROOT_VolumeFromSandbox is flaky.

2016-04-25 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-5283:
---

 Summary: LinuxFilesystemIsolatorTest.ROOT_VolumeFromSandbox is 
flaky.
 Key: MESOS-5283
 URL: https://issues.apache.org/jira/browse/MESOS-5283
 Project: Mesos
  Issue Type: Bug
  Components: tests
 Environment: CentOS without SSL
Reporter: Gilbert Song


Observed on the internal Mesosphere CI:
{code}
[23:10:03] : [Step 10/10] [ RUN  ] 
LinuxFilesystemIsolatorTest.ROOT_VolumeFromSandbox
[23:10:05]W: [Step 10/10] I0425 23:10:05.061769 32151 linux.cpp:81] Making 
'/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromSandbox_vndkVC'
 a shared mount
[23:10:05]W: [Step 10/10] I0425 23:10:05.074729 32151 
linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
for the Linux launcher
[23:10:05]W: [Step 10/10] I0425 23:10:05.075099 32170 
containerizer.cpp:703] Starting container 
'14a48c04-9157-4796-8743-f37ad5da05d1' for executor 'test_executor' of 
framework ''
[23:10:05]W: [Step 10/10] I0425 23:10:05.075275 32168 provisioner.cpp:285] 
Provisioning image rootfs 
'/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromSandbox_vndkVC/provisioner/containers/14a48c04-9157-4796-8743-f37ad5da05d1/backends/copy/rootfses/f47a2ca8-b8bc-4bfa-bfaf-788c2eb33b49'
 for container 14a48c04-9157-4796-8743-f37ad5da05d1
[23:10:05]W: [Step 10/10] I0425 23:10:05.075589 32166 copy.cpp:128] Copying 
layer path '/tmp/uK310o/test_image' to rootfs 
'/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromSandbox_vndkVC/provisioner/containers/14a48c04-9157-4796-8743-f37ad5da05d1/backends/copy/rootfses/f47a2ca8-b8bc-4bfa-bfaf-788c2eb33b49'
[23:10:09]W: [Step 10/10] I0425 23:10:09.184612 32165 linux.cpp:355] Bind 
mounting work directory from '/tmp/uK310o/sandbox' to 
'/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromSandbox_vndkVC/provisioner/containers/14a48c04-9157-4796-8743-f37ad5da05d1/backends/copy/rootfses/f47a2ca8-b8bc-4bfa-bfaf-788c2eb33b49/mnt/mesos/sandbox'
 for container 14a48c04-9157-4796-8743-f37ad5da05d1
[23:10:09]W: [Step 10/10] I0425 23:10:09.185264 32167 
linux_launcher.cpp:281] Cloning child process with flags = CLONE_NEWNS
[23:10:09]W: [Step 10/10] + 
/mnt/teamcity/work/4240ba9ddd0997c3/build/src/mesos-containerizer mount 
--help=false --operation=make-rslave --path=/
[23:10:09]W: [Step 10/10] + grep -E 
/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromSandbox_vndkVC/.+
 /proc/self/mountinfo
[23:10:09]W: [Step 10/10] + grep -v 14a48c04-9157-4796-8743-f37ad5da05d1
[23:10:09]W: [Step 10/10] + cut '-d ' -f5
[23:10:09]W: [Step 10/10] + xargs --no-run-if-empty umount -l
[23:10:09]W: [Step 10/10] + mount -n --rbind /tmp/uK310o/sandbox/tmp 
/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromSandbox_vndkVC/provisioner/containers/14a48c04-9157-4796-8743-f37ad5da05d1/backends/copy/rootfses/f47a2ca8-b8bc-4bfa-bfaf-788c2eb33b49/tmp
[23:10:09] : [Step 10/10] Changing root to 
/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromSandbox_vndkVC/provisioner/containers/14a48c04-9157-4796-8743-f37ad5da05d1/backends/copy/rootfses/f47a2ca8-b8bc-4bfa-bfaf-788c2eb33b49
[23:10:09]W: [Step 10/10] I0425 23:10:09.284610 32170 
containerizer.cpp:1717] Executor for container 
'14a48c04-9157-4796-8743-f37ad5da05d1' has exited
[23:10:09]W: [Step 10/10] I0425 23:10:09.284638 32170 
containerizer.cpp:1481] Destroying container 
'14a48c04-9157-4796-8743-f37ad5da05d1'
[23:10:09]W: [Step 10/10] I0425 23:10:09.286339 32172 cgroups.cpp:2676] 
Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/14a48c04-9157-4796-8743-f37ad5da05d1
[23:10:09]W: [Step 10/10] I0425 23:10:09.287952 32169 cgroups.cpp:1409] 
Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos/14a48c04-9157-4796-8743-f37ad5da05d1 after 
1.587712ms
[23:10:09]W: [Step 10/10] I0425 23:10:09.289566 32171 cgroups.cpp:2694] 
Thawing cgroup /sys/fs/cgroup/freezer/mesos/14a48c04-9157-4796-8743-f37ad5da05d1
[23:10:09]W: [Step 10/10] I0425 23:10:09.290956 32171 cgroups.cpp:1438] 
Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos/14a48c04-9157-4796-8743-f37ad5da05d1 after 
1.371904ms
[23:10:09]W: [Step 10/10] I0425 23:10:09.292289 32167 linux.cpp:825] 
Unmounting sandbox/work directory 
'/mnt/teamcity/temp/buildTmp/LinuxFilesystemIsolatorTest_ROOT_VolumeFromSandbox_vndkVC/provisioner/containers/14a48c04-9157-4796-8743-f37ad5da05d1/backends/copy/rootfses/f47a2ca8-b8bc-4bfa-bfaf-788c2eb33b49/mnt/mesos/sandbox'
 for container 14a48c04-9157-4796-8743-f37ad5da05d1
[23:10:09]W: [Step 10/10] I0425 23:10:09.292472 32169 provisioner.cpp:338] 
Destroying container rootfs at 

[jira] [Commented] (MESOS-5155) Consolidate authorization actions for quota.

2016-04-25 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257557#comment-15257557
 ] 

Zhitao Li commented on MESOS-5155:
--

[~adam-mesos], I tried to implement the plan listed in option 1, but saw two 
potential issues:
1. Once we implemented quota update in MESOS-4941 (which we plan to use 
{{UPDATE_QUOTA_WITH_ROLE}} to guard), operator cannot upgrade to a version 
safely without temporarily losing ACL on either update quota or set quota;
2. {{Master::QuotaHandler::authorizeRemoveQuota}} needs to have access to the 
{{ACLs}}, so it can check which of {{removeQuotas}} or {{updateQuotas}} is 
empty in input, because {{object}} will have different types. However, {{ACLs}} 
is only parsed and stored in either {{LocalAuthorizer}} or external authorizer 
module, and not exposed in the {{mesos::Authorizer}} interface right now. We 
would need to modify {{mesos::Authorizer}} interface to either return {{ACLs}}, 
or return more information than {{Future}}.

Option 2 (which requires operator to provide {{ACLs.updateQuotas}} before 
binary upgrade, and simply warn and ignore old fields if not empty) does not 
have these problems, because we can simply cut implementation in 
{{QuotaHandler}} onto new action. The downside here is that operator needs to 
change {{--acls}} with unrecognized content first, and expect later binary 
upgrade to pick up the new filed, although I guess this is required to pick up 
any new action which requires ACLs.

Do you think we should still go with option 1? If yes, what's your suggestion 
on implementing {{Master::QuotaHandler::authorizeRemoveQuota}}?

> Consolidate authorization actions for quota.
> 
>
> Key: MESOS-5155
> URL: https://issues.apache.org/jira/browse/MESOS-5155
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Zhitao Li
>  Labels: mesosphere
>
> We should have just a single authz action: {{UPDATE_QUOTA_WITH_ROLE}}. It was 
> a mistake in retrospect to introduce multiple actions.
> Actions that are not symmetrical are register/teardown and dynamic 
> reservations. The way they are implemented in this way is because entities 
> that do one action differ from entities that do the other. For example, 
> register framework is issued by a framework, teardown by an operator. What is 
> a good way to identify a framework? A role it runs in, which may be different 
> each launch and makes no sense in multi-role frameworks setup or better a 
> sort of a group id, which is its principal. For dynamic reservations and 
> persistent volumes, they can be both issued by frameworks and operators, 
> hence similar reasoning applies. 
> Now, quota is associated with a role and set only by operators. Do we need to 
> care about principals that set it? Not that much. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5272) Support docker image labels.

2016-04-25 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257552#comment-15257552
 ] 

Gilbert Song commented on MESOS-5272:
-

We are still designing how this should be supported, so the final 
implementation is not determined yet. This is just one example that image label 
can be used. However, image labels are useful for custom metadata, which is 
concerned by many users. We should support that.

> Support docker image labels.
> 
>
> Key: MESOS-5272
> URL: https://issues.apache.org/jira/browse/MESOS-5272
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer, gpu
>
> Docker image labels should be supported in unified containerizer, which can 
> be used for applying custom metadata. Image labels are necessary for mesos 
> features to support docker in unified containerizer (e.g., for mesos GPU 
> device isolator).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5215) Update the documentation for '/reserve' and '/create-volumes'

2016-04-25 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5215:
-
Description: There are a couple issues related to the {{principal}} field 
in {{DiskInfo}} and {{ReservationInfo}} (see linked JIRAs) that should be 
better documented. We need to help users understand the purpose of these fields 
and how they interact with the principal provided in the HTTP authentication 
header. See linked tickets for background.  (was: There are a couple issues 
related to the {{principal}} field in {{DiskInfo}} and {{ReservationInfo}} (see 
linked JIRAs) that should be better documented. We need to help users 
understand the purpose/significance of these fields, and how to use them 
properly.)

> Update the documentation for '/reserve' and '/create-volumes'
> -
>
> Key: MESOS-5215
> URL: https://issues.apache.org/jira/browse/MESOS-5215
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Greg Mann
>  Labels: documentation, mesosphere
>
> There are a couple issues related to the {{principal}} field in {{DiskInfo}} 
> and {{ReservationInfo}} (see linked JIRAs) that should be better documented. 
> We need to help users understand the purpose of these fields and how they 
> interact with the principal provided in the HTTP authentication header. See 
> linked tickets for background.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5215) Update the documentation for '/reserve' and '/create-volumes'

2016-04-25 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257507#comment-15257507
 ] 

Greg Mann commented on MESOS-5215:
--

Good call, thanks [~neilc]!

> Update the documentation for '/reserve' and '/create-volumes'
> -
>
> Key: MESOS-5215
> URL: https://issues.apache.org/jira/browse/MESOS-5215
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.28.1
>Reporter: Greg Mann
>  Labels: documentation, mesosphere
>
> There are a couple issues related to the {{principal}} field in {{DiskInfo}} 
> and {{ReservationInfo}} (see linked JIRAs) that should be better documented. 
> We need to help users understand the purpose of these fields and how they 
> interact with the principal provided in the HTTP authentication header. See 
> linked tickets for background.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1575) master sets failover timeout to 0 when framework requests a high value

2016-04-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257493#comment-15257493
 ] 

José Guilherme Vanz commented on MESOS-1575:


[~vinodkone], this is the preliminary version: 
https://github.com/jvanz/mesos/commit/6701767722f23503e6f1f2d8f958e4b6acb387a3
I've tested with a java framework as in the issue description and it worked as 
expected. The subscription fails

I'll write a test to cover this issue. 

> master sets failover timeout to 0 when framework requests a high value
> --
>
> Key: MESOS-1575
> URL: https://issues.apache.org/jira/browse/MESOS-1575
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kevin Sweeney
>Assignee: José Guilherme Vanz
>  Labels: newbie, twitter
>
> In response to a registered RPC we observed the following behavior:
> {noformat}
> W0709 19:07:32.982997 11400 master.cpp:612] Using the default value for 
> 'failover_timeout' becausethe input value is invalid: Argument out of the 
> range that a Duration can represent due to int64_t's size limit
> I0709 19:07:32.983008 11404 hierarchical_allocator_process.hpp:408] 
> Deactivated framework 20140709-184342-119646400-5050-11380-0003
> I0709 19:07:32.983013 11400 master.cpp:617] Giving framework 
> 20140709-184342-119646400-5050-11380-0003 0ns to failover
> I0709 19:07:32.983271 11404 master.cpp:2201] Framework failover timeout, 
> removing framework 20140709-184342-119646400-5050-11380-0003
> I0709 19:07:32.983294 11404 master.cpp:2688] Removing framework 
> 20140709-184342-119646400-5050-11380-0003
> I0709 19:07:32.983678 11404 hierarchical_allocator_process.hpp:363] Removed 
> framework 20140709-184342-119646400-5050-11380-0003
> {noformat}
> This was using the following frameworkInfo.
> {code}
> FrameworkInfo frameworkInfo = FrameworkInfo.newBuilder()
> .setUser("test")
> .setName("jvm")
> .setFailoverTimeout(Long.MAX_VALUE)
> .build();
> {code}
> Instead of silently defaulting large values to 0 the master should refuse to 
> process the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5155) Consolidate authorization actions for quota.

2016-04-25 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257410#comment-15257410
 ] 

Adam B commented on MESOS-5155:
---

Excellent point. Operators must be able to upgrade their clusters live without 
losing access control. But this can still be achieved with option 1. I only 
stated that we should fail master startup if we find *both* old and new in 
--acls. As AlexR suggests, if only the old format is specified, a deprecation 
warning will be printed.
1. Start with old ACL fields/values and old binaries. Works without warnings.
2. Upgrade to new binary, keep old ACLs. Master logs deprecation warning, but 
works with old ACLs.
3. Upgrade flags to new ACLs. New master works without warnings with new ACLs.
In some future (>6 months) release, we will remove the deprecated format and 
the warning. From that release on, only the new ACLs will be accepted.

> Consolidate authorization actions for quota.
> 
>
> Key: MESOS-5155
> URL: https://issues.apache.org/jira/browse/MESOS-5155
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Zhitao Li
>  Labels: mesosphere
>
> We should have just a single authz action: {{UPDATE_QUOTA_WITH_ROLE}}. It was 
> a mistake in retrospect to introduce multiple actions.
> Actions that are not symmetrical are register/teardown and dynamic 
> reservations. The way they are implemented in this way is because entities 
> that do one action differ from entities that do the other. For example, 
> register framework is issued by a framework, teardown by an operator. What is 
> a good way to identify a framework? A role it runs in, which may be different 
> each launch and makes no sense in multi-role frameworks setup or better a 
> sort of a group id, which is its principal. For dynamic reservations and 
> persistent volumes, they can be both issued by frameworks and operators, 
> hence similar reasoning applies. 
> Now, quota is associated with a role and set only by operators. Do we need to 
> care about principals that set it? Not that much. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-04-25 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257405#comment-15257405
 ] 

Fan Du commented on MESOS-4492:
---

[~bmahler] Can you please help to view this ticket?
RR: https://reviews.apache.org/r/44255/

Thanks a lot!

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5278) Add a CLI allowing a user to enter a container.

2016-04-25 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257380#comment-15257380
 ] 

Vinod Kone commented on MESOS-5278:
---

Would be great to add this as a subcommand to the existing "mesos" CLI ( does 
that work anymore?) instead of creating a new one.

> Add a CLI allowing a user to enter a container.
> ---
>
> Key: MESOS-5278
> URL: https://issues.apache.org/jira/browse/MESOS-5278
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Containers created by the unified containerizer (Mesos containerizer) uses 
> various namespaces (e.g., mount, network, etc.).
> To improve debugability, we should create a CLI that allows an operator or a 
> user to enter the namespaces associated with the container, and execute an 
> arbitrary command in that container (similar to `docker exec`).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4386) Deprecate 'authenticate' master flag in favor of 'authenticate_frameworks' flag

2016-04-25 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4386:
--
Shepherd: Benjamin Mahler
  Sprint: Mesosphere Sprint 33
Story Points: 1

> Deprecate 'authenticate' master flag in favor of 'authenticate_frameworks' 
> flag
> ---
>
> Key: MESOS-4386
> URL: https://issues.apache.org/jira/browse/MESOS-4386
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>  Labels: mesosphere, newbie, security
>
> To be consistent with `authenticate_slaves` and `authenticate_http` flags, we 
> should rename `authenticate` to `authenticate_frameworks` flag.
> This should be done via deprecation cycle. 
> 1) Release X supports both `authenticate` and `authenticate_frameworks` flags
> 2)  Release X + n supports only `authenticate_frameworks` flag. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4386) Deprecate 'authenticate' master flag in favor of 'authenticate_frameworks' flag

2016-04-25 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-4386:
-

Assignee: Vinod Kone

> Deprecate 'authenticate' master flag in favor of 'authenticate_frameworks' 
> flag
> ---
>
> Key: MESOS-4386
> URL: https://issues.apache.org/jira/browse/MESOS-4386
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>  Labels: mesosphere, newbie, security
>
> To be consistent with `authenticate_slaves` and `authenticate_http` flags, we 
> should rename `authenticate` to `authenticate_frameworks` flag.
> This should be done via deprecation cycle. 
> 1) Release X supports both `authenticate` and `authenticate_frameworks` flags
> 2)  Release X + n supports only `authenticate_frameworks` flag. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory

2016-04-25 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-4869:
--

Assignee: Benjamin Mahler

> /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
> ---
>
> Key: MESOS-4869
> URL: https://issues.apache.org/jira/browse/MESOS-4869
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.1
>Reporter: Anthony Scalisi
>Assignee: Benjamin Mahler
>Priority: Critical
>  Labels: health-check
> Fix For: 0.26.1, 0.25.1, 0.24.2, 0.28.1, 0.27.3
>
>
> We switched our health checks in Marathon from HTTP to COMMAND:
> {noformat}
> "healthChecks": [
> {
>   "protocol": "COMMAND",
>   "path": "/ops/ping",
>   "command": { "value": "curl --silent -f -X GET 
> http://$HOST:$PORT0/ops/ping > /dev/null" },
>   "gracePeriodSeconds": 90,
>   "intervalSeconds": 2,
>   "portIndex": 0,
>   "timeoutSeconds": 5,
>   "maxConsecutiveFailures": 3
> }
>   ]
> {noformat}
> All our applications have the same health check (and /ops/ping endpoint).
> Even though we have the issue on all our Meos slaves, I'm going to focus on a 
> particular one: *mesos-slave-i-e3a9c724*.
> The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks:
> !https://i.imgur.com/gbRf804.png!
> Here is a *docker ps* on it:
> {noformat}
> root@mesos-slave-i-e3a9c724 # docker ps
> CONTAINER IDIMAGE   COMMAND  CREATED  
>STATUS  PORTS NAMES
> 4f7c0aa8d03ajava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago  
>Up 6 hours  0.0.0.0:31926->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d
> 66f2fc8f8056java:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago  
>Up 6 hours  0.0.0.0:31939->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a
> f7382f241fcejava:8  "/bin/sh -c 'JAVA_OPT"   6 hours ago  
>Up 6 hours  0.0.0.0:31656->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d
> 880934c0049ejava:8  "/bin/sh -c 'JAVA_OPT"   24 hours ago 
>Up 24 hours 0.0.0.0:31371->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0
> 5eab1f8dac4ajava:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago 
>Up 46 hours 0.0.0.0:31500->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7
> b63740fe56e7java:8  "/bin/sh -c 'JAVA_OPT"   46 hours ago 
>Up 46 hours 0.0.0.0:31382->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe
> 5c7a9ea77b0ejava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago   
>Up 2 days   0.0.0.0:31186->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4
> 53065e7a31adjava:8  "/bin/sh -c 'JAVA_OPT"   2 days ago   
>Up 2 days   0.0.0.0:31839->8080/tcp   
> mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
> {noformat}
> Here is a *docker stats* on it:
> {noformat}
> root@mesos-slave-i-e3a9c724  # docker stats
> CONTAINER   CPU %   MEM USAGE / LIMIT MEM %   
> NET I/O   BLOCK I/O
> 4f7c0aa8d03a2.93%   797.3 MB / 1.611 GB   49.50%  
> 1.277 GB / 1.189 GB   155.6 kB / 151.6 kB
> 53065e7a31ad8.30%   738.9 MB / 1.611 GB   45.88%  
> 419.6 MB / 554.3 MB   98.3 kB / 61.44 kB
> 5c7a9ea77b0e4.91%   1.081 GB / 1.611 GB   67.10%  
> 423 MB / 526.5 MB 3.219 MB / 61.44 kB
> 5eab1f8dac4a3.13%   1.007 GB / 1.611 GB   62.53%  
> 2.737 GB / 2.564 GB   6.566 MB / 118.8 kB
> 66f2fc8f80563.15%   768.1 MB / 1.611 GB   47.69%  
> 258.5 MB / 252.8 MB   1.86 MB / 151.6 kB
> 880934c0049e10.07%  735.1 MB / 1.611 GB   45.64%  
> 1.451 GB / 1.399 GB   573.4 kB / 94.21 kB
> b63740fe56e712.04%  629 MB / 1.611 GB 39.06%  
> 10.29 GB / 9.344 GB   8.102 MB / 61.44 kB
> f7382f241fce6.21%   505 MB / 1.611 GB 31.36%  
> 153.4 MB / 151.9 MB   5.837 MB / 94.21 kB
> {noformat}
> Not much else is running on the slave, yet the used memory doesn't map to the 
> tasks memory:
> {noformat}
> Mem:16047M used:13340M buffers:1139M cache:776M
> {noformat}
> If 

[jira] [Updated] (MESOS-5238) CHECK failure in AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest

2016-04-25 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-5238:

Fix Version/s: 0.28.2
   0.29.0

> CHECK failure in AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest
> -
>
> Key: MESOS-5238
> URL: https://issues.apache.org/jira/browse/MESOS-5238
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 7 + SSL, x86-64
>Reporter: Neil Conway
>Assignee: Gilbert Song
>  Labels: containerizer, flaky, mesosphere
> Fix For: 0.29.0, 0.28.2
>
> Attachments: 5238_check_failure.txt
>
>
> Observed on the Mesosphere internal CI:
> {noformat}
> [22:56:28]W: [Step 10/10] F0420 22:56:28.056788   629 
> containerizer.cpp:1634] Check failed: containers_.contains(containerId)
> {noformat}
> Complete test log will be attached as a file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5238) CHECK failure in AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest

2016-04-25 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-5238:

Affects Version/s: (was: 0.28.1)
   (was: 0.28.0)
Fix Version/s: (was: 0.28.2)
   (was: 0.29.0)

> CHECK failure in AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest
> -
>
> Key: MESOS-5238
> URL: https://issues.apache.org/jira/browse/MESOS-5238
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 7 + SSL, x86-64
>Reporter: Neil Conway
>Assignee: Gilbert Song
>  Labels: containerizer, flaky, mesosphere
> Attachments: 5238_check_failure.txt
>
>
> Observed on the Mesosphere internal CI:
> {noformat}
> [22:56:28]W: [Step 10/10] F0420 22:56:28.056788   629 
> containerizer.cpp:1634] Check failed: containers_.contains(containerId)
> {noformat}
> Complete test log will be attached as a file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5282) Destroy container while provisioning volume images may lead to a race.

2016-04-25 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257354#comment-15257354
 ] 

Gilbert Song commented on MESOS-5282:
-

This bug is found from MESOS-5238.

> Destroy container while provisioning volume images may lead to a race.
> --
>
> Key: MESOS-5282
> URL: https://issues.apache.org/jira/browse/MESOS-5282
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.28.0, 0.28.1
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer
> Fix For: 0.29.0, 0.28.2
>
>
> For the specific case that if the containerizer destroys a container while it 
> is provisioning a volume image during `_launch()` in mesos containerizer. 
> There may possibly be a race between `destroy` and `prepare`, which will 
> change the container state from TERMINATED to PREPARING. That is problematic 
> because that leads to container may be destroy more than once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5282) Destroy container while provisioning volume images may lead to a race.

2016-04-25 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-5282:
---

 Summary: Destroy container while provisioning volume images may 
lead to a race.
 Key: MESOS-5282
 URL: https://issues.apache.org/jira/browse/MESOS-5282
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 0.28.1, 0.28.0
Reporter: Gilbert Song
Assignee: Gilbert Song
 Fix For: 0.29.0, 0.28.2


For the specific case that if the containerizer destroys a container while it 
is provisioning a volume image during `_launch()` in mesos containerizer. There 
may possibly be a race between `destroy` and `prepare`, which will change the 
container state from TERMINATED to PREPARING. That is problematic because that 
leads to container may be destroy more than once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5281) Improve docker containerizer debug logging.

2016-04-25 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-5281:
---

 Summary: Improve docker containerizer debug logging.
 Key: MESOS-5281
 URL: https://issues.apache.org/jira/browse/MESOS-5281
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Gilbert Song
Assignee: Gilbert Song


Currently not all method (e.g., `update()`) provides a `LOG(INFO)` message 
which indicates the method is invoked in docker containerizer. This is 
important for debugging because if there exists a race, at least we know which 
method is called.

This JIRA aims to improve/fix:
1. LOG(INFO) for each mother in docker containerizer.
2. Brocken VLOG(1) and WARNING logging. 
3. Log format improvement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-04-25 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257308#comment-15257308
 ] 

Joseph Wu commented on MESOS-4760:
--

Most of the caching logic will remain the same.  I'll be changing parts of 
{{FetcherProcess::run}} and a tiny bit of logic in {{FetcherProcess::fetch}}.

It should be safe to add metrics in parallel, up until you need the injected 
fetcher object.

> Expose metrics and gauges for fetcher cache usage and hit rate
> --
>
> Key: MESOS-4760
> URL: https://issues.apache.org/jira/browse/MESOS-4760
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, statistics
>Reporter: Michael Browning
>Assignee: Michael Browning
>Priority: Minor
>  Labels: features, fetcher, statistics, uber
>
> To evaluate the fetcher cache and calibrate the value of the 
> fetcher_cache_size flag, it would be useful to have metrics and gauges on 
> agents that expose operational statistics like cache hit rate, occupied cache 
> size, and time spent downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4189) Dynamic weights

2016-04-25 Thread Yongqiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257262#comment-15257262
 ] 

Yongqiao Wang commented on MESOS-4189:
--

[~adam-mesos], also thanks a million for your guidance, I was very happy to 
work with you over the past few months. 

> Dynamic weights
> ---
>
> Key: MESOS-4189
> URL: https://issues.apache.org/jira/browse/MESOS-4189
> Project: Mesos
>  Issue Type: Epic
>Reporter: Yongqiao Wang
>Assignee: Yongqiao Wang
> Fix For: 0.29.0
>
>
> Mesos current uses a static list of weights that are configured when the 
> master startup(via the --weights flag), this place some limitation about 
> change the resource allocation priority for a role/frameworks(changing the 
> set of weights requires restarting all the masters). 
> This JIRA will add a new endpoint /weight to update/show weight of a role 
> with the authorized principles, and the non-default weights will be persisted 
> in registry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5167) Add tests for `network/cni` isolator

2016-04-25 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-5167:
-
  Sprint: Mesosphere Sprint 33
Story Points: 5

> Add tests for `network/cni` isolator
> 
>
> Key: MESOS-5167
> URL: https://issues.apache.org/jira/browse/MESOS-5167
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> We need to add tests to verify the functionality of `network/cni` isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4763) Add test mock for CNI plugins.

2016-04-25 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257260#comment-15257260
 ] 

Avinash Sridharan commented on MESOS-4763:
--

Sounds good. Thanks !! , closing this and tracking MESOS-5167 in the Agile 
board. 

> Add test mock for CNI plugins.
> --
>
> Key: MESOS-4763
> URL: https://issues.apache.org/jira/browse/MESOS-4763
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> In order to test the network/cni isolator, we need to mock the behavior of an 
> CNI plugin. One option is to write a mock script which acts as a CNI plugin. 
> The isolator will talk to the mock script the same way it talks to an actual 
> CNI plugin.
> The mock script can just join the host network?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5167) Add tests for `network/cni` isolator

2016-04-25 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-5167:
-
Labels: mesosphere  (was: )

> Add tests for `network/cni` isolator
> 
>
> Key: MESOS-5167
> URL: https://issues.apache.org/jira/browse/MESOS-5167
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> We need to add tests to verify the functionality of `network/cni` isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5280) Inconsistent error checking in DRF sorter.

2016-04-25 Thread Yan Xu (JIRA)
Yan Xu created MESOS-5280:
-

 Summary: Inconsistent error checking in DRF sorter.
 Key: MESOS-5280
 URL: https://issues.apache.org/jira/browse/MESOS-5280
 Project: Mesos
  Issue Type: Bug
  Components: allocation
Reporter: Yan Xu
Assignee: Yan Xu


There exist a few different error handling styles in the sorter.

h2. Hard checks
e.g., 
[DRFSorter::update|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L62]
{code}
CHECK(weights.contains(name));
{code}

h2. No-op if it results in an error condition.
e.g., 
[DRFSorter::allocated|https://github.com/apache/mesos/blob/c530deb3050d862fd894a9c4ed0a8ddca8714a63/src/master/allocator/sorter/drf/sorter.cpp#L116]:
{code}
set::iterator it = find(name);

if (it != clients.end()) { // TODO(benh): This should really be a CHECK.
...
}
{code}

IMO there should never be silent no-ops. Short of CHECK, we should return an 
error if it's indeed an error. If either path of the branch is valid and one is 
a  noop, we should log the noop branch or return a 'bool' so the caller can 
distinguish the two.

Implicitness makes it hard to debug things and we have run into one instance of 
this.

My proposal is to use CHECKs consistently in sorter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5272) Support docker image labels.

2016-04-25 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257220#comment-15257220
 ] 

Qian Zhang commented on MESOS-5272:
---

[~gilbert], can you please elaborate how image labels are necessary for Mesos 
GPU device isolator? Any logics in that isolator will depends on image labels?

> Support docker image labels.
> 
>
> Key: MESOS-5272
> URL: https://issues.apache.org/jira/browse/MESOS-5272
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer, gpu
>
> Docker image labels should be supported in unified containerizer, which can 
> be used for applying custom metadata. Image labels are necessary for mesos 
> features to support docker in unified containerizer (e.g., for mesos GPU 
> device isolator).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4705) Linux 'perf' parsing logic may fail when OS distribution has perf backports.

2016-04-25 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-4705:
---
Summary: Linux 'perf' parsing logic may fail when OS distribution has perf 
backports.  (was: Slave failed to sample container with perf event)

> Linux 'perf' parsing logic may fail when OS distribution has perf backports.
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
> Fix For: 0.29.0, 0.27.3, 0.28.2, 0.26.2
>
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4763) Add test mock for CNI plugins.

2016-04-25 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257204#comment-15257204
 ] 

Qian Zhang commented on MESOS-4763:
---

[~avin...@mesosphere.io], I think what you mean should be the JIRA: 
https://issues.apache.org/jira/browse/MESOS-5167. I have already move that JIRA 
into "Reviewable" and posted the RBs there.

For this JIRA, I think we will not work on it since we do not need to implement 
a separate mock plugin script anymore (we have embedded such script in the 
tests).

> Add test mock for CNI plugins.
> --
>
> Key: MESOS-4763
> URL: https://issues.apache.org/jira/browse/MESOS-4763
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> In order to test the network/cni isolator, we need to mock the behavior of an 
> CNI plugin. One option is to write a mock script which acts as a CNI plugin. 
> The isolator will talk to the mock script the same way it talks to an actual 
> CNI plugin.
> The mock script can just join the host network?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-04-25 Thread Michael Browning (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257197#comment-15257197
 ] 

Michael Browning commented on MESOS-4760:
-

[~kaysoky], I looked over the epic and the associated tasks, but you probably 
have better context -- do you think it would be easier to do this work after 
the unification with `Uri::Fetcher` is complete, or would it be fine to proceed 
in parallel?

> Expose metrics and gauges for fetcher cache usage and hit rate
> --
>
> Key: MESOS-4760
> URL: https://issues.apache.org/jira/browse/MESOS-4760
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, statistics
>Reporter: Michael Browning
>Assignee: Michael Browning
>Priority: Minor
>  Labels: features, fetcher, statistics, uber
>
> To evaluate the fetcher cache and calibrate the value of the 
> fetcher_cache_size flag, it would be useful to have metrics and gauges on 
> agents that expose operational statistics like cache hit rate, occupied cache 
> size, and time spent downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4705) Slave failed to sample container with perf event

2016-04-25 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257195#comment-15257195
 ] 

Benjamin Mahler edited comment on MESOS-4705 at 4/25/16 10:49 PM:
--

[~fan.du] this is now committed, I'll backport this onto the stable branches 
for 0.28.x, 0.27.x, 0.26.x:

{noformat}
commit a5c81d4077400892cd3a5c306143f16903aac62c
Author: fan du 
Date:   Mon Apr 25 13:50:50 2016 -0700

Fixed the 'perf' parsing logic.

Previously the 'perf' parsing logic used the kernel version to
determine the token ordering. However, this approach breaks
when distributions backport perf parsing changes onto older
kernel versions. This updates the parsing logic to understand
all existing formats.

Co-authored with haosdent.

Review: https://reviews.apache.org/r/44379/
{noformat}


was (Author: bmahler):
[~fan.du] this is now committed, I'll backport this onto the stable branches 
for 0.28.x, 0.27.x, 0.26.x.

> Slave failed to sample container with perf event
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
> Fix For: 0.29.0, 0.27.3, 0.28.2, 0.26.2
>
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5278) Add a CLI allowing a user to enter a container.

2016-04-25 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257144#comment-15257144
 ] 

Yan Xu edited comment on MESOS-5278 at 4/25/16 10:13 PM:
-

This is basically MESOS-2349 right? 

-[~idownes] we are interested too and can help with review if you share it. :)-

-Wait... you are offering to review... not sharing it?-

Had to read it a third time. Anyhow, +1 on this and it would be great if you 
can share it. A simplified version is a good start.


was (Author: xujyan):
This is basically MESOS-2349 right? 

-[~idownes] we are interested too and can help with review if you share it. :)-

Wait... you are offering to review... not sharing it?

> Add a CLI allowing a user to enter a container.
> ---
>
> Key: MESOS-5278
> URL: https://issues.apache.org/jira/browse/MESOS-5278
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Containers created by the unified containerizer (Mesos containerizer) uses 
> various namespaces (e.g., mount, network, etc.).
> To improve debugability, we should create a CLI that allows an operator or a 
> user to enter the namespaces associated with the container, and execute an 
> arbitrary command in that container (similar to `docker exec`).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5278) Add a CLI allowing a user to enter a container.

2016-04-25 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257144#comment-15257144
 ] 

Yan Xu edited comment on MESOS-5278 at 4/25/16 10:10 PM:
-

This is basically MESOS-2349 right? 

-[~idownes] we are interested too and can help with review if you share it. :)-

Wait... you are offering to review... not sharing it?


was (Author: xujyan):
This is basically MESOS-2349 right? 

[~idownes] we are interested too and can help with review if you share it. :)

> Add a CLI allowing a user to enter a container.
> ---
>
> Key: MESOS-5278
> URL: https://issues.apache.org/jira/browse/MESOS-5278
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Containers created by the unified containerizer (Mesos containerizer) uses 
> various namespaces (e.g., mount, network, etc.).
> To improve debugability, we should create a CLI that allows an operator or a 
> user to enter the namespaces associated with the container, and execute an 
> arbitrary command in that container (similar to `docker exec`).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5278) Add a CLI allowing a user to enter a container.

2016-04-25 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257144#comment-15257144
 ] 

Yan Xu commented on MESOS-5278:
---

This is basically MESOS-2349 right? 

[~idownes] we are interested too and can help with review if you share it. :)

> Add a CLI allowing a user to enter a container.
> ---
>
> Key: MESOS-5278
> URL: https://issues.apache.org/jira/browse/MESOS-5278
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Containers created by the unified containerizer (Mesos containerizer) uses 
> various namespaces (e.g., mount, network, etc.).
> To improve debugability, we should create a CLI that allows an operator or a 
> user to enter the namespaces associated with the container, and execute an 
> arbitrary command in that container (similar to `docker exec`).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3914) Make request format consistent across endpoints

2016-04-25 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257143#comment-15257143
 ] 

Vinod Kone commented on MESOS-3914:
---

Moved some of the tickets from this epic to the Operator API v1 epic. MESOS-4791

> Make request format consistent across endpoints
> ---
>
> Key: MESOS-3914
> URL: https://issues.apache.org/jira/browse/MESOS-3914
> Project: Mesos
>  Issue Type: Epic
>  Components: master
>Reporter: Alexander Rukletsov
>  Labels: http, mesosphere, tech-debt
>
> We are inconsistent with the format of requests we expect for operator 
> endpoints. For example, dynamic reservations take a string 
> "slaveId={{}}={{}}", while maintenance 
> expects a {{JSON}} object representing {{maintenance::Schedule}} protobuf 
> directly.
> We agreed to accept single {{JSON}} objects and provide a corresponding 
> {{*Request}} protobuf to document the schema, leverage HTTP verbs where 
> appropriate. Next steps are:
> * document how Mesos HTTP operator endpoints should be implemented;
> * convert all nonconformant endpoints via a deprecation cycle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5155) Consolidate authorization actions for quota.

2016-04-25 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257139#comment-15257139
 ] 

Zhitao Li commented on MESOS-5155:
--

[~alexr] [~adam-mesos] I noticed that previous handling of deprecation for 
shutdown_framework -> teardown_framework followed more with option 2 from 
Adam's list (warn instead of fail if value are specified in both fields, and 
ignore values from deprecated field).

The benefit of this option is a clear upgrade path for operator w/o losing ACL 
guard at any moment:

1. Rollout values for both old and new fields (I assume new field will be 
dropped in {{protobuf::parse}} because it's unknown value);
2. Rollout new binary version, new value will take effect instead of the old 
one;
3. Once stabilized, remove old value within deprecation cycle.

In option 1, I don't think there is a way to do this w/o either turning off ACL 
for quota in upgrade window, or forcing coordinated upgrade for both --acls 
flag and binary version (which is not easy if operator has multiple clusters).

What do you think?

> Consolidate authorization actions for quota.
> 
>
> Key: MESOS-5155
> URL: https://issues.apache.org/jira/browse/MESOS-5155
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Zhitao Li
>  Labels: mesosphere
>
> We should have just a single authz action: {{UPDATE_QUOTA_WITH_ROLE}}. It was 
> a mistake in retrospect to introduce multiple actions.
> Actions that are not symmetrical are register/teardown and dynamic 
> reservations. The way they are implemented in this way is because entities 
> that do one action differ from entities that do the other. For example, 
> register framework is issued by a framework, teardown by an operator. What is 
> a good way to identify a framework? A role it runs in, which may be different 
> each launch and makes no sense in multi-role frameworks setup or better a 
> sort of a group id, which is its principal. For dynamic reservations and 
> persistent volumes, they can be both issued by frameworks and operators, 
> hence similar reasoning applies. 
> Now, quota is associated with a role and set only by operators. Do we need to 
> care about principals that set it? Not that much. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5279) DRF sorter add/activate doesn't check if it's adding a duplicate entry

2016-04-25 Thread Yan Xu (JIRA)
Yan Xu created MESOS-5279:
-

 Summary: DRF sorter add/activate doesn't check if it's adding a 
duplicate entry
 Key: MESOS-5279
 URL: https://issues.apache.org/jira/browse/MESOS-5279
 Project: Mesos
  Issue Type: Bug
  Components: allocation
Reporter: Yan Xu
Assignee: Yan Xu


Currently the sorter relies on the caller to make sure the sorter is in a good 
state when add/activate is called. It's not defensive against caller mistakes 
as it should be. It's never an acceptable result if duplicates are added to 
{{DRFSorter::clients}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5278) Add a CLI allowing a user to enter a container.

2016-04-25 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257112#comment-15257112
 ] 

Ian Downes commented on MESOS-5278:
---

I wrote an internal version of this tool which was simplified to support our 
environment. I can share that if you're interested? Either way, I'm definitely 
interested in providing input and can review.

> Add a CLI allowing a user to enter a container.
> ---
>
> Key: MESOS-5278
> URL: https://issues.apache.org/jira/browse/MESOS-5278
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Containers created by the unified containerizer (Mesos containerizer) uses 
> various namespaces (e.g., mount, network, etc.).
> To improve debugability, we should create a CLI that allows an operator or a 
> user to enter the namespaces associated with the container, and execute an 
> arbitrary command in that container (similar to `docker exec`).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5278) Add a CLI allowing a user to enter a container.

2016-04-25 Thread Jie Yu (JIRA)
Jie Yu created MESOS-5278:
-

 Summary: Add a CLI allowing a user to enter a container.
 Key: MESOS-5278
 URL: https://issues.apache.org/jira/browse/MESOS-5278
 Project: Mesos
  Issue Type: Improvement
Reporter: Jie Yu


Containers created by the unified containerizer (Mesos containerizer) uses 
various namespaces (e.g., mount, network, etc.).

To improve debugability, we should create a CLI that allows an operator or a 
user to enter the namespaces associated with the container, and execute an 
arbitrary command in that container (similar to `docker exec`).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5277) Need to add REMOVE semantics to the copy backend

2016-04-25 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5277:
--
Fix Version/s: 0.29.0

> Need to add REMOVE semantics to the copy backend
> 
>
> Key: MESOS-5277
> URL: https://issues.apache.org/jira/browse/MESOS-5277
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: linux
>Reporter: Avinash Sridharan
>Assignee: Gilbert Song
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> Some Dockerfile run the `rm` command to remove files from the base image 
> using the "RUN" directive in the Dockerfile. An example can be found here:
> https://github.com/ngineered/nginx-php-fpm.git
> In the final rootfs the removed files should not be present. Presence of 
> these files in the final image can make the container misbehave. For example, 
> the nginx-php-fpm docker image that is reference tries to remove the default 
> nginx config and replace it with it own config to point a different HTML 
> root. If the default nginx config is still present after the building the 
> image, nginx will start pointing to a different HTML root than the one set in 
> the Dockerfile.
> Currently the copy backend cannot handle removal of files from intermediate 
> layers. This can cause issues with docker images built using a Dockerfile 
> similar to the one listed here. Hence, we need to add REMOVE semantics to the 
> copy backend.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5277) Need to add REMOVE semantics to the copy backend

2016-04-25 Thread Avinash Sridharan (JIRA)
Avinash Sridharan created MESOS-5277:


 Summary: Need to add REMOVE semantics to the copy backend
 Key: MESOS-5277
 URL: https://issues.apache.org/jira/browse/MESOS-5277
 Project: Mesos
  Issue Type: Bug
  Components: containerization
 Environment: linux
Reporter: Avinash Sridharan
Assignee: Gilbert Song


Some Dockerfile run the `rm` command to remove files from the base image using 
the "RUN" directive in the Dockerfile. An example can be found here:
https://github.com/ngineered/nginx-php-fpm.git

In the final rootfs the removed files should not be present. Presence of these 
files in the final image can make the container misbehave. For example, the 
nginx-php-fpm docker image that is reference tries to remove the default nginx 
config and replace it with it own config to point a different HTML root. If the 
default nginx config is still present after the building the image, nginx will 
start pointing to a different HTML root than the one set in the Dockerfile.


Currently the copy backend cannot handle removal of files from intermediate 
layers. This can cause issues with docker images built using a Dockerfile 
similar to the one listed here. Hence, we need to add REMOVE semantics to the 
copy backend.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5276) HTTPCommandExecutor should terminate after it receives an ACK from the agent

2016-04-25 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-5276:
-

 Summary: HTTPCommandExecutor should terminate after it receives an 
ACK from the agent
 Key: MESOS-5276
 URL: https://issues.apache.org/jira/browse/MESOS-5276
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone


Currently, the HTTP command executor does a os::sleep() (similar to the driver 
based command executor) after sending the terminal update as a hack to ensure 
the update is received by the agent. The right thing to do would be to 
terminate after receiving the ACK from the agent.

For this to work we need to ensure the agent always ACKs an executor update 
(MESOS-5262).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-04-25 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256821#comment-15256821
 ] 

Joseph Wu commented on MESOS-4760:
--

It's somewhat likely that the fetcher unification (with the {{URI::Fetcher}}) 
will make the fetcher into an injectable object.  Added this issue to the Epic 
[MESOS-3918].

> Expose metrics and gauges for fetcher cache usage and hit rate
> --
>
> Key: MESOS-4760
> URL: https://issues.apache.org/jira/browse/MESOS-4760
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, statistics
>Reporter: Michael Browning
>Assignee: Michael Browning
>Priority: Minor
>  Labels: features, fetcher, statistics, uber
>
> To evaluate the fetcher cache and calibrate the value of the 
> fetcher_cache_size flag, it would be useful to have metrics and gauges on 
> agents that expose operational statistics like cache hit rate, occupied cache 
> size, and time spent downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-04-25 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-4760:
-
Story Points: 2

> Expose metrics and gauges for fetcher cache usage and hit rate
> --
>
> Key: MESOS-4760
> URL: https://issues.apache.org/jira/browse/MESOS-4760
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, statistics
>Reporter: Michael Browning
>Assignee: Michael Browning
>Priority: Minor
>  Labels: features, fetcher, statistics, uber
>
> To evaluate the fetcher cache and calibrate the value of the 
> fetcher_cache_size flag, it would be useful to have metrics and gauges on 
> agents that expose operational statistics like cache hit rate, occupied cache 
> size, and time spent downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-04-25 Thread Michael Browning (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256760#comment-15256760
 ] 

Michael Browning commented on MESOS-4760:
-

[~bernd-mesos], great, I'll get on that. One question to guide my 
implementation: I discovered the need for this so that metrics could be 
correctly implemented, but other than that, what was the motivation for 
refactoring fetcher/agent/containerizer this way?

> Expose metrics and gauges for fetcher cache usage and hit rate
> --
>
> Key: MESOS-4760
> URL: https://issues.apache.org/jira/browse/MESOS-4760
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, statistics
>Reporter: Michael Browning
>Assignee: Michael Browning
>Priority: Minor
>  Labels: features, fetcher, statistics, uber
>
> To evaluate the fetcher cache and calibrate the value of the 
> fetcher_cache_size flag, it would be useful to have metrics and gauges on 
> agents that expose operational statistics like cache hit rate, occupied cache 
> size, and time spent downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-04-25 Thread Michael Browning (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Browning updated MESOS-4760:

Comment: was deleted

(was: Great, I'll get on that. One question to guide my implementation: I 
discovered the need for this so that metrics could be correctly implemented, 
but other than that, what was the motivation for refactoring 
fetcher/agent/containerizer this way?)

> Expose metrics and gauges for fetcher cache usage and hit rate
> --
>
> Key: MESOS-4760
> URL: https://issues.apache.org/jira/browse/MESOS-4760
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, statistics
>Reporter: Michael Browning
>Assignee: Michael Browning
>Priority: Minor
>  Labels: features, fetcher, statistics, uber
>
> To evaluate the fetcher cache and calibrate the value of the 
> fetcher_cache_size flag, it would be useful to have metrics and gauges on 
> agents that expose operational statistics like cache hit rate, occupied cache 
> size, and time spent downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-04-25 Thread Michael Browning (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256757#comment-15256757
 ] 

Michael Browning commented on MESOS-4760:
-

Great, I'll get on that. One question to guide my implementation: I discovered 
the need for this so that metrics could be correctly implemented, but other 
than that, what was the motivation for refactoring fetcher/agent/containerizer 
this way?

> Expose metrics and gauges for fetcher cache usage and hit rate
> --
>
> Key: MESOS-4760
> URL: https://issues.apache.org/jira/browse/MESOS-4760
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, statistics
>Reporter: Michael Browning
>Assignee: Michael Browning
>Priority: Minor
>  Labels: features, fetcher, statistics, uber
>
> To evaluate the fetcher cache and calibrate the value of the 
> fetcher_cache_size flag, it would be useful to have metrics and gauges on 
> agents that expose operational statistics like cache hit rate, occupied cache 
> size, and time spent downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5259) Refactor the mesos-fetcher binary to use the uri::Fetcher as a backend

2016-04-25 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5259:
-
Sprint: Mesosphere Sprint 34

> Refactor the mesos-fetcher binary to use the uri::Fetcher as a backend
> --
>
> Key: MESOS-5259
> URL: https://issues.apache.org/jira/browse/MESOS-5259
> Project: Mesos
>  Issue Type: Task
>  Components: fetcher
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: fetcher, mesosphere
>
> This is an intermediate step for combining the {{mesos-fetcher}} binary and 
> {{uri::Fetcher}}.  
> The {{download}} method should be replaced with {{uri::Fetcher::fetch}}.
> https://github.com/apache/mesos/blob/653eca74f1080f5f55cd5092423506163e65d402/src/launcher/fetcher.cpp#L179
> Combining the two will:
> * Attach the {{uri::Fetcher}} to the existing Fetcher caching logic.
> * Remove some code duplication for downloading URIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5261) Combine the internal::slave::Fetcher class and mesos-fetcher binary

2016-04-25 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5261:
-
Sprint: Mesosphere Sprint 34

> Combine the internal::slave::Fetcher class and mesos-fetcher binary
> ---
>
> Key: MESOS-5261
> URL: https://issues.apache.org/jira/browse/MESOS-5261
> Project: Mesos
>  Issue Type: Task
>  Components: fetcher
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: fetcher, mesosphere
>
> After [MESOS-5259], the {{mesos-fetcher}} will no longer need to be a 
> separate binary and can be safely folded back into the agent process.  (It 
> was a separate binary because libcurl has synchronous/blocking calls.)  
> This will likely mean:
> * A change to the {{fetch}} continuation chain:
>   
> https://github.com/apache/mesos/blob/653eca74f1080f5f55cd5092423506163e65d402/src/slave/containerizer/fetcher.cpp#L315
> * This protobuf can be deprecated (or just removed):
>   
> https://github.com/apache/mesos/blob/653eca74f1080f5f55cd5092423506163e65d402/include/mesos/fetcher/fetcher.proto



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3926) Modularize URI fetcher plugin interface.

2016-04-25 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3926:
-
Sprint: Mesosphere Sprint 34

> Modularize URI fetcher plugin interface.  
> --
>
> Key: MESOS-3926
> URL: https://issues.apache.org/jira/browse/MESOS-3926
> Project: Mesos
>  Issue Type: Task
>  Components: fetcher
>Reporter: Jie Yu
>Assignee: Shuai Lin
>  Labels: fetcher, mesosphere, module
>
> So that we can add custom URI fetcher plugins using modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5275) Add capabilities support for unified containerizer.

2016-04-25 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5275:
--
Sprint: Mesosphere Sprint 34

> Add capabilities support for unified containerizer.
> ---
>
> Key: MESOS-5275
> URL: https://issues.apache.org/jira/browse/MESOS-5275
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> Add capabilities support for unified containerizer. 
> Requirements:
> 1. Use the mesos capabilities API.
> 2. Frameworks be able to add capability requests for containers.
> 3. Agents be able to add maximum allowed capabilities for all containers 
> launched.
> Design document: 
> https://docs.google.com/document/d/1YiTift8TQla2vq3upQr7K-riQ_pQ-FKOCOsysQJROGc/edit#heading=h.rgfwelqrskmd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5275) Add capabilities support for unified containerizer.

2016-04-25 Thread Jojy Varghese (JIRA)
Jojy Varghese created MESOS-5275:


 Summary: Add capabilities support for unified containerizer.
 Key: MESOS-5275
 URL: https://issues.apache.org/jira/browse/MESOS-5275
 Project: Mesos
  Issue Type: Task
  Components: containerization
Reporter: Jojy Varghese
Assignee: Jojy Varghese


Add capabilities support for unified containerizer. 

Requirements:
1. Use the mesos capabilities API.
2. Frameworks be able to add capability requests for containers.
3. Agents be able to add maximum allowed capabilities for all containers 
launched.

Design document: 
https://docs.google.com/document/d/1YiTift8TQla2vq3upQr7K-riQ_pQ-FKOCOsysQJROGc/edit#heading=h.rgfwelqrskmd




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5273) Document "/flags" endpoint authorization as in MESOS-4785

2016-04-25 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-5273:

Sprint: Mesosphere Sprint 34
Labels: documentation mesosphere security  (was: documentation security)

> Document "/flags" endpoint authorization as in MESOS-4785
> -
>
> Key: MESOS-5273
> URL: https://issues.apache.org/jira/browse/MESOS-5273
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: documentation, mesosphere, security
> Fix For: 0.29.0
>
>
> MESOS-4785 reorganizes the documentation of the authorization features that 
> are available in Mesos. The authorization of the "/flags" endpoint, 
> introduced in MESOS-5142 needs to be documented in a similar way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5249) Update CMake files to reflect reorganized 3rdparty

2016-04-25 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-5249:
--
Shepherd: Kapil Arya

> Update CMake files to reflect reorganized 3rdparty
> --
>
> Key: MESOS-5249
> URL: https://issues.apache.org/jira/browse/MESOS-5249
> Project: Mesos
>  Issue Type: Task
>  Components: build
>Reporter: Kapil Arya
>Assignee: Alex Clemmer
>  Labels: mesosphere
> Fix For: 0.29.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5273) Document "/flags" endpoint authorization as in MESOS-4785

2016-04-25 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht reassigned MESOS-5273:
---

Assignee: Jan Schlicht

> Document "/flags" endpoint authorization as in MESOS-4785
> -
>
> Key: MESOS-5273
> URL: https://issues.apache.org/jira/browse/MESOS-5273
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: documentation, security
> Fix For: 0.29.0
>
>
> MESOS-4785 reorganizes the documentation of the authorization features that 
> are available in Mesos. The authorization of the "/flags" endpoint, 
> introduced in MESOS-5142 needs to be documented in a similar way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4763) Add test mock for CNI plugins.

2016-04-25 Thread Avinash Sridharan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256683#comment-15256683
 ] 

Avinash Sridharan commented on MESOS-4763:
--

Hi Qian,
 I think there are bunch of reviews out for this JIRA. Can we move this to 
"Reviewable" and point to the RBs that relate to this JIRA?

Thanks,
Avinash

> Add test mock for CNI plugins.
> --
>
> Key: MESOS-4763
> URL: https://issues.apache.org/jira/browse/MESOS-4763
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> In order to test the network/cni isolator, we need to mock the behavior of an 
> CNI plugin. One option is to write a mock script which acts as a CNI plugin. 
> The isolator will talk to the mock script the same way it talks to an actual 
> CNI plugin.
> The mock script can just join the host network?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5250) Move 3rdparty/libprocess/3rdparty/* to 3rdparty/

2016-04-25 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-5250:
--
Shepherd: Joris Van Remoortere

> Move 3rdparty/libprocess/3rdparty/* to 3rdparty/
> 
>
> Key: MESOS-5250
> URL: https://issues.apache.org/jira/browse/MESOS-5250
> Project: Mesos
>  Issue Type: Task
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4690) Reorganize 3rdparty directory

2016-04-25 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-4690:
--
Shepherd: Joris Van Remoortere

> Reorganize 3rdparty directory
> -
>
> Key: MESOS-4690
> URL: https://issues.apache.org/jira/browse/MESOS-4690
> Project: Mesos
>  Issue Type: Epic
>  Components: build, libprocess, stout
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> This issues is currently being discussed in the dev mailing list:
> http://www.mail-archive.com/dev@mesos.apache.org/msg34349.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4434) Install 3rdparty package boost, glog, protobuf and picojson when installing Mesos

2016-04-25 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-4434:
--
Shepherd: Joris Van Remoortere

> Install 3rdparty package boost, glog, protobuf and picojson when installing 
> Mesos
> -
>
> Key: MESOS-4434
> URL: https://issues.apache.org/jira/browse/MESOS-4434
> Project: Mesos
>  Issue Type: Bug
>  Components: build, modules
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> Mesos modules depend on having these packages installed with the exact 
> version as Mesos was compiled with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3319) Mesos will not build when configured with gperftools enabled

2016-04-25 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256677#comment-15256677
 ] 

Greg Mann commented on MESOS-3319:
--

And another review for docs: https://reviews.apache.org/r/46647/

> Mesos will not build when configured with gperftools enabled
> 
>
> Key: MESOS-3319
> URL: https://issues.apache.org/jira/browse/MESOS-3319
> Project: Mesos
>  Issue Type: Bug
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: build
>
> Mesos configured with {{--enable-perftools}} currently will not build on OSX 
> 10.10.4 or Ubuntu 14.04, possibly because the bundled gperftools-2.0 is not 
> current. The stable release is now 2.4, which builds successfully on both of 
> these platforms.
> This issue is resolved when Mesos will build successfully out of the box with 
> gperftools enabled. After this ticket is resolved, the libprocess profiler 
> should be tested to confirm that it still works and if not, it should be 
> fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5028) Copy provisioner cannot replace directory with symlink

2016-04-25 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5028:
--
Story Points: 3

> Copy provisioner cannot replace directory with symlink
> --
>
> Key: MESOS-5028
> URL: https://issues.apache.org/jira/browse/MESOS-5028
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Zhitao Li
>Assignee: Gilbert Song
>
> I'm trying to play with the new image provisioner on our custom docker 
> images, but one of layer failed to get copied, possibly due to a dangling 
> symlink.
> Error log with Glog_v=1:
> {quote}
> I0324 05:42:48.926678 15067 copy.cpp:127] Copying layer path 
> '/tmp/mesos/store/docker/layers/5df0888641196b88dcc1b97d04c74839f02a73b8a194a79e134426d6a8fcb0f1/rootfs'
>  to rootfs 
> '/var/lib/mesos/provisioner/containers/5f05be6c-c970-4539-aa64-fd0eef2ec7ae/backends/copy/rootfses/507173f3-e316-48a3-a96e-5fdea9ffe9f6'
> E0324 05:42:49.028506 15062 slave.cpp:3773] Container 
> '5f05be6c-c970-4539-aa64-fd0eef2ec7ae' for executor 'test' of framework 
> 75932a89-1514-4011-bafe-beb6a208bb2d-0004 failed to start: Collect failed: 
> Collect failed: Failed to copy layer: cp: cannot overwrite directory 
> ‘/var/lib/mesos/provisioner/containers/5f05be6c-c970-4539-aa64-fd0eef2ec7ae/backends/copy/rootfses/507173f3-e316-48a3-a96e-5fdea9ffe9f6/etc/apt’
>  with non-directory
> {quote}
> Content of 
> _/tmp/mesos/store/docker/layers/5df0888641196b88dcc1b97d04c74839f02a73b8a194a79e134426d6a8fcb0f1/rootfs/etc/apt_
>  points to a non-existing absolute path (cannot provide exact path but it's a 
> result of us trying to mount apt keys into docker container at build time).
> I believe what happened is that we executed a script at build time, which 
> contains equivalent of:
> {quote}
> rm -rf /etc/apt/* && ln -sf /build-mount-point/ /etc/apt
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2222) Add ACLs for the maintenance HTTP endpoints.

2016-04-25 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-:

Sprint: Mesosphere Sprint 34

> Add ACLs for the maintenance HTTP endpoints.
> 
>
> Key: MESOS-
> URL: https://issues.apache.org/jira/browse/MESOS-
> Project: Mesos
>  Issue Type: Task
>  Components: master, security
>Affects Versions: 0.25.0
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: authorization, maintenance, mesosphere, security
>
> In order to authorize the HTTP endpoints for maintenance (to be added in 
> MESOS-2067), we will need to add an ACL definition for performing maintenance 
> operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5164) Add authorization to agent's /monitor/statistics endpoint

2016-04-25 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256637#comment-15256637
 ] 

Adam B commented on MESOS-5164:
---

Committed one patch, still working on the other.

commit 365ec5915a29721fd04572ed891cab5ed35a78bb
Author: Benjamin Bannier 
Date:   Mon Apr 25 03:58:58 2016 -0700

Added helper to create test agent with injected `Authorizer`.

In addition to the fully generic interface we do provide a number of
short hand functions for creating agents in tests which allow injecting
just a single component. Add one such short hand function for creating
a test agent with an injected `Authorizer` which we will use in a
subsequent patch.

Review: https://reviews.apache.org/r/46318/

> Add authorization to agent's /monitor/statistics endpoint
> -
>
> Key: MESOS-5164
> URL: https://issues.apache.org/jira/browse/MESOS-5164
> Project: Mesos
>  Issue Type: Task
>  Components: security, slave
>Reporter: Adam B
>Assignee: Benjamin Bannier
>  Labels: authorization, mesosphere, security
> Fix For: 0.29.0
>
>
> Operators may want to enforce that only specific authorized users be able to 
> view per-executor resource usage statistics. For 0.29 MVP, we can make this 
> coarse-grained, and assume that only the operator or a operator-privileged 
> monitoring service will be accessing the endpoint.
> For a future release, we can consider fine-grained authz that filters 
> statistics like we plan to do for /tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5142) Add agent flags for HTTP authorization

2016-04-25 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256631#comment-15256631
 ] 

Adam B commented on MESOS-5142:
---

Committed the new --authorizer agent flag (first review above), but not 
authorization of /flags (2nd review).

commit a3da5811e0de83373f6ef5d98fbe9f72e65de046
Author: Jan Schlicht 
Date:   Mon Apr 25 03:57:31 2016 -0700

Added agent authorization flags.

Review: https://reviews.apache.org/r/45922/

> Add agent flags for HTTP authorization
> --
>
> Key: MESOS-5142
> URL: https://issues.apache.org/jira/browse/MESOS-5142
> Project: Mesos
>  Issue Type: Bug
>  Components: security, slave
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere, security
> Fix For: 0.29.0
>
>
> Flags should be added to the agent to:
> 1. Enable authorization ({{--authorizers}})
> 2. Provide ACLs ({{--acls}})



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4279) Docker executor truncates task's output when the task is killed.

2016-04-25 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-4279:
---
Affects Version/s: 0.28.1
Fix Version/s: 0.29.0
  Summary: Docker executor truncates task's output when the task is 
killed.  (was: Graceful restart of docker task)

> Docker executor truncates task's output when the task is killed.
> 
>
> Key: MESOS-4279
> URL: https://issues.apache.org/jira/browse/MESOS-4279
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.2, 0.28.1
>Reporter: Martin Bydzovsky
>Assignee: Qian Zhang
>  Labels: docker, mesosphere
> Fix For: 0.29.0
>
>
> I'm implementing a graceful restarts of our mesos-marathon-docker setup and I 
> came to a following issue:
> (it was already discussed on 
> https://github.com/mesosphere/marathon/issues/2876 and guys form mesosphere 
> got to a point that its probably a docker containerizer problem...)
> To sum it up:
> When i deploy simple python script to all mesos-slaves:
> {code}
> #!/usr/bin/python
> from time import sleep
> import signal
> import sys
> import datetime
> def sigterm_handler(_signo, _stack_frame):
> print "got %i" % _signo
> print datetime.datetime.now().time()
> sys.stdout.flush()
> sleep(2)
> print datetime.datetime.now().time()
> print "ending"
> sys.stdout.flush()
> sys.exit(0)
> signal.signal(signal.SIGTERM, sigterm_handler)
> signal.signal(signal.SIGINT, sigterm_handler)
> try:
> print "Hello"
> i = 0
> while True:
> i += 1
> print datetime.datetime.now().time()
> print "Iteration #%i" % i
> sys.stdout.flush()
> sleep(1)
> finally:
> print "Goodbye"
> {code}
> and I run it through Marathon like
> {code:javascript}
> data = {
>   args: ["/tmp/script.py"],
>   instances: 1,
>   cpus: 0.1,
>   mem: 256,
>   id: "marathon-test-api"
> }
> {code}
> During the app restart I get expected result - the task receives sigterm and 
> dies peacefully (during my script-specified 2 seconds period)
> But when i wrap this python script in a docker:
> {code}
> FROM node:4.2
> RUN mkdir /app
> ADD . /app
> WORKDIR /app
> ENTRYPOINT []
> {code}
> and run appropriate application by Marathon:
> {code:javascript}
> data = {
>   args: ["./script.py"],
>   container: {
>   type: "DOCKER",
>   docker: {
>   image: "bydga/marathon-test-api"
>   },
>   forcePullImage: yes
>   },
>   cpus: 0.1,
>   mem: 256,
>   instances: 1,
>   id: "marathon-test-api"
> }
> {code}
> The task during restart (issued from marathon) dies immediately without 
> having a chance to do any cleanup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3319) Mesos will not build when configured with gperftools enabled

2016-04-25 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256626#comment-15256626
 ] 

Greg Mann commented on MESOS-3319:
--

Another review here: https://reviews.apache.org/r/46643/

> Mesos will not build when configured with gperftools enabled
> 
>
> Key: MESOS-3319
> URL: https://issues.apache.org/jira/browse/MESOS-3319
> Project: Mesos
>  Issue Type: Bug
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: build
>
> Mesos configured with {{--enable-perftools}} currently will not build on OSX 
> 10.10.4 or Ubuntu 14.04, possibly because the bundled gperftools-2.0 is not 
> current. The stable release is now 2.4, which builds successfully on both of 
> these platforms.
> This issue is resolved when Mesos will build successfully out of the box with 
> gperftools enabled. After this ticket is resolved, the libprocess profiler 
> should be tested to confirm that it still works and if not, it should be 
> fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4902) Add authentication to libprocess endpoints

2016-04-25 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256618#comment-15256618
 ] 

Greg Mann commented on MESOS-4902:
--

Another review here: https://reviews.apache.org/r/46641/

> Add authentication to libprocess endpoints
> --
>
> Key: MESOS-4902
> URL: https://issues.apache.org/jira/browse/MESOS-4902
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: authentication, http, mesosphere, security
> Fix For: 0.29.0
>
>
> In addition to the endpoints addressed by MESOS-4850 and MESOS-5152, the 
> following endpoints would also benefit from HTTP authentication:
> * {{/profiler/*}}
> * {{/logging/toggle}}
> * {{/metrics/snapshot}}
> Adding HTTP authentication to these endpoints is a bit more complicated 
> because they are defined at the libprocess level.
> While working on MESOS-4850, it became apparent that since our tests use the 
> same instance of libprocess for both master and agent, different default 
> authentication realms must be used for master/agent so that HTTP 
> authentication can be independently enabled/disabled for each.
> We should establish a mechanism for making an endpoint authenticated that 
> allows us to:
> 1) Install an endpoint like {{/files}}, whose code is shared by the master 
> and agent, with different authentication realms for the master and agent
> 2) Avoid hard-coding a default authentication realm into libprocess, to 
> permit the use of different authentication realms for the master and agent 
> and to keep application-level concerns from leaking into libprocess
> Another option would be to use a single default authentication realm and 
> always enable or disable HTTP authentication for *both* the master and agent 
> in tests. However, this wouldn't allow us to test scenarios where HTTP 
> authentication is enabled on one but disabled on the other.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4316) Support get non-default weights by /weights

2016-04-25 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256616#comment-15256616
 ] 

Adam B commented on MESOS-4316:
---

commit d014d994b4513f9dcf9d33a293e847505bfb10fb
Author: Adam B 
Date:   Mon Apr 25 05:09:44 2016 -0700

Fixed unsigned int comparison.

commit 03168ce650856bffdcf172c3e87d85c2e2dd8f6b
Author: Yongqiao Wang 
Date:   Mon Apr 25 04:23:16 2016 -0700

Added positive tests for /weights endpoint.

Review: https://reviews.apache.org/r/46139/

> Support get non-default weights by /weights
> ---
>
> Key: MESOS-4316
> URL: https://issues.apache.org/jira/browse/MESOS-4316
> Project: Mesos
>  Issue Type: Task
>Reporter: Yongqiao Wang
>Assignee: Yongqiao Wang
>Priority: Minor
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> Like /quota, we should also add query logic for /weights to keep consistent. 
> Then /roles no longer needs to show weight information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5222) Create a benchmark for scale testing HTTP frameworks

2016-04-25 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5222:
--
Sprint: Mesosphere Sprint 34

> Create a benchmark for scale testing HTTP frameworks
> 
>
> Key: MESOS-5222
> URL: https://issues.apache.org/jira/browse/MESOS-5222
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> It would be good to add a benchmark for scale testing the HTTP frameworks wrt 
> driver based frameworks. The benchmark can be as simple as trying to launch N 
> tasks (parameterized) with the old/new API. We can then focus on fixing 
> performance issues that we find as a result of this exercise.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5181) Master should reject calls from the scheduler driver if the scheduler is not connected.

2016-04-25 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5181:
--
Sprint: Mesosphere Sprint 34

> Master should reject calls from the scheduler driver if the scheduler is not 
> connected.
> ---
>
> Key: MESOS-5181
> URL: https://issues.apache.org/jira/browse/MESOS-5181
> Project: Mesos
>  Issue Type: Bug
>  Components: scheduler driver
>Affects Versions: 0.24.0
>Reporter: Joseph Wu
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> When a scheduler registers, the master will create a link from master to 
> scheduler.  If this link breaks, the master will consider the scheduler 
> {{inactive}} and mark it as {{disconnected}}.
> This causes a couple problems:
> 1) Master does not send offers to {{inactive}} schedulers.  But these 
> schedulers might consider themselves "registered" in a one-way network 
> partition scenario.
> 2) Any calls from the {{inactive}} scheduler is still accepted, which leaves 
> the scheduler in a starved, but semi-functional state.
> See the related issue for more context: MESOS-5180
> There should be an additional guard for registered, but {{inactive}} 
> schedulers here:
> https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/master.cpp#L1977
> The HTTP API already does this:
> https://github.com/apache/mesos/blob/94f4f4ebb7d491ec6da1473b619600332981dd8e/src/master/http.cpp#L459
> Since the scheduler driver cannot return a 403, it may be necessary to return 
> a {{Event::ERROR}} and force the scheduler to abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5180) Scheduler driver does not detect disconnection with master and reregister.

2016-04-25 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5180:
--
  Sprint: Mesosphere Sprint 34
Story Points: 3  (was: 2)

> Scheduler driver does not detect disconnection with master and reregister.
> --
>
> Key: MESOS-5180
> URL: https://issues.apache.org/jira/browse/MESOS-5180
> Project: Mesos
>  Issue Type: Bug
>  Components: scheduler driver
>Affects Versions: 0.24.0
>Reporter: Joseph Wu
>Assignee: Anand Mazumdar
>  Labels: mesosphere
>
> The existing implementation of the scheduler driver does not re-register with 
> the master under some network partition cases.
> When a scheduler registers with the master:
> 1) master links to the framework
> 2) framework links to the master
> It is possible for either of these links to break *without* the master 
> changing.  (Currently, the scheduler driver will only re-register if the 
> master changes).
> If both links break or if just link (1) breaks, the master views the 
> framework as {{inactive}} and {{disconnected}}.  This means the framework 
> will not receive any more events (such as offers) from the master until it 
> re-registers.  There is currently no way for the scheduler to detect a 
> one-way link breakage.
> if link (2) breaks, it makes (almost) no difference to the scheduler.  The 
> scheduler usually uses the link to send messages to the master, but 
> libprocess will create another socket if the persistent one is not available.
> To fix link breakages for (1+2) and (2), the scheduler driver should 
> implement a `::exited` event handler for the master's {{pid}} and trigger a 
> master (re-)detection upon a disconnection. This in turn should make the 
> driver (re)-register with the master. The scheduler library already does 
> this: 
> https://github.com/apache/mesos/blob/master/src/scheduler/scheduler.cpp#L395
> See the related issue MESOS-5181 for link (1) breakage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5274) DockerRuntimeIsolatorTest.ROOT_DockerDefaultEntryptLocalPuller is flaky

2016-04-25 Thread Neil Conway (JIRA)
Neil Conway created MESOS-5274:
--

 Summary: 
DockerRuntimeIsolatorTest.ROOT_DockerDefaultEntryptLocalPuller is flaky
 Key: MESOS-5274
 URL: https://issues.apache.org/jira/browse/MESOS-5274
 Project: Mesos
  Issue Type: Bug
Reporter: Neil Conway


Observed on Mesosphere internal CI:

{noformat}
[15:07:02] : [Step 10/10] [ RUN  ] 
DockerRuntimeIsolatorTest.ROOT_DockerDefaultEntryptLocalPuller
[15:07:02]W: [Step 10/10] I0425 15:07:02.166211 32147 cluster.cpp:149] 
Creating default 'local' authorizer
[15:07:02]W: [Step 10/10] I0425 15:07:02.178527 32147 leveldb.cpp:174] 
Opened db in 12.157082ms
[15:07:02]W: [Step 10/10] I0425 15:07:02.179869 32147 leveldb.cpp:181] 
Compacted db in 1.313946ms
[15:07:02]W: [Step 10/10] I0425 15:07:02.179893 32147 leveldb.cpp:196] 
Created db iterator in 3836ns
[15:07:02]W: [Step 10/10] I0425 15:07:02.179913 32147 leveldb.cpp:202] 
Seeked to beginning of db in 656ns
[15:07:02]W: [Step 10/10] I0425 15:07:02.179919 32147 leveldb.cpp:271] 
Iterated through 0 keys in the db in 553ns
[15:07:02]W: [Step 10/10] I0425 15:07:02.179934 32147 replica.cpp:779] 
Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
[15:07:02]W: [Step 10/10] I0425 15:07:02.180132 32165 recover.cpp:447] 
Starting replica recovery
[15:07:02]W: [Step 10/10] I0425 15:07:02.180217 32163 recover.cpp:473] 
Replica is in EMPTY status
[15:07:02]W: [Step 10/10] I0425 15:07:02.180500 32161 replica.cpp:673] 
Replica in EMPTY status received a broadcasted recover request from 
(17793)@172.30.2.13:42326
[15:07:02]W: [Step 10/10] I0425 15:07:02.180655 32167 recover.cpp:193] 
Received a recover response from a replica in EMPTY status
[15:07:02]W: [Step 10/10] I0425 15:07:02.180891 32165 recover.cpp:564] 
Updating replica status to STARTING
[15:07:02]W: [Step 10/10] I0425 15:07:02.181026 32168 master.cpp:382] 
Master f480c5f5-0eb6-43fe-a0c9-6bd9cd62f517 (ip-172-30-2-13.mesosphere.io) 
started on 172.30.2.13:42326
[15:07:02]W: [Step 10/10] I0425 15:07:02.181040 32168 master.cpp:384] Flags 
at startup: --acls="" --allocation_interval="1secs" 
--allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" 
--authenticate_http_frameworks="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/G2DxFl/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_slave_ping_timeouts="5" --quiet="false" 
--recovery_slave_removal_limit="100%" --registry="replicated_log" 
--registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
--registry_strict="true" --root_submissions="true" 
--slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
--user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/G2DxFl/master" 
--zk_session_timeout="10secs"
[15:07:02]W: [Step 10/10] I0425 15:07:02.181165 32168 master.cpp:433] 
Master only allowing authenticated frameworks to register
[15:07:02]W: [Step 10/10] I0425 15:07:02.181171 32168 master.cpp:439] 
Master only allowing authenticated agents to register
[15:07:02]W: [Step 10/10] I0425 15:07:02.181175 32168 master.cpp:445] 
Master only allowing authenticated HTTP frameworks to register
[15:07:02]W: [Step 10/10] I0425 15:07:02.181180 32168 credentials.hpp:37] 
Loading credentials for authentication from '/tmp/G2DxFl/credentials'
[15:07:02]W: [Step 10/10] I0425 15:07:02.181313 32168 master.cpp:489] Using 
default 'crammd5' authenticator
[15:07:02]W: [Step 10/10] I0425 15:07:02.181352 32168 master.cpp:560] Using 
default 'basic' HTTP authenticator
[15:07:02]W: [Step 10/10] I0425 15:07:02.181421 32168 master.cpp:640] Using 
default 'basic' HTTP framework authenticator
[15:07:02]W: [Step 10/10] I0425 15:07:02.181473 32168 master.cpp:687] 
Authorization enabled
[15:07:02]W: [Step 10/10] I0425 15:07:02.181556 32162 
whitelist_watcher.cpp:77] No whitelist given
[15:07:02]W: [Step 10/10] I0425 15:07:02.181576 32163 hierarchical.cpp:142] 
Initialized hierarchical allocator process
[15:07:02]W: [Step 10/10] I0425 15:07:02.182076 32164 master.cpp:1932] The 
newly elected leader is master@172.30.2.13:42326 with id 
f480c5f5-0eb6-43fe-a0c9-6bd9cd62f517
[15:07:02]W: [Step 10/10] I0425 15:07:02.182090 32164 master.cpp:1945] 
Elected as the leading master!
[15:07:02]W: [Step 10/10] I0425 15:07:02.182095 32164 master.cpp:1632] 
Recovering from registrar
[15:07:02]W: [Step 10/10] I0425 15:07:02.182142 32168 registrar.cpp:331] 
Recovering registrar
[15:07:02]W: [Step 10/10] 

[jira] [Commented] (MESOS-5236) OverlayBackendTest.ROOT_OVERLAYFS_OverlayFSBackend test flakiness

2016-04-25 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256427#comment-15256427
 ] 

Jan Schlicht commented on MESOS-5236:
-

Looks like the version of overlay used in the kernel of Ubuntu 14.04 is rather 
old. It's 3.13.0. Ubuntu 12.04 is using the same (backported) version. In this 
version overlay doesn't support {{workdir}} yet.
This small test worked:
{noformat}
cd /tmp
mkdir lower upper overlay
sudo mount -t overlayfs -o lowerdir=/tmp/lower,upperdir=/tmp/upper none 
/tmp/overlay
{noformat}
This one doesn't:
{noformat}
cd /tmp
mkdir lower upper workdir overlay
sudo mount -t overlayfs -o 
lowerdir=/tmp/lower,upperdir=/tmp/upper,workdir=/tmp/workdir none /tmp/overlay
{noformat}

> OverlayBackendTest.ROOT_OVERLAYFS_OverlayFSBackend test flakiness
> -
>
> Key: MESOS-5236
> URL: https://issues.apache.org/jira/browse/MESOS-5236
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>  Labels: flaky, flaky-test, mesosphere
>
> Observed on internal Mesosphere CI:
> {noformat}
> [13:31:06] : [Step 10/10] [ RUN  ] 
> OverlayBackendTest.ROOT_OVERLAYFS_OverlayFSBackend
> [13:31:06]W: [Step 10/10] I0419 13:31:06.708961 23289 overlay.cpp:161] 
> Provisioning image rootfs with overlayfs: 
> 'lowerdir=/tmp/Dkgh5V/source2:/tmp/Dkgh5V/source1,upperdir=/tmp/Dkgh5V/scratch/rootfs/upperdir,workdir=/tmp/Dkgh5V/scrat\
> ch/rootfs/workdir'
> [13:31:06] : [Step 10/10] 
> ../../src/tests/containerizer/provisioner_backend_tests.cpp:97: Failure
> [13:31:06] : [Step 10/10] (backends["overlay"]->provision( {layer1, 
> layer2}, rootfs, sandbox.get())).failure(): Failed to mount rootfs 
> '/tmp/Dkgh5V/rootfs' with overlayfs: No such device
> [13:31:06] : [Step 10/10] [  FAILED  ] 
> OverlayBackendTest.ROOT_OVERLAYFS_OverlayFSBackend (5 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5242) pivot_root is not available on System z (s390x)

2016-04-25 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256409#comment-15256409
 ] 

haosdent commented on MESOS-5242:
-

Hi, [~bingli1000] Thank you for pop up this. I believe your work have 
duplicated with [Looking for Shepherd for MESOS-5263
 | 
http://search-hadoop.com/m/0Vlr6WzbcX1eUz3d2=Looking+for+Shepherd+for+MESOS+5263]

> pivot_root is not available on System z (s390x)
> ---
>
> Key: MESOS-5242
> URL: https://issues.apache.org/jira/browse/MESOS-5242
> Project: Mesos
>  Issue Type: Bug
> Environment: Hardward: IBM System z
> OS: Linux on z SLES12SP1
>Reporter: Bing Li
>Assignee: Bing Li
>
> Got error "pivot_root is not available" which is similar to MESOS-5121 .
> Added syscall pivot_root definition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5242) pivot_root is not available on System z (s390x)

2016-04-25 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li reassigned MESOS-5242:
--

Assignee: Bing Li

> pivot_root is not available on System z (s390x)
> ---
>
> Key: MESOS-5242
> URL: https://issues.apache.org/jira/browse/MESOS-5242
> Project: Mesos
>  Issue Type: Bug
> Environment: Hardward: IBM System z
> OS: Linux on z SLES12SP1
>Reporter: Bing Li
>Assignee: Bing Li
>
> Got error "pivot_root is not available" which is similar to MESOS-5121 .
> Added syscall pivot_root definition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2222) Add ACLs for the maintenance HTTP endpoints.

2016-04-25 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-:
---

Assignee: Benjamin Bannier

> Add ACLs for the maintenance HTTP endpoints.
> 
>
> Key: MESOS-
> URL: https://issues.apache.org/jira/browse/MESOS-
> Project: Mesos
>  Issue Type: Task
>  Components: master, security
>Affects Versions: 0.25.0
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: authorization, maintenance, mesosphere, security
>
> In order to authorize the HTTP endpoints for maintenance (to be added in 
> MESOS-2067), we will need to add an ACL definition for performing maintenance 
> operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5273) Document "/flags" endpoint authorization as in MESOS-4785

2016-04-25 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256211#comment-15256211
 ] 

Adam B commented on MESOS-5273:
---

I'd also like to see authz documentation added to the auto-generated endpoint 
help docs, just like we did for authentication in MESOS-4934

> Document "/flags" endpoint authorization as in MESOS-4785
> -
>
> Key: MESOS-5273
> URL: https://issues.apache.org/jira/browse/MESOS-5273
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Jan Schlicht
>  Labels: documentation, security
> Fix For: 0.29.0
>
>
> MESOS-4785 reorganizes the documentation of the authorization features that 
> are available in Mesos. The authorization of the "/flags" endpoint, 
> introduced in MESOS-5142 needs to be documented in a similar way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5244) Compilation failure on Ubuntu 16.04

2016-04-25 Thread Chen Zhiwei (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256145#comment-15256145
 ] 

Chen Zhiwei commented on MESOS-5244:


Since distribute was merged to setuptools, so I plan to replace it with 
setuptools.

> Compilation failure on Ubuntu 16.04
> ---
>
> Key: MESOS-5244
> URL: https://issues.apache.org/jira/browse/MESOS-5244
> Project: Mesos
>  Issue Type: Bug
>  Components: python api
>Reporter: Kapil Arya
>Assignee: Chen Zhiwei
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> I saw the following error when trying to compile Mesos on Ubuntu 16.04:
> {code}
> dekaksi:...mesos/build/src> make 
> ../3rdparty/protobuf-2.6.1/python/dist/protobuf-2.6.1-py2.7.egg
> Building protobuf Python egg ...
> cd ../3rdparty/protobuf-2.6.1/python && \
>   CC="gcc"  \
>   CXX="g++" \
>   CFLAGS="-g -O2 -Wno-unused-local-typedefs -g1 -O0"  
>   \
>   CXXFLAGS="-g -O2 -Wno-unused-local-typedefs -Wno-maybe-uninitialized 
> -std=c++11 -g1 -O0"   \
>   PYTHONPATH=/home/kapil/mesos/build/3rdparty/distribute-0.6.26 \
>   /usr/bin/python setup.py build bdist_egg
> Traceback (most recent call last):
>   File "setup.py", line 11, in 
> from setuptools import setup, Extension
>   File 
> "/home/kapil/mesos/build/3rdparty/distribute-0.6.26/setuptools/__init__.py", 
> line 2, in 
> from setuptools.extension import Extension, Library
>   File 
> "/home/kapil/mesos/build/3rdparty/distribute-0.6.26/setuptools/extension.py", 
> line 5, in 
> from setuptools.dist import _get_unpatched
>   File 
> "/home/kapil/mesos/build/3rdparty/distribute-0.6.26/setuptools/dist.py", line 
> 6, in 
> from setuptools.command.install import install
>   File 
> "/home/kapil/mesos/build/3rdparty/distribute-0.6.26/setuptools/command/__init__.py",
>  line 8, in 
> from setuptools.command import install_scripts
>   File 
> "/home/kapil/mesos/build/3rdparty/distribute-0.6.26/setuptools/command/install_scripts.py",
>  line 3, in 
> from pkg_resources import Distribution, PathMetadata, ensure_directory
>   File "/home/kapil/mesos/build/3rdparty/distribute-0.6.26/pkg_resources.py", 
> line 2731, in 
> add_activation_listener(lambda dist: dist.activate())
>   File "/home/kapil/mesos/build/3rdparty/distribute-0.6.26/pkg_resources.py", 
> line 704, in subscribe
> callback(dist)
>   File "/home/kapil/mesos/build/3rdparty/distribute-0.6.26/pkg_resources.py", 
> line 2731, in 
> add_activation_listener(lambda dist: dist.activate())
>   File "/home/kapil/mesos/build/3rdparty/distribute-0.6.26/pkg_resources.py", 
> line 2231, in activate
> self.insert_on(path)
>   File "/home/kapil/mesos/build/3rdparty/distribute-0.6.26/pkg_resources.py", 
> line 2332, in insert_on
> "with distribute. Found one at %s" % str(self.location))
> ValueError: A 0.7-series setuptools cannot be installed with distribute. 
> Found one at /usr/lib/python2.7/dist-packages
> Makefile:10869: recipe for target 
> '../3rdparty/protobuf-2.6.1/python/dist/protobuf-2.6.1-py2.7.egg' failed
> make: *** [../3rdparty/protobuf-2.6.1/python/dist/protobuf-2.6.1-py2.7.egg] 
> Error 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3367) Mesos fetcher does not extract archives for URI with parameters

2016-04-25 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256141#comment-15256141
 ] 

Bernd Mathiske commented on MESOS-3367:
---

Sorry, I wasn't following this, because I was OOO. Just FYI I agree with the 
resolution. :-)

> Mesos fetcher does not extract archives for URI with parameters
> ---
>
> Key: MESOS-3367
> URL: https://issues.apache.org/jira/browse/MESOS-3367
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 0.22.1, 0.23.0
> Environment: DCOS 1.1
>Reporter: Renat Zubairov
>Assignee: haosdent
>Priority: Minor
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> I'm deploying using marathon applications with sources served from S3. I'm 
> using a signed URL to give only temporary access to the S3 resources, so URL 
> of the resource have some query parameters.
> So URI is 'https://foo.com/file.tgz?hasi' and fetcher stores it in the file 
> with the name 'file.tgz?hasi', then it thinks that extension 'hasi' is not 
> tgz hence extraction is skipped, despite the fact that MIME Type of the HTTP 
> resource is 'application/x-tar'.
> Workaround - add additional parameter like '=.tgz'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5269) Replace Master/Slave Terminology Phase I - Update Metrics

2016-04-25 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256089#comment-15256089
 ] 

haosdent commented on MESOS-5269:
-

OK, I think isn't fit for this.

> Replace Master/Slave Terminology Phase I - Update Metrics
> -
>
> Key: MESOS-5269
> URL: https://issues.apache.org/jira/browse/MESOS-5269
> Project: Mesos
>  Issue Type: Task
>Reporter: Jay Guo
>
>   process::metrics::Gauge slaves_connected;
>   process::metrics::Gauge slaves_disconnected;
>   process::metrics::Gauge slaves_active;
>   process::metrics::Gauge slaves_inactive;
>   process::metrics::Counter messages_register_slave;
>   process::metrics::Counter messages_reregister_slave;
>   process::metrics::Counter messages_unregister_slave;
>   process::metrics::Counter messages_update_slave;
>   process::metrics::Counter recovery_slave_removals;
>   process::metrics::Counter slave_registrations;
>   process::metrics::Counter slave_reregistrations;
>   process::metrics::Counter slave_removals;
>   process::metrics::Counter slave_removals_reason_unhealthy;
>   process::metrics::Counter slave_removals_reason_unregistered;
>   process::metrics::Counter slave_removals_reason_registered;
>   process::metrics::Counter slave_shutdowns_scheduled;
>   process::metrics::Counter slave_shutdowns_completed;
>   process::metrics::Counter slave_shutdowns_canceled;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5273) Document "/flags" endpoint authorization as in MESOS-4785

2016-04-25 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-5273:
---

 Summary: Document "/flags" endpoint authorization as in MESOS-4785
 Key: MESOS-5273
 URL: https://issues.apache.org/jira/browse/MESOS-5273
 Project: Mesos
  Issue Type: Bug
  Components: documentation
Reporter: Jan Schlicht
 Fix For: 0.29.0


MESOS-4785 reorganizes the documentation of the authorization features that are 
available in Mesos. The authorization of the "/flags" endpoint, introduced in 
MESOS-5142 needs to be documented in a similar way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5269) Replace Master/Slave Terminology Phase I - Update Metrics

2016-04-25 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256083#comment-15256083
 ] 

haosdent commented on MESOS-5269:
-

[~guoger] Do you think {{multihashmap}} is fit for this case?

> Replace Master/Slave Terminology Phase I - Update Metrics
> -
>
> Key: MESOS-5269
> URL: https://issues.apache.org/jira/browse/MESOS-5269
> Project: Mesos
>  Issue Type: Task
>Reporter: Jay Guo
>
>   process::metrics::Gauge slaves_connected;
>   process::metrics::Gauge slaves_disconnected;
>   process::metrics::Gauge slaves_active;
>   process::metrics::Gauge slaves_inactive;
>   process::metrics::Counter messages_register_slave;
>   process::metrics::Counter messages_reregister_slave;
>   process::metrics::Counter messages_unregister_slave;
>   process::metrics::Counter messages_update_slave;
>   process::metrics::Counter recovery_slave_removals;
>   process::metrics::Counter slave_registrations;
>   process::metrics::Counter slave_reregistrations;
>   process::metrics::Counter slave_removals;
>   process::metrics::Counter slave_removals_reason_unhealthy;
>   process::metrics::Counter slave_removals_reason_unregistered;
>   process::metrics::Counter slave_removals_reason_registered;
>   process::metrics::Counter slave_shutdowns_scheduled;
>   process::metrics::Counter slave_shutdowns_completed;
>   process::metrics::Counter slave_shutdowns_canceled;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5269) Replace Master/Slave Terminology Phase I - Update Metrics

2016-04-25 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256081#comment-15256081
 ] 

Jay Guo commented on MESOS-5269:


[~vinodkone]

> Replace Master/Slave Terminology Phase I - Update Metrics
> -
>
> Key: MESOS-5269
> URL: https://issues.apache.org/jira/browse/MESOS-5269
> Project: Mesos
>  Issue Type: Task
>Reporter: Jay Guo
>
>   process::metrics::Gauge slaves_connected;
>   process::metrics::Gauge slaves_disconnected;
>   process::metrics::Gauge slaves_active;
>   process::metrics::Gauge slaves_inactive;
>   process::metrics::Counter messages_register_slave;
>   process::metrics::Counter messages_reregister_slave;
>   process::metrics::Counter messages_unregister_slave;
>   process::metrics::Counter messages_update_slave;
>   process::metrics::Counter recovery_slave_removals;
>   process::metrics::Counter slave_registrations;
>   process::metrics::Counter slave_reregistrations;
>   process::metrics::Counter slave_removals;
>   process::metrics::Counter slave_removals_reason_unhealthy;
>   process::metrics::Counter slave_removals_reason_unregistered;
>   process::metrics::Counter slave_removals_reason_registered;
>   process::metrics::Counter slave_shutdowns_scheduled;
>   process::metrics::Counter slave_shutdowns_completed;
>   process::metrics::Counter slave_shutdowns_canceled;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5269) Replace Master/Slave Terminology Phase I - Update Metrics

2016-04-25 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256080#comment-15256080
 ] 

Jay Guo commented on MESOS-5269:


Since {{hashmap metrics}} expects a _string_ 
(_metric.name_) as the key to avoid duplicated metrics, we probably need 
something like this:
{code}
Gauge(const std::string& primaryName, const std::vector& aliases = 
None(), const Deferred& f)
  : Metric(primaryName, aliases, None()), data(new Data(f)) {}
{code}

Let me know what you think.

> Replace Master/Slave Terminology Phase I - Update Metrics
> -
>
> Key: MESOS-5269
> URL: https://issues.apache.org/jira/browse/MESOS-5269
> Project: Mesos
>  Issue Type: Task
>Reporter: Jay Guo
>
>   process::metrics::Gauge slaves_connected;
>   process::metrics::Gauge slaves_disconnected;
>   process::metrics::Gauge slaves_active;
>   process::metrics::Gauge slaves_inactive;
>   process::metrics::Counter messages_register_slave;
>   process::metrics::Counter messages_reregister_slave;
>   process::metrics::Counter messages_unregister_slave;
>   process::metrics::Counter messages_update_slave;
>   process::metrics::Counter recovery_slave_removals;
>   process::metrics::Counter slave_registrations;
>   process::metrics::Counter slave_reregistrations;
>   process::metrics::Counter slave_removals;
>   process::metrics::Counter slave_removals_reason_unhealthy;
>   process::metrics::Counter slave_removals_reason_unregistered;
>   process::metrics::Counter slave_removals_reason_registered;
>   process::metrics::Counter slave_shutdowns_scheduled;
>   process::metrics::Counter slave_shutdowns_completed;
>   process::metrics::Counter slave_shutdowns_canceled;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-04-25 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256068#comment-15256068
 ] 

Bernd Mathiske commented on MESOS-4760:
---

 [~mrbrowning], I am not aware of near-term plans for injection of the fetcher 
process into the slave object. If you want to take this on, I am happy to 
shepherd it.

> Expose metrics and gauges for fetcher cache usage and hit rate
> --
>
> Key: MESOS-4760
> URL: https://issues.apache.org/jira/browse/MESOS-4760
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, statistics
>Reporter: Michael Browning
>Assignee: Michael Browning
>Priority: Minor
>  Labels: features, fetcher, statistics, uber
>
> To evaluate the fetcher cache and calibrate the value of the 
> fetcher_cache_size flag, it would be useful to have metrics and gauges on 
> agents that expose operational statistics like cache hit rate, occupied cache 
> size, and time spent downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-04-25 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256068#comment-15256068
 ] 

Bernd Mathiske edited comment on MESOS-4760 at 4/25/16 8:28 AM:


 [~mrbrowning], I am not aware of near-term plans for injection of the fetcher 
into the slave object. If you want to take this on, I am happy to shepherd it.


was (Author: bernd-mesos):
 [~mrbrowning], I am not aware of near-term plans for injection of the fetcher 
process into the slave object. If you want to take this on, I am happy to 
shepherd it.

> Expose metrics and gauges for fetcher cache usage and hit rate
> --
>
> Key: MESOS-4760
> URL: https://issues.apache.org/jira/browse/MESOS-4760
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, statistics
>Reporter: Michael Browning
>Assignee: Michael Browning
>Priority: Minor
>  Labels: features, fetcher, statistics, uber
>
> To evaluate the fetcher cache and calibrate the value of the 
> fetcher_cache_size flag, it would be useful to have metrics and gauges on 
> agents that expose operational statistics like cache hit rate, occupied cache 
> size, and time spent downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4705) Slave failed to sample container with perf event

2016-04-25 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255998#comment-15255998
 ] 

Fan Du commented on MESOS-4705:
---

[~bmahler] Ping ;)

> Slave failed to sample container with perf event
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5272) Support docker image labels.

2016-04-25 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-5272:
---

 Summary: Support docker image labels.
 Key: MESOS-5272
 URL: https://issues.apache.org/jira/browse/MESOS-5272
 Project: Mesos
  Issue Type: Task
  Components: containerization
Reporter: Gilbert Song
Assignee: Gilbert Song


Docker image labels should be supported in unified containerizer, which can be 
used for applying custom metadata. Image labels are necessary for mesos 
features to support docker in unified containerizer (e.g., for mesos GPU device 
isolator).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5269) Replace Master/Slave Terminology Phase I - Update Metrics

2016-04-25 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255944#comment-15255944
 ] 

Jay Guo commented on MESOS-5269:


If we take similar approach as multi-named flags, we could also have 
multi-named metrics, which take vector of strings during construction and treat 
them as aliases for the same metric.

e.g.
{code}
Gauge(const std::vector& names, const Deferred& 
f)
  : Metric(name, None()), data(new Data(f)) {}
{code}

> Replace Master/Slave Terminology Phase I - Update Metrics
> -
>
> Key: MESOS-5269
> URL: https://issues.apache.org/jira/browse/MESOS-5269
> Project: Mesos
>  Issue Type: Task
>Reporter: Jay Guo
>
>   process::metrics::Gauge slaves_connected;
>   process::metrics::Gauge slaves_disconnected;
>   process::metrics::Gauge slaves_active;
>   process::metrics::Gauge slaves_inactive;
>   process::metrics::Counter messages_register_slave;
>   process::metrics::Counter messages_reregister_slave;
>   process::metrics::Counter messages_unregister_slave;
>   process::metrics::Counter messages_update_slave;
>   process::metrics::Counter recovery_slave_removals;
>   process::metrics::Counter slave_registrations;
>   process::metrics::Counter slave_reregistrations;
>   process::metrics::Counter slave_removals;
>   process::metrics::Counter slave_removals_reason_unhealthy;
>   process::metrics::Counter slave_removals_reason_unregistered;
>   process::metrics::Counter slave_removals_reason_registered;
>   process::metrics::Counter slave_shutdowns_scheduled;
>   process::metrics::Counter slave_shutdowns_completed;
>   process::metrics::Counter slave_shutdowns_canceled;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)