[jira] [Updated] (MESOS-4802) Update vendored leveldb to 1.18

2016-02-28 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang updated MESOS-4802:
--
Description: 
See: https://github.com/google/leveldb/releases/tag/v1.18 for improvements / 
bug fixes.
The motivation is that leveldb v1.18 has officially support IBM Power 
(ppc64le), so this is needed by 
[MESOS-4312|https://issues.apache.org/jira/browse/MESOS-4312].

  was:See: https://github.com/google/leveldb/releases/tag/v1.18 for 
improvements / bug fixes.


> Update vendored leveldb to 1.18
> ---
>
> Key: MESOS-4802
> URL: https://issues.apache.org/jira/browse/MESOS-4802
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Qian Zhang
>Assignee: Chen Zhiwei
>
> See: https://github.com/google/leveldb/releases/tag/v1.18 for improvements / 
> bug fixes.
> The motivation is that leveldb v1.18 has officially support IBM Power 
> (ppc64le), so this is needed by 
> [MESOS-4312|https://issues.apache.org/jira/browse/MESOS-4312].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4802) Update vendored leveldb to 1.18

2016-02-28 Thread Qian Zhang (JIRA)
Qian Zhang created MESOS-4802:
-

 Summary: Update vendored leveldb to 1.18
 Key: MESOS-4802
 URL: https://issues.apache.org/jira/browse/MESOS-4802
 Project: Mesos
  Issue Type: Improvement
Reporter: Qian Zhang
Assignee: Chen Zhiwei


See: https://github.com/google/leveldb/releases/tag/v1.18 for improvements / 
bug fixes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4447) Updated reserved() API

2016-02-28 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171547#comment-15171547
 ] 

Guangya Liu commented on MESOS-4447:


[~bmahler] can you please help shepherd for this? I have made this as a single 
patch without any dependency with other patches.

> Updated reserved() API
> --
>
> Key: MESOS-4447
> URL: https://issues.apache.org/jira/browse/MESOS-4447
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> There are some problems for current {{reserve}} API. The problem is as 
> following:
> {code}
> hashmap Resources::reserved() const
> {
>   hashmap result;
>   foreach (const Resource& resource, resources) {
> if (isReserved(resource)) {
>   result[resource.role()] += resource;
> }
>   }
>   return result;
> }
> Resources Resources::reserved(const string& role) const
> {
>   return filter(lambda::bind(isReserved, lambda::_1, role));
> }
> bool Resources::isReserved(
> const Resource& resource,
> const Option& role)
> {
>   if (role.isSome()) {
> return !isUnreserved(resource) && role.get() == resource.role();
>   } else {
> return !isUnreserved(resource);
>   }
> }
> {code}
> This caused the {{reserved(const string& role) }} has no chance to transfer a 
>   None() parameter to get all reserved resources in flatten mode.
> The solution is remove {{reserved()}} and update {{reserved(const string& 
> role) }} to {{reserved(const Option& role = None()) }}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4667) Expose persistent volume information in HTTP endpoints

2016-02-28 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171523#comment-15171523
 ] 

Michael Park commented on MESOS-4667:
-

{noformat}
commit e2a3cd63b558c2399c7341e1c232989ef73196d3
Author: Neil Conway 
Date:   Mon Feb 29 02:21:39 2016 -0500

Added full reserved resource info to `/slaves` master endpoint.

This allows operators to list all the dynamic reservations and
persistent volumes in a cluster. This is important in itself;
it also makes it easier to use the `/unreserve` and
`/destroy-volumes` endpoints.

Review: https://reviews.apache.org/r/44047/
{noformat}

> Expose persistent volume information in HTTP endpoints
> --
>
> Key: MESOS-4667
> URL: https://issues.apache.org/jira/browse/MESOS-4667
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: endpoint, mesosphere
> Fix For: 0.28.0
>
>
> The per-slave {{reserved_resources}} information returned by {{/state}} does 
> not seem to include information about persistent volumes. This makes it hard 
> for operators to use the {{/destroy-volumes}} endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4801) Updated `createFrameworkInfo` for hierarchical_allocator_tests.cpp.

2016-02-28 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-4801:
--

 Summary: Updated `createFrameworkInfo` for 
hierarchical_allocator_tests.cpp.
 Key: MESOS-4801
 URL: https://issues.apache.org/jira/browse/MESOS-4801
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu


The function of {{createFrameworkInfo}} in hierarchical_allocator_tests.cpp 
should be updated by enabling caller can set a bool parameter to create a 
framework which can use revocable resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2162) Consider a C++ implementation of CoreOS AppContainer spec

2016-02-28 Thread Michael Korolyov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171480#comment-15171480
 ] 

Michael Korolyov commented on MESOS-2162:
-

Will this enable Clear Containers executions and scheduling?


> Consider a C++ implementation of CoreOS AppContainer spec
> -
>
> Key: MESOS-2162
> URL: https://issues.apache.org/jira/browse/MESOS-2162
> Project: Mesos
>  Issue Type: Story
>  Components: containerization
>Reporter: Dominic Hamon
>  Labels: gsoc2015, mesosphere, twitter
>
> CoreOS have released a 
> [specification|https://github.com/coreos/rocket/blob/master/app-container/SPEC.md]
>  for a container abstraction as an alternative to Docker. They have also 
> released a reference implementation, [rocket|https://coreos.com/blog/rocket/].
> We should consider a C++ implementation of the specification to have parity 
> with the community and then use this implementation for our containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4253) Provide a minimalist "runtime context" to an Anonymous Module

2016-02-28 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171462#comment-15171462
 ] 

Marco Massenzio commented on MESOS-4253:


​It would be good if whoever came up with the "security concerns" could clarify 
them further: in particular, when making an assertion about a particular 
feature introducing a "security vulnerability", it is best practice to describe 
a scenario, a potential attacker's capabilities, and the attack vector - 
otherwise, *anything* can be a "security concern."
​{quote} 
What this means is that I have to retract the ship-it to discuss it further. 
One of the most important issues was the fact that exposing all Master/Agent 
flags could also mean sharing things like credentials and password info and any 
other information that is part of other modules' module.json parameters.
{quote}
​I will be honest and confess that I don't understand the scenario here: please 
bear in mind that the module(s) can *only* be loaded at startup, by using the 
{{--modules}} flag (and associated JSON) by the same person/team/script that is 
launching the Master/Agent.

So, we are really *not* "exposing" the flags: these are already available (by 
definition) to the actor who launched the Agent (or Master), hence this 
facility does not further expand the surface of attack (provided, of course, 
that the module itself is designed according to security principles).

In other words, passing the Flags during module creation is simply a 
convenience, wrt to writing a "wrapper" script that duplicates these Flags of 
interest into the modules' "Parameters" in the JSON.
Also, it gives the modules access to default values that are not explicitly 
defined: as these are, by definition, "public" there is no increase in 
vulnerability.

Again, the very same person that launches Mesos is loading the module - how 
does that represent a greater security concern?
​
{quote} 
Having said that, I am not saying that Mesos is completely secure and these 
patches will make it less secure, but we do need to comeup with a better plan 
going forward.
{quote}
​"better" can only be defined wrt to a security threat scenario: what is it?
​ 
{quote}
On a more detailed note, there are two main avenues that we need to pursue 
here. One, have the modules explicitly request the flags that are needed by 
them in order to work. At which point, the operator can pass in these flags as 
part of Master/Agent commandline and they will be forwarded to the respective 
modules.
{quote}
​how would a module "explicitly request the flags"?
​This seems rather cumbersome, and only minimally better than just the 
"wrapper" script that duplicates the flags inside the JSON's parameters.

It is also completely contrary to treating your cluster "as herd, not pets."
{quote}​
Second, we can come up with a minimal set of Master/Agent flags that we 
consider "safe" and always pass to all modules as part of the `create` call 
along with Parameters. There is already a precedence in the way SSL flags are 
passed on via Master/Agent commandline.
{quote}
This seems to me to be really non-scalable and a bit cumbersome, but probably 
the only viable option, without a clearer definition of what the security 
concerns are.
{quote}
Finally, given the nature of the concerns, I wanted to see if you can join the 
next community sync and discuss it further while involving the whole community? 
After that, we might be able to create a small working group with all 
interested parties to come up with better design decisions.
{quote}

Considering that it's taken two months (of virtually no feedback at all) I 
honestly can't see how this is likely to elicit more interest, but we'll see, I 
guess.

> Provide a minimalist "runtime context" to an Anonymous Module
> -
>
> Key: MESOS-4253
> URL: https://issues.apache.org/jira/browse/MESOS-4253
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
>Reporter: Marco Massenzio
>Assignee: Marco Massenzio
>
> Currently, {{Anonymous}} modules only receive at creation a copy of the 
> {{"parameters"}} passed in the JSON configuration file.
> However, at runtime, it would be useful to also have a "runtime context" for 
> the module developer to use, when implementing the functionality.
> I would suggest to pass in the {{Flags}} object from the Master/Agent inside 
> an {{setRuntimeContext(const Flags&)}}[0] method, called immediately 
> post-{{create(const Parameters&)}}[1].
> Also, I would suggest adding a {{teardown()}} method too, in case the module 
> needs to release resources / conduct cleanup before exiting (there is a TODO 
> in the code to this effect, and adding this in this patch would be close to 
> trivial).
> [0] In practice, it won't be this trivial, as 

[jira] [Commented] (MESOS-4381) Improve upgrade compatibility documentation.

2016-02-28 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171451#comment-15171451
 ] 

Joerg Schad commented on MESOS-4381:


https://reviews.apache.org/r/43792/
https://reviews.apache.org/r/43798/

> Improve upgrade compatibility documentation.
> 
>
> Key: MESOS-4381
> URL: https://issues.apache.org/jira/browse/MESOS-4381
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: documentation, mesosphere
>
> Investigate and document upgrade compatibility for 0.27 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4381) Improve upgrade compatibility documentation.

2016-02-28 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad updated MESOS-4381:
---
Description: Investigate and document upgrade compatibility for 0.27 
release.  (was: https://reviews.apache.org/r/43798/)

> Improve upgrade compatibility documentation.
> 
>
> Key: MESOS-4381
> URL: https://issues.apache.org/jira/browse/MESOS-4381
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: documentation, mesosphere
>
> Investigate and document upgrade compatibility for 0.27 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4711) Race condition in libevent poll implementation causes crash

2016-02-28 Thread Alexander Rojas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rojas updated MESOS-4711:
---
Sprint: Mesosphere Sprint 29

> Race condition in libevent poll implementation causes crash
> ---
>
> Key: MESOS-4711
> URL: https://issues.apache.org/jira/browse/MESOS-4711
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.28.0
> Environment: CentOS 6.7 running in VirtualBox
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: mesosphere
> Fix For: 0.28.0, 0.27.2, 0.26.1, 0.25.1, 0.24.2
>
>
> The issue first arose in MESOS-3271, but can be reproduced every time by 
> using the mentioned environment and running:
> {noformat}
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery" 
> --gtest_repeat=1000
> {noformat}
> The problem can be traced back to 
> [{{libevent_poll.cpp}}|https://github.com/apache/mesos/blob/3539b7a0e15b594148308319bf052d28b1429b98/3rdparty/libprocess/src/libevent_poll.cpp].
>  If the event is triggered and the the future associated with the event is 
> discarded, the situation arises in which  
> [{{pollCallback()}}|https://github.com/apache/mesos/blob/3539b7a0e15b594148308319bf052d28b1429b98/3rdparty/libprocess/src/libevent_poll.cpp#L33]
>  starts executing just early enough to finish before 
> [{{pollDiscard()}}|https://github.com/apache/mesos/blob/3539b7a0e15b594148308319bf052d28b1429b98/3rdparty/libprocess/src/libevent_poll.cpp#L53]
>  executes. If that happens, {{pollCallback()}} deletes the poll object and 
> {{pollDiscard()}} is left with a dangling pointer which crashes when it 
> executes the line {{event_active(ev, EV_READ, 0);}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2162) Consider a C++ implementation of CoreOS AppContainer spec

2016-02-28 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171419#comment-15171419
 ] 

Jie Yu commented on MESOS-2162:
---

Yes, currently, appc filesystem image is supported (with simple discovery).

We'll add some runtime configuration support soon (e.g., exec, env, workdir)

> Consider a C++ implementation of CoreOS AppContainer spec
> -
>
> Key: MESOS-2162
> URL: https://issues.apache.org/jira/browse/MESOS-2162
> Project: Mesos
>  Issue Type: Story
>  Components: containerization
>Reporter: Dominic Hamon
>  Labels: gsoc2015, mesosphere, twitter
>
> CoreOS have released a 
> [specification|https://github.com/coreos/rocket/blob/master/app-container/SPEC.md]
>  for a container abstraction as an alternative to Docker. They have also 
> released a reference implementation, [rocket|https://coreos.com/blog/rocket/].
> We should consider a C++ implementation of the specification to have parity 
> with the community and then use this implementation for our containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4629) Implement fault tolerance tests for the HTTP Scheduler API.

2016-02-28 Thread Shuai Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Lin reassigned MESOS-4629:


Assignee: Shuai Lin

> Implement fault tolerance tests for the HTTP Scheduler API.
> ---
>
> Key: MESOS-4629
> URL: https://issues.apache.org/jira/browse/MESOS-4629
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Shuai Lin
>  Labels: mesosphere
>
> Currently, the HTTP V1 API does not have fault tolerance tests similar to the 
> one in {{src/tests/fault_tolerance_tests.cpp}}. 
> For more information see MESOS-3355.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4800) SlaveRecoveryTest/0.RecoverTerminatedExecutor is flaky

2016-02-28 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4800:
-

 Summary: SlaveRecoveryTest/0.RecoverTerminatedExecutor is flaky
 Key: MESOS-4800
 URL: https://issues.apache.org/jira/browse/MESOS-4800
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar


Showed up on ASF CI:

https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/1743/changes

{code}
[ RUN  ] SlaveRecoveryTest/0.RecoverTerminatedExecutor
I0229 02:11:01.321990  2124 leveldb.cpp:174] Opened db in 121.848194ms
I0229 02:11:01.363880  2124 leveldb.cpp:181] Compacted db in 41.823665ms
I0229 02:11:01.363965  2124 leveldb.cpp:196] Created db iterator in 27127ns
I0229 02:11:01.363984  2124 leveldb.cpp:202] Seeked to beginning of db in 3446ns
I0229 02:11:01.363996  2124 leveldb.cpp:271] Iterated through 0 keys in the db 
in 332ns
I0229 02:11:01.364050  2124 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0229 02:11:01.365196  2158 recover.cpp:447] Starting replica recovery
I0229 02:11:01.365492  2158 recover.cpp:473] Replica is in EMPTY status
I0229 02:11:01.366982  2151 replica.cpp:673] Replica in EMPTY status received a 
broadcasted recover request from (9830)@172.17.0.3:36786
I0229 02:11:01.367451  2149 recover.cpp:193] Received a recover response from a 
replica in EMPTY status
I0229 02:11:01.368335  2149 recover.cpp:564] Updating replica status to STARTING
I0229 02:11:01.372730  2158 master.cpp:375] Master 
d551df7b-0c69-4bc9-b113-eca605384c49 (3036a6611147) started on 172.17.0.3:36786
I0229 02:11:01.372764  2158 master.cpp:377] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/e9RAjp/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="100secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" 
--work_dir="/tmp/e9RAjp/master" --zk_session_timeout="10secs"
I0229 02:11:01.373164  2158 master.cpp:422] Master only allowing authenticated 
frameworks to register
I0229 02:11:01.373178  2158 master.cpp:427] Master only allowing authenticated 
slaves to register
I0229 02:11:01.373188  2158 credentials.hpp:35] Loading credentials for 
authentication from '/tmp/e9RAjp/credentials'
I0229 02:11:01.373612  2158 master.cpp:467] Using default 'crammd5' 
authenticator
I0229 02:11:01.373793  2158 master.cpp:536] Using default 'basic' HTTP 
authenticator
I0229 02:11:01.373919  2158 master.cpp:570] Authorization enabled
I0229 02:11:01.376322  2153 whitelist_watcher.cpp:77] No whitelist given
I0229 02:11:01.376456  2158 hierarchical.cpp:144] Initialized hierarchical 
allocator process
I0229 02:11:01.378609  2144 master.cpp:1711] The newly elected leader is 
master@172.17.0.3:36786 with id d551df7b-0c69-4bc9-b113-eca605384c49
I0229 02:11:01.378674  2144 master.cpp:1724] Elected as the leading master!
I0229 02:11:01.378700  2144 master.cpp:1469] Recovering from registrar
I0229 02:11:01.378880  2154 registrar.cpp:307] Recovering registrar
I0229 02:11:01.413949  2149 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 45.305096ms
I0229 02:11:01.414049  2149 replica.cpp:320] Persisted replica status to 
STARTING
I0229 02:11:01.414481  2154 recover.cpp:473] Replica is in STARTING status
I0229 02:11:01.416136  2154 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from (9832)@172.17.0.3:36786
I0229 02:11:01.416656  2149 recover.cpp:193] Received a recover response from a 
replica in STARTING status
I0229 02:11:01.417251  2154 recover.cpp:564] Updating replica status to VOTING
I0229 02:11:01.455773  2149 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 38.225441ms
I0229 02:11:01.455874  2149 replica.cpp:320] Persisted replica status to VOTING
I0229 02:11:01.456140  2154 recover.cpp:578] Successfully joined the Paxos group
I0229 02:11:01.456480  2154 recover.cpp:462] Recover process terminated
I0229 02:11:01.457126  2154 log.cpp:659] Attempting to start the writer
I0229 02:11:01.458848  2154 replica.cpp:493] Replica received implicit promise 
request from 

[jira] [Commented] (MESOS-4799) MAC OS build failed

2016-02-28 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171310#comment-15171310
 ] 

Guangya Liu commented on MESOS-4799:


I see, I will drop my patch, thanks [~gilbert]

> MAC OS build failed
> ---
>
> Key: MESOS-4799
> URL: https://issues.apache.org/jira/browse/MESOS-4799
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> Using latest code and build one mac os failed.
> {code}
> In file included from 
> ../../src/slave/containerizer/mesos/provisioner/backend.cpp:21:
> ../../src/linux/fs.hpp:21:10: fatal error: 'mntent.h' file not found
> #include 
>  ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-28 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171311#comment-15171311
 ] 

Jie Yu edited comment on MESOS-4757 at 2/29/16 2:16 AM:


BTW, I tested my patch on OSX (EL Capitan, 10.11.3), and it works fine.

{noformat}
$ sudo sbin/mesos-master --work_dir=/tmp/mesos/master
$ sudo GLOG_v=1 sbin/mesos-slave --master=10.0.1.26:5050 
--work_dir=/tmp/mesos/slave --executor_environment_variables="{}"
$ bin/mesos-execute --master=10.0.1.26:5050 --name=test --command="id" # under 
my name 'jie'
Registered executor on 10.0.1.26
Starting task test
sh -c 'id'
Forked command at 86930
uid=501(jie) gid=20(staff) 
groups=20(staff),701(com.apple.sharepoint.group.1),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),33(_appstore),100(_lpoperator),204(_developer),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh)
Command exited with status 0 (pid: 86930)
Shutting down
Sending SIGTERM to process tree at pid 86930
Sent SIGTERM to the following process trees:
[ 

]
$ id
uid=501(jie) gid=20(staff) 
groups=20(staff),701(com.apple.sharepoint.group.1),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),33(_appstore),100(_lpoperator),204(_developer),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh)
{noformat}




was (Author: jieyu):
BTW, I tested my patch on OSX (EL Capitan, 10.11.3), and it works fine.

{noformat}
$ sudo sbin/mesos-master --work_dir=/tmp/mesos/master
$ sudo GLOG_v=1 sbin/mesos-slave --master=10.0.1.26:5050 
--work_dir=/tmp/mesos/slave --executor_environment_variables="{}"
$ bin/mesos-execute --master=10.0.1.26:5050 --name=test --command="id" # under 
my name 'jie'
Registered executor on 10.0.1.26
Starting task test
sh -c 'id'
Forked command at 86930
uid=501(jie) gid=20(staff) 
groups=20(staff),701(com.apple.sharepoint.group.1),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),33(_appstore),100(_lpoperator),204(_developer),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh)
Command exited with status 0 (pid: 86930)
Shutting down
Sending SIGTERM to process tree at pid 86930
Sent SIGTERM to the following process trees:
[ 

]
$ id
uid=501(jie) gid=20(staff) 
groups=20(staff),701(com.apple.sharepoint.group.1),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),33(_appstore),100(_lpoperator),204(_developer),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh)




> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-28 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171311#comment-15171311
 ] 

Jie Yu commented on MESOS-4757:
---

BTW, I tested my patch on OSX (EL Capitan, 10.11.3), and it works fine.

{noformat}
$ sudo sbin/mesos-master --work_dir=/tmp/mesos/master
$ sudo GLOG_v=1 sbin/mesos-slave --master=10.0.1.26:5050 
--work_dir=/tmp/mesos/slave --executor_environment_variables="{}"
$ bin/mesos-execute --master=10.0.1.26:5050 --name=test --command="id" # under 
my name 'jie'
Registered executor on 10.0.1.26
Starting task test
sh -c 'id'
Forked command at 86930
uid=501(jie) gid=20(staff) 
groups=20(staff),701(com.apple.sharepoint.group.1),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),33(_appstore),100(_lpoperator),204(_developer),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh)
Command exited with status 0 (pid: 86930)
Shutting down
Sending SIGTERM to process tree at pid 86930
Sent SIGTERM to the following process trees:
[ 

]
$ id
uid=501(jie) gid=20(staff) 
groups=20(staff),701(com.apple.sharepoint.group.1),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),33(_appstore),100(_lpoperator),204(_developer),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh)




> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4799) MAC OS build failed

2016-02-28 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171305#comment-15171305
 ] 

Gilbert Song commented on MESOS-4799:
-

Hi [~gyliu], could check again with the latest master branch? If fixed, please 
label it as resolved:)

> MAC OS build failed
> ---
>
> Key: MESOS-4799
> URL: https://issues.apache.org/jira/browse/MESOS-4799
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> Using latest code and build one mac os failed.
> {code}
> In file included from 
> ../../src/slave/containerizer/mesos/provisioner/backend.cpp:21:
> ../../src/linux/fs.hpp:21:10: fatal error: 'mntent.h' file not found
> #include 
>  ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4799) MAC OS build failed

2016-02-28 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu reassigned MESOS-4799:
--

Assignee: Guangya Liu

> MAC OS build failed
> ---
>
> Key: MESOS-4799
> URL: https://issues.apache.org/jira/browse/MESOS-4799
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> Using latest code and build one mac os failed.
> {code}
> In file included from 
> ../../src/slave/containerizer/mesos/provisioner/backend.cpp:21:
> ../../src/linux/fs.hpp:21:10: fatal error: 'mntent.h' file not found
> #include 
>  ^
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4029) ContentType/SchedulerTest is flaky.

2016-02-28 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4029:
--
 Assignee: Anand Mazumdar  (was: Artem Harutyunyan)
Fix Version/s: (was: 0.28.0)

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Anand Mazumdar
>  Labels: flaky, flaky-test, mesosphere
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x2b5495b81a40 (unknown)
> @ 0x2b54962f8182 start_thread
> @ 0x2b549660847d (unknown)
> make[3]: *** [check-local] Segmentation fault
> make[3]: Leaving directory `/home/vagrant/mesos/build/src'
> make[2]: *** [check-am] Error 2
> make[2]: Leaving directory 

[jira] [Updated] (MESOS-4630) Implement partition tests for the HTTP Scheduler API.

2016-02-28 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4630:
--
Sprint:   (was: Mesosphere Sprint 30)

> Implement partition tests for the HTTP Scheduler API.
> -
>
> Key: MESOS-4630
> URL: https://issues.apache.org/jira/browse/MESOS-4630
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, the HTTP V1 API does not have partition tests similar to the one 
> in src/tests/partition_tests.cpp.
> For more information see MESOS-3355.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4029) ContentType/SchedulerTest is flaky.

2016-02-28 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4029:
--
Sprint: Mesosphere Sprint 23, Mesosphere Sprint 30  (was: Mesosphere Sprint 
23)

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Artem Harutyunyan
>  Labels: flaky, flaky-test, mesosphere
> Fix For: 0.28.0
>
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x2b5495b81a40 (unknown)
> @ 0x2b54962f8182 start_thread
> @ 0x2b549660847d (unknown)
> make[3]: *** [check-local] Segmentation fault
> make[3]: Leaving directory `/home/vagrant/mesos/build/src'
> make[2]: *** [check-am] Error 2
> make[2]: Leaving 

[jira] [Updated] (MESOS-4629) Implement fault tolerance tests for the HTTP Scheduler API.

2016-02-28 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4629:
--
Sprint:   (was: Mesosphere Sprint 30)

> Implement fault tolerance tests for the HTTP Scheduler API.
> ---
>
> Key: MESOS-4629
> URL: https://issues.apache.org/jira/browse/MESOS-4629
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> Currently, the HTTP V1 API does not have fault tolerance tests similar to the 
> one in {{src/tests/fault_tolerance_tests.cpp}}. 
> For more information see MESOS-3355.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4798) Make existing scheduler library tests use the callback interface.

2016-02-28 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4798:
--
Sprint: Mesosphere Sprint 30

> Make existing scheduler library tests use the callback interface.
> -
>
> Key: MESOS-4798
> URL: https://issues.apache.org/jira/browse/MESOS-4798
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> We need to migrate the existing tests in {{src/tests/scheduler_tests.cpp}} 
> and {{src/tests/maintenance_tests.cpp}} to use the new callback interface 
> introduced in {{MESOS-3339}}. 
> For an example see {{SchedulerTest.SchedulerFailover}} which already uses 
> this new interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4799) MAC OS build failed

2016-02-28 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-4799:
--

 Summary: MAC OS build failed
 Key: MESOS-4799
 URL: https://issues.apache.org/jira/browse/MESOS-4799
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu



Using latest code and build one mac os failed.

{code}
In file included from 
../../src/slave/containerizer/mesos/provisioner/backend.cpp:21:
../../src/linux/fs.hpp:21:10: fatal error: 'mntent.h' file not found
#include 
 ^
{code}







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4798) Make existing scheduler library tests use the callback interface.

2016-02-28 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4798:
-

 Summary: Make existing scheduler library tests use the callback 
interface.
 Key: MESOS-4798
 URL: https://issues.apache.org/jira/browse/MESOS-4798
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar


We need to migrate the existing tests in {{src/tests/scheduler_tests.cpp}} and 
{{src/tests/maintenance_tests.cpp}} to use the new callback interface 
introduced in {{MESOS-3339}}. 

For an example see {{SchedulerTest.SchedulerFailover}} which already uses this 
new interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-28 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171290#comment-15171290
 ] 

Jie Yu commented on MESOS-4757:
---

[~jamespeach] Can you also give me a pointer to the 'setgroups' problem you 
mentioned on Darwin?

> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-28 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171286#comment-15171286
 ] 

Jie Yu commented on MESOS-4757:
---

I am not familiar with BSD, is there a way to retain capabilities to do 
pivot_root when switching the credentials?

> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4757) Mesos containerizer should get uid/gids before pivot_root.

2016-02-28 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171267#comment-15171267
 ] 

James Peach commented on MESOS-4757:


I think this is a problematic approach. Switching credentials tends to be a bit 
subtle on many systems and it doesn't easily decompose into separate operations.

For example, BSD requires (or assumes) that the first {{setgroups(2)}} element 
is the primary GID. {{NGROUPS_MAX}} is a dynamic parameter on many systems. In 
Darwin, {{setgroups(2)}} just primes the kernel credential cache, but only if 
you call the {{initgroups}} system call afterwards.

I suggest that a more reliable approach is to keep doing a full credential 
switch before the {{pivot_root}}, but retain enough capabilities to be able to 
enter the chroot afterwards.

> Mesos containerizer should get uid/gids before pivot_root.
> --
>
> Key: MESOS-4757
> URL: https://issues.apache.org/jira/browse/MESOS-4757
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, we call os::su(user) after pivot_root. This is problematic because 
> /etc/passwd and /etc/group might be missing in container's root filesystem. 
> We should instead, get the uid/gids before pivot_root, and call 
> setuid/setgroups after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2717) Qemu/KVM containerizer

2016-02-28 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171250#comment-15171250
 ] 

James Peach commented on MESOS-2717:


Sure.

> Qemu/KVM containerizer
> --
>
> Key: MESOS-2717
> URL: https://issues.apache.org/jira/browse/MESOS-2717
> Project: Mesos
>  Issue Type: Wish
>  Components: containerization
>Reporter: Pierre-Yves Ritschard
>Assignee: Abhishek Dasgupta
>
> I think it would make sense for Mesos to have the ability to treat 
> hypervisors as containerizers and the most sensible one to start with would 
> probably be Qemu/KVM.
> There are a few workloads that can require full-fledged VMs (the most obvious 
> one being Windows workloads).
> The containerization code is well decoupled and seems simple enough, I can 
> definitely take a shot at it. VMs do bring some questions with them here is 
> my take on them:
> 1. Routing, network strategy
> ==
> The simplest approach here might very well be to go for bridged networks
> and leave the setup and inter slave routing up to the administrator
> 2. IP Address assignment
> 
> At first, it can be up to the Frameworks to deal with IP assignment.
> The simplest way to address this could be to have an executor running
> on slaves providing the qemu/kvm containerizer which would instrument a DHCP 
> server and collect IP + Mac address resources from slaves. While it may be up 
> to the frameworks to provide this, an example should most likely be provided.
> 3. VM Templates
> ==
> VM templates should probably leverage the fetcher and could thus be copied 
> locally or fetch from HTTP(s) / HDFS.
> 4. Resource limiting
> 
> Mapping resouce constraints to the qemu command line is probably the easiest 
> part, Additional command line should also be fetchable. For Unix VMs, the 
> sandbox could show the output of the serial console
> 5. Libvirt / plain Qemu
> =
> I tend to favor limiting the amount of necessary hoops to jump through and 
> would thus investigate working directly with Qemu, maintaining an open 
> connection to the monitor to assert status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2717) Qemu/KVM containerizer

2016-02-28 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-2717:
---
Assignee: Abhishek Dasgupta  (was: James Peach)

> Qemu/KVM containerizer
> --
>
> Key: MESOS-2717
> URL: https://issues.apache.org/jira/browse/MESOS-2717
> Project: Mesos
>  Issue Type: Wish
>  Components: containerization
>Reporter: Pierre-Yves Ritschard
>Assignee: Abhishek Dasgupta
>
> I think it would make sense for Mesos to have the ability to treat 
> hypervisors as containerizers and the most sensible one to start with would 
> probably be Qemu/KVM.
> There are a few workloads that can require full-fledged VMs (the most obvious 
> one being Windows workloads).
> The containerization code is well decoupled and seems simple enough, I can 
> definitely take a shot at it. VMs do bring some questions with them here is 
> my take on them:
> 1. Routing, network strategy
> ==
> The simplest approach here might very well be to go for bridged networks
> and leave the setup and inter slave routing up to the administrator
> 2. IP Address assignment
> 
> At first, it can be up to the Frameworks to deal with IP assignment.
> The simplest way to address this could be to have an executor running
> on slaves providing the qemu/kvm containerizer which would instrument a DHCP 
> server and collect IP + Mac address resources from slaves. While it may be up 
> to the frameworks to provide this, an example should most likely be provided.
> 3. VM Templates
> ==
> VM templates should probably leverage the fetcher and could thus be copied 
> locally or fetch from HTTP(s) / HDFS.
> 4. Resource limiting
> 
> Mapping resouce constraints to the qemu command line is probably the easiest 
> part, Additional command line should also be fetchable. For Unix VMs, the 
> sandbox could show the output of the serial console
> 5. Libvirt / plain Qemu
> =
> I tend to favor limiting the amount of necessary hoops to jump through and 
> would thus investigate working directly with Qemu, maintaining an open 
> connection to the monitor to assert status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-338) Mesos 1.0

2016-02-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-338:
-
Description: This ticket tracks the Mesos 1.0 road map. Specifically, the 
blockers, a.k.a roadmap items, for 1.0 are linked to this ticket.  (was: This 
ticket tracks the Mesos 1.0 road map. Specifically, the blockers, a.k.a road 
map items, for 1.0 are linked to this ticket.)

> Mesos 1.0
> -
>
> Key: MESOS-338
> URL: https://issues.apache.org/jira/browse/MESOS-338
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Mahler
>Priority: Critical
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> This ticket tracks the Mesos 1.0 road map. Specifically, the blockers, a.k.a 
> roadmap items, for 1.0 are linked to this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-338) Mesos 1.0

2016-02-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-338:
-
Description: This ticket tracks the Mesos 1.0 road map. Specifically, the 
blockers, a.k.a road map items, for 1.0 are linked to this ticket.  (was: This 
ticket tracks the Mesos 1.0 road map. Specifically, the blockers, a.k.a road 
map items) for 1.0 are linked to this ticket.)

> Mesos 1.0
> -
>
> Key: MESOS-338
> URL: https://issues.apache.org/jira/browse/MESOS-338
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Mahler
>Priority: Critical
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> This ticket tracks the Mesos 1.0 road map. Specifically, the blockers, a.k.a 
> road map items, for 1.0 are linked to this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-338) Mesos 1.0

2016-02-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-338:
-
Summary: Mesos 1.0  (was: Mesos 1.0 release)

> Mesos 1.0
> -
>
> Key: MESOS-338
> URL: https://issues.apache.org/jira/browse/MESOS-338
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Mahler
>Priority: Critical
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> This ticket tracks the Mesos 1.0 road map. Specifically, the blockers, a.k.a 
> road map items) for 1.0 are linked to this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-338) Mesos 1.0 release

2016-02-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-338:
-
 Shepherd: Vinod Kone
 Assignee: (was: Benjamin Hindman)
Fix Version/s: 1.0.0
  Description: This ticket tracks the Mesos 1.0 road map. Specifically, the 
blockers, a.k.a road map items) for 1.0 are linked to this ticket.  (was: This 
is to establish a roadmap for the desired features for 1.0.0, this is not meant 
to be an authoritative list!

Slave Recovery: MESOS-110

Master State Registrar: MESOS-295

New API: MESOS-810
  Using Futures to replace callbacks and as a means of acknowledgement.
  First Class "grandfathered" Resources (e.g. cpu, memory, disk, ...).

Security and Authentication: MESOS-418

Reconfigurable Log: MESOS-683

Stateful Scheduler Driver: (no ticket yet)

Revocable offers: (no ticket yet)

Packaging: (needs an umbrella ticket)

Launching Frameworks: (no ticket yet))
   Issue Type: Task  (was: Story)
  Summary: Mesos 1.0 release  (was: 1.0.0 Roadmap)

> Mesos 1.0 release
> -
>
> Key: MESOS-338
> URL: https://issues.apache.org/jira/browse/MESOS-338
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Mahler
>Priority: Critical
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> This ticket tracks the Mesos 1.0 road map. Specifically, the blockers, a.k.a 
> road map items) for 1.0 are linked to this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-810) New Scheduler API

2016-02-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-810:
-
Fix Version/s: (was: 1.0.0)

> New Scheduler API
> -
>
> Key: MESOS-810
> URL: https://issues.apache.org/jira/browse/MESOS-810
> Project: Mesos
>  Issue Type: Story
>  Components: c++ api, java api, python api
>Reporter: Benjamin Mahler
>
> This ticket is an effort to capture requirements and link to related tickets 
> for the future version of the Scheduler API. We should split these out as 
> needed but for now I'll just document things off the top of my head:
> 1. Batch status update acknowledgements. The current mechanism for 
> acknowledging a status update is for the call to Scheduler::statusUpdate to 
> return. At this point we send an acknowledgement to the slave. This 
> simplistic approach forces schedulers to serially persist status updates if 
> operating in a stateful manner, ultimately leading to scaling issues.
> 2. Explicit behavior. By this I mean that when a Scheduler calls something 
> like killTask, we'll currently implicitly drop it when we're disconnected 
> from the Master. Rather than implicitly doing this, we either need to deliver 
> things reliably or inform schedulers when their request was dropped. 
> Returning Futures could be very powerful here.
> 3. Statefulness and access to state. The scheduler driver is currently 
> stateless (no persistence). This means schedulers currently have to persist 
> state using our State abstraction or the replicated log, or through their own 
> persistence mechanism. Providing a stateful scheduler driver increases the 
> simplicity of framework schedulers substantially, and providing access to 
> state makes it very simple to implement a framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1027) IPv6 support

2016-02-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-1027:
--
Fix Version/s: (was: 1.0.0)

> IPv6 support
> 
>
> Key: MESOS-1027
> URL: https://issues.apache.org/jira/browse/MESOS-1027
> Project: Mesos
>  Issue Type: Epic
>  Components: framework, libprocess, master, slave
>Reporter: Dominic Hamon
>
> From the CLI down through the various layers of tech we should support IPv6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4792) Remove src/common/date_utils.{c,h}pp

2016-02-28 Thread Yong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171152#comment-15171152
 ] 

Yong Tang commented on MESOS-4792:
--

Seems to be an easy fix. Just created a review request:
https://reviews.apache.org/r/44147/

> Remove src/common/date_utils.{c,h}pp
> 
>
> Key: MESOS-4792
> URL: https://issues.apache.org/jira/browse/MESOS-4792
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Neil Conway
>Assignee: Yong Tang
>Priority: Trivial
>  Labels: mesosphere, newbie, tech-debt
>
> AFAICT this is unused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4792) Remove src/common/date_utils.{c,h}pp

2016-02-28 Thread Yong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yong Tang reassigned MESOS-4792:


Assignee: Yong Tang

> Remove src/common/date_utils.{c,h}pp
> 
>
> Key: MESOS-4792
> URL: https://issues.apache.org/jira/browse/MESOS-4792
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Neil Conway
>Assignee: Yong Tang
>Priority: Trivial
>  Labels: mesosphere, newbie, tech-debt
>
> AFAICT this is unused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3570) Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess

2016-02-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-3570:
--
Story Points: 8  (was: 5)

> Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess
> 
>
> Key: MESOS-3570
> URL: https://issues.apache.org/jira/browse/MESOS-3570
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere, newbie
> Fix For: 0.28.0
>
>
> Currently, the scheduler library sends calls in order by chaining them and 
> sending them only when it has received a response for the earlier call. This 
> was done because there was no HTTP Pipelining abstraction in Libprocess 
> {{process::post}}.
> However once {{MESOS-3332}} is resolved, we should be now able to use the new 
> abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3854) Finalize design for generalized Authorizer interface

2016-02-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-3854:
--
Story Points: 5  (was: 2)

> Finalize design for generalized Authorizer interface
> 
>
> Key: MESOS-3854
> URL: https://issues.apache.org/jira/browse/MESOS-3854
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Bernd Mathiske
>Assignee: Alexander Rojas
>  Labels: authorization, mesosphere
>
> Finalize the structure the interface and achieve consensus on the design doc 
> proposed in MESOS-2949.
> https://docs.google.com/document/d/1-XARWJFUq0r_TgRHz_472NvLZNjbqE4G8c2JL44OSMQ/edit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3570) Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess

2016-02-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-3570:
--
Story Points: 5  (was: 3)

> Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess
> 
>
> Key: MESOS-3570
> URL: https://issues.apache.org/jira/browse/MESOS-3570
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere, newbie
>
> Currently, the scheduler library sends calls in order by chaining them and 
> sending them only when it has received a response for the earlier call. This 
> was done because there was no HTTP Pipelining abstraction in Libprocess 
> {{process::post}}.
> However once {{MESOS-3332}} is resolved, we should be now able to use the new 
> abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2162) Consider a C++ implementation of CoreOS AppContainer spec

2016-02-28 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171091#comment-15171091
 ] 

Deshi Xiao commented on MESOS-2162:
---

+1

> Consider a C++ implementation of CoreOS AppContainer spec
> -
>
> Key: MESOS-2162
> URL: https://issues.apache.org/jira/browse/MESOS-2162
> Project: Mesos
>  Issue Type: Story
>  Components: containerization
>Reporter: Dominic Hamon
>  Labels: gsoc2015, mesosphere, twitter
>
> CoreOS have released a 
> [specification|https://github.com/coreos/rocket/blob/master/app-container/SPEC.md]
>  for a container abstraction as an alternative to Docker. They have also 
> released a reference implementation, [rocket|https://coreos.com/blog/rocket/].
> We should consider a C++ implementation of the specification to have parity 
> with the community and then use this implementation for our containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1648) Add a --pidfile option to master and agent binaries.

2016-02-28 Thread Pradeep Chhetri (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171017#comment-15171017
 ] 

Pradeep Chhetri commented on MESOS-1648:


Is anyone working on it ? Otherwise, I would like to pick this up.


> Add a --pidfile option to master and agent binaries.
> 
>
> Key: MESOS-1648
> URL: https://issues.apache.org/jira/browse/MESOS-1648
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Reporter: Tobias Weingartner
>  Labels: newbie, twitter
>
> Right now we use a number of wrapper scripts to try and keep up a 
> {{/var/run/mesos/mesos-slave.pid}} in order to be able to monitor the 
> process.  This has proven to be somewhat fragile due to the lack of locking 
> and the possibility of races and stale data.
> By adding a {{--pidfile}}, we can obtain a lock on the file to prevent 
> multiple binaries from starting, and to enable the tooling to validate that 
> the lock is held before doing any signaling. We can also do a best effort 
> unlink in the signal handler upon termination:
> {code}
> // Get exclusive access to the file.
> fd = open(O_CREAT ...)
> flock(fd, LOCK_EX)
> if not locked, abort
> ftruncate(fd, 0)
> // Write the pid.
> write(fd, "")
> // Inside signal handler..
> unlink(pidfile)
> {code}
> Digging around, looks like the open, ftruncate, write pattern is pretty 
> common:
> http://man7.org/tlpi/code/online/diff/filelock/create_pid_file.c.html
> The tooling around it could that the file is locked by the pid inside it, 
> before taking any action (like signaling):
> *Case 1*: If the file does not exist or is not locked, then assume nothing is 
> running. It's possible for something to be running and about to grab the 
> lock, but we'll eventually read it correctly and converge on a single 
> instance started correctly.
> *Case 2*: If the file is locked, and the pid doesn't match, then assume it is 
> running but not as the pid in the file (.. yet). Treat this the same as (1), 
> assume it's not running, and the next attempts to start will eventually 
> converge on a single instance running.
> *Case 3*: If the file is locked, and the pid matches the locker process, then 
> assume it is running as that pid. Note that it's still possible that in 
> between matching the pid and taking an action (e.g. kill), the pid may become 
> stale, but the recycling pattern of pids makes it unlikely to be re-used 
> unless there is a large delay.
> It seems like some tools already do this signal wrapping (note the comment 
> about fcntl and note the race from (3) in the BUGS section):
> http://manpages.ubuntu.com/manpages/natty/man8/ovs-kill.8.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4784) SlaveTest.MetricsSlaveLaunchErrors test relies on implicit blocking behavior hitting the global metrics endpoint

2016-02-28 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4784:

Shepherd: Adam B

> SlaveTest.MetricsSlaveLaunchErrors test relies on implicit blocking behavior 
> hitting the global metrics endpoint
> 
>
> Key: MESOS-4784
> URL: https://issues.apache.org/jira/browse/MESOS-4784
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> The test attempts to observe a change in the 
> {{slave/container_launch_errors}} metric, but does not wait for the 
> triggering action to take place. Currently the test passes since hitting the 
> endpoint blocks for some rate limit-related time which provides under many 
> circumstances enough wait time for the action to take place. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4797) Add a couple of registrar tests for /weights endpoint

2016-02-28 Thread Yongqiao Wang (JIRA)
Yongqiao Wang created MESOS-4797:


 Summary: Add a couple of registrar tests for /weights endpoint
 Key: MESOS-4797
 URL: https://issues.apache.org/jira/browse/MESOS-4797
 Project: Mesos
  Issue Type: Task
Reporter: Yongqiao Wang
Assignee: Yongqiao Wang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)