[jira] [Updated] (MESOS-4691) Add a HierarchicalAllocator benchmark with reservation labels.

2016-03-01 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4691:

Shepherd: Joris Van Remoortere  (was: Michael Park)

> Add a HierarchicalAllocator benchmark with reservation labels.
> --
>
> Key: MESOS-4691
> URL: https://issues.apache.org/jira/browse/MESOS-4691
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Park
>Assignee: Neil Conway
>  Labels: mesosphere
> Fix For: 0.28.0
>
>
> With {{Labels}} being part of the {{ReservationInfo}}, we should ensure that 
> we don't observe a significant performance degradation in the allocator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4816) Expose TaskInfo to Isolators

2016-03-01 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175065#comment-15175065
 ] 

Guangya Liu commented on MESOS-4816:


1) When Kubernetes framework register, it will create a executor 
https://github.com/kubernetes/kubernetes/blob/master/contrib/mesos/pkg/scheduler/service/service.go#L492-L499
2) Then kubernetes will use this executor to launchtask. 
https://github.com/kubernetes/kubernetes/blob/master/contrib/mesos/pkg/scheduler/podtask/pod_task.go#L191-L198

So the executor will be launched with the first task and later on, all tasks 
wil this executor and this caused the isolator cannot get other task infos.

> Expose TaskInfo to Isolators
> 
>
> Key: MESOS-4816
> URL: https://issues.apache.org/jira/browse/MESOS-4816
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules, slave
>Reporter: Connor Doyle
>
> Authors of custom isolator modules frequently require access to the TaskInfo 
> in order to read custom metadata in task labels.
> Currently, it's possible to link containers to tasks within a module by 
> implementing both an isolator and the {{slaveRunTaskLabelDecorator}} hook, 
> and maintaining a shared map of containers to tasks.  This way works, but 
> adds unnecessary complexity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4828) XFS disk quota isolator

2016-03-01 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175062#comment-15175062
 ] 

James Peach commented on MESOS-4828:


[~jieyu] and [~xujyan] volunteered to shepherd.

> XFS disk quota isolator
> ---
>
> Key: MESOS-4828
> URL: https://issues.apache.org/jira/browse/MESOS-4828
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: James Peach
>Assignee: James Peach
>
> Implement a disk resource isolator using XFS project quotas. Compared to the 
> {{posix/disk}} isolator, this doesn't need to scan the filesystem 
> periodically, and applications receive a {{ENOSPC}} error instead of being 
> summarily killed.
> This initial implementation only isolates sandbox directory resources, since 
> isolation doesn't have any visibility into the the lifecycle of volumes, 
> which is needed to assign and track project IDs.
> The build dependencies for this are XFS header (from xfsprogs-devel) and 
> libblkid. We need libblkid or the equivalent to map filesystem paths to block 
> devices in order to apply quota.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-03-01 Thread Fan Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168583#comment-15168583
 ] 

Fan Du edited comment on MESOS-4492 at 3/2/16 5:16 AM:
---

Here goes the RR: (Discarded)
https://reviews.apache.org/r/44058/

Updated RR with document fix and test code addon:
https://reviews.apache.org/r/44255/



was (Author: fan.du):
Here goes the RR:
https://reviews.apache.org/r/44058/

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4816) Expose TaskInfo to Isolators

2016-03-01 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175016#comment-15175016
 ] 

James Peach commented on MESOS-4816:


{quote}
 it will not work for some cases such as Kubernetes and Mesos integration where 
one executor can manage many tasks.
{quote}

How does this work in Kubernetes? Can you point me to code or something?

> Expose TaskInfo to Isolators
> 
>
> Key: MESOS-4816
> URL: https://issues.apache.org/jira/browse/MESOS-4816
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules, slave
>Reporter: Connor Doyle
>
> Authors of custom isolator modules frequently require access to the TaskInfo 
> in order to read custom metadata in task labels.
> Currently, it's possible to link containers to tasks within a module by 
> implementing both an isolator and the {{slaveRunTaskLabelDecorator}} hook, 
> and maintaining a shared map of containers to tasks.  This way works, but 
> adds unnecessary complexity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4831) Master sometimes sends two inverse offers after the agent goes into maintenance.

2016-03-01 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu reassigned MESOS-4831:
--

Assignee: Guangya Liu

> Master sometimes sends two inverse offers after the agent goes into 
> maintenance.
> 
>
> Key: MESOS-4831
> URL: https://issues.apache.org/jira/browse/MESOS-4831
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>  Labels: maintenance, mesosphere
>
> Showed up on ASF CI for {{MasterMaintenanceTest.PendingUnavailabilityTest}}
> https://builds.apache.org/job/Mesos/1748/COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/consoleFull
> {code}
> I0229 11:08:57.027559   668 hierarchical.cpp:1437] No resources available to 
> allocate!
> I0229 11:08:57.027745   668 hierarchical.cpp:1150] Performed allocation for 
> slave fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b-S0 in 272747ns
> I0229 11:08:57.027757   675 master.cpp:5369] Sending 1 offers to framework 
> fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> I0229 11:08:57.028586   675 master.cpp:5459] Sending 1 inverse offers to 
> framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> I0229 11:08:57.029039   675 master.cpp:5459] Sending 1 inverse offers to 
> framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> {code}
> The ideal expected workflow for this test is something like:
> - The framework receives offers from master.
> - The framework updates its maintenance schedule.
> - The current offer is rescinded.
> - A new offer is received from the master with unavailability set.
> - After the agent goes for maintenance, an inverse offer is sent.
> For some reason, in the logs we see that the master is sending 2 inverse 
> offers. The test seems to pass as we just check for the initial inverse offer 
> being present. This can also be reproduced by a modified version of the 
> original test.
> {code}
> // Test ensures that an offer will have an `unavailability` set if the
> // slave is scheduled to go down for maintenance.
> TEST_F(MasterMaintenanceTest, PendingUnavailabilityTest)
> {
>   Try master = StartMaster();
>   ASSERT_SOME(master);
>   MockExecutor exec(DEFAULT_EXECUTOR_ID);
>   Try slave = StartSlave();
>   ASSERT_SOME(slave);
>   auto scheduler = std::make_shared();
>   EXPECT_CALL(*scheduler, heartbeat(_))
> .WillRepeatedly(Return()); // Ignore heartbeats.
>   Future connected;
>   EXPECT_CALL(*scheduler, connected(_))
> .WillOnce(FutureSatisfy())
> .WillRepeatedly(Return()); // Ignore future invocations.
>   scheduler::TestV1Mesos mesos(master.get(), ContentType::PROTOBUF, 
> scheduler);
>   AWAIT_READY(connected);
>   Future subscribed;
>   EXPECT_CALL(*scheduler, subscribed(_, _))
> .WillOnce(FutureArg<1>());
>   Future normalOffers;
>   Future unavailabilityOffers;
>   Future inverseOffers;
>   EXPECT_CALL(*scheduler, offers(_, _))
> .WillOnce(FutureArg<1>())
> .WillOnce(FutureArg<1>())
> .WillOnce(FutureArg<1>());
>   // The original offers should be rescinded when the unavailability is 
> changed.
>   Future offerRescinded;
>   EXPECT_CALL(*scheduler, rescind(_, _))
> .WillOnce(FutureSatisfy());
>   {
> Call call;
> call.set_type(Call::SUBSCRIBE);
> Call::Subscribe* subscribe = call.mutable_subscribe();
> subscribe->mutable_framework_info()->CopyFrom(DEFAULT_V1_FRAMEWORK_INFO);
> mesos.send(call);
>   }
>   AWAIT_READY(subscribed);
>   v1::FrameworkID frameworkId(subscribed->framework_id());
>   AWAIT_READY(normalOffers);
>   EXPECT_NE(0, normalOffers->offers().size());
>   // Regular offers shouldn't have unavailability.
>   foreach (const v1::Offer& offer, normalOffers->offers()) {
> EXPECT_FALSE(offer.has_unavailability());
>   }
>   // Schedule this slave for maintenance.
>   MachineID machine;
>   machine.set_hostname(maintenanceHostname);
>   machine.set_ip(stringify(slave.get().address.ip));
>   const Time start = Clock::now() + Seconds(60);
>   const Duration duration = Seconds(120);
>   const Unavailability unavailability = createUnavailability(start, duration);
>   // Post a valid schedule with one machine.
>   maintenance::Schedule schedule = createSchedule(
>   {createWindow({machine}, unavailability)});
>   // We have a few seconds between the first set of offers and the
>   // next allocation of offers. This should be enough time to perform
>   // a maintenance schedule update. This update will also trigger the
>   // rescinding of offers from the scheduled slave.
>   Future response = process::http::post(
>   master.get(),
>   "maintenance/schedule",
>   headers,
>   

[jira] [Commented] (MESOS-4816) Expose TaskInfo to Isolators

2016-03-01 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174848#comment-15174848
 ] 

Guangya Liu commented on MESOS-4816:


I saw that MESOS-4500 enabled {{Expose ExecutorInfo and TaskInfo for isolators 
in prepare()}}, but as [~cdoyle] point out, this is not enough, as {{prepare}} 
wil only be invoked just once per container executor, it will not work for some 
cases such as Kubernetes and Mesos integration where one executor can manage 
many tasks.

Does it make sense to leave this ticket and update the isoloator api of 
{{update()}} to pass a list of {{TaskInfo}} to cover more cases?

> Expose TaskInfo to Isolators
> 
>
> Key: MESOS-4816
> URL: https://issues.apache.org/jira/browse/MESOS-4816
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules, slave
>Reporter: Connor Doyle
>
> Authors of custom isolator modules frequently require access to the TaskInfo 
> in order to read custom metadata in task labels.
> Currently, it's possible to link containers to tasks within a module by 
> implementing both an isolator and the {{slaveRunTaskLabelDecorator}} hook, 
> and maintaining a shared map of containers to tasks.  This way works, but 
> adds unnecessary complexity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4836) Fix rmdir for windows

2016-03-01 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-4836:
-

 Summary: Fix rmdir for windows
 Key: MESOS-4836
 URL: https://issues.apache.org/jira/browse/MESOS-4836
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone
Assignee: Alex Clemmer


This is due to a bug in MESOS-4415 that landed for 0.27.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4796) Debug ability enhancement for unified container

2016-03-01 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174788#comment-15174788
 ] 

Guangya Liu commented on MESOS-4796:


Thanks [~jieyu], there are still couple of patches that need to address this in 
backend, isoloator etc, shall I open this ticket to continue the patches?

> Debug ability enhancement for unified container
> ---
>
> Key: MESOS-4796
> URL: https://issues.apache.org/jira/browse/MESOS-4796
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Guangya Liu
>Assignee: Guangya Liu
> Fix For: 0.28.0
>
>
> The following are some start point for what I want to do for this after some 
> discussion with [~jieyu], there will be more enhancement later. 
> docker/local_puller:
> LocalPullerProcess::extractLayer: add some detail for how to extract
> LocalPullerProcess::pull: Message needs to be updated to add image info the 
> log info
> docker/puller.cpp: 
> Puller::create: Clarify which puller is using: local or registry
> docker/registery_puller.cpp
> RegistryPullerProcess::pull: Clarify which image is going to be pulled
> RegistryPullerProcess::__pull: Add some detail for roots,layerPath, tarpath, 
> Json etc when creat layer path.
> RegistryPullerProcess::fetchBlobs: The log message needs to be updated for 
> reference: stringify(reference)
> backends/bind.cpp:
> BindBackendProcess::provision: Add more detail for provision, such as mount 
> point etc.
> BindBackendProcess::destroy: Add which roots is destroying.
> backends/copy.cpp:
> CopyBackendProcess::destroy: Add which roots is destroying.
> CopyBackendProcess::provision: Add more detail for provision info, such as 
> rootfs etc.
> mesos/isolators/docker/runtime.cpp
> add some logs to clarify some detail for 
> DockerRuntimeIsolatorProcess::prepare for how does the docker run time 
> isolator is prepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2840) MesosContainerizer support multiple image provisioners

2016-03-01 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174727#comment-15174727
 ] 

Vinod Kone commented on MESOS-2840:
---

IIUC, the MVP for this feature is complete? If yes, can you move the unresolved 
issues into a new epic and close this one?

We also need a blurb for this in the CHANGELOG and user doc.

> MesosContainerizer support multiple image provisioners
> --
>
> Key: MESOS-2840
> URL: https://issues.apache.org/jira/browse/MESOS-2840
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization, docker
>Affects Versions: 0.23.0
>Reporter: Marco Massenzio
>Assignee: Timothy Chen
>  Labels: mesosphere, twitter
>
> We want to utilize the Appc integration interfaces to further make 
> MesosContainerizers to support multiple image formats.
> This allows our future work on isolators to support any container image 
> format.
> Design
> [open to public comments]
> https://docs.google.com/document/d/1oUpJNjJ0l51fxaYut21mKPwJUiAcAdgbdF7SAdAW2PA/edit?usp=sharing
> [original document, requires permission]
> https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4343) Introduce the ability to assign network handles to mesos containers

2016-03-01 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174725#comment-15174725
 ] 

Vinod Kone commented on MESOS-4343:
---

Can you add a blurb in the CHANGELOG describing this feature? This is one of 
the few epics going into the 0.28.0 release. Great to see that there is already 
a user doc for this.

> Introduce the ability to assign network handles to mesos containers
> ---
>
> Key: MESOS-4343
> URL: https://issues.apache.org/jira/browse/MESOS-4343
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: containers, mesosphere
> Fix For: 0.28.0
>
>
> Linux provides net_cls as a cgroup subsystem. A net_cls cgroup is associated 
> with a 16-bit major handle and a 16-bit minor handle.  When a task is 
> associated with a net_cls cgroup, the kernel tags every packet being 
> generated by the task with the major and minor handle associated with the 
> net_cls cgroup. These tags are then used by network performance shaping and 
> firewall tools such as tc (traffic controller) and iptables. 
> Currently, mesos agents do not provide any isolator that can enable 
> mesos-containers in a net_cls cgroup, or assign network handles to a net_cls 
> cgroup. As part of this epic we plan to achieve the following:
> a)  Implement net_cls cgroup isolator for mesos agents.
> b)  Implement a manager for the net_cls handles.
> c)  Allow operators to set a major network handle when launching an agent. 
> d)  Expose the net_cls network handle allocated to a container, to entities 
> such as operators and frameworks. 
> Once the above goals are met operators can learn about network handles 
> allocated to containers and apply them to tools such as tc and iptables to 
> enforce network policies. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4831) Master sometimes sends two inverse offers after the agent goes into maintenance.

2016-03-01 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4831:
--
Description: 
Showed up on ASF CI for {{MasterMaintenanceTest.PendingUnavailabilityTest}}

https://builds.apache.org/job/Mesos/1748/COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/consoleFull

{code}
I0229 11:08:57.027559   668 hierarchical.cpp:1437] No resources available to 
allocate!
I0229 11:08:57.027745   668 hierarchical.cpp:1150] Performed allocation for 
slave fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b-S0 in 272747ns
I0229 11:08:57.027757   675 master.cpp:5369] Sending 1 offers to framework 
fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
I0229 11:08:57.028586   675 master.cpp:5459] Sending 1 inverse offers to 
framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
I0229 11:08:57.029039   675 master.cpp:5459] Sending 1 inverse offers to 
framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
{code}

The ideal expected workflow for this test is something like:

- The framework receives offers from master.
- The framework updates its maintenance schedule.
- The current offer is rescinded.
- A new offer is received from the master with unavailability set.
- After the agent goes for maintenance, an inverse offer is sent.

For some reason, in the logs we see that the master is sending 2 inverse 
offers. The test seems to pass as we just check for the initial inverse offer 
being present. This can also be reproduced by a modified version of the 
original test.

{code}
// Test ensures that an offer will have an `unavailability` set if the
// slave is scheduled to go down for maintenance.
TEST_F(MasterMaintenanceTest, PendingUnavailabilityTest)
{
  Try master = StartMaster();
  ASSERT_SOME(master);

  MockExecutor exec(DEFAULT_EXECUTOR_ID);

  Try slave = StartSlave();
  ASSERT_SOME(slave);

  auto scheduler = std::make_shared();

  EXPECT_CALL(*scheduler, heartbeat(_))
.WillRepeatedly(Return()); // Ignore heartbeats.

  Future connected;
  EXPECT_CALL(*scheduler, connected(_))
.WillOnce(FutureSatisfy())
.WillRepeatedly(Return()); // Ignore future invocations.

  scheduler::TestV1Mesos mesos(master.get(), ContentType::PROTOBUF, scheduler);

  AWAIT_READY(connected);

  Future subscribed;
  EXPECT_CALL(*scheduler, subscribed(_, _))
.WillOnce(FutureArg<1>());

  Future normalOffers;
  Future unavailabilityOffers;
  Future inverseOffers;
  EXPECT_CALL(*scheduler, offers(_, _))
.WillOnce(FutureArg<1>())
.WillOnce(FutureArg<1>())
.WillOnce(FutureArg<1>());

  // The original offers should be rescinded when the unavailability is changed.
  Future offerRescinded;
  EXPECT_CALL(*scheduler, rescind(_, _))
.WillOnce(FutureSatisfy());

  {
Call call;
call.set_type(Call::SUBSCRIBE);

Call::Subscribe* subscribe = call.mutable_subscribe();
subscribe->mutable_framework_info()->CopyFrom(DEFAULT_V1_FRAMEWORK_INFO);

mesos.send(call);
  }

  AWAIT_READY(subscribed);

  v1::FrameworkID frameworkId(subscribed->framework_id());

  AWAIT_READY(normalOffers);
  EXPECT_NE(0, normalOffers->offers().size());

  // Regular offers shouldn't have unavailability.
  foreach (const v1::Offer& offer, normalOffers->offers()) {
EXPECT_FALSE(offer.has_unavailability());
  }

  // Schedule this slave for maintenance.
  MachineID machine;
  machine.set_hostname(maintenanceHostname);
  machine.set_ip(stringify(slave.get().address.ip));

  const Time start = Clock::now() + Seconds(60);
  const Duration duration = Seconds(120);
  const Unavailability unavailability = createUnavailability(start, duration);

  // Post a valid schedule with one machine.
  maintenance::Schedule schedule = createSchedule(
  {createWindow({machine}, unavailability)});

  // We have a few seconds between the first set of offers and the
  // next allocation of offers. This should be enough time to perform
  // a maintenance schedule update. This update will also trigger the
  // rescinding of offers from the scheduled slave.
  Future response = process::http::post(
  master.get(),
  "maintenance/schedule",
  headers,
  stringify(JSON::protobuf(schedule)));

  AWAIT_EXPECT_RESPONSE_STATUS_EQ(OK().status, response);

  // The original offers should be rescinded when the unavailability
  // is changed.
  AWAIT_READY(offerRescinded);

  AWAIT_READY(unavailabilityOffers);
  EXPECT_NE(0, unavailabilityOffers->offers().size());

  // Make sure the new offers have the unavailability set.
  foreach (const v1::Offer& offer, unavailabilityOffers->offers()) {
EXPECT_TRUE(offer.has_unavailability());
EXPECT_EQ(
unavailability.start().nanoseconds(),
offer.unavailability().start().nanoseconds());

EXPECT_EQ(
unavailability.duration().nanoseconds(),

[jira] [Created] (MESOS-4834) Add 'file' fetcher plugin.

2016-03-01 Thread Jojy Varghese (JIRA)
Jojy Varghese created MESOS-4834:


 Summary: Add 'file' fetcher plugin.
 Key: MESOS-4834
 URL: https://issues.apache.org/jira/browse/MESOS-4834
 Project: Mesos
  Issue Type: Task
  Components: containerization
Reporter: Jojy Varghese
Assignee: Jojy Varghese


Add support for "file" based URI fetcher. This could be useful for container 
image provisioning from local file system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4586) Resources clarification in Mesos UI

2016-03-01 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4586:
--
Affects Version/s: (was: 0.27.0)
   (was: 0.26.0)

In the main page, "Used" should actually be called "Allocated" because it 
represents the resources allocated.

"Offered" represents resources that are currently offered to framework(s) but 
frameworks haven't accepted/declined them.

"Idle": Total - Used/Allocated - Offered. Note that even though a resource 
might be idle it might not be offered to framework(s) if there are filters set 
on it (e.g., declined by a framework for 1 day).

> Resources clarification in Mesos UI
> ---
>
> Key: MESOS-4586
> URL: https://issues.apache.org/jira/browse/MESOS-4586
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Craig W
>
> On the Mesos UI under the "resources" section when it lists CPUs and Mem, 
> it seems to be calculated by sum up every executor cpu and memory statistics, 
> which would be less than <= "allocated" resources.
> On the page that displays information for a slave it shows the CPUs and Mem 
> show used and allocated.
> When I look at the Mesos UI front page, I was looking at "Idle" resources as 
> the amount of resources I have available for offers. However, that's not the 
> case. It would be nice to have it show the amount of "free" or "available" 
> resources as well as "idle", so I can better determine how many resources I 
> actually have available for scheduling additional tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4825) Master's slave reregister logic does not update version field

2016-03-01 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174659#comment-15174659
 ] 

Klaus Ma commented on MESOS-4825:
-

RR: https://reviews.apache.org/r/44236/

> Master's slave reregister logic does not update version field
> -
>
> Key: MESOS-4825
> URL: https://issues.apache.org/jira/browse/MESOS-4825
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joris Van Remoortere
>Assignee: Klaus Ma
>Priority: Blocker
> Fix For: 0.28.0
>
>
> The master's logic for reregistering a slave does not update the version 
> field if the slave re-registers with a new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4831) Master sometimes sends two inverse offers after the agent goes into maintenance.

2016-03-01 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4831:
--
Description: 
Showed up on ASF CI for {{MasterMaintenanceTest.PendingUnavailabilityTest}}

https://builds.apache.org/job/Mesos/1748/COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/consoleFull

{code}
I0229 11:08:57.027559   668 hierarchical.cpp:1437] No resources available to 
allocate!
I0229 11:08:57.027745   668 hierarchical.cpp:1150] Performed allocation for 
slave fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b-S0 in 272747ns
I0229 11:08:57.027757   675 master.cpp:5369] Sending 1 offers to framework 
fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
I0229 11:08:57.028586   675 master.cpp:5459] Sending 1 inverse offers to 
framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
I0229 11:08:57.029039   675 master.cpp:5459] Sending 1 inverse offers to 
framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
{code}

The ideal expected workflow for this test is something like:

- The framework receives offers from master.
- The framework updates its maintenance schedule.
- The current offer is rescinded.
- A new offer is received from the master with unavailability set.
- After the agent goes for maintenance, an inverse offer is sent.

For some reason, in the logs we see that the master is sending 2 inverse 
offers. The test seems to pass as we just check for the initial inverse offer 
being present. 

Also, unrelated, we need to clean up this test to not expect multiple offers 
i.e. remove {{numberOfOffers}} constant.

  was:
Showed up on ASF CI for {{MasterMaintenanceTest.PendingUnavailabilityTest}}

{code}
I0229 11:08:57.027559   668 hierarchical.cpp:1437] No resources available to 
allocate!
I0229 11:08:57.027745   668 hierarchical.cpp:1150] Performed allocation for 
slave fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b-S0 in 272747ns
I0229 11:08:57.027757   675 master.cpp:5369] Sending 1 offers to framework 
fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
I0229 11:08:57.028586   675 master.cpp:5459] Sending 1 inverse offers to 
framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
I0229 11:08:57.029039   675 master.cpp:5459] Sending 1 inverse offers to 
framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
{code}

The ideal expected workflow for this test is something like:

- The framework receives offers from master.
- The framework updates its maintenance schedule.
- The current offer is rescinded.
- A new offer is received from the master with unavailability set.
- After the agent goes for maintenance, an inverse offer is sent.

For some reason, in the logs we see that the master is sending 2 inverse 
offers. The test seems to pass as we just check for the initial inverse offer 
being present. 

Also, unrelated, we need to clean up this test to not expect multiple offers 
i.e. remove {{numberOfOffers}} constant.


> Master sometimes sends two inverse offers after the agent goes into 
> maintenance.
> 
>
> Key: MESOS-4831
> URL: https://issues.apache.org/jira/browse/MESOS-4831
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0
>Reporter: Anand Mazumdar
>  Labels: maintenance, mesosphere
>
> Showed up on ASF CI for {{MasterMaintenanceTest.PendingUnavailabilityTest}}
> https://builds.apache.org/job/Mesos/1748/COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/consoleFull
> {code}
> I0229 11:08:57.027559   668 hierarchical.cpp:1437] No resources available to 
> allocate!
> I0229 11:08:57.027745   668 hierarchical.cpp:1150] Performed allocation for 
> slave fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b-S0 in 272747ns
> I0229 11:08:57.027757   675 master.cpp:5369] Sending 1 offers to framework 
> fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> I0229 11:08:57.028586   675 master.cpp:5459] Sending 1 inverse offers to 
> framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> I0229 11:08:57.029039   675 master.cpp:5459] Sending 1 inverse offers to 
> framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
> {code}
> The ideal expected workflow for this test is something like:
> - The framework receives offers from master.
> - The framework updates its maintenance schedule.
> - The current offer is rescinded.
> - A new offer is received from the master with unavailability set.
> - After the agent goes for maintenance, an inverse offer is sent.
> For some reason, in the logs we see that the master is sending 2 inverse 
> offers. The test seems to pass as we just check for the initial inverse offer 
> being present. 
> Also, unrelated, we need to 

[jira] [Updated] (MESOS-4740) Improve metrics/snapshot performace

2016-03-01 Thread David Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Robinson updated MESOS-4740:
--
Description: 
[~drobinson] noticed retrieving metrics/snapshot statistics could be very 
inefficient.

{noformat}
[user@server ~]$ time curl -s localhost:5050/metrics/snapshot

real0m35.654s
user0m0.019s
sys 0m0.011s
{noformat}

MESOS-1287 introduces a timeout parameter for this query, but for 
metric-collectors like ours they are not aware of such URL-specific parameter, 
so we need:

1) We should always have a timeout and set some default value to it

2) Investigate why metrics/snapshot could take such a long time to complete 
under load, since we don't use history for these statistics and the values are 
just some atomic read.


  was:
David Robinson noticed retrieving metrics/snapshot statistics could be very 
inefficient and cause Mesos master stuck.

{noformat}
[root@atla-bny-34-sr1 ~]# time curl -s localhost:5051/metrics/snapshot

real2m7.302s
user0m0.001s
sys0m0.004s
{noformat}

MESOS-1287 introduces a timeout parameter for this query, but for observers 
like ours they are not aware of such URL-specific parameter, so we need:

1) We should always have a timeout and set some default value to it

2) Investigate why metrics/snapshot could take such a long time to complete 
under load, since we don't use history for these statistics and the values are 
just some atomic read.



> Improve metrics/snapshot performace
> ---
>
> Key: MESOS-4740
> URL: https://issues.apache.org/jira/browse/MESOS-4740
> Project: Mesos
>  Issue Type: Task
>Reporter: Cong Wang
>Assignee: Cong Wang
>
> [~drobinson] noticed retrieving metrics/snapshot statistics could be very 
> inefficient.
> {noformat}
> [user@server ~]$ time curl -s localhost:5050/metrics/snapshot
> real  0m35.654s
> user  0m0.019s
> sys   0m0.011s
> {noformat}
> MESOS-1287 introduces a timeout parameter for this query, but for 
> metric-collectors like ours they are not aware of such URL-specific 
> parameter, so we need:
> 1) We should always have a timeout and set some default value to it
> 2) Investigate why metrics/snapshot could take such a long time to complete 
> under load, since we don't use history for these statistics and the values 
> are just some atomic read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4415) Implement stout/os/windows/rmdir.hpp

2016-03-01 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174573#comment-15174573
 ] 

Joris Van Remoortere commented on MESOS-4415:
-

https://reviews.apache.org/r/43907/
https://reviews.apache.org/r/43908/

> Implement stout/os/windows/rmdir.hpp
> 
>
> Key: MESOS-4415
> URL: https://issues.apache.org/jira/browse/MESOS-4415
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Joris Van Remoortere
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows
> Fix For: 0.27.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4821) Introduce a port field in `ImageManifest` in order to set exposed ports for a container.

2016-03-01 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-4821:
-
Description: Networking isolators such as `network/cni` need to learn about 
ports that a container wishes to be exposed to the outside world. This can be 
achieved by adding a field to the `ImageManifest` protobuf and allowing the 
`ImageProvisioner` to set these fields to inform the isolator of the ports that 
the container wishes to be exposed.   (was: Networking isolators such as 
`network/cni` need to learn about ports that a container wishes to be exposed 
to the outside world. This can be achieved by adding a field to the 
`ContainerConfig` protobuf and allowing the `Containerizer` or framework set 
these fields to inform the isolator of the ports that the container wishes to 
be exposed. )
Summary: Introduce a port field in `ImageManifest` in order to set 
exposed ports for a container.  (was: Introduce a port field in 
`ContainerConfig` in order to set exposed ports for a container.)

> Introduce a port field in `ImageManifest` in order to set exposed ports for a 
> container.
> 
>
> Key: MESOS-4821
> URL: https://issues.apache.org/jira/browse/MESOS-4821
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Networking isolators such as `network/cni` need to learn about ports that a 
> container wishes to be exposed to the outside world. This can be achieved by 
> adding a field to the `ImageManifest` protobuf and allowing the 
> `ImageProvisioner` to set these fields to inform the isolator of the ports 
> that the container wishes to be exposed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4780) Remove `user` and `rootfs` flags in Windows launcher.

2016-03-01 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174440#comment-15174440
 ] 

Joris Van Remoortere edited comment on MESOS-4780 at 3/1/16 10:42 PM:
--

https://reviews.apache.org/r/43904/
https://reviews.apache.org/r/43905/
https://reviews.apache.org/r/40938/
https://reviews.apache.org/r/40939/


was (Author: jvanremoortere):
https://reviews.apache.org/r/43904/
https://reviews.apache.org/r/43905/

> Remove `user` and `rootfs` flags in Windows launcher.
> -
>
> Key: MESOS-4780
> URL: https://issues.apache.org/jira/browse/MESOS-4780
> Project: Mesos
>  Issue Type: Task
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows-mvp
> Fix For: 0.28.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4780) Remove `user` and `rootfs` flags in Windows launcher.

2016-03-01 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174457#comment-15174457
 ] 

Joris Van Remoortere edited comment on MESOS-4780 at 3/1/16 10:42 PM:
--

{code}
commit 9f1b115a67a1625a4807c2a7d4e1a41bca1af2a6
Author: Daniel Pravat 
Date:   Tue Mar 1 14:18:41 2016 -0800

Stout: Marked `os::su` as deleted on Windows.

Review: https://reviews.apache.org/r/40939/

commit a1f731746657b1cbcf136ddb2bf154ca3da271fc
Author: Daniel Pravat 
Date:   Tue Mar 1 14:16:08 2016 -0800

Stout: Marked `os::chroot` as deleted on Windows.

Review: https://reviews.apache.org/r/40938/

commit a1a9cd5939d25f82214a5c533bde96a3493f81f3
Author: Alex Clemmer 
Date:   Tue Mar 1 13:35:13 2016 -0800

Windows: Stout: Removed user based functions.

Review: https://reviews.apache.org/r/43905/

commit b9de8c6a06f0d0246ea38ab5586de1d0b2478c38
Author: Alex Clemmer 
Date:   Tue Mar 1 13:33:37 2016 -0800

Windows: Removed `user` launcher flag, preventing `su`.

`su` does not exist on Windows. Unfortunately, the launcher also depends
on it. In this commit, we remove Windows support for the launcher flag
`user`, which controls whether we use `su` in the launcher. This
allows us to divest ourselves of `su` altogether on Windows.

Review: https://reviews.apache.org/r/43905/
{code}


was (Author: jvanremoortere):
{code}
commit a1a9cd5939d25f82214a5c533bde96a3493f81f3
Author: Alex Clemmer 
Date:   Tue Mar 1 13:35:13 2016 -0800

Windows: Stout: Removed user based functions.

Review: https://reviews.apache.org/r/43905/

commit b9de8c6a06f0d0246ea38ab5586de1d0b2478c38
Author: Alex Clemmer 
Date:   Tue Mar 1 13:33:37 2016 -0800

Windows: Removed `user` launcher flag, preventing `su`.

`su` does not exist on Windows. Unfortunately, the launcher also depends
on it. In this commit, we remove Windows support for the launcher flag
`user`, which controls whether we use `su` in the launcher. This
allows us to divest ourselves of `su` altogether on Windows.

Review: https://reviews.apache.org/r/43905/
{code}

> Remove `user` and `rootfs` flags in Windows launcher.
> -
>
> Key: MESOS-4780
> URL: https://issues.apache.org/jira/browse/MESOS-4780
> Project: Mesos
>  Issue Type: Task
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows-mvp
> Fix For: 0.28.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4712) Remove 'force' field from the Subscribe Call in v1 Scheduler API

2016-03-01 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174542#comment-15174542
 ] 

Vinod Kone commented on MESOS-4712:
---

test

> Remove 'force' field from the Subscribe Call in v1 Scheduler API
> 
>
> Key: MESOS-4712
> URL: https://issues.apache.org/jira/browse/MESOS-4712
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Vinod Kone
> Fix For: 0.28.0
>
>
> We/I introduced the `force` field in SUBSCRIBE call to deal with scheduler 
> partition cases. Having thought a bit more and discussing with few other 
> folks ([~anandmazumdar], [~greggomann]), I think we can get away from not 
> having that field in the v1 API. The obvious advantage of removing the field 
> is that framework devs don't have to think about how/when to set the field 
> (the current semantics are a bit confusing).
> The new workflow when a master receives a SUBSCRIBE call is that master 
> always accepts this call and closes any existing connection (after sending 
> ERROR event) from the same scheduler (identified by framework id).  
> The expectation from schedulers is that they must close the old subscribe 
> connection before resending a new SUBSCRIBE call.
> Lets look at some tricky scenarios and see how this works and why it is safe.
> 1) Connection disconnection @ the scheduler but not @ the master
>
> Scheduler sees the disconnection and sends a new SUBSCRIBE call. Master sends 
> ERROR on the old connection (won't be received by the scheduler because the 
> connection is already closed) and closes it.
> 2) Connection disconnection @ master but not @ scheduler
> Scheduler realizes this from lack of HEARTBEAT events. It then closes its 
> existing connection and sends a new SUBSCRIBE call. Master accepts the new 
> SUBSCRIBE call. There is no old connection to close on the master as it is 
> already closed.
> 3) Scheduler failover but no disconnection @ master
> Newly elected scheduler sends a SUBSCRIBE call. Master sends ERROR event and 
> closes the old connection (won't be received because the old scheduler failed 
> over).
> 4) If Scheduler A got partitioned (but is alive and connected with master) 
> and Scheduler B got elected as new leader.
> When Scheduler B sends SUBSCRIBE, master sends ERROR and closes the 
> connection from Scheduler A. Master accepts Scheduler B's connection. 
> Typically Scheduler A aborts after receiving ERROR and gets restarted. After 
> restart it won't become the leader because Scheduler B is already elected.
> 5) Scheduler sends SUBSCRIBE, times out, closes the SUBSCRIBE connection (A) 
> and sends a new SUBSCRIBE (B). Master receives SUBSCRIBE (B) and then 
> receives SUBSCRIBE (A) but doesn't see A's disconnection yet.
> Master first accepts SUBSCRIBE (B). After it receives SUBSCRIBE (A), it sends 
> ERROR to SUBSCRIBE (B) and closes that connection. When it accepts SUBSCRIBE 
> (A) and tries to send SUBSCRIBED event the connection closure is detected. 
> Scheduler retries the SUBSCRIBE connection after a backoff. I think this is a 
> rare enough race for it to happen continuously in a loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-4712) Remove 'force' field from the Subscribe Call in v1 Scheduler API

2016-03-01 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4712:
--
Comment: was deleted

(was: test)

> Remove 'force' field from the Subscribe Call in v1 Scheduler API
> 
>
> Key: MESOS-4712
> URL: https://issues.apache.org/jira/browse/MESOS-4712
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Vinod Kone
> Fix For: 0.28.0
>
>
> We/I introduced the `force` field in SUBSCRIBE call to deal with scheduler 
> partition cases. Having thought a bit more and discussing with few other 
> folks ([~anandmazumdar], [~greggomann]), I think we can get away from not 
> having that field in the v1 API. The obvious advantage of removing the field 
> is that framework devs don't have to think about how/when to set the field 
> (the current semantics are a bit confusing).
> The new workflow when a master receives a SUBSCRIBE call is that master 
> always accepts this call and closes any existing connection (after sending 
> ERROR event) from the same scheduler (identified by framework id).  
> The expectation from schedulers is that they must close the old subscribe 
> connection before resending a new SUBSCRIBE call.
> Lets look at some tricky scenarios and see how this works and why it is safe.
> 1) Connection disconnection @ the scheduler but not @ the master
>
> Scheduler sees the disconnection and sends a new SUBSCRIBE call. Master sends 
> ERROR on the old connection (won't be received by the scheduler because the 
> connection is already closed) and closes it.
> 2) Connection disconnection @ master but not @ scheduler
> Scheduler realizes this from lack of HEARTBEAT events. It then closes its 
> existing connection and sends a new SUBSCRIBE call. Master accepts the new 
> SUBSCRIBE call. There is no old connection to close on the master as it is 
> already closed.
> 3) Scheduler failover but no disconnection @ master
> Newly elected scheduler sends a SUBSCRIBE call. Master sends ERROR event and 
> closes the old connection (won't be received because the old scheduler failed 
> over).
> 4) If Scheduler A got partitioned (but is alive and connected with master) 
> and Scheduler B got elected as new leader.
> When Scheduler B sends SUBSCRIBE, master sends ERROR and closes the 
> connection from Scheduler A. Master accepts Scheduler B's connection. 
> Typically Scheduler A aborts after receiving ERROR and gets restarted. After 
> restart it won't become the leader because Scheduler B is already elected.
> 5) Scheduler sends SUBSCRIBE, times out, closes the SUBSCRIBE connection (A) 
> and sends a new SUBSCRIBE (B). Master receives SUBSCRIBE (B) and then 
> receives SUBSCRIBE (A) but doesn't see A's disconnection yet.
> Master first accepts SUBSCRIBE (B). After it receives SUBSCRIBE (A), it sends 
> ERROR to SUBSCRIBE (B) and closes that connection. When it accepts SUBSCRIBE 
> (A) and tries to send SUBSCRIBED event the connection closure is detected. 
> Scheduler retries the SUBSCRIBE connection after a backoff. I think this is a 
> rare enough race for it to happen continuously in a loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4820) Need to set `EXPOSED` ports from docker images into `ContainerConfig`

2016-03-01 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan reassigned MESOS-4820:


Assignee: Avinash Sridharan

> Need to set `EXPOSED` ports from docker images into `ContainerConfig`
> -
>
> Key: MESOS-4820
> URL: https://issues.apache.org/jira/browse/MESOS-4820
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Critical
>  Labels: mesosphere
>
> Most docker images have an `EXPOSE` command associated with them. This tells 
> the container run-time the TCP ports that the micro-service "wishes" to 
> expose to the outside world. 
> With the `Unified containerizer` project since `MesosContainerizer` is going 
> to natively support docker images it is imperative that the Mesos container 
> run time have a mechanism to expose ports listed in a Docker image. The first 
> step to achieve this is to extract this information from the `Docker` image 
> and set in the `ContainerConfig` . The `ContainerConfig` can then be used to 
> pass this information to any isolator (for e.g. `network/cni` isolator) that 
> will install port forwarding rules to expose the desired ports.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4832) DockerContainerizerTest.ROOT_DOCKER_RecoverOrphanedPersistentVolumes exits when the /tmp directory is bind-mounted

2016-03-01 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174482#comment-15174482
 ] 

Jie Yu commented on MESOS-4832:
---

I think the problem is that we're trying to umount the persistent volume twice:
{noformat}
I0226 03:17:28.127876  1114 docker.cpp:912] Unmounting volume for container 
'bcc90102-163d-4ff6-a3fc-a1b2e3fc3b7c'
I0226 03:17:28.127957  1114 docker.cpp:912] Unmounting volume for container 
'bcc90102-163d-4ff6-a3fc-a1b2e3fc3b7c'
{noformat}

Looking at the code:
{code}
Try unmountPersistentVolumes(const ContainerID& containerId)
{
  // We assume volumes are only supported on Linux, and also
  // the target path contains the containerId.
#ifdef __linux__
  Try table = fs::MountInfoTable::read();
  if (table.isError()) {
return Error("Failed to get mount table: " + table.error());
  }

  foreach (const fs::MountInfoTable::Entry& entry,
   adaptor::reverse(table.get().entries)) {
// TODO(tnachen): We assume there is only one docker container
// running per container Id and no other mounts will have the
// container Id name. We might need to revisit if this is no
// longer true.
if (strings::contains(entry.target, containerId.value())) {
  LOG(INFO) << "Unmounting volume for container '" << containerId
<< "'"; 
  Try unmount = fs::unmount(entry.target);
  if (unmount.isError()) {
return Error("Failed to unmount volume '" + entry.target +
 "': " + unmount.error());
  }
}
  }
#endif // __linux__
  return Nothing();
}
{code}

We rely on {noformat}if (strings::contains(entry.target, containerId.value())) 
{noformat} to discovery persistent volume mounts. But on some system settings, 
if the slave's work_dir is under a bind mount, and the parent of that bind 
mount is a 'shared' mount, that mount of persistent volumes will be propagated 
to another mount point. That means there will be two mounts in the mount table 
that contain the 'containerId'.

There are two issues:
1) we should modify unmountPersistentVolumes to be more robust. One simple fix 
is to check if 'entry.target' is under slave's work_dir or not.
2) Ideally, we should do the same as we did in LinuxFilesystemIsolator to make 
slave's work_dir a slave+shared mount. I'll add a TODO

> DockerContainerizerTest.ROOT_DOCKER_RecoverOrphanedPersistentVolumes exits 
> when the /tmp directory is bind-mounted
> --
>
> Key: MESOS-4832
> URL: https://issues.apache.org/jira/browse/MESOS-4832
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.27.0
> Environment: Seen on CentOS 7 & Debian 8.
>Reporter: Joseph Wu
>Assignee: Jie Yu
>  Labels: mesosphere, test
> Fix For: 0.28.0
>
>
> If the {{/tmp}} directory (where Mesos tests create temporary directories) is 
> a bind mount, the test suite will exit here:
> {code}
> [ RUN  ] 
> DockerContainerizerTest.ROOT_DOCKER_RecoverOrphanedPersistentVolumes
> I0226 03:17:26.722806  1097 leveldb.cpp:174] Opened db in 12.587676ms
> I0226 03:17:26.723496  1097 leveldb.cpp:181] Compacted db in 636999ns
> I0226 03:17:26.723536  1097 leveldb.cpp:196] Created db iterator in 18271ns
> I0226 03:17:26.723547  1097 leveldb.cpp:202] Seeked to beginning of db in 
> 1555ns
> I0226 03:17:26.723554  1097 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 363ns
> I0226 03:17:26.723593  1097 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0226 03:17:26.724128  1117 recover.cpp:447] Starting replica recovery
> I0226 03:17:26.724367  1117 recover.cpp:473] Replica is in EMPTY status
> I0226 03:17:26.725237  1117 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (13810)@172.30.2.151:51934
> I0226 03:17:26.725744  1114 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0226 03:17:26.726356   master.cpp:376] Master 
> 5cc57c0e-f1ad-4107-893f-420ed1a1db1a (ip-172-30-2-151.mesosphere.io) started 
> on 172.30.2.151:51934
> I0226 03:17:26.726369  1118 recover.cpp:564] Updating replica status to 
> STARTING
> I0226 03:17:26.726378   master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/djHTVQ/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" 

[jira] [Updated] (MESOS-4740) Improve metrics/snapshot performace

2016-03-01 Thread Cong Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cong Wang updated MESOS-4740:
-
Description: 
David Robinson noticed retrieving metrics/snapshot statistics could be very 
inefficient and cause Mesos master stuck.

{noformat}
[root@atla-bny-34-sr1 ~]# time curl -s localhost:5051/metrics/snapshot

real2m7.302s
user0m0.001s
sys0m0.004s
{noformat}

MESOS-1287 introduces a timeout parameter for this query, but for observers 
like ours they are not aware of such URL-specific parameter, so we need:

1) We should always have a timeout and set some default value to it

2) Investigate why metrics/snapshot could take such a long time to complete 
under load, since we don't use history for these statistics and the values are 
just some atomic read.


  was:
David Robinson noticed retrieving metrics/snapshot statistics could be very 
inefficient and cause Mesos master stuck.

{noformat}
[root@atla-bny-34-sr1 ~]# time curl -s localhost:5051/metrics/snapshot

real2m7.302s
user0m0.001s
sys0m0.004s
{noformat}

>From a quick glance of the code, this *seems* due to we sort all the values 
>saved in the time series when calculating percentiles.

{noformat}
foreach (const typename TimeSeries::Value& value, values_) {
  values.push_back(value.data);
}

std::sort(values.begin(), values.end());
{noformat}



> Improve metrics/snapshot performace
> ---
>
> Key: MESOS-4740
> URL: https://issues.apache.org/jira/browse/MESOS-4740
> Project: Mesos
>  Issue Type: Task
>Reporter: Cong Wang
>Assignee: Cong Wang
>
> David Robinson noticed retrieving metrics/snapshot statistics could be very 
> inefficient and cause Mesos master stuck.
> {noformat}
> [root@atla-bny-34-sr1 ~]# time curl -s localhost:5051/metrics/snapshot
> real2m7.302s
> user0m0.001s
> sys0m0.004s
> {noformat}
> MESOS-1287 introduces a timeout parameter for this query, but for observers 
> like ours they are not aware of such URL-specific parameter, so we need:
> 1) We should always have a timeout and set some default value to it
> 2) Investigate why metrics/snapshot could take such a long time to complete 
> under load, since we don't use history for these statistics and the values 
> are just some atomic read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4832) DockerContainerizerTest.ROOT_DOCKER_RecoverOrphanedPersistentVolumes exits when the /tmp directory is bind-mounted

2016-03-01 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4832:
--
Fix Version/s: 0.28.0

> DockerContainerizerTest.ROOT_DOCKER_RecoverOrphanedPersistentVolumes exits 
> when the /tmp directory is bind-mounted
> --
>
> Key: MESOS-4832
> URL: https://issues.apache.org/jira/browse/MESOS-4832
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.27.0
> Environment: Seen on CentOS 7 & Debian 8.
>Reporter: Joseph Wu
>Assignee: Jie Yu
>  Labels: mesosphere, test
> Fix For: 0.28.0
>
>
> If the {{/tmp}} directory (where Mesos tests create temporary directories) is 
> a bind mount, the test suite will exit here:
> {code}
> [ RUN  ] 
> DockerContainerizerTest.ROOT_DOCKER_RecoverOrphanedPersistentVolumes
> I0226 03:17:26.722806  1097 leveldb.cpp:174] Opened db in 12.587676ms
> I0226 03:17:26.723496  1097 leveldb.cpp:181] Compacted db in 636999ns
> I0226 03:17:26.723536  1097 leveldb.cpp:196] Created db iterator in 18271ns
> I0226 03:17:26.723547  1097 leveldb.cpp:202] Seeked to beginning of db in 
> 1555ns
> I0226 03:17:26.723554  1097 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 363ns
> I0226 03:17:26.723593  1097 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0226 03:17:26.724128  1117 recover.cpp:447] Starting replica recovery
> I0226 03:17:26.724367  1117 recover.cpp:473] Replica is in EMPTY status
> I0226 03:17:26.725237  1117 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (13810)@172.30.2.151:51934
> I0226 03:17:26.725744  1114 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0226 03:17:26.726356   master.cpp:376] Master 
> 5cc57c0e-f1ad-4107-893f-420ed1a1db1a (ip-172-30-2-151.mesosphere.io) started 
> on 172.30.2.151:51934
> I0226 03:17:26.726369  1118 recover.cpp:564] Updating replica status to 
> STARTING
> I0226 03:17:26.726378   master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/djHTVQ/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/djHTVQ/master" 
> --zk_session_timeout="10secs"
> I0226 03:17:26.726605   master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0226 03:17:26.726616   master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0226 03:17:26.726632   credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/djHTVQ/credentials'
> I0226 03:17:26.726860   master.cpp:468] Using default 'crammd5' 
> authenticator
> I0226 03:17:26.726977   master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0226 03:17:26.727092   master.cpp:571] Authorization enabled
> I0226 03:17:26.727243  1118 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0226 03:17:26.727285  1116 whitelist_watcher.cpp:77] No whitelist given
> I0226 03:17:26.728852  1114 master.cpp:1712] The newly elected leader is 
> master@172.30.2.151:51934 with id 5cc57c0e-f1ad-4107-893f-420ed1a1db1a
> I0226 03:17:26.728876  1114 master.cpp:1725] Elected as the leading master!
> I0226 03:17:26.728891  1114 master.cpp:1470] Recovering from registrar
> I0226 03:17:26.728977  1117 registrar.cpp:307] Recovering registrar
> I0226 03:17:26.731503  1112 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 4.977811ms
> I0226 03:17:26.731539  1112 replica.cpp:320] Persisted replica status to 
> STARTING
> I0226 03:17:26.731711   recover.cpp:473] Replica is in STARTING status
> I0226 03:17:26.732501  1114 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (13812)@172.30.2.151:51934
> I0226 03:17:26.732862   recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0226 

[jira] [Commented] (MESOS-4735) CommandInfo.URI should allow specifying target filename

2016-03-01 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174464#comment-15174464
 ] 

Erik Weathers commented on MESOS-4735:
--

[~dma1982] : kind of correct..  I'm saying that people that use file 
downloading tools (e.g., curl, wget, every single web browser) have the option 
to choose the downloaded resultant filename.  e.g.,
* {{curl -o bar-executor-binary.tgz 
http://somewebserver/bar-executor-binary.tgz.foobarbazblahblahblah}}
* {{wget -O bar-executor-binary.tgz 
http://somewebserver/bar-executor-binary.tgz.foobarbazblahblahblah}}

> CommandInfo.URI should allow specifying target filename
> ---
>
> Key: MESOS-4735
> URL: https://issues.apache.org/jira/browse/MESOS-4735
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Reporter: Erik Weathers
>Assignee: Guangya Liu
>Priority: Minor
>
> The {{CommandInfo.URI}} message should allow explicitly choosing the 
> downloaded file's name, to better mimic functionality present in tools like 
> {{wget}} and {{curl}}.
> This relates to issues when the {{CommandInfo.URI}} is pointing to a URL that 
> has query parameters at the end of the path, resulting in the downloaded 
> filename having those elements.  This also prevents extracting of such files, 
> since the extraction logic is simply looking at the file's suffix. See 
> MESOS-3367, MESOS-1686, and MESOS-1509 for more info.  If this issue was 
> fixed, then I could workaround the other issues not being fixed by modifying 
> my framework's scheduler to set the target filename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4833) Poor allocator performance with labeled resources and/or persistent volumes

2016-03-01 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4833:

Priority: Blocker  (was: Critical)

> Poor allocator performance with labeled resources and/or persistent volumes
> ---
>
> Key: MESOS-4833
> URL: https://issues.apache.org/jira/browse/MESOS-4833
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Blocker
>  Labels: mesosphere, resources
> Fix For: 0.28.0
>
>
> Modifying the {{HierarchicalAllocator_BENCHMARK_Test.ResourceLabels}} 
> benchmark from https://reviews.apache.org/r/43686/ to use distinct labels 
> between different slaves, performance regresses from ~2 seconds to ~3 
> minutes. The culprit seems to be the way in which the allocator merges 
> together resources; reserved resource labels (or persistent volume IDs) 
> inhibit merging, which causes performance to be much worse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4833) Poor allocator performance with labeled resources and/or persistent volumes

2016-03-01 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4833:
---
Shepherd: Joris Van Remoortere

> Poor allocator performance with labeled resources and/or persistent volumes
> ---
>
> Key: MESOS-4833
> URL: https://issues.apache.org/jira/browse/MESOS-4833
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Critical
>  Labels: mesosphere, resources
> Fix For: 0.28.0
>
>
> Modifying the {{HierarchicalAllocator_BENCHMARK_Test.ResourceLabels}} 
> benchmark from https://reviews.apache.org/r/43686/ to use distinct labels 
> between different slaves, performance regresses from ~2 seconds to ~3 
> minutes. The culprit seems to be the way in which the allocator merges 
> together resources; reserved resource labels (or persistent volume IDs) 
> inhibit merging, which causes performance to be much worse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4833) Poor allocator performance with labeled resources and/or persistent volumes

2016-03-01 Thread Neil Conway (JIRA)
Neil Conway created MESOS-4833:
--

 Summary: Poor allocator performance with labeled resources and/or 
persistent volumes
 Key: MESOS-4833
 URL: https://issues.apache.org/jira/browse/MESOS-4833
 Project: Mesos
  Issue Type: Bug
  Components: allocation
Reporter: Neil Conway
Priority: Critical
 Fix For: 0.28.0


Modifying the {{HierarchicalAllocator_BENCHMARK_Test.ResourceLabels}} benchmark 
from https://reviews.apache.org/r/43686/ to use distinct labels between 
different slaves, performance regresses from ~2 seconds to ~3 minutes. The 
culprit seems to be the way in which the allocator merges together resources; 
reserved resource labels (or persistent volume IDs) inhibit merging, which 
causes performance to be much worse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4833) Poor allocator performance with labeled resources and/or persistent volumes

2016-03-01 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway reassigned MESOS-4833:
--

Assignee: Neil Conway

> Poor allocator performance with labeled resources and/or persistent volumes
> ---
>
> Key: MESOS-4833
> URL: https://issues.apache.org/jira/browse/MESOS-4833
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Critical
>  Labels: mesosphere, resources
> Fix For: 0.28.0
>
>
> Modifying the {{HierarchicalAllocator_BENCHMARK_Test.ResourceLabels}} 
> benchmark from https://reviews.apache.org/r/43686/ to use distinct labels 
> between different slaves, performance regresses from ~2 seconds to ~3 
> minutes. The culprit seems to be the way in which the allocator merges 
> together resources; reserved resource labels (or persistent volume IDs) 
> inhibit merging, which causes performance to be much worse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4780) Remove `user` and `rootfs` flags in Windows launcher.

2016-03-01 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174440#comment-15174440
 ] 

Joris Van Remoortere edited comment on MESOS-4780 at 3/1/16 9:31 PM:
-

https://reviews.apache.org/r/43904/
https://reviews.apache.org/r/43905/


was (Author: jvanremoortere):
https://reviews.apache.org/r/43904/

> Remove `user` and `rootfs` flags in Windows launcher.
> -
>
> Key: MESOS-4780
> URL: https://issues.apache.org/jira/browse/MESOS-4780
> Project: Mesos
>  Issue Type: Task
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: mesosphere, windows-mvp
> Fix For: 0.28.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4832) DockerContainerizerTest.ROOT_DOCKER_RecoverOrphanedPersistentVolumes exits when the /tmp directory is bind-mounted

2016-03-01 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-4832:


 Summary: 
DockerContainerizerTest.ROOT_DOCKER_RecoverOrphanedPersistentVolumes exits when 
the /tmp directory is bind-mounted
 Key: MESOS-4832
 URL: https://issues.apache.org/jira/browse/MESOS-4832
 Project: Mesos
  Issue Type: Bug
  Components: containerization, docker
Affects Versions: 0.27.0
 Environment: Seen on CentOS 7 & Debian 8.
Reporter: Joseph Wu
Assignee: Jie Yu


If the {{/tmp}} directory (where Mesos tests create temporary directories) is a 
bind mount, the test suite will exit here:
{code}
[ RUN  ] 
DockerContainerizerTest.ROOT_DOCKER_RecoverOrphanedPersistentVolumes
I0226 03:17:26.722806  1097 leveldb.cpp:174] Opened db in 12.587676ms
I0226 03:17:26.723496  1097 leveldb.cpp:181] Compacted db in 636999ns
I0226 03:17:26.723536  1097 leveldb.cpp:196] Created db iterator in 18271ns
I0226 03:17:26.723547  1097 leveldb.cpp:202] Seeked to beginning of db in 1555ns
I0226 03:17:26.723554  1097 leveldb.cpp:271] Iterated through 0 keys in the db 
in 363ns
I0226 03:17:26.723593  1097 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0226 03:17:26.724128  1117 recover.cpp:447] Starting replica recovery
I0226 03:17:26.724367  1117 recover.cpp:473] Replica is in EMPTY status
I0226 03:17:26.725237  1117 replica.cpp:673] Replica in EMPTY status received a 
broadcasted recover request from (13810)@172.30.2.151:51934
I0226 03:17:26.725744  1114 recover.cpp:193] Received a recover response from a 
replica in EMPTY status
I0226 03:17:26.726356   master.cpp:376] Master 
5cc57c0e-f1ad-4107-893f-420ed1a1db1a (ip-172-30-2-151.mesosphere.io) started on 
172.30.2.151:51934
I0226 03:17:26.726369  1118 recover.cpp:564] Updating replica status to STARTING
I0226 03:17:26.726378   master.cpp:378] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/djHTVQ/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="100secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/djHTVQ/master" 
--zk_session_timeout="10secs"
I0226 03:17:26.726605   master.cpp:423] Master only allowing authenticated 
frameworks to register
I0226 03:17:26.726616   master.cpp:428] Master only allowing authenticated 
slaves to register
I0226 03:17:26.726632   credentials.hpp:35] Loading credentials for 
authentication from '/tmp/djHTVQ/credentials'
I0226 03:17:26.726860   master.cpp:468] Using default 'crammd5' 
authenticator
I0226 03:17:26.726977   master.cpp:537] Using default 'basic' HTTP 
authenticator
I0226 03:17:26.727092   master.cpp:571] Authorization enabled
I0226 03:17:26.727243  1118 hierarchical.cpp:144] Initialized hierarchical 
allocator process
I0226 03:17:26.727285  1116 whitelist_watcher.cpp:77] No whitelist given
I0226 03:17:26.728852  1114 master.cpp:1712] The newly elected leader is 
master@172.30.2.151:51934 with id 5cc57c0e-f1ad-4107-893f-420ed1a1db1a
I0226 03:17:26.728876  1114 master.cpp:1725] Elected as the leading master!
I0226 03:17:26.728891  1114 master.cpp:1470] Recovering from registrar
I0226 03:17:26.728977  1117 registrar.cpp:307] Recovering registrar
I0226 03:17:26.731503  1112 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 4.977811ms
I0226 03:17:26.731539  1112 replica.cpp:320] Persisted replica status to 
STARTING
I0226 03:17:26.731711   recover.cpp:473] Replica is in STARTING status
I0226 03:17:26.732501  1114 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from (13812)@172.30.2.151:51934
I0226 03:17:26.732862   recover.cpp:193] Received a recover response from a 
replica in STARTING status
I0226 03:17:26.733264  1117 recover.cpp:564] Updating replica status to VOTING
I0226 03:17:26.733836  1118 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 388246ns
I0226 03:17:26.733855  1118 replica.cpp:320] Persisted replica status to VOTING
I0226 03:17:26.733979  1113 recover.cpp:578] Successfully joined the Paxos group
I0226 03:17:26.734149  1113 recover.cpp:462] Recover process terminated
I0226 03:17:26.734478   log.cpp:659] Attempting to start the 

[jira] [Updated] (MESOS-4824) "filesystem/linux" isolator does not unmount orphaned persistent volumes

2016-03-01 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-4824:
-
   Sprint: Mesosphere Sprint 30
 Priority: Blocker  (was: Major)
Fix Version/s: 0.28.0

> "filesystem/linux" isolator does not unmount orphaned persistent volumes
> 
>
> Key: MESOS-4824
> URL: https://issues.apache.org/jira/browse/MESOS-4824
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Affects Versions: 0.24.0, 0.25.0, 0.26.0, 0.27.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Blocker
>  Labels: containerizer, mesosphere, persistent-volumes
> Fix For: 0.28.0
>
>
> A persistent volume can be orphaned when:
> # A framework registers with checkpointing enabled.
> # The framework starts a task + a persistent volume.
> # The agent exits.  The task continues to run.
> # Something wipes the agent's {{meta}} directory.  This removes the 
> checkpointed framework info from the agent.
> # The agent comes back and recovers.  The framework for the task is not 
> found, so the task is considered orphaned now.
> The agent currently does not unmount the persistent volume, saying (with 
> {{GLOG_v=1}}) 
> {code}
> I0229 23:55:42.078940  5635 linux.cpp:711] Ignoring cleanup request for 
> unknown container: a35189d3-85d5-4d02-b568-67f675b6dc97
> {code}
> Test implemented here: https://reviews.apache.org/r/44122/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4708) Provide a Mesos build for Ubuntu 15.10 Wily

2016-03-01 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4708:
--
Affects Version/s: (was: 0.27.0)

> Provide a Mesos build for Ubuntu 15.10 Wily
> ---
>
> Key: MESOS-4708
> URL: https://issues.apache.org/jira/browse/MESOS-4708
> Project: Mesos
>  Issue Type: Wish
>Reporter: Ludovic Claude
>
> Hello,
> I am running Mesos on Ubuntu. Recently, I was using Ubuntu 15.04 but because 
> Docker does not support anymore this version, I decided to upgrade to 15.10. 
> Then I realised - too late - that Mesos does not support officially Ubuntu 
> 15.10. Is there a way out?
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4831) Master sometimes sends two inverse offers after the agent goes into maintenance.

2016-03-01 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4831:
-

 Summary: Master sometimes sends two inverse offers after the agent 
goes into maintenance.
 Key: MESOS-4831
 URL: https://issues.apache.org/jira/browse/MESOS-4831
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.27.0
Reporter: Anand Mazumdar


Showed up on ASF CI for {{MasterMaintenanceTest.PendingUnavailabilityTest}}

{code}
I0229 11:08:57.027559   668 hierarchical.cpp:1437] No resources available to 
allocate!
I0229 11:08:57.027745   668 hierarchical.cpp:1150] Performed allocation for 
slave fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b-S0 in 272747ns
I0229 11:08:57.027757   675 master.cpp:5369] Sending 1 offers to framework 
fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
I0229 11:08:57.028586   675 master.cpp:5459] Sending 1 inverse offers to 
framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
I0229 11:08:57.029039   675 master.cpp:5459] Sending 1 inverse offers to 
framework fd39ca89-d7fd-4df8-ad50-dbb493d1cd7b- (default)
{code}

The ideal expected workflow for this test is something like:

- The framework receives offers from master.
- The framework updates its maintenance schedule.
- The current offer is rescinded.
- A new offer is received from the master with unavailability set.
- After the agent goes for maintenance, an inverse offer is sent.

For some reason, in the logs we see that the master is sending 2 inverse 
offers. The test seems to pass as we just check for the initial inverse offer 
being present. 

Also, unrelated, we need to clean up this test to not expect multiple offers 
i.e. remove {{numberOfOffers}} constant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4735) CommandInfo.URI should allow specifying target filename

2016-03-01 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4735:
--
Affects Version/s: (was: 0.27.0)

> CommandInfo.URI should allow specifying target filename
> ---
>
> Key: MESOS-4735
> URL: https://issues.apache.org/jira/browse/MESOS-4735
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Reporter: Erik Weathers
>Assignee: Guangya Liu
>Priority: Minor
>
> The {{CommandInfo.URI}} message should allow explicitly choosing the 
> downloaded file's name, to better mimic functionality present in tools like 
> {{wget}} and {{curl}}.
> This relates to issues when the {{CommandInfo.URI}} is pointing to a URL that 
> has query parameters at the end of the path, resulting in the downloaded 
> filename having those elements.  This also prevents extracting of such files, 
> since the extraction logic is simply looking at the file's suffix. See 
> MESOS-3367, MESOS-1686, and MESOS-1509 for more info.  If this issue was 
> fixed, then I could workaround the other issues not being fixed by modifying 
> my framework's scheduler to set the target filename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4700) Allow agent to configure net_cls handle minor range.

2016-03-01 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4700:
--
Shepherd: Jie Yu

> Allow agent to configure net_cls handle minor range.
> 
>
> Key: MESOS-4700
> URL: https://issues.apache.org/jira/browse/MESOS-4700
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Avinash Sridharan
>  Labels: mesosphere
> Fix For: 0.28.0
>
>
> Bug exists in some user libraries that prevents some certain minor net_cls 
> handle being used. It'll be great if we can configure the minor range through 
> agent flags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4830) Bind docker runtime isolator with docker image provider.

2016-03-01 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-4830:

Summary: Bind docker runtime isolator with docker image provider.  (was: 
Bind docker runtime isolator with docker image provider)

> Bind docker runtime isolator with docker image provider.
> 
>
> Key: MESOS-4830
> URL: https://issues.apache.org/jira/browse/MESOS-4830
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer, mesosphere
> Fix For: 0.28.0
>
>
> If image provider is specified as `docker` but docker/runtime is not set, it 
> would be not meaningful, because of no executables. A check should be added 
> to make sure docker runtime isolator is on if using docker as image provider.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4830) Bind docker runtime isolator with docker image provider

2016-03-01 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-4830:
---

 Summary: Bind docker runtime isolator with docker image provider
 Key: MESOS-4830
 URL: https://issues.apache.org/jira/browse/MESOS-4830
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Gilbert Song
Assignee: Gilbert Song
 Fix For: 0.28.0


If image provider is specified as `docker` but docker/runtime is not set, it 
would be not meaningful, because of no executables. A check should be added to 
make sure docker runtime isolator is on if using docker as image provider.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4816) Expose TaskInfo to Isolators

2016-03-01 Thread Connor Doyle (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174232#comment-15174232
 ] 

Connor Doyle commented on MESOS-4816:
-

Thanks for the update James, my experience with this is also pre-0.28.  As I 
understand it, {{prepare()}} gets invoked just once per container (for an 
executor's first task), so it might not be sufficient given a framework that 
launches multiple tasks per executor.  However, if {{ContainerConfig}} covers 
most uses then maybe this issue can be dropped?

> Expose TaskInfo to Isolators
> 
>
> Key: MESOS-4816
> URL: https://issues.apache.org/jira/browse/MESOS-4816
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules, slave
>Reporter: Connor Doyle
>
> Authors of custom isolator modules frequently require access to the TaskInfo 
> in order to read custom metadata in task labels.
> Currently, it's possible to link containers to tasks within a module by 
> implementing both an isolator and the {{slaveRunTaskLabelDecorator}} hook, 
> and maintaining a shared map of containers to tasks.  This way works, but 
> adds unnecessary complexity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4829) Remove `grace_period_seconds` field from Shutdown event v1 protobuf.

2016-03-01 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4829:
--
Fix Version/s: 0.28.0

> Remove `grace_period_seconds` field from Shutdown event v1 protobuf.
> 
>
> Key: MESOS-4829
> URL: https://issues.apache.org/jira/browse/MESOS-4829
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere
> Fix For: 0.28.0
>
>
> There are two ways in which a shutdown of executor can be triggered:
> 1. If it receives an explicit `Shutdown` message from the agent.
> 2. If the recovery timeout period has elapsed, and the executor still hasn’t 
> been able to (re-)connect with the agent.
> Currently, the executor library relies on the field `grace_period_seconds` 
> having a default value of 5 seconds to handle the second scenario. 
> https://github.com/apache/mesos/blob/master/src/executor/executor.cpp#L608
> The driver used to trigger the grace period via a constant defined in 
> src/slave/constants.cpp. 
> https://github.com/apache/mesos/blob/master/src/exec/exec.cpp#L92
> The agent may want to force a shorter shutdown grace period (e.g. 
> oversubscription eviction may have shorter deadline) in the future. For now, 
> we can just read the value via an environment variable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4829) Remove `grace_period_seconds` field from Shutdown event v1 protobuf.

2016-03-01 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-4829:
-

Assignee: Anand Mazumdar

> Remove `grace_period_seconds` field from Shutdown event v1 protobuf.
> 
>
> Key: MESOS-4829
> URL: https://issues.apache.org/jira/browse/MESOS-4829
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
> Fix For: 0.28.0
>
>
> There are two ways in which a shutdown of executor can be triggered:
> 1. If it receives an explicit `Shutdown` message from the agent.
> 2. If the recovery timeout period has elapsed, and the executor still hasn’t 
> been able to (re-)connect with the agent.
> Currently, the executor library relies on the field `grace_period_seconds` 
> having a default value of 5 seconds to handle the second scenario. 
> https://github.com/apache/mesos/blob/master/src/executor/executor.cpp#L608
> The driver used to trigger the grace period via a constant defined in 
> src/slave/constants.cpp. 
> https://github.com/apache/mesos/blob/master/src/exec/exec.cpp#L92
> The agent may want to force a shorter shutdown grace period (e.g. 
> oversubscription eviction may have shorter deadline) in the future. For now, 
> we can just read the value via an environment variable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4829) Remove `grace_period_seconds` field from Shutdown event v1 protobuf.

2016-03-01 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4829:
--
Summary: Remove `grace_period_seconds` field from Shutdown event v1 
protobuf.  (was: Remove `grace_period_seconds` field from Shutdown event.)

> Remove `grace_period_seconds` field from Shutdown event v1 protobuf.
> 
>
> Key: MESOS-4829
> URL: https://issues.apache.org/jira/browse/MESOS-4829
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> There are two ways in which a shutdown of executor can be triggered:
> 1. If it receives an explicit `Shutdown` message from the agent.
> 2. If the recovery timeout period has elapsed, and the executor still hasn’t 
> been able to (re-)connect with the agent.
> Currently, the executor library relies on the field `grace_period_seconds` 
> having a default value of 5 seconds to handle the second scenario. 
> https://github.com/apache/mesos/blob/master/src/executor/executor.cpp#L608
> The driver used to trigger the grace period via a constant defined in 
> src/slave/constants.cpp. 
> https://github.com/apache/mesos/blob/master/src/exec/exec.cpp#L92
> The agent may want to force a shorter shutdown grace period (e.g. 
> oversubscription eviction may have shorter deadline) in the future. For now, 
> we can just read the value via an environment variable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4829) Remove `grace_period_seconds` field from Shutdown event.

2016-03-01 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4829:
--
Description: 
There are two ways in which a shutdown of executor can be triggered:
1. If it receives an explicit `Shutdown` message from the agent.
2. If the recovery timeout period has elapsed, and the executor still hasn’t 
been able to (re-)connect with the agent.

Currently, the executor library relies on the field `grace_period_seconds` 
having a default value of 5 seconds to handle the second scenario. 
https://github.com/apache/mesos/blob/master/src/executor/executor.cpp#L608

The driver used to trigger the grace period via a constant defined in 
src/slave/constants.cpp. 
https://github.com/apache/mesos/blob/master/src/exec/exec.cpp#L92

The agent may want to force a shorter shutdown grace period (e.g. 
oversubscription eviction may have shorter deadline) in the future. For now, we 
can just read the value via an environment variable.

  was:
There are two ways in which a shutdown of executor can be triggered:
1. If it receives an explicit `Shutdown` message from the agent.
2. If the recovery timeout period has elapsed, and the executor still hasn’t 
been able to (re-)connect with the agent.

Currently, the executor library relies on the field `grace_period_seconds` 
having a default value of 5 seconds to handle the second scenario. 
https://github.com/apache/mesos/blob/master/src/executor/executor.cpp#L608

The driver used to trigger the grace period via a constant defined in 
src/slave/constants.cpp. 
https://github.com/apache/mesos/blob/master/src/exec/exec.cpp#L92

The agent may want to force a shorter shutdown grace period (e.g. 
oversubscription eviction may have shorter deadline).


> Remove `grace_period_seconds` field from Shutdown event.
> 
>
> Key: MESOS-4829
> URL: https://issues.apache.org/jira/browse/MESOS-4829
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere
>
> There are two ways in which a shutdown of executor can be triggered:
> 1. If it receives an explicit `Shutdown` message from the agent.
> 2. If the recovery timeout period has elapsed, and the executor still hasn’t 
> been able to (re-)connect with the agent.
> Currently, the executor library relies on the field `grace_period_seconds` 
> having a default value of 5 seconds to handle the second scenario. 
> https://github.com/apache/mesos/blob/master/src/executor/executor.cpp#L608
> The driver used to trigger the grace period via a constant defined in 
> src/slave/constants.cpp. 
> https://github.com/apache/mesos/blob/master/src/exec/exec.cpp#L92
> The agent may want to force a shorter shutdown grace period (e.g. 
> oversubscription eviction may have shorter deadline) in the future. For now, 
> we can just read the value via an environment variable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4712) Remove 'force' field from the Subscribe Call in v1 Scheduler API

2016-03-01 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4712:
--
Fix Version/s: 0.28.0

> Remove 'force' field from the Subscribe Call in v1 Scheduler API
> 
>
> Key: MESOS-4712
> URL: https://issues.apache.org/jira/browse/MESOS-4712
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Vinod Kone
> Fix For: 0.28.0
>
>
> We/I introduced the `force` field in SUBSCRIBE call to deal with scheduler 
> partition cases. Having thought a bit more and discussing with few other 
> folks ([~anandmazumdar], [~greggomann]), I think we can get away from not 
> having that field in the v1 API. The obvious advantage of removing the field 
> is that framework devs don't have to think about how/when to set the field 
> (the current semantics are a bit confusing).
> The new workflow when a master receives a SUBSCRIBE call is that master 
> always accepts this call and closes any existing connection (after sending 
> ERROR event) from the same scheduler (identified by framework id).  
> The expectation from schedulers is that they must close the old subscribe 
> connection before resending a new SUBSCRIBE call.
> Lets look at some tricky scenarios and see how this works and why it is safe.
> 1) Connection disconnection @ the scheduler but not @ the master
>
> Scheduler sees the disconnection and sends a new SUBSCRIBE call. Master sends 
> ERROR on the old connection (won't be received by the scheduler because the 
> connection is already closed) and closes it.
> 2) Connection disconnection @ master but not @ scheduler
> Scheduler realizes this from lack of HEARTBEAT events. It then closes its 
> existing connection and sends a new SUBSCRIBE call. Master accepts the new 
> SUBSCRIBE call. There is no old connection to close on the master as it is 
> already closed.
> 3) Scheduler failover but no disconnection @ master
> Newly elected scheduler sends a SUBSCRIBE call. Master sends ERROR event and 
> closes the old connection (won't be received because the old scheduler failed 
> over).
> 4) If Scheduler A got partitioned (but is alive and connected with master) 
> and Scheduler B got elected as new leader.
> When Scheduler B sends SUBSCRIBE, master sends ERROR and closes the 
> connection from Scheduler A. Master accepts Scheduler B's connection. 
> Typically Scheduler A aborts after receiving ERROR and gets restarted. After 
> restart it won't become the leader because Scheduler B is already elected.
> 5) Scheduler sends SUBSCRIBE, times out, closes the SUBSCRIBE connection (A) 
> and sends a new SUBSCRIBE (B). Master receives SUBSCRIBE (B) and then 
> receives SUBSCRIBE (A) but doesn't see A's disconnection yet.
> Master first accepts SUBSCRIBE (B). After it receives SUBSCRIBE (A), it sends 
> ERROR to SUBSCRIBE (B) and closes that connection. When it accepts SUBSCRIBE 
> (A) and tries to send SUBSCRIBED event the connection closure is detected. 
> Scheduler retries the SUBSCRIBE connection after a backoff. I think this is a 
> rare enough race for it to happen continuously in a loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4829) Remove `grace_period_seconds` field from Shutdown event.

2016-03-01 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-4829:
-

 Summary: Remove `grace_period_seconds` field from Shutdown event.
 Key: MESOS-4829
 URL: https://issues.apache.org/jira/browse/MESOS-4829
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar


There are two ways in which a shutdown of executor can be triggered:
1. If it receives an explicit `Shutdown` message from the agent.
2. If the recovery timeout period has elapsed, and the executor still hasn’t 
been able to (re-)connect with the agent.

Currently, the executor library relies on the field `grace_period_seconds` 
having a default value of 5 seconds to handle the second scenario. 
https://github.com/apache/mesos/blob/master/src/executor/executor.cpp#L608

The driver used to trigger the grace period via a constant defined in 
src/slave/constants.cpp. 
https://github.com/apache/mesos/blob/master/src/exec/exec.cpp#L92

The agent may want to force a shorter shutdown grace period (e.g. 
oversubscription eviction may have shorter deadline).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3525) Figure out how to enforce 64-bit builds on Windows.

2016-03-01 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174156#comment-15174156
 ] 

Joris Van Remoortere commented on MESOS-3525:
-

https://reviews.apache.org/r/43692/
https://reviews.apache.org/r/43693/
https://reviews.apache.org/r/43694/
https://reviews.apache.org/r/43695/
https://reviews.apache.org/r/43689/

> Figure out how to enforce 64-bit builds on Windows.
> ---
>
> Key: MESOS-3525
> URL: https://issues.apache.org/jira/browse/MESOS-3525
> Project: Mesos
>  Issue Type: Task
>  Components: build
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: build, cmake, mesosphere
> Fix For: 0.28.0
>
>
> We need to make sure people don't try to compile Mesos on 32-bit 
> architectures. We don't want a Windows repeat of something like this:
> https://issues.apache.org/jira/browse/MESOS-267



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6

2016-03-01 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4053:
-
Environment: CentOS 6.6  (was: CentOS 6.6, Ubuntu 14.04)

> MemoryPressureMesosTest tests fail on CentOS 6.6
> 
>
> Key: MESOS-4053
> URL: https://issues.apache.org/jira/browse/MESOS-4053
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>Assignee: Benjamin Hindman
>  Labels: mesosphere, test-failure
>
> {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and 
> {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It 
> seems that mounted cgroups are not properly cleaned up after previous tests, 
> so multiple hierarchies are detected and thus an error is produced:
> {code}
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6

2016-03-01 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174054#comment-15174054
 ] 

Greg Mann edited comment on MESOS-4053 at 3/1/16 6:30 PM:
--

I also produced this error with 0.25.1-rc1 on Ubuntu 14.04 using gcc, with 
libevent and SSL enabled. Tests were run as root.

However, rebooting and running {{sudo make check}} with the current master 
yields no test failures at all, so this doesn't seem to currently be an issue 
on Ubuntu 14.04.


was (Author: greggomann):
I also produced this error with 0.25.1-rc1 on Ubuntu 14.04 using gcc, with 
libevent and SSL enabled. Tests were run as root.

> MemoryPressureMesosTest tests fail on CentOS 6.6
> 
>
> Key: MESOS-4053
> URL: https://issues.apache.org/jira/browse/MESOS-4053
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6, Ubuntu 14.04
>Reporter: Greg Mann
>Assignee: Benjamin Hindman
>  Labels: mesosphere, test-failure
>
> {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and 
> {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It 
> seems that mounted cgroups are not properly cleaned up after previous tests, 
> so multiple hierarchies are detected and thus an error is produced:
> {code}
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4816) Expose TaskInfo to Isolators

2016-03-01 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174136#comment-15174136
 ] 

James Peach commented on MESOS-4816:


The isolator that I have that consumes {{TaskInfo}} labels was written for 
Mesos 0.27. Since 0.28, {{prepare()}} gets a {{ContainerConfig}} which looks 
like it should have the {{TaskInfo}}.

> Expose TaskInfo to Isolators
> 
>
> Key: MESOS-4816
> URL: https://issues.apache.org/jira/browse/MESOS-4816
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules, slave
>Reporter: Connor Doyle
>
> Authors of custom isolator modules frequently require access to the TaskInfo 
> in order to read custom metadata in task labels.
> Currently, it's possible to link containers to tasks within a module by 
> implementing both an isolator and the {{slaveRunTaskLabelDecorator}} hook, 
> and maintaining a shared map of containers to tasks.  This way works, but 
> adds unnecessary complexity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4828) XFS disk quota isolator

2016-03-01 Thread James Peach (JIRA)
James Peach created MESOS-4828:
--

 Summary: XFS disk quota isolator
 Key: MESOS-4828
 URL: https://issues.apache.org/jira/browse/MESOS-4828
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Reporter: James Peach
Assignee: James Peach


Implement a disk resource isolator using XFS project quotas. Compared to the 
{{posix/disk}} isolator, this doesn't need to scan the filesystem periodically, 
and applications receive a {{ENOSPC}} error instead of being summarily killed.

This initial implementation only isolates sandbox directory resources, since 
isolation doesn't have any visibility into the the lifecycle of volumes, which 
is needed to assign and track project IDs.

The build dependencies for this are XFS header (from xfsprogs-devel) and 
libblkid. We need libblkid or the equivalent to map filesystem paths to block 
devices in order to apply quota.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4816) Expose TaskInfo to Isolators

2016-03-01 Thread Connor Doyle (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174099#comment-15174099
 ] 

Connor Doyle commented on MESOS-4816:
-

Hi [~gyliu],

I agree the optional task info argument in the comment is awkward, but a list 
sounds pretty good.  Existing isolators could continue to look only at the 
aggregated resources.

This question came up during the isolation WG meeting last week.  I and others 
have used this workaround while prototyping isolators for networking, but in 
general people tend to pass information to isolators via task labels before 
concepts become first-class in ContainerInfo or elsewhere.  [~jamespeach] and 
[~idownes] may be able to fill in more details.

> Expose TaskInfo to Isolators
> 
>
> Key: MESOS-4816
> URL: https://issues.apache.org/jira/browse/MESOS-4816
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules, slave
>Reporter: Connor Doyle
>
> Authors of custom isolator modules frequently require access to the TaskInfo 
> in order to read custom metadata in task labels.
> Currently, it's possible to link containers to tasks within a module by 
> implementing both an isolator and the {{slaveRunTaskLabelDecorator}} hook, 
> and maintaining a shared map of containers to tasks.  This way works, but 
> adds unnecessary complexity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1796) Support multiple working paths

2016-03-01 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174095#comment-15174095
 ] 

James Peach commented on MESOS-1796:


This sounds like a duplicate or a subset of MESOS-1650.

> Support multiple working paths
> --
>
> Key: MESOS-1796
> URL: https://issues.apache.org/jira/browse/MESOS-1796
> Project: Mesos
>  Issue Type: Wish
>  Components: slave
>Reporter: Charles Allen
>Priority: Minor
>
> As a framework developer, I would like the ability to have multiple working 
> paths as part of a slave reporting its resources.
> Currently, if a slave (like an ec2 instance) has multiple disks, the disks 
> must be combined in a MD array or similar in order to be fully utilized in 
> Mesos. This ask is to allow multiple disks to be mounted on multiple paths, 
> and have the slave be able to support and report availability on these 
> various working paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4718) Add allocator metric for number of completed allocation runs

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4718:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Add allocator metric for number of completed allocation runs
> 
>
> Key: MESOS-4718
> URL: https://issues.apache.org/jira/browse/MESOS-4718
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4723) Add allocator metric for currently satisfied quotas

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4723:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Add allocator metric for currently satisfied quotas
> ---
>
> Key: MESOS-4723
> URL: https://issues.apache.org/jira/browse/MESOS-4723
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4576) Introduce a stout helper for "which"

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4576:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Introduce a stout helper for "which"
> 
>
> Key: MESOS-4576
> URL: https://issues.apache.org/jira/browse/MESOS-4576
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Reporter: Joseph Wu
>Assignee: Disha Singh
>  Labels: mesosphere
>
> We may want to add a helper to {{stout/os.hpp}} that will natively emulate 
> the functionality of the Linux utility {{which}}.  i.e.
> {code}
> Option which(const string& command)
> {
>   Option path = os::getenv("PATH");
>   // Loop through path and return the first one which os::exists(...).
>   return None();
> }
> {code}
> This helper may be useful:
> * for test filters in {{src/tests/environment.cpp}}
> * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}}
> * the {{sha512}} utility in {{src/common/command_utils.cpp}}
> * as runtime checks in the {{LogrotateContainerLogger}}
> * etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4720) Add allocator metric for current allocation breakdown

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4720:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Add allocator metric for current allocation breakdown
> -
>
> Key: MESOS-4720
> URL: https://issues.apache.org/jira/browse/MESOS-4720
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> We likely want to expose allocated/available/total.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1571) Signal escalation timeout is not configurable.

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-1571:
-
Sprint: Mesosphere Q4 Sprint 2 - 11/14, Mesosphere Q4 Sprint 3 - 12/7, 
Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Q4 Sprint 2 - 
11/14, Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Sprint 29)

> Signal escalation timeout is not configurable.
> --
>
> Key: MESOS-1571
> URL: https://issues.apache.org/jira/browse/MESOS-1571
> Project: Mesos
>  Issue Type: Bug
>Reporter: Niklas Quarfot Nielsen
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Even though the executor shutdown grace period is set to a larger interval, 
> the signal escalation timeout will still be 3 seconds. It should either be 
> configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4683) Document docker runtime isolator.

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4683:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Document docker runtime isolator.
> -
>
> Key: MESOS-4683
> URL: https://issues.apache.org/jira/browse/MESOS-4683
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer, documentation
>
> Should include the following information:
> *What features are currently supported in docker runtime isolator.
> *How to use the docker runtime isolator (user manual).
> *Compare the different semantics v.s. docker containerizer, and explain why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4233) Logging is too verbose for sysadmins / syslog

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4233:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28, 
Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 26, 
Mesosphere Sprint 27, Mesosphere Sprint 28, Mesosphere Sprint 29)

> Logging is too verbose for sysadmins / syslog
> -
>
> Key: MESOS-4233
> URL: https://issues.apache.org/jira/browse/MESOS-4233
> Project: Mesos
>  Issue Type: Epic
>Reporter: Cody Maloney
>Assignee: Kapil Arya
>  Labels: mesosphere
> Attachments: giant_port_range_logging
>
>
> Currently mesos logs a lot. When launching a thousand tasks in the space of 
> 10 seconds it will print tens of thousands of log lines, overwhelming syslog 
> (there is a max rate at which a process can send stuff over a unix socket) 
> and not giving useful information to a sysadmin who cares about just the 
> high-level activity and when something goes wrong.
> Note mesos also blocks writing to its log locations, so when writing a lot of 
> log messages, it can fill up the write buffer in the kernel, and be suspended 
> until the syslog agent catches up reading from the socket (GLOG does a 
> blocking fwrite to stderr). GLOG also has a big mutex around logging so only 
> one thing logs at a time.
> While for "internal debugging" it is useful to see things like "message went 
> from internal compoent x to internal component y", from a sysadmin 
> perspective I only care about the high level actions taken (launched task for 
> framework x), sent offer to framework y, got task failed from host z. Note 
> those are what I'd expect at the "INFO" level. At the "WARNING" level I'd 
> expect very little to be logged / almost nothing in normal operation. Just 
> things like "WARN: Repliacted log write took longer than expected". WARN 
> would also get things like backtraces on crashes and abnormal exits / abort.
> When trying to launch 3k+ tasks inside a second, mesos logging currently 
> overwhelms syslog with 100k+ messages, many of which are thousands of bytes. 
> Sysadmins expect to be able to use syslog to monitor basic events in their 
> system. This is too much.
> We can keep logging the messages to files, but the logging to stderr needs to 
> be reduced significantly (stderr gets picked up and forwarded to syslog / 
> central aggregation).
> What I would like is if I can set the stderr logging level to be different / 
> independent from the file logging level (Syslog giving the "sysadmin" 
> aggregated overview, files useful for debugging in depth what happened in a 
> cluster). A lot of what mesos currently logs at info is really debugging info 
> / should show up as debug log level.
> Some samples of mesos logging a lot more than a sysadmin would want / expect 
> are attached, and some are below:
>  - Every task gets printed multiple times for a basic launch:
> {noformat}
> Dec 15 22:58:30 ip-10-0-7-60.us-west-2.compute.internal mesos-master[1311]: 
> I1215 22:58:29.382644  1315 master.cpp:3248] Launching task 
> envy.5b19a713-a37f-11e5-8b3e-0251692d6109 of framework 
> 5178f46d-71d6-422f-922c-5bbe82dff9cc- (marathon)
> Dec 15 22:58:30 ip-10-0-7-60.us-west-2.compute.internal mesos-master[1311]: 
> I1215 22:58:29.382925  1315 master.hpp:176] Adding task 
> envy.5b1958f2-a37f-11e5-8b3e-0251692d6109 with resources cpus(​*):0.0001; 
> mem(*​):16; ports(*):[14047-14047]
> {noformat}
>  - Every task status update prints many log lines, successful ones are part 
> of normal operation and maybe should be logged at info / debug levels, but 
> not to a sysadmin (Just show when things fail, and maybe aggregate counters 
> to tell of the volume of working)
>  - No log messagse should be really big / more than 1k characters (Would 
> prevent the giant port list attached, make that easily discoverable / bug 
> filable / fixable) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4721) Add allocator metric for allocation duration

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4721:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Add allocator metric for allocation duration
> 
>
> Key: MESOS-4721
> URL: https://issues.apache.org/jira/browse/MESOS-4721
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4684) Create base docker image for test suite.

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4684:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Create base docker image for test suite.
> 
>
> Key: MESOS-4684
> URL: https://issues.apache.org/jira/browse/MESOS-4684
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer
>
> This should be widely used for unified containerizer testing. Should 
> basically include:
> *at least one layer.
> *repositories.
> For each layer:
> *root file system as a layer tar ball.
> *docker image json (manifest).
> *docker version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4719) Add allocator metric for number of offers each framework received

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4719:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Add allocator metric for number of offers each framework received
> -
>
> Key: MESOS-4719
> URL: https://issues.apache.org/jira/browse/MESOS-4719
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4214) Introduce HTTP endpoint /weights for updating weight

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4214:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28, 
Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 26, 
Mesosphere Sprint 27, Mesosphere Sprint 28, Mesosphere Sprint 29)

> Introduce HTTP endpoint /weights for updating weight
> 
>
> Key: MESOS-4214
> URL: https://issues.apache.org/jira/browse/MESOS-4214
> Project: Mesos
>  Issue Type: Task
>Reporter: Yongqiao Wang
>Assignee: Yongqiao Wang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4381) Improve upgrade compatibility documentation.

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4381:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Improve upgrade compatibility documentation.
> 
>
> Key: MESOS-4381
> URL: https://issues.apache.org/jira/browse/MESOS-4381
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: documentation, mesosphere
>
> Investigate and document upgrade compatibility for 0.27 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4748) Add Appc image fetcher tests.

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4748:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Add Appc image fetcher tests.
> -
>
> Key: MESOS-4748
> URL: https://issues.apache.org/jira/browse/MESOS-4748
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>  Labels: mesosphere, unified-containerizer-mvp
>
> Mesos now has support for fetching Appc images. Add tests that verifies the 
> new component.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4691) Add a HierarchicalAllocator benchmark with reservation labels.

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4691:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Add a HierarchicalAllocator benchmark with reservation labels.
> --
>
> Key: MESOS-4691
> URL: https://issues.apache.org/jira/browse/MESOS-4691
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Park
>Assignee: Neil Conway
>  Labels: mesosphere
>
> With {{Labels}} being part of the {{ReservationInfo}}, we should ensure that 
> we don't observe a significant performance degradation in the allocator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3945) Add operator documentation for /weight endpoint

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3945:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Add operator documentation for /weight endpoint
> ---
>
> Key: MESOS-3945
> URL: https://issues.apache.org/jira/browse/MESOS-3945
> Project: Mesos
>  Issue Type: Task
>Reporter: James Wang
>Assignee: Yongqiao Wang
>
> This JIRA ticket will update the related doc to apply to dynamic weights, and 
> add an new operator guide for dynamic weights which describes basic usage of 
> the /weights endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4633) Tests will dereference stack allocated agent objects upon assertion/expectation failure.

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4633:
-
Sprint: Mesosphere Sprint 28, Mesosphere Sprint 29, Mesosphere Sprint 30  
(was: Mesosphere Sprint 28, Mesosphere Sprint 29)

> Tests will dereference stack allocated agent objects upon 
> assertion/expectation failure.
> 
>
> Key: MESOS-4633
> URL: https://issues.apache.org/jira/browse/MESOS-4633
> Project: Mesos
>  Issue Type: Bug
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: flaky, mesosphere, tech-debt, test
>
> Tests that use the {{StartSlave}} test helper are generally fragile when the 
> test fails an assert/expect in the middle of the test.  This is because the 
> {{StartSlave}} helper takes raw pointer arguments, which may be 
> stack-allocated.
> In case of an assert failure, the test immediately exits (destroying stack 
> allocated objects) and proceeds onto test cleanup.  The test cleanup may 
> dereference some of these destroyed objects, leading to a test crash like:
> {code}
> [18:27:36][Step 8/8] F0204 18:27:35.981302 23085 logging.cpp:64] RAW: Pure 
> virtual method called
> [18:27:36][Step 8/8] @ 0x7f7077055e1c  google::LogMessage::Fail()
> [18:27:36][Step 8/8] @ 0x7f707705ba6f  google::RawLog__()
> [18:27:36][Step 8/8] @ 0x7f70760f76c9  __cxa_pure_virtual
> [18:27:36][Step 8/8] @   0xa9423c  
> mesos::internal::tests::Cluster::Slaves::shutdown()
> [18:27:36][Step 8/8] @  0x1074e45  
> mesos::internal::tests::MesosTest::ShutdownSlaves()
> [18:27:36][Step 8/8] @  0x1074de4  
> mesos::internal::tests::MesosTest::Shutdown()
> [18:27:36][Step 8/8] @  0x1070ec7  
> mesos::internal::tests::MesosTest::TearDown()
> {code}
> The {{StartSlave}} helper should take {{shared_ptr}} arguments instead.
> This also means that we can remove the {{Shutdown}} helper from most of these 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4544) Propose design doc for agent partitioning behavior

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4544:
-
Sprint: Mesosphere Sprint 28, Mesosphere Sprint 29, Mesosphere Sprint 30  
(was: Mesosphere Sprint 28, Mesosphere Sprint 29)

> Propose design doc for agent partitioning behavior
> --
>
> Key: MESOS-4544
> URL: https://issues.apache.org/jira/browse/MESOS-4544
> Project: Mesos
>  Issue Type: Task
>  Components: general
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3854) Finalize design for generalized Authorizer interface

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3854:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28, Mesosphere Sprint 29, 
Mesosphere Sprint 30  (was: Mesosphere Sprint 27, Mesosphere Sprint 28, 
Mesosphere Sprint 29)

> Finalize design for generalized Authorizer interface
> 
>
> Key: MESOS-3854
> URL: https://issues.apache.org/jira/browse/MESOS-3854
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Bernd Mathiske
>Assignee: Alexander Rojas
>  Labels: authorization, mesosphere
>
> Finalize the structure the interface and achieve consensus on the design doc 
> proposed in MESOS-2949.
> https://docs.google.com/document/d/1-XARWJFUq0r_TgRHz_472NvLZNjbqE4G8c2JL44OSMQ/edit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4634) Tests will dereference stack allocated master objects upon assertion/expectation failure.

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4634:
-
Sprint: Mesosphere Sprint 28, Mesosphere Sprint 29, Mesosphere Sprint 30  
(was: Mesosphere Sprint 28, Mesosphere Sprint 29)

> Tests will dereference stack allocated master objects upon 
> assertion/expectation failure.
> -
>
> Key: MESOS-4634
> URL: https://issues.apache.org/jira/browse/MESOS-4634
> Project: Mesos
>  Issue Type: Bug
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: flaky, mesosphere, tech-debt, test
>
> Tests that use the {{StartMaster}} test helper are generally fragile when the 
> test fails an assert/expect in the middle of the test.  This is because the 
> {{StartMaster}} helper takes raw pointer arguments, which may be 
> stack-allocated.
> In case of an assert failure, the test immediately exits (destroying stack 
> allocated objects) and proceeds onto test cleanup.  The test cleanup may 
> dereference some of these destroyed objects, leading to a test crash like:
> {code}
> [18:27:36][Step 8/8] F0204 18:27:35.981302 23085 logging.cpp:64] RAW: Pure 
> virtual method called
> [18:27:36][Step 8/8] @ 0x7f7077055e1c  google::LogMessage::Fail()
> [18:27:36][Step 8/8] @ 0x7f707705ba6f  google::RawLog__()
> [18:27:36][Step 8/8] @ 0x7f70760f76c9  __cxa_pure_virtual
> [18:27:36][Step 8/8] @   0xa9423c  
> mesos::internal::tests::Cluster::Slaves::shutdown()
> [18:27:36][Step 8/8] @  0x1074e45  
> mesos::internal::tests::MesosTest::ShutdownSlaves()
> [18:27:36][Step 8/8] @  0x1074de4  
> mesos::internal::tests::MesosTest::Shutdown()
> [18:27:36][Step 8/8] @  0x1070ec7  
> mesos::internal::tests::MesosTest::TearDown()
> {code}
> The {{StartMaster}} helper should take {{shared_ptr}} arguments instead.
> This also means that we can remove the {{Shutdown}} helper from most of these 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4610) MasterContender/MasterDetector should be loadable as modules

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4610:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> MasterContender/MasterDetector should be loadable as modules
> 
>
> Key: MESOS-4610
> URL: https://issues.apache.org/jira/browse/MESOS-4610
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Mark Cavage
>Assignee: Mark Cavage
>
> Currently mesos depends on Zookeeper for leader election and notification to 
> slaves, although there is a C++ hierarchy in the code to support alternatives 
> (e.g., unit tests use an in-memory implementation). From an operational 
> perspective, many organizations/users do not want to take a dependency on 
> Zookeeper, and use an alternative solution to implementing leader election. 
> Our organization in particular, very much wants this, and as a reference 
> there have been several requests from the community (see referenced tickets) 
> to replace with etcd/consul/etc.
> This ticket will serve as the work effort to modularize the 
> MasterContender/MasterDetector APIs such that integrators can build a 
> pluggable solution of their choice; this ticket will not fold in any 
> implementations such as etcd et al., but simply move this hierarchy to be 
> fully pluggable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4722) Add allocator metric for number of active offer filters

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4722:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Add allocator metric for number of active offer filters
> ---
>
> Key: MESOS-4722
> URL: https://issues.apache.org/jira/browse/MESOS-4722
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4673) Agent fails to shutdown after re-registering period timed-out.

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4673:
-
Sprint: Mesosphere Sprint 29, Mesosphere Sprint 30  (was: Mesosphere Sprint 
29)

> Agent fails to shutdown after re-registering period timed-out.
> --
>
> Key: MESOS-4673
> URL: https://issues.apache.org/jira/browse/MESOS-4673
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> Under certain conditions, when a mesos agent looses connection to the master 
> for an extended period of time (Say a switch fails), the master will 
> de-register the agent, and then when the agent comes back up, refuse to let 
> it register: {{Slave asked to shut down by master@10.102.25.1:5050 because 
> 'Slave attempted to re-register after removal'}}.
> The agent doesn't seem to be able to properly shutdown and remove running 
> tasks as it should do to register as a new agent. Hence this message will 
> persist until it's resolved by manual intervetion.
> This seems to be caused by Docker tasks that couldn't shutdown cleanly when 
> the agent is asked to shutdown running tasks to be able to register as a new 
> agent with the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2317) Remove deprecated checkpoint=false code

2016-03-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-2317:
-
Sprint: Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, 
Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 
10, Mesosphere Sprint 11, Mesosphere Sprint 26, Mesosphere Sprint 27, 
Mesosphere Sprint 28, Mesosphere Sprint 29, Mesosphere Sprint 30  (was: 
Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 
Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere 
Sprint 11, Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28, 
Mesosphere Sprint 29)

> Remove deprecated checkpoint=false code
> ---
>
> Key: MESOS-2317
> URL: https://issues.apache.org/jira/browse/MESOS-2317
> Project: Mesos
>  Issue Type: Epic
>Affects Versions: 0.22.0
>Reporter: Adam B
>Assignee: Joerg Schad
>  Labels: checkpoint, mesosphere
>
> Cody's plan from MESOS-444 was:
> 1) -Make it so the flag can't be changed at the command line-
> 2) -Remove the checkpoint variable entirely from slave/flags.hpp. This is a 
> fairly involved change since a number of unit tests depend on manually 
> setting the flag, as well as the default being non-checkpointing.-
> 3) -Remove logic around checkpointing in the slave, remove logic inside the 
> master.-
> 4) Drop the flag from the SlaveInfo struct (Will require a deprecation cycle).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6

2016-03-01 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4053:
-
Environment: CentOS 6.6, Ubuntu 14.04  (was: CentOS 6.6)

> MemoryPressureMesosTest tests fail on CentOS 6.6
> 
>
> Key: MESOS-4053
> URL: https://issues.apache.org/jira/browse/MESOS-4053
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6, Ubuntu 14.04
>Reporter: Greg Mann
>Assignee: Benjamin Hindman
>  Labels: mesosphere, test-failure
>
> {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and 
> {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It 
> seems that mounted cgroups are not properly cleaned up after previous tests, 
> so multiple hierarchies are detected and thus an error is produced:
> {code}
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6

2016-03-01 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174054#comment-15174054
 ] 

Greg Mann commented on MESOS-4053:
--

I also produced this error with 0.25.1-rc1 on Ubuntu 14.04 using gcc, with 
libevent and SSL enabled. Tests were run as root.

> MemoryPressureMesosTest tests fail on CentOS 6.6
> 
>
> Key: MESOS-4053
> URL: https://issues.apache.org/jira/browse/MESOS-4053
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6, Ubuntu 14.04
>Reporter: Greg Mann
>Assignee: Benjamin Hindman
>  Labels: mesosphere, test-failure
>
> {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and 
> {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It 
> seems that mounted cgroups are not properly cleaned up after previous tests, 
> so multiple hierarchies are detected and thus an error is produced:
> {code}
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4827) Destroy Docker container from Marathon kills Mesos slave

2016-03-01 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4827:
--
Labels:   (was: newbie)

> Destroy Docker container from Marathon kills Mesos slave
> 
>
> Key: MESOS-4827
> URL: https://issues.apache.org/jira/browse/MESOS-4827
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework, slave
>Affects Versions: 0.25.0
>Reporter: Zhenzhong Shi
>
> The details of this issue originally [posted on 
> StackOverflow|http://stackoverflow.com/questions/35713985/destroy-docker-container-from-marathon-kills-mesos-slave].
>  
> To be short, the problem is when we destroy/re-deploy a docker-containerized 
> task, the mesos-slave got killed from time to time. It happened on our 
> production environment and I cann't re-produce it.
> Please refer to the post on StackOverflow about the error message I got and 
> details of environment info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4811) Reusable/Cacheable Offer

2016-03-01 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma reassigned MESOS-4811:
---

Assignee: Klaus Ma

> Reusable/Cacheable Offer
> 
>
> Key: MESOS-4811
> URL: https://issues.apache.org/jira/browse/MESOS-4811
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>
> Currently, the resources are return back to allocator when task finished; and 
> those resources are not allocated to framework until next allocation cycle. 
> The performance is low for short running tasks (MESOS-3078). The proposed 
> solution is to let framework keep using the offer until allocator decide to 
> rescind it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4811) Reusable/Cacheable Offer

2016-03-01 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-4811:

Labels: tech-debt  (was: )

> Reusable/Cacheable Offer
> 
>
> Key: MESOS-4811
> URL: https://issues.apache.org/jira/browse/MESOS-4811
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Klaus Ma
>Assignee: Klaus Ma
>  Labels: tech-debt
>
> Currently, the resources are return back to allocator when task finished; and 
> those resources are not allocated to framework until next allocation cycle. 
> The performance is low for short running tasks (MESOS-3078). The proposed 
> solution is to let framework keep using the offer until allocator decide to 
> rescind it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4735) CommandInfo.URI should allow specifying target filename

2016-03-01 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173867#comment-15173867
 ] 

Klaus Ma commented on MESOS-4735:
-

[~gyliu], I think [~erikdw] is talking about type/extension of the downloaded 
file; for example, there's some url that did not include file name in its url, 
so it's hard for {{fetcher}} to check which file type it is; the proposal of 
this JIRA is to add file type into {{CommandInfo.URI}}, so {{fetcher}} can use 
the right tool to un-package the download file.

[~erikdw], please correct me if mis-understanding.

> CommandInfo.URI should allow specifying target filename
> ---
>
> Key: MESOS-4735
> URL: https://issues.apache.org/jira/browse/MESOS-4735
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 0.27.0
>Reporter: Erik Weathers
>Assignee: Guangya Liu
>Priority: Minor
>
> The {{CommandInfo.URI}} message should allow explicitly choosing the 
> downloaded file's name, to better mimic functionality present in tools like 
> {{wget}} and {{curl}}.
> This relates to issues when the {{CommandInfo.URI}} is pointing to a URL that 
> has query parameters at the end of the path, resulting in the downloaded 
> filename having those elements.  This also prevents extracting of such files, 
> since the extraction logic is simply looking at the file's suffix. See 
> MESOS-3367, MESOS-1686, and MESOS-1509 for more info.  If this issue was 
> fixed, then I could workaround the other issues not being fixed by modifying 
> my framework's scheduler to set the target filename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4735) CommandInfo.URI should allow specifying target filename

2016-03-01 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173854#comment-15173854
 ] 

Guangya Liu commented on MESOS-4735:


Hi [~erikdw] can you please show more detail for {{choosing the filename to 
save the downloaded file as}}?

The curl fetcher now supports {{"http", "https", "ftp", "ftps"}}, what other 
kind of files that the curl fetcher need to support?

> CommandInfo.URI should allow specifying target filename
> ---
>
> Key: MESOS-4735
> URL: https://issues.apache.org/jira/browse/MESOS-4735
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 0.27.0
>Reporter: Erik Weathers
>Assignee: Guangya Liu
>Priority: Minor
>
> The {{CommandInfo.URI}} message should allow explicitly choosing the 
> downloaded file's name, to better mimic functionality present in tools like 
> {{wget}} and {{curl}}.
> This relates to issues when the {{CommandInfo.URI}} is pointing to a URL that 
> has query parameters at the end of the path, resulting in the downloaded 
> filename having those elements.  This also prevents extracting of such files, 
> since the extraction logic is simply looking at the file's suffix. See 
> MESOS-3367, MESOS-1686, and MESOS-1509 for more info.  If this issue was 
> fixed, then I could workaround the other issues not being fixed by modifying 
> my framework's scheduler to set the target filename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4827) Destroy Docker container from Marathon kills Mesos slave

2016-03-01 Thread Zhenzhong Shi (JIRA)
Zhenzhong Shi created MESOS-4827:


 Summary: Destroy Docker container from Marathon kills Mesos slave
 Key: MESOS-4827
 URL: https://issues.apache.org/jira/browse/MESOS-4827
 Project: Mesos
  Issue Type: Bug
  Components: docker, framework, slave
Affects Versions: 0.25.0
Reporter: Zhenzhong Shi


The details of this issue originally [posted on 
StackOverflow|http://stackoverflow.com/questions/35713985/destroy-docker-container-from-marathon-kills-mesos-slave].
 

To be short, the problem is when we destroy/re-deploy a docker-containerized 
task, the mesos-slave got killed from time to time. It happened on our 
production environment and I cann't re-produce it.

Please refer to the post on StackOverflow about the error message I got and 
details of environment info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2858) FetcherCacheHttpTest.HttpMixed is flaky.

2016-03-01 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173700#comment-15173700
 ] 

Bernd Mathiske commented on MESOS-2858:
---

Thanks! Having looked through this log once, I have not found the culprit yet. 
According to the sandbox dumps, the 3 tasks run as intended, but somehow 
signaling the TASK_FINISHED status updates gets hung somewhere along the way to 
an AWAIT. Investigation to be continued...

> FetcherCacheHttpTest.HttpMixed is flaky.
> 
>
> Key: MESOS-2858
> URL: https://issues.apache.org/jira/browse/MESOS-2858
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Bernd Mathiske
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheHttpTest.HttpMixed
> Using temporary directory '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC'
> I0611 00:40:28.208909 26042 leveldb.cpp:176] Opened db in 3.831173ms
> I0611 00:40:28.209951 26042 leveldb.cpp:183] Compacted db in 997319ns
> I0611 00:40:28.210011 26042 leveldb.cpp:198] Created db iterator in 23917ns
> I0611 00:40:28.210032 26042 leveldb.cpp:204] Seeked to beginning of db in 
> 2112ns
> I0611 00:40:28.210043 26042 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 392ns
> I0611 00:40:28.210095 26042 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0611 00:40:28.210741 26067 recover.cpp:449] Starting replica recovery
> I0611 00:40:28.211144 26067 recover.cpp:475] Replica is in EMPTY status
> I0611 00:40:28.212210 26074 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0611 00:40:28.212728 26071 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0611 00:40:28.213260 26069 recover.cpp:566] Updating replica status to 
> STARTING
> I0611 00:40:28.214066 26073 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 590673ns
> I0611 00:40:28.214095 26073 replica.cpp:323] Persisted replica status to 
> STARTING
> I0611 00:40:28.214350 26073 recover.cpp:475] Replica is in STARTING status
> I0611 00:40:28.214774 26061 master.cpp:363] Master 
> 20150611-004028-1946161580-33349-26042 (658ddc752264) started on 
> 172.17.0.116:33349
> I0611 00:40:28.214800 26061 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/master" 
> --zk_session_timeout="10secs"
> I0611 00:40:28.215342 26061 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0611 00:40:28.215361 26061 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0611 00:40:28.215397 26061 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials'
> I0611 00:40:28.215589 26064 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0611 00:40:28.215770 26061 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0611 00:40:28.215934 26061 master.cpp:491] Authorization enabled
> I0611 00:40:28.215932 26062 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0611 00:40:28.216256 26070 whitelist_watcher.cpp:79] No whitelist given
> I0611 00:40:28.216310 26066 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0611 00:40:28.216352 26067 recover.cpp:566] Updating replica status to VOTING
> I0611 00:40:28.216909 26070 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 374189ns
> I0611 00:40:28.216931 26070 replica.cpp:323] Persisted replica status to 
> VOTING
> I0611 00:40:28.217052 26075 recover.cpp:580] Successfully joined the Paxos 
> group
> I0611 00:40:28.217355 26063 master.cpp:1476] The newly elected leader is 
> master@172.17.0.116:33349 with id 20150611-004028-1946161580-33349-26042
> I0611 00:40:28.217512 26063 master.cpp:1489] Elected as the leading master!
> I0611 00:40:28.217540 26063 master.cpp:1259] Recovering from registrar
> I0611 00:40:28.217753 26070 registrar.cpp:313] Recovering 

[jira] [Comment Edited] (MESOS-4735) CommandInfo.URI should allow specifying target filename

2016-03-01 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173519#comment-15173519
 ] 

Erik Weathers edited comment on MESOS-4735 at 3/1/16 9:57 AM:
--

[~gyliu], MESOS-3367 fixes one of the issues I appended above, but not the 
other.  This proposal is more general than either of those issues, providing 
some level of future-proofness against other unforeseen issues.

Ignoring those other issues, the Mesos fetcher is acting as an HTTP downloader, 
and all of the utilities I use on a day-to-day basis for that already support 
the functionality this ticket is requesting:  choosing the filename to save the 
downloaded file as.  Browsers let you do that, as do {{curl}} and {{wget}}.  So 
it's just something that should be added sooner or later to the Mesos fetcher, 
and the fact that this would allow for other various problems to be overcome by 
a framework author is just another benefit.


was (Author: erikdw):
[~gyliu] MESOS-3367 fixes one of the issues I appended above, but not the 
other.  This proposal is more general than either of those issues, providing 
some level of future-proofness against other unforeseen issues.

Ignoring those other issues, the Mesos fetcher is acting as an HTTP downloader, 
and all of the utilities I use on a day-to-day basis for that already support 
the functionality this ticket is requesting:  choosing the filename to save the 
downloaded file as.  Browsers let you do that, as do {{curl}} and {{wget}}.  So 
it's just something that should be added sooner or later to the Mesos fetcher, 
and the fact that this would allow for other various problems to be overcome by 
a framework author is just another benefit.

> CommandInfo.URI should allow specifying target filename
> ---
>
> Key: MESOS-4735
> URL: https://issues.apache.org/jira/browse/MESOS-4735
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 0.27.0
>Reporter: Erik Weathers
>Assignee: Guangya Liu
>Priority: Minor
>
> The {{CommandInfo.URI}} message should allow explicitly choosing the 
> downloaded file's name, to better mimic functionality present in tools like 
> {{wget}} and {{curl}}.
> This relates to issues when the {{CommandInfo.URI}} is pointing to a URL that 
> has query parameters at the end of the path, resulting in the downloaded 
> filename having those elements.  This also prevents extracting of such files, 
> since the extraction logic is simply looking at the file's suffix. See 
> MESOS-3367, MESOS-1686, and MESOS-1509 for more info.  If this issue was 
> fixed, then I could workaround the other issues not being fixed by modifying 
> my framework's scheduler to set the target filename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4735) CommandInfo.URI should allow specifying target filename

2016-03-01 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173519#comment-15173519
 ] 

Erik Weathers commented on MESOS-4735:
--

[~gyliu] MESOS-3367 fixes one of the issues I appended above, but not the 
other.  This proposal is more general than either of those issues, providing 
some level of future-proofness against other unforeseen issues.

Ignoring those other issues, the Mesos fetcher is acting as an HTTP downloader, 
and all of the utilities I use on a day-to-day basis for that already support 
the functionality this ticket is requesting:  choosing the filename to save the 
downloaded file as.  Browsers let you do that, as do {{curl}} and {{wget}}.  So 
it's just something that should be added sooner or later to the Mesos fetcher, 
and the fact that this would allow for other various problems to be overcome by 
a framework author is just another benefit.

> CommandInfo.URI should allow specifying target filename
> ---
>
> Key: MESOS-4735
> URL: https://issues.apache.org/jira/browse/MESOS-4735
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 0.27.0
>Reporter: Erik Weathers
>Assignee: Guangya Liu
>Priority: Minor
>
> The {{CommandInfo.URI}} message should allow explicitly choosing the 
> downloaded file's name, to better mimic functionality present in tools like 
> {{wget}} and {{curl}}.
> This relates to issues when the {{CommandInfo.URI}} is pointing to a URL that 
> has query parameters at the end of the path, resulting in the downloaded 
> filename having those elements.  This also prevents extracting of such files, 
> since the extraction logic is simply looking at the file's suffix. See 
> MESOS-3367, MESOS-1686, and MESOS-1509 for more info.  If this issue was 
> fixed, then I could workaround the other issues not being fixed by modifying 
> my framework's scheduler to set the target filename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4709) Enable compiler optimization by default

2016-03-01 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4709:
--
Shepherd: Till Toenshoff

> Enable compiler optimization by default
> ---
>
> Key: MESOS-4709
> URL: https://issues.apache.org/jira/browse/MESOS-4709
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: autoconf, configure, mesosphere
>
> At present, Mesos defaults to compiling with "-O0"; to enable compiler
> optimizations, the user needs to specify "--enable-optimize" when running 
> {{configure}}.
> We should change the default for the following reasons:
> (1) The autoconf default for CFLAGS/CXXFLAGS is "-O2 -g". Anecdotally,
> I think most software packages compile with a reasonable level of
> optimizations enabled by default.
> (2) I think we should make the default configure flags appropriate for
> end-users (rather than Mesos developers): developers will be familiar
> enough with Mesos to tune the configure flags according to their own
> preferences.
> (3) The performance consequences of not enabling compiler
> optimizations can be pretty severe: 5x in a benchmark I just ran, and
> we've seen between 2x and 30x (!) performance differences for some
> real-world workloads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4735) CommandInfo.URI should allow specifying target filename

2016-03-01 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173462#comment-15173462
 ] 

Guangya Liu commented on MESOS-4735:


[~erikdw] I think that MESOS-3367 is going to fix the issue that you append 
above, right? From the append of you in MESOS-3367, seems you filed this JIRA 
ticket want to enable the URI can specify some local files?

> CommandInfo.URI should allow specifying target filename
> ---
>
> Key: MESOS-4735
> URL: https://issues.apache.org/jira/browse/MESOS-4735
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 0.27.0
>Reporter: Erik Weathers
>Assignee: Guangya Liu
>Priority: Minor
>
> The {{CommandInfo.URI}} message should allow explicitly choosing the 
> downloaded file's name, to better mimic functionality present in tools like 
> {{wget}} and {{curl}}.
> This relates to issues when the {{CommandInfo.URI}} is pointing to a URL that 
> has query parameters at the end of the path, resulting in the downloaded 
> filename having those elements.  This also prevents extracting of such files, 
> since the extraction logic is simply looking at the file's suffix. See 
> MESOS-3367, MESOS-1686, and MESOS-1509 for more info.  If this issue was 
> fixed, then I could workaround the other issues not being fixed by modifying 
> my framework's scheduler to set the target filename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4826) Test helper function mesos::internal::tests::Metrics has name not following mesos style

2016-03-01 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4826:
---

 Summary: Test helper function mesos::internal::tests::Metrics has 
name not following mesos style 
 Key: MESOS-4826
 URL: https://issues.apache.org/jira/browse/MESOS-4826
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Benjamin Bannier
Priority: Trivial


The test helper function {{mesos::internal::tests::Metrics}} has a name not 
following mesos style. The expected name would have been {{metrics}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)