[jira] [Created] (MESOS-1797) Packaged Zookeeper does not compile on OSX Yosemite

2014-09-16 Thread Dario Rexin (JIRA)
Dario Rexin created MESOS-1797:
--

 Summary: Packaged Zookeeper does not compile on OSX Yosemite
 Key: MESOS-1797
 URL: https://issues.apache.org/jira/browse/MESOS-1797
 Project: Mesos
  Issue Type: Improvement
  Components: build
Affects Versions: 0.19.1, 0.20.0, 0.21.0
Reporter: Dario Rexin
Priority: Minor


I have been struggling with this for some time (due to my lack of knowledge 
about C compiler error messages) and finally found a way to make it compile. 
The problem is that Zookeeper defines a function `htonll` that is a builtin 
function in Yosemite. For me it worked to just remove this function, but as it 
needs to keep working on other systems as well, we would need some check for 
the OS version or if the function is already defined.

Here are the links to the source:

https://github.com/apache/zookeeper/blob/trunk/src/c/include/recordio.h#L73
https://github.com/apache/zookeeper/blob/trunk/src/c/src/recordio.c#L83-L97




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-1764) Build Fixes from 0.20 release

2014-09-16 Thread Timothy St. Clair (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy St. Clair resolved MESOS-1764.
--
Resolution: Fixed

 Build Fixes from 0.20 release
 -

 Key: MESOS-1764
 URL: https://issues.apache.org/jira/browse/MESOS-1764
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.20.0
Reporter: Timothy St. Clair
Assignee: Timothy St. Clair
 Fix For: 0.20.1


 This ticket is a catch all for minor issues caught during a rebase and 
 testing.
 + Add package configuration file to deployment
 + Updates deploy_dir from localstatedir to sysconfdir



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-1764) Build Fixes from 0.20 release

2014-09-16 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130604#comment-14130604
 ] 

Timothy St. Clair edited comment on MESOS-1764 at 9/16/14 2:32 PM:
---

Punting last update to https://issues.apache.org/jira/browse/MESOS-1675


was (Author: tstclair):
add initial -version-info for shared library
http://reviews.apache.org/r/25551/

 Build Fixes from 0.20 release
 -

 Key: MESOS-1764
 URL: https://issues.apache.org/jira/browse/MESOS-1764
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.20.0
Reporter: Timothy St. Clair
Assignee: Timothy St. Clair
 Fix For: 0.20.1


 This ticket is a catch all for minor issues caught during a rebase and 
 testing.
 + Add package configuration file to deployment
 + Updates deploy_dir from localstatedir to sysconfdir



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1675) Decouple version of the mesos library from the package release version

2014-09-16 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135526#comment-14135526
 ] 

Timothy St. Clair commented on MESOS-1675:
--

[~vinodkone] Did you want to elaborate on your thoughts here? 

 Decouple version of the mesos library from the package release version
 --

 Key: MESOS-1675
 URL: https://issues.apache.org/jira/browse/MESOS-1675
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone

 This discussion should be rolled into the larger discussion around how to 
 version Mesos (APIs, packages, libraries etc).
 Some notes from libtool docs.
 http://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html
 http://www.gnu.org/software/libtool/manual/html_node/Release-numbers.html#Release-numbers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1621) Docker run networking should be configurable and support bridge network

2014-09-16 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135531#comment-14135531
 ] 

Timothy St. Clair commented on MESOS-1621:
--

I'll open up a separate ticket to discuss the API + override conversation. 

 Docker run networking should be configurable and support bridge network
 ---

 Key: MESOS-1621
 URL: https://issues.apache.org/jira/browse/MESOS-1621
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Timothy Chen
Assignee: Timothy Chen
  Labels: Docker
 Fix For: 0.20.1


 Currently to easily support running executors in Docker image, we hardcode 
 --net=host into Docker run so slave and executor and reuse the same mechanism 
 to communicate, which is to pass the slave IP/PORT for the framework to 
 respond with it's own hostname and port information back to setup the tunnel.
 We want to see how to abstract this or even get rid of host networking 
 altogether if we have a good way to not rely on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1195) systemd.slice + cgroup enablement fails in multiple ways.

2014-09-16 Thread Timothy St. Clair (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy St. Clair updated MESOS-1195:
-
Target Version/s: 0.21.0

reviews.apache.org/r/25695/

 systemd.slice + cgroup enablement fails in multiple ways. 
 --

 Key: MESOS-1195
 URL: https://issues.apache.org/jira/browse/MESOS-1195
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 0.18.0
Reporter: Timothy St. Clair
Assignee: Timothy St. Clair

 When attempting to configure mesos to use systemd slices on a 'rawhide/f21' 
 machine, it fails creating the isolator: 
 I0407 12:39:28.035354 14916 containerizer.cpp:180] Using isolation: 
 cgroups/cpu,cgroups/mem
 Failed to create a containerizer: Could not create isolator cgroups/cpu: 
 Failed to create isolator: The cpu subsystem is co-mounted at 
 /sys/fs/cgroup/cpu with other subsytems
 -- details --
 /sys/fs/cgroup
 total 0
 drwxr-xr-x. 12 root root 280 Mar 18 08:47 .
 drwxr-xr-x.  6 root root   0 Mar 18 08:47 ..
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 blkio
 lrwxrwxrwx.  1 root root  11 Mar 18 08:47 cpu - cpu,cpuacct
 lrwxrwxrwx.  1 root root  11 Mar 18 08:47 cpuacct - cpu,cpuacct
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 cpu,cpuacct
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 cpuset
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 devices
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 freezer
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 hugetlb
 drwxr-xr-x.  3 root root   0 Apr  3 11:26 memory
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 net_cls
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 perf_event
 drwxr-xr-x.  4 root root   0 Mar 18 08:47 systemd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1675) Decouple version of the mesos library from the package release version

2014-09-16 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135698#comment-14135698
 ] 

Vinod Kone commented on MESOS-1675:
---

Is adding version info is backwards compatible, i.e., the new lib can be a drop 
in replacement for the old lib, then that should be fine.

{quote}
However the release wrangler will need to add a step to their punch-list prior 
to adoption.
{quote}

Not sure what this means?

 Decouple version of the mesos library from the package release version
 --

 Key: MESOS-1675
 URL: https://issues.apache.org/jira/browse/MESOS-1675
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone

 This discussion should be rolled into the larger discussion around how to 
 version Mesos (APIs, packages, libraries etc).
 Some notes from libtool docs.
 http://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html
 http://www.gnu.org/software/libtool/manual/html_node/Release-numbers.html#Release-numbers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1797) Packaged Zookeeper does not compile on OSX Yosemite

2014-09-16 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135714#comment-14135714
 ] 

Benjamin Mahler commented on MESOS-1797:


Is there a ZooKeeper ticket related to this?

 Packaged Zookeeper does not compile on OSX Yosemite
 ---

 Key: MESOS-1797
 URL: https://issues.apache.org/jira/browse/MESOS-1797
 Project: Mesos
  Issue Type: Improvement
  Components: build
Affects Versions: 0.20.0, 0.21.0, 0.19.1
Reporter: Dario Rexin
Priority: Minor

 I have been struggling with this for some time (due to my lack of knowledge 
 about C compiler error messages) and finally found a way to make it compile. 
 The problem is that Zookeeper defines a function `htonll` that is a builtin 
 function in Yosemite. For me it worked to just remove this function, but as 
 it needs to keep working on other systems as well, we would need some check 
 for the OS version or if the function is already defined.
 Here are the links to the source:
 https://github.com/apache/zookeeper/blob/trunk/src/c/include/recordio.h#L73
 https://github.com/apache/zookeeper/blob/trunk/src/c/src/recordio.c#L83-L97



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1797) Packaged Zookeeper does not compile on OSX Yosemite

2014-09-16 Thread Dario Rexin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135719#comment-14135719
 ] 

Dario Rexin commented on MESOS-1797:


I didn't find one.

 Packaged Zookeeper does not compile on OSX Yosemite
 ---

 Key: MESOS-1797
 URL: https://issues.apache.org/jira/browse/MESOS-1797
 Project: Mesos
  Issue Type: Improvement
  Components: build
Affects Versions: 0.20.0, 0.21.0, 0.19.1
Reporter: Dario Rexin
Priority: Minor

 I have been struggling with this for some time (due to my lack of knowledge 
 about C compiler error messages) and finally found a way to make it compile. 
 The problem is that Zookeeper defines a function `htonll` that is a builtin 
 function in Yosemite. For me it worked to just remove this function, but as 
 it needs to keep working on other systems as well, we would need some check 
 for the OS version or if the function is already defined.
 Here are the links to the source:
 https://github.com/apache/zookeeper/blob/trunk/src/c/include/recordio.h#L73
 https://github.com/apache/zookeeper/blob/trunk/src/c/src/recordio.c#L83-L97



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1675) Decouple version of the mesos library from the package release version

2014-09-16 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135816#comment-14135816
 ] 

Timothy St. Clair commented on MESOS-1675:
--

Folks will need to check compatibility and update the revision in 
src/Makefile.am as outlined here:  
http://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html

 Decouple version of the mesos library from the package release version
 --

 Key: MESOS-1675
 URL: https://issues.apache.org/jira/browse/MESOS-1675
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone

 This discussion should be rolled into the larger discussion around how to 
 version Mesos (APIs, packages, libraries etc).
 Some notes from libtool docs.
 http://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html
 http://www.gnu.org/software/libtool/manual/html_node/Release-numbers.html#Release-numbers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-444) Remove --checkpoint flag in the slave once checkpointing is stable.

2014-09-16 Thread Kevin Sweeney (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135880#comment-14135880
 ] 

Kevin Sweeney commented on MESOS-444:
-

Any activity here? I'd like to simplify this flag: 
https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/DriverFactory.java#L75-L101

 Remove --checkpoint flag in the slave once checkpointing is stable.
 ---

 Key: MESOS-444
 URL: https://issues.apache.org/jira/browse/MESOS-444
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Mahler
  Labels: newbie

 In the interim of slave recovery being worked on (see: MESOS-110), we've 
 added a --checkpoint flag to the slave to enable or disable the feature.
 Prior to releasing this feature, we need to remove this flag so that all 
 slaves have checkpointing available, and frameworks can choose to use it. 
 There's no need to keep this flag around and add configuration complexity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1746) clear TaskStatus data to avoid OOM

2014-09-16 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135982#comment-14135982
 ] 

Timothy St. Clair commented on MESOS-1746:
--

Maybe I'm missing something, but how is this a Mesos problem?  It seems like an 
Executor sizing constraint issue in the Spark Scheduler. 

 clear TaskStatus data to avoid OOM
 --

 Key: MESOS-1746
 URL: https://issues.apache.org/jira/browse/MESOS-1746
 Project: Mesos
  Issue Type: Bug
 Environment: mesos-0.19.0
Reporter: Chengwei Yang
Assignee: Chengwei Yang

 Spark on mesos may use TaskStatus to transfer computed result between worker 
 and scheduler, the source code like below (spark 1.0.2)
 {code}
 val serializedResult = {
   if (serializedDirectResult.limit = execBackend.akkaFrameSize() -
   AkkaUtils.reservedSizeBytes) {  
   
   
 
 logInfo(Storing result for  + taskId +  in local BlockManager)
 val blockId = TaskResultBlockId(taskId)
 env.blockManager.putBytes(
   blockId, serializedDirectResult, 
 StorageLevel.MEMORY_AND_DISK_SER)
 ser.serialize(new IndirectTaskResult[Any](blockId))   
   
   
 
   } else {
   
   
 
 logInfo(Sending result for  + taskId +  directly to driver)
 serializedDirectResult
   
   
 
   }   
   
   
 
 }
 {code}
 And In our test environment, we enlarge akkaFrameSize to 128MB from default 
 value (10MB) and this cause our mesos-master process will be OOM in tens of 
 minutes when running spark tasks in fine-grained mode.
 As you can see, even changed akkaFrameSize back to default value (10MB), it's 
 very likely to make mesos-master OOM too, however more slower.
 So I think it's good to delete data from TaskStatus since this is only 
 designed to on-top framework and we don't interested in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-1746) clear TaskStatus data to avoid OOM

2014-09-16 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135982#comment-14135982
 ] 

Timothy St. Clair edited comment on MESOS-1746 at 9/16/14 7:05 PM:
---

Are you saying, a task status update is OOM killing the mesos-master?


was (Author: tstclair):
Maybe I'm missing something, but how is this a Mesos problem?  It seems like an 
Executor sizing constraint issue in the Spark Scheduler. 

 clear TaskStatus data to avoid OOM
 --

 Key: MESOS-1746
 URL: https://issues.apache.org/jira/browse/MESOS-1746
 Project: Mesos
  Issue Type: Bug
 Environment: mesos-0.19.0
Reporter: Chengwei Yang
Assignee: Chengwei Yang

 Spark on mesos may use TaskStatus to transfer computed result between worker 
 and scheduler, the source code like below (spark 1.0.2)
 {code}
 val serializedResult = {
   if (serializedDirectResult.limit = execBackend.akkaFrameSize() -
   AkkaUtils.reservedSizeBytes) {  
   
   
 
 logInfo(Storing result for  + taskId +  in local BlockManager)
 val blockId = TaskResultBlockId(taskId)
 env.blockManager.putBytes(
   blockId, serializedDirectResult, 
 StorageLevel.MEMORY_AND_DISK_SER)
 ser.serialize(new IndirectTaskResult[Any](blockId))   
   
   
 
   } else {
   
   
 
 logInfo(Sending result for  + taskId +  directly to driver)
 serializedDirectResult
   
   
 
   }   
   
   
 
 }
 {code}
 And In our test environment, we enlarge akkaFrameSize to 128MB from default 
 value (10MB) and this cause our mesos-master process will be OOM in tens of 
 minutes when running spark tasks in fine-grained mode.
 As you can see, even changed akkaFrameSize back to default value (10MB), it's 
 very likely to make mesos-master OOM too, however more slower.
 So I think it's good to delete data from TaskStatus since this is only 
 designed to on-top framework and we don't interested in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1799) Reconciliation can send out-of-order updates.

2014-09-16 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-1799:
--

 Summary: Reconciliation can send out-of-order updates.
 Key: MESOS-1799
 URL: https://issues.apache.org/jira/browse/MESOS-1799
 Project: Mesos
  Issue Type: Bug
  Components: master, slave
Reporter: Benjamin Mahler


When a slave re-registers with the master, it currently sends the latest task 
state for all tasks that are not both terminal and acknowledged.

However, reconciliation assumes that we always have the latest unacknowledged 
state of the task represented in the master.

As a result, out-of-order updates are possible, e.g.

(1) Slave has task T in TASK_FINISHED, with unacknowledged updates: 
[TASK_RUNNING, TASK_FINISHED].
(2) Master fails over.
(3) New master re-registers the slave with T in TASK_FINISHED.
(4) Reconciliation request arrives, master sends TASK_FINISHED.
(5) Slave sends TASK_RUNNING to master, master sends TASK_RUNNING.

I think the fix here is to preserve the task state invariants in the master, 
namely, that the master has the latest unacknowledged state of the task. This 
means when the slave re-registers, it should instead send the latest 
unacknowledged state of each task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1027) IPv6 support

2014-09-16 Thread Oskar Stenman (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136253#comment-14136253
 ] 

Oskar Stenman commented on MESOS-1027:
--

This would be great if it was resolved, we have a few things holding us back 
from V6-only (which would allow us to greatly simplify a lot of our 
infrastructure), one is mesos, the others are most likely weird services we 
haven't discovered are an issue yet since we can't even run mesos on V6-only. :)

 IPv6 support
 

 Key: MESOS-1027
 URL: https://issues.apache.org/jira/browse/MESOS-1027
 Project: Mesos
  Issue Type: Epic
  Components: framework, libprocess, master, slave
Reporter: Dominic Hamon
 Fix For: 1.0.0


 From the CLI down through the various layers of tech we should support IPv6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1800) The slave does not send pending executors during re-registration.

2014-09-16 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-1800:
--

 Summary: The slave does not send pending executors during 
re-registration.
 Key: MESOS-1800
 URL: https://issues.apache.org/jira/browse/MESOS-1800
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Benjamin Mahler


In what looks like an oversight, the pending executors in the slave are not 
sent in the re-registration message.

This can lead to under-accounting in the master, causing an overcommit on the 
slave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1466) Race between executor exited event and launch task can cause overcommit of resources

2014-09-16 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-1466:
--

Assignee: (was: Benjamin Mahler)

 Race between executor exited event and launch task can cause overcommit of 
 resources
 

 Key: MESOS-1466
 URL: https://issues.apache.org/jira/browse/MESOS-1466
 Project: Mesos
  Issue Type: Bug
  Components: allocation, master
Reporter: Vinod Kone
  Labels: reliability

 The following sequence of events can cause an overcommit
 -- Launch task is called for a task whose executor is already running
 -- Executor's resources are not accounted for on the master
 -- Executor exits and the event is enqueued behind launch tasks on the master
 -- Master sends the task to the slave which needs to commit for resources 
 for task and the (new) executor.
 -- Master processes the executor exited event and re-offers the executor's 
 resources causing an overcommit of resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1688) No offers if no memory is allocatable

2014-09-16 Thread Bhuvan Arumugam (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136441#comment-14136441
 ] 

Bhuvan Arumugam commented on MESOS-1688:


targetting it for 0.21.0.

 No offers if no memory is allocatable
 -

 Key: MESOS-1688
 URL: https://issues.apache.org/jira/browse/MESOS-1688
 Project: Mesos
  Issue Type: Bug
  Components: master
Affects Versions: 0.18.1, 0.18.2, 0.19.0, 0.19.1
Reporter: Martin Weindel
Priority: Critical
 Fix For: 0.21.0


 The [Spark 
 scheduler|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala]
  allocates memory only for the executor and cpu only for its tasks.
 So it can happen that all memory is nearly completely allocated by Spark 
 executors, but all cpu resources are idle.
 In this case Mesos does not offer resources anymore, as less than MIN_MEM 
 (=32MB) memory is allocatable.
 This effectively causes a dead lock in the Spark job, as it is not offered 
 cpu resources needed for launching new tasks.
 see {{HierarchicalAllocatorProcess::allocatable(const Resources)}} called in 
 {{HierarchicalAllocatorProcess::allocate(const hashsetSlaveID)}}
 {code}
 template class RoleSorter, class FrameworkSorter
 bool
 HierarchicalAllocatorProcessRoleSorter, FrameworkSorter::allocatable(
 const Resources resources)
 {
 ...
   Optiondouble cpus = resources.cpus();
   OptionBytes mem = resources.mem();
   if (cpus.isSome()  mem.isSome()) {
 return cpus.get() = MIN_CPUS  mem.get()  MIN_MEM;
   }
   return false;
 }
 {code}
 A possible solution may to completely drop the condition on allocatable 
 memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1195) systemd.slice + cgroup enablement fails in multiple ways.

2014-09-16 Thread Bhuvan Arumugam (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136449#comment-14136449
 ] 

Bhuvan Arumugam commented on MESOS-1195:


[~tstclair] the patch is still not reviewed. i'm going to offload it to 0.21.0.

 systemd.slice + cgroup enablement fails in multiple ways. 
 --

 Key: MESOS-1195
 URL: https://issues.apache.org/jira/browse/MESOS-1195
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 0.18.0
Reporter: Timothy St. Clair
Assignee: Timothy St. Clair

 When attempting to configure mesos to use systemd slices on a 'rawhide/f21' 
 machine, it fails creating the isolator: 
 I0407 12:39:28.035354 14916 containerizer.cpp:180] Using isolation: 
 cgroups/cpu,cgroups/mem
 Failed to create a containerizer: Could not create isolator cgroups/cpu: 
 Failed to create isolator: The cpu subsystem is co-mounted at 
 /sys/fs/cgroup/cpu with other subsytems
 -- details --
 /sys/fs/cgroup
 total 0
 drwxr-xr-x. 12 root root 280 Mar 18 08:47 .
 drwxr-xr-x.  6 root root   0 Mar 18 08:47 ..
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 blkio
 lrwxrwxrwx.  1 root root  11 Mar 18 08:47 cpu - cpu,cpuacct
 lrwxrwxrwx.  1 root root  11 Mar 18 08:47 cpuacct - cpu,cpuacct
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 cpu,cpuacct
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 cpuset
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 devices
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 freezer
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 hugetlb
 drwxr-xr-x.  3 root root   0 Apr  3 11:26 memory
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 net_cls
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 perf_event
 drwxr-xr-x.  4 root root   0 Mar 18 08:47 systemd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1724) Can't include port in DockerInfo's image

2014-09-16 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1724:
---
Fix Version/s: 0.20.1

 Can't include port in DockerInfo's image
 

 Key: MESOS-1724
 URL: https://issues.apache.org/jira/browse/MESOS-1724
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Jay Buffington
Assignee: Timothy Chen
Priority: Minor
  Labels: docker
 Fix For: 0.20.1


 The current git tree doesn't allow you to specify a docker image with 
 multiple colons.  It is valid that multiple colons would exist in a docker 
 image.  e.g. docker-registry.example.com:80/centos:6u5
 From 
 https://github.com/apache/mesos/blob/02a35ab213fb074f6c532075cada76f13eb9d552/src/slave/containerizer/docker.cpp#L441
 {code}
   vectorstring parts = strings::split(dockerInfo.image(), :);
   if (parts.size()  2) {
 return Failure(Not expecting multiple ':' in image:  +
dockerInfo.image());
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1737) Isolation=external result in core dump on 0.20.0

2014-09-16 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1737:
---
Target Version/s: 0.20.1

 Isolation=external result in core dump on 0.20.0
 

 Key: MESOS-1737
 URL: https://issues.apache.org/jira/browse/MESOS-1737
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 0.20.0
Reporter: Tim Nolet
Assignee: Timothy Chen
 Fix For: 0.20.1


 When upgrading from 0.19.1 to 0.20.0, any slaves started with the standard 
 deimos setup fail hard on startup. The following command spits out about 
 20.000 errors before core dumping:
 /etc/mesos-slave# /usr/local/sbin/mesos-slave 
 --master=zk://localhost:2181/mesos --port=5051 --log_dir=/var/log/mesos 
 --ip=172.17.8.101 --work_dir=/var/lib/mesos --isolation=external 
 --containerizer_path=/usr/local/bin/deimos
 output:
 
 W0827 15:20:18.366271   721 containerizer.cpp:159] The 'external' isolation 
 flag is deprecated, please update your flags to '--containerizers=external'.
 W0827 15:20:18.366580   721 containerizer.cpp:159] The 'external' isolation 
 flag is deprecated, please update your flags to '--containerizers=external'.
 W0827 15:20:18.366631   721 containerizer.cpp:159] The 'external' isolation 
 flag is deprecated, please update your flags to '--containerizers=external'.
 W0827 15:20:18.366683   721 containerizer.cpp:159] The 'external' isolation 
 flag is deprecated, please update your flags to '--containerizers=external'.
 W0827 15:20:18.366714   721 containerizer.cpp:159] The 'external' isolation 
 flag is deprecated, please update your flags to '--containerizers=external'.
 W0827 15:20:18.366752   721 containerizer.cpp:159] The 'external' isolation 
 flag is deprecated, please update your flags to '--containerizers=external'.
 Segmentation fault (core dumped)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1643) Provide APIs to return port resource for a given role

2014-09-16 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1643:
---
Target Version/s: 0.20.1
   Fix Version/s: (was: 0.21.0)
  0.20.1

trivial enough to accomodate in 0.20.1.

 Provide APIs to return port resource for a given role
 -

 Key: MESOS-1643
 URL: https://issues.apache.org/jira/browse/MESOS-1643
 Project: Mesos
  Issue Type: Improvement
Reporter: Zuyu Zhang
Assignee: Zuyu Zhang
Priority: Trivial
 Fix For: 0.20.1


 It makes more sense to return port resource for a given role, rather than all 
 ports in Resources.
 In mesos/resource.hpp:
 OptionValue::Ranges Resources::ports(const string role = *);
 // Check whether Resources have the given number (num_port) of ports, and 
 return the begin number of the port range.
 Optionlong Resources::getPorts(long num_port, const string role = *);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1675) Decouple version of the mesos library from the package release version

2014-09-16 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136550#comment-14136550
 ] 

Vinod Kone commented on MESOS-1675:
---

I see. For the patch you sent, which sets version to 0.0.0, do frameworks have 
to do anything specific to use the new lib (assuming it's compatible)?

 Decouple version of the mesos library from the package release version
 --

 Key: MESOS-1675
 URL: https://issues.apache.org/jira/browse/MESOS-1675
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone

 This discussion should be rolled into the larger discussion around how to 
 version Mesos (APIs, packages, libraries etc).
 Some notes from libtool docs.
 http://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html
 http://www.gnu.org/software/libtool/manual/html_node/Release-numbers.html#Release-numbers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2014-09-16 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-1802:
--

 Summary: HealthCheckTest.HealthStatusChange is flaky on jenkins.
 Key: MESOS-1802
 URL: https://issues.apache.org/jira/browse/MESOS-1802
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Benjamin Mahler
Assignee: Timothy Chen


https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull

{noformat}
[ RUN  ] HealthCheckTest.HealthStatusChange
Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2'
I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms
I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns
I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns
I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in 642ns
I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the db 
in 343ns
I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log 
positions 0 - 0 with 1 holes and 0 unlearned
I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery
I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status
I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received a 
broadcasted recover request
I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from a 
replica in EMPTY status
I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to STARTING
I0916 22:56:14.036603 21046 master.cpp:286] Master 
20140916-225614-3125920579-47865-21026 (penates.apache.org) started on 
67.195.81.186:47865
I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing authenticated 
frameworks to register
I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing authenticated 
slaves to register
I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for 
authentication from '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials'
I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 480322ns
I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to 
STARTING
I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled
I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status
I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising 
offers for all slaves
I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] 
Initializing hierarchical allocator process with master : 
master@67.195.81.186:47865
I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status 
received a broadcasted recover request
I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is 
master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026
I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master!
I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar
I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar
I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from a 
replica in STARTING status
I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING
I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 330251ns
I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to VOTING
I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos group
I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated
I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer
I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit promise 
request with proposal 1
I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 92623ns
I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1
I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to fill 
missing position
I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit promise 
request for position 0 with proposal 2
I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to 
leveldb took 144215ns
I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0
I0916 22:56:14.040243 21047 replica.cpp:508] Replica received write request for 
position 0
I0916 22:56:14.040267 21047 leveldb.cpp:438] Reading position from leveldb took 
10323ns
I0916 22:56:14.040362 21047 leveldb.cpp:343] Persisting action (14 bytes) to 
leveldb took 79471ns
I0916 22:56:14.040375 21047 replica.cpp:676] Persisted action at 0
I0916 22:56:14.040556 21054 replica.cpp:655] Replica received learned notice 
for position 0
I0916 22:56:14.040658 21054 leveldb.cpp:343] Persisting action (16 bytes) to 
leveldb took 83975ns
I0916 22:56:14.040676 21054 replica.cpp:676

[jira] [Created] (MESOS-1803) Strict/RegistrarTest.remove test is flaky on jenkins.

2014-09-16 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-1803:
--

 Summary: Strict/RegistrarTest.remove test is flaky on jenkins.
 Key: MESOS-1803
 URL: https://issues.apache.org/jira/browse/MESOS-1803
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler


https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull

{noformat}
[ RUN  ] Strict/RegistrarTest.remove/1
Using temporary directory '/tmp/Strict_RegistrarTest_remove_1_3QvnOW'
I0916 22:59:02.112568 21026 leveldb.cpp:176] Opened db in 1.779835ms
I0916 22:59:02.112896 21026 leveldb.cpp:183] Compacted db in 301862ns
I0916 22:59:02.112916 21026 leveldb.cpp:198] Created db iterator in 3065ns
I0916 22:59:02.112926 21026 leveldb.cpp:204] Seeked to beginning of db in 475ns
I0916 22:59:02.112936 21026 leveldb.cpp:273] Iterated through 0 keys in the db 
in 330ns
I0916 22:59:02.112951 21026 replica.cpp:741] Replica recovered with log 
positions 0 - 0 with 1 holes and 0 unlearned
I0916 22:59:02.113654 21054 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 421460ns
I0916 22:59:02.113674 21054 replica.cpp:320] Persisted replica status to VOTING
I0916 22:59:02.115900 21026 leveldb.cpp:176] Opened db in 1.947919ms
I0916 22:59:02.116263 21026 leveldb.cpp:183] Compacted db in 338043ns
I0916 22:59:02.116283 21026 leveldb.cpp:198] Created db iterator in 2809ns
I0916 22:59:02.116293 21026 leveldb.cpp:204] Seeked to beginning of db in 468ns
I0916 22:59:02.116302 21026 leveldb.cpp:273] Iterated through 0 keys in the db 
in 195ns
I0916 22:59:02.116317 21026 replica.cpp:741] Replica recovered with log 
positions 0 - 0 with 1 holes and 0 unlearned
I0916 22:59:02.117013 21043 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 472891ns
I0916 22:59:02.117034 21043 replica.cpp:320] Persisted replica status to VOTING
I0916 22:59:02.119240 21026 leveldb.cpp:176] Opened db in 1.950367ms
I0916 22:59:02.120455 21026 leveldb.cpp:183] Compacted db in 1.188056ms
I0916 22:59:02.120481 21026 leveldb.cpp:198] Created db iterator in 4370ns
I0916 22:59:02.120499 21026 leveldb.cpp:204] Seeked to beginning of db in 7977ns
I0916 22:59:02.120517 21026 leveldb.cpp:273] Iterated through 1 keys in the db 
in 8479ns
I0916 22:59:02.120533 21026 replica.cpp:741] Replica recovered with log 
positions 0 - 0 with 1 holes and 0 unlearned
I0916 22:59:02.122890 21026 leveldb.cpp:176] Opened db in 2.301327ms
I0916 22:59:02.124325 21026 leveldb.cpp:183] Compacted db in 1.406223ms
I0916 22:59:02.124351 21026 leveldb.cpp:198] Created db iterator in 4185ns
I0916 22:59:02.124368 21026 leveldb.cpp:204] Seeked to beginning of db in 7167ns
I0916 22:59:02.124387 21026 leveldb.cpp:273] Iterated through 1 keys in the db 
in 8182ns
I0916 22:59:02.124403 21026 replica.cpp:741] Replica recovered with log 
positions 0 - 0 with 1 holes and 0 unlearned
I0916 22:59:02.124579 21047 recover.cpp:425] Starting replica recovery
I0916 22:59:02.124651 21047 recover.cpp:451] Replica is in VOTING status
I0916 22:59:02.124793 21047 recover.cpp:440] Recover process terminated
I0916 22:59:02.126404 21046 registrar.cpp:313] Recovering registrar
I0916 22:59:02.126597 21050 log.cpp:656] Attempting to start the writer
I0916 22:59:02.127259 21041 replica.cpp:474] Replica received implicit promise 
request with proposal 1
I0916 22:59:02.127321 21050 replica.cpp:474] Replica received implicit promise 
request with proposal 1
I0916 22:59:02.127835 21041 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 547018ns
I0916 22:59:02.127858 21041 replica.cpp:342] Persisted promised to 1
I0916 22:59:02.127835 21050 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 487588ns
I0916 22:59:02.127887 21050 replica.cpp:342] Persisted promised to 1
I0916 22:59:02.128387 21055 coordinator.cpp:230] Coordinator attemping to fill 
missing position
I0916 22:59:02.129546 21042 replica.cpp:375] Replica received explicit promise 
request for position 0 with proposal 2
I0916 22:59:02.129600 21053 replica.cpp:375] Replica received explicit promise 
request for position 0 with proposal 2
I0916 22:59:02.129982 21042 leveldb.cpp:343] Persisting action (8 bytes) to 
leveldb took 406954ns
I0916 22:59:02.129982 21053 leveldb.cpp:343] Persisting action (8 bytes) to 
leveldb took 357253ns
I0916 22:59:02.130009 21042 replica.cpp:676] Persisted action at 0
I0916 22:59:02.130029 21053 replica.cpp:676] Persisted action at 0
I0916 22:59:02.130543 21041 replica.cpp:508] Replica received write request for 
position 0
I0916 22:59:02.130585 21041 leveldb.cpp:438] Reading position from leveldb took 
17424ns
I0916 22:59:02.130599 21046 replica.cpp:508] Replica received write request for 
position 0
I0916 22:59:02.130635 21046 leveldb.cpp:438] Reading position from leveldb took 
12702ns
I0916 22:59:02.130728 

[jira] [Updated] (MESOS-1760) MasterAuthorizationTest.FrameworkRemovedBeforeReregistration is flaky

2014-09-16 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1760:
---
Target Version/s: 0.20.1  (was: 0.21.0)

 MasterAuthorizationTest.FrameworkRemovedBeforeReregistration is flaky
 -

 Key: MESOS-1760
 URL: https://issues.apache.org/jira/browse/MESOS-1760
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Vinod Kone
Assignee: Vinod Kone
 Fix For: 0.20.1


 Observed this on Apache CI: 
 https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2355/changes
 {code}
 [ RUN] MasterAuthorizationTest.FrameworkRemovedBeforeReregistration
 Using temporary directory 
 '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_0tw16Z'
 I0903 22:04:33.520237 25565 leveldb.cpp:176] Opened db in 49.073821ms
 I0903 22:04:33.538331 25565 leveldb.cpp:183] Compacted db in 18.065051ms
 I0903 22:04:33.538363 25565 leveldb.cpp:198] Created db iterator in 4826ns
 I0903 22:04:33.538377 25565 leveldb.cpp:204] Seeked to beginning of db in 
 682ns
 I0903 22:04:33.538385 25565 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 312ns
 I0903 22:04:33.538399 25565 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0903 22:04:33.538624 25593 recover.cpp:425] Starting replica recovery
 I0903 22:04:33.538707 25598 recover.cpp:451] Replica is in EMPTY status
 I0903 22:04:33.540909 25590 master.cpp:286] Master 
 20140903-220433-453759884-44122-25565 (hemera.apache.org) started on 
 140.211.11.27:44122
 I0903 22:04:33.540932 25590 master.cpp:332] Master only allowing 
 authenticated frameworks to register
 I0903 22:04:33.540936 25590 master.cpp:337] Master only allowing 
 authenticated slaves to register
 I0903 22:04:33.540941 25590 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_0tw16Z/credentials'
 I0903 22:04:33.541337 25590 master.cpp:366] Authorization enabled
 I0903 22:04:33.541508 25597 replica.cpp:638] Replica in EMPTY status received 
 a broadcasted recover request
 I0903 22:04:33.542343 25582 hierarchical_allocator_process.hpp:299] 
 Initializing hierarchical allocator process with master : 
 master@140.211.11.27:44122
 I0903 22:04:33.542445 25592 master.cpp:120] No whitelist given. Advertising 
 offers for all slaves
 I0903 22:04:33.543175 25602 recover.cpp:188] Received a recover response from 
 a replica in EMPTY status
 I0903 22:04:33.543637 25587 recover.cpp:542] Updating replica status to 
 STARTING
 I0903 22:04:33.544256 25579 master.cpp:1205] The newly elected leader is 
 master@140.211.11.27:44122 with id 20140903-220433-453759884-44122-25565
 I0903 22:04:33.544275 25579 master.cpp:1218] Elected as the leading master!
 I0903 22:04:33.544282 25579 master.cpp:1036] Recovering from registrar
 I0903 22:04:33.544401 25579 registrar.cpp:313] Recovering registrar
 I0903 22:04:33.558487 25593 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 14.678563ms
 I0903 22:04:33.558531 25593 replica.cpp:320] Persisted replica status to 
 STARTING
 I0903 22:04:33.558653 25593 recover.cpp:451] Replica is in STARTING status
 I0903 22:04:33.559867 25588 replica.cpp:638] Replica in STARTING status 
 received a broadcasted recover request
 I0903 22:04:33.560057 25602 recover.cpp:188] Received a recover response from 
 a replica in STARTING status
 I0903 22:04:33.561280 25584 recover.cpp:542] Updating replica status to VOTING
 I0903 22:04:33.576900 25581 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 14.712427ms
 I0903 22:04:33.576942 25581 replica.cpp:320] Persisted replica status to 
 VOTING
 I0903 22:04:33.577018 25581 recover.cpp:556] Successfully joined the Paxos 
 group
 I0903 22:04:33.577108 25581 recover.cpp:440] Recover process terminated
 I0903 22:04:33.577401 25581 log.cpp:656] Attempting to start the writer
 I0903 22:04:33.578559 25589 replica.cpp:474] Replica received implicit 
 promise request with proposal 1
 I0903 22:04:33.594611 25589 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 16.029152ms
 I0903 22:04:33.594640 25589 replica.cpp:342] Persisted promised to 1
 I0903 22:04:33.595391 25584 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0903 22:04:33.597512 25588 replica.cpp:375] Replica received explicit 
 promise request for position 0 with proposal 2
 I0903 22:04:33.613037 25588 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 15.502568ms
 I0903 22:04:33.613065 25588 replica.cpp:676] Persisted action at 0
 I0903 22:04:33.615435 25585 replica.cpp:508] Replica received write request 
 for position 0
 I0903 22:04:33.615463 25585 leveldb.cpp:438] 

[jira] [Updated] (MESOS-1760) MasterAuthorizationTest.FrameworkRemovedBeforeReregistration is flaky

2014-09-16 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1760:
---
Fix Version/s: (was: 0.21.0)
   0.20.1

 MasterAuthorizationTest.FrameworkRemovedBeforeReregistration is flaky
 -

 Key: MESOS-1760
 URL: https://issues.apache.org/jira/browse/MESOS-1760
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Vinod Kone
Assignee: Vinod Kone
 Fix For: 0.20.1


 Observed this on Apache CI: 
 https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2355/changes
 {code}
 [ RUN] MasterAuthorizationTest.FrameworkRemovedBeforeReregistration
 Using temporary directory 
 '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_0tw16Z'
 I0903 22:04:33.520237 25565 leveldb.cpp:176] Opened db in 49.073821ms
 I0903 22:04:33.538331 25565 leveldb.cpp:183] Compacted db in 18.065051ms
 I0903 22:04:33.538363 25565 leveldb.cpp:198] Created db iterator in 4826ns
 I0903 22:04:33.538377 25565 leveldb.cpp:204] Seeked to beginning of db in 
 682ns
 I0903 22:04:33.538385 25565 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 312ns
 I0903 22:04:33.538399 25565 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0903 22:04:33.538624 25593 recover.cpp:425] Starting replica recovery
 I0903 22:04:33.538707 25598 recover.cpp:451] Replica is in EMPTY status
 I0903 22:04:33.540909 25590 master.cpp:286] Master 
 20140903-220433-453759884-44122-25565 (hemera.apache.org) started on 
 140.211.11.27:44122
 I0903 22:04:33.540932 25590 master.cpp:332] Master only allowing 
 authenticated frameworks to register
 I0903 22:04:33.540936 25590 master.cpp:337] Master only allowing 
 authenticated slaves to register
 I0903 22:04:33.540941 25590 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_0tw16Z/credentials'
 I0903 22:04:33.541337 25590 master.cpp:366] Authorization enabled
 I0903 22:04:33.541508 25597 replica.cpp:638] Replica in EMPTY status received 
 a broadcasted recover request
 I0903 22:04:33.542343 25582 hierarchical_allocator_process.hpp:299] 
 Initializing hierarchical allocator process with master : 
 master@140.211.11.27:44122
 I0903 22:04:33.542445 25592 master.cpp:120] No whitelist given. Advertising 
 offers for all slaves
 I0903 22:04:33.543175 25602 recover.cpp:188] Received a recover response from 
 a replica in EMPTY status
 I0903 22:04:33.543637 25587 recover.cpp:542] Updating replica status to 
 STARTING
 I0903 22:04:33.544256 25579 master.cpp:1205] The newly elected leader is 
 master@140.211.11.27:44122 with id 20140903-220433-453759884-44122-25565
 I0903 22:04:33.544275 25579 master.cpp:1218] Elected as the leading master!
 I0903 22:04:33.544282 25579 master.cpp:1036] Recovering from registrar
 I0903 22:04:33.544401 25579 registrar.cpp:313] Recovering registrar
 I0903 22:04:33.558487 25593 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 14.678563ms
 I0903 22:04:33.558531 25593 replica.cpp:320] Persisted replica status to 
 STARTING
 I0903 22:04:33.558653 25593 recover.cpp:451] Replica is in STARTING status
 I0903 22:04:33.559867 25588 replica.cpp:638] Replica in STARTING status 
 received a broadcasted recover request
 I0903 22:04:33.560057 25602 recover.cpp:188] Received a recover response from 
 a replica in STARTING status
 I0903 22:04:33.561280 25584 recover.cpp:542] Updating replica status to VOTING
 I0903 22:04:33.576900 25581 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 14.712427ms
 I0903 22:04:33.576942 25581 replica.cpp:320] Persisted replica status to 
 VOTING
 I0903 22:04:33.577018 25581 recover.cpp:556] Successfully joined the Paxos 
 group
 I0903 22:04:33.577108 25581 recover.cpp:440] Recover process terminated
 I0903 22:04:33.577401 25581 log.cpp:656] Attempting to start the writer
 I0903 22:04:33.578559 25589 replica.cpp:474] Replica received implicit 
 promise request with proposal 1
 I0903 22:04:33.594611 25589 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 16.029152ms
 I0903 22:04:33.594640 25589 replica.cpp:342] Persisted promised to 1
 I0903 22:04:33.595391 25584 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0903 22:04:33.597512 25588 replica.cpp:375] Replica received explicit 
 promise request for position 0 with proposal 2
 I0903 22:04:33.613037 25588 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 15.502568ms
 I0903 22:04:33.613065 25588 replica.cpp:676] Persisted action at 0
 I0903 22:04:33.615435 25585 replica.cpp:508] Replica received write request 
 for position 0
 I0903 22:04:33.615463 25585 

[jira] [Commented] (MESOS-1760) MasterAuthorizationTest.FrameworkRemovedBeforeReregistration is flaky

2014-09-16 Thread Bhuvan Arumugam (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136577#comment-14136577
 ] 

Bhuvan Arumugam commented on MESOS-1760:


including in 0.20.1.

 MasterAuthorizationTest.FrameworkRemovedBeforeReregistration is flaky
 -

 Key: MESOS-1760
 URL: https://issues.apache.org/jira/browse/MESOS-1760
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Vinod Kone
Assignee: Vinod Kone
 Fix For: 0.20.1


 Observed this on Apache CI: 
 https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2355/changes
 {code}
 [ RUN] MasterAuthorizationTest.FrameworkRemovedBeforeReregistration
 Using temporary directory 
 '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_0tw16Z'
 I0903 22:04:33.520237 25565 leveldb.cpp:176] Opened db in 49.073821ms
 I0903 22:04:33.538331 25565 leveldb.cpp:183] Compacted db in 18.065051ms
 I0903 22:04:33.538363 25565 leveldb.cpp:198] Created db iterator in 4826ns
 I0903 22:04:33.538377 25565 leveldb.cpp:204] Seeked to beginning of db in 
 682ns
 I0903 22:04:33.538385 25565 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 312ns
 I0903 22:04:33.538399 25565 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0903 22:04:33.538624 25593 recover.cpp:425] Starting replica recovery
 I0903 22:04:33.538707 25598 recover.cpp:451] Replica is in EMPTY status
 I0903 22:04:33.540909 25590 master.cpp:286] Master 
 20140903-220433-453759884-44122-25565 (hemera.apache.org) started on 
 140.211.11.27:44122
 I0903 22:04:33.540932 25590 master.cpp:332] Master only allowing 
 authenticated frameworks to register
 I0903 22:04:33.540936 25590 master.cpp:337] Master only allowing 
 authenticated slaves to register
 I0903 22:04:33.540941 25590 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/MasterAuthorizationTest_FrameworkRemovedBeforeReregistration_0tw16Z/credentials'
 I0903 22:04:33.541337 25590 master.cpp:366] Authorization enabled
 I0903 22:04:33.541508 25597 replica.cpp:638] Replica in EMPTY status received 
 a broadcasted recover request
 I0903 22:04:33.542343 25582 hierarchical_allocator_process.hpp:299] 
 Initializing hierarchical allocator process with master : 
 master@140.211.11.27:44122
 I0903 22:04:33.542445 25592 master.cpp:120] No whitelist given. Advertising 
 offers for all slaves
 I0903 22:04:33.543175 25602 recover.cpp:188] Received a recover response from 
 a replica in EMPTY status
 I0903 22:04:33.543637 25587 recover.cpp:542] Updating replica status to 
 STARTING
 I0903 22:04:33.544256 25579 master.cpp:1205] The newly elected leader is 
 master@140.211.11.27:44122 with id 20140903-220433-453759884-44122-25565
 I0903 22:04:33.544275 25579 master.cpp:1218] Elected as the leading master!
 I0903 22:04:33.544282 25579 master.cpp:1036] Recovering from registrar
 I0903 22:04:33.544401 25579 registrar.cpp:313] Recovering registrar
 I0903 22:04:33.558487 25593 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 14.678563ms
 I0903 22:04:33.558531 25593 replica.cpp:320] Persisted replica status to 
 STARTING
 I0903 22:04:33.558653 25593 recover.cpp:451] Replica is in STARTING status
 I0903 22:04:33.559867 25588 replica.cpp:638] Replica in STARTING status 
 received a broadcasted recover request
 I0903 22:04:33.560057 25602 recover.cpp:188] Received a recover response from 
 a replica in STARTING status
 I0903 22:04:33.561280 25584 recover.cpp:542] Updating replica status to VOTING
 I0903 22:04:33.576900 25581 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 14.712427ms
 I0903 22:04:33.576942 25581 replica.cpp:320] Persisted replica status to 
 VOTING
 I0903 22:04:33.577018 25581 recover.cpp:556] Successfully joined the Paxos 
 group
 I0903 22:04:33.577108 25581 recover.cpp:440] Recover process terminated
 I0903 22:04:33.577401 25581 log.cpp:656] Attempting to start the writer
 I0903 22:04:33.578559 25589 replica.cpp:474] Replica received implicit 
 promise request with proposal 1
 I0903 22:04:33.594611 25589 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 16.029152ms
 I0903 22:04:33.594640 25589 replica.cpp:342] Persisted promised to 1
 I0903 22:04:33.595391 25584 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0903 22:04:33.597512 25588 replica.cpp:375] Replica received explicit 
 promise request for position 0 with proposal 2
 I0903 22:04:33.613037 25588 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 15.502568ms
 I0903 22:04:33.613065 25588 replica.cpp:676] Persisted action at 0
 I0903 22:04:33.615435 25585 replica.cpp:508] Replica received write request 
 for position 0
 I0903 

[jira] [Updated] (MESOS-1766) MasterAuthorizationTest.DuplicateRegistration test is flaky

2014-09-16 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1766:
---
Fix Version/s: (was: 0.21.0)
   0.20.1

 MasterAuthorizationTest.DuplicateRegistration test is flaky
 ---

 Key: MESOS-1766
 URL: https://issues.apache.org/jira/browse/MESOS-1766
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Vinod Kone
Assignee: Vinod Kone
 Fix For: 0.20.1


 {code}
 [ RUN  ] MasterAuthorizationTest.DuplicateRegistration
 Using temporary directory 
 '/tmp/MasterAuthorizationTest_DuplicateRegistration_pVJg7m'
 I0905 15:53:16.398993 25769 leveldb.cpp:176] Opened db in 2.601036ms
 I0905 15:53:16.399566 25769 leveldb.cpp:183] Compacted db in 546216ns
 I0905 15:53:16.399590 25769 leveldb.cpp:198] Created db iterator in 2787ns
 I0905 15:53:16.399605 25769 leveldb.cpp:204] Seeked to beginning of db in 
 500ns
 I0905 15:53:16.399617 25769 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 185ns
 I0905 15:53:16.399633 25769 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0905 15:53:16.399817 25786 recover.cpp:425] Starting replica recovery
 I0905 15:53:16.399952 25793 recover.cpp:451] Replica is in EMPTY status
 I0905 15:53:16.400683 25795 replica.cpp:638] Replica in EMPTY status received 
 a broadcasted recover request
 I0905 15:53:16.400795 25787 recover.cpp:188] Received a recover response from 
 a replica in EMPTY status
 I0905 15:53:16.401005 25783 recover.cpp:542] Updating replica status to 
 STARTING
 I0905 15:53:16.401470 25786 master.cpp:286] Master 
 20140905-155316-3125920579-49188-25769 (penates.apache.org) started on 
 67.195.81.186:49188
 I0905 15:53:16.401521 25786 master.cpp:332] Master only allowing 
 authenticated frameworks to register
 I0905 15:53:16.401533 25786 master.cpp:337] Master only allowing 
 authenticated slaves to register
 I0905 15:53:16.401543 25786 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/MasterAuthorizationTest_DuplicateRegistration_pVJg7m/credentials'
 I0905 15:53:16.401558 25793 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 474683ns
 I0905 15:53:16.401582 25793 replica.cpp:320] Persisted replica status to 
 STARTING
 I0905 15:53:16.401667 25793 recover.cpp:451] Replica is in STARTING status
 I0905 15:53:16.401669 25786 master.cpp:366] Authorization enabled
 I0905 15:53:16.401898 25795 master.cpp:120] No whitelist given. Advertising 
 offers for all slaves
 I0905 15:53:16.401936 25796 hierarchical_allocator_process.hpp:299] 
 Initializing hierarchical allocator process with master : 
 master@67.195.81.186:49188
 I0905 15:53:16.402160 25784 replica.cpp:638] Replica in STARTING status 
 received a broadcasted recover request
 I0905 15:53:16.402333 25790 master.cpp:1205] The newly elected leader is 
 master@67.195.81.186:49188 with id 20140905-155316-3125920579-49188-25769
 I0905 15:53:16.402359 25790 master.cpp:1218] Elected as the leading master!
 I0905 15:53:16.402371 25790 master.cpp:1036] Recovering from registrar
 I0905 15:53:16.402472 25798 registrar.cpp:313] Recovering registrar
 I0905 15:53:16.402529 25791 recover.cpp:188] Received a recover response from 
 a replica in STARTING status
 I0905 15:53:16.402782 25788 recover.cpp:542] Updating replica status to VOTING
 I0905 15:53:16.403002 25795 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 116403ns
 I0905 15:53:16.403020 25795 replica.cpp:320] Persisted replica status to 
 VOTING
 I0905 15:53:16.403081 25791 recover.cpp:556] Successfully joined the Paxos 
 group
 I0905 15:53:16.403197 25791 recover.cpp:440] Recover process terminated
 I0905 15:53:16.403388 25796 log.cpp:656] Attempting to start the writer
 I0905 15:53:16.403993 25784 replica.cpp:474] Replica received implicit 
 promise request with proposal 1
 I0905 15:53:16.404147 25784 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 132156ns
 I0905 15:53:16.404167 25784 replica.cpp:342] Persisted promised to 1
 I0905 15:53:16.404542 25795 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0905 15:53:16.405498 25787 replica.cpp:375] Replica received explicit 
 promise request for position 0 with proposal 2
 I0905 15:53:16.405868 25787 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 347231ns
 I0905 15:53:16.405886 25787 replica.cpp:676] Persisted action at 0
 I0905 15:53:16.406553 25788 replica.cpp:508] Replica received write request 
 for position 0
 I0905 15:53:16.406582 25788 leveldb.cpp:438] Reading position from leveldb 
 took 11402ns
 I0905 15:53:16.529067 25788 leveldb.cpp:343] Persisting action (14 bytes) to 
 leveldb took 535803ns
 I0905 15:53:16.529088 25788 replica.cpp:676] Persisted 

[jira] [Commented] (MESOS-1766) MasterAuthorizationTest.DuplicateRegistration test is flaky

2014-09-16 Thread Bhuvan Arumugam (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136579#comment-14136579
 ] 

Bhuvan Arumugam commented on MESOS-1766:


related to MESOS-1760. including in 0.20.1

 MasterAuthorizationTest.DuplicateRegistration test is flaky
 ---

 Key: MESOS-1766
 URL: https://issues.apache.org/jira/browse/MESOS-1766
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Vinod Kone
Assignee: Vinod Kone
 Fix For: 0.20.1


 {code}
 [ RUN  ] MasterAuthorizationTest.DuplicateRegistration
 Using temporary directory 
 '/tmp/MasterAuthorizationTest_DuplicateRegistration_pVJg7m'
 I0905 15:53:16.398993 25769 leveldb.cpp:176] Opened db in 2.601036ms
 I0905 15:53:16.399566 25769 leveldb.cpp:183] Compacted db in 546216ns
 I0905 15:53:16.399590 25769 leveldb.cpp:198] Created db iterator in 2787ns
 I0905 15:53:16.399605 25769 leveldb.cpp:204] Seeked to beginning of db in 
 500ns
 I0905 15:53:16.399617 25769 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 185ns
 I0905 15:53:16.399633 25769 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0905 15:53:16.399817 25786 recover.cpp:425] Starting replica recovery
 I0905 15:53:16.399952 25793 recover.cpp:451] Replica is in EMPTY status
 I0905 15:53:16.400683 25795 replica.cpp:638] Replica in EMPTY status received 
 a broadcasted recover request
 I0905 15:53:16.400795 25787 recover.cpp:188] Received a recover response from 
 a replica in EMPTY status
 I0905 15:53:16.401005 25783 recover.cpp:542] Updating replica status to 
 STARTING
 I0905 15:53:16.401470 25786 master.cpp:286] Master 
 20140905-155316-3125920579-49188-25769 (penates.apache.org) started on 
 67.195.81.186:49188
 I0905 15:53:16.401521 25786 master.cpp:332] Master only allowing 
 authenticated frameworks to register
 I0905 15:53:16.401533 25786 master.cpp:337] Master only allowing 
 authenticated slaves to register
 I0905 15:53:16.401543 25786 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/MasterAuthorizationTest_DuplicateRegistration_pVJg7m/credentials'
 I0905 15:53:16.401558 25793 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 474683ns
 I0905 15:53:16.401582 25793 replica.cpp:320] Persisted replica status to 
 STARTING
 I0905 15:53:16.401667 25793 recover.cpp:451] Replica is in STARTING status
 I0905 15:53:16.401669 25786 master.cpp:366] Authorization enabled
 I0905 15:53:16.401898 25795 master.cpp:120] No whitelist given. Advertising 
 offers for all slaves
 I0905 15:53:16.401936 25796 hierarchical_allocator_process.hpp:299] 
 Initializing hierarchical allocator process with master : 
 master@67.195.81.186:49188
 I0905 15:53:16.402160 25784 replica.cpp:638] Replica in STARTING status 
 received a broadcasted recover request
 I0905 15:53:16.402333 25790 master.cpp:1205] The newly elected leader is 
 master@67.195.81.186:49188 with id 20140905-155316-3125920579-49188-25769
 I0905 15:53:16.402359 25790 master.cpp:1218] Elected as the leading master!
 I0905 15:53:16.402371 25790 master.cpp:1036] Recovering from registrar
 I0905 15:53:16.402472 25798 registrar.cpp:313] Recovering registrar
 I0905 15:53:16.402529 25791 recover.cpp:188] Received a recover response from 
 a replica in STARTING status
 I0905 15:53:16.402782 25788 recover.cpp:542] Updating replica status to VOTING
 I0905 15:53:16.403002 25795 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 116403ns
 I0905 15:53:16.403020 25795 replica.cpp:320] Persisted replica status to 
 VOTING
 I0905 15:53:16.403081 25791 recover.cpp:556] Successfully joined the Paxos 
 group
 I0905 15:53:16.403197 25791 recover.cpp:440] Recover process terminated
 I0905 15:53:16.403388 25796 log.cpp:656] Attempting to start the writer
 I0905 15:53:16.403993 25784 replica.cpp:474] Replica received implicit 
 promise request with proposal 1
 I0905 15:53:16.404147 25784 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 132156ns
 I0905 15:53:16.404167 25784 replica.cpp:342] Persisted promised to 1
 I0905 15:53:16.404542 25795 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0905 15:53:16.405498 25787 replica.cpp:375] Replica received explicit 
 promise request for position 0 with proposal 2
 I0905 15:53:16.405868 25787 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 347231ns
 I0905 15:53:16.405886 25787 replica.cpp:676] Persisted action at 0
 I0905 15:53:16.406553 25788 replica.cpp:508] Replica received write request 
 for position 0
 I0905 15:53:16.406582 25788 leveldb.cpp:438] Reading position from leveldb 
 took 11402ns
 I0905 15:53:16.529067 25788 leveldb.cpp:343] Persisting action (14 bytes) to 
 leveldb took 535803ns
 I0905 15:53:16.529088 

[jira] [Updated] (MESOS-1219) Master should disallow frameworks that reconnect after failover timeout.

2014-09-16 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1219:
---
Fix Version/s: (was: 0.20.1)
   0.21.0

 Master should disallow frameworks that reconnect after failover timeout.
 

 Key: MESOS-1219
 URL: https://issues.apache.org/jira/browse/MESOS-1219
 Project: Mesos
  Issue Type: Bug
  Components: master, webui
Reporter: Robert Lacroix
Assignee: Vinod Kone
 Fix For: 0.21.0


 When a scheduler reconnects after the failover timeout has exceeded, the 
 framework id is usually reused because the scheduler doesn't know that the 
 timeout exceeded and it is actually handled as a new framework.
 The /framework/:framework_id route of the Web UI doesn't handle those cases 
 very well because its key is reused. It only shows the terminated one.
 Would it make sense to ignore the provided framework id when a scheduler 
 reconnects to a terminated framework and generate a new id to make sure it's 
 unique?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1801) MESOS_work_dir and MESOS_master env vars not honoured

2014-09-16 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-1801:
--
Fix Version/s: (was: 0.20.1)

 MESOS_work_dir and MESOS_master env vars not honoured
 -

 Key: MESOS-1801
 URL: https://issues.apache.org/jira/browse/MESOS-1801
 Project: Mesos
  Issue Type: Bug
  Components: cli
Affects Versions: 0.20.0
 Environment: CentOS 7
Reporter: Cosmin Lehene

 The documentation states that cli params should be substitutable by 
 environment variables
 {quote}
  Each option can be set in two ways:
 By passing it to the binary using --option_name=value.
 By setting the environment variable MESOS_OPTION_NAME (the option name with a 
 MESOS_ prefix added to it).
 {quote}
 However at least the master's MESOS_work_dir and slave's MESOS_master  env 
 vars seem to be ignored:
 {noformat}
 [root@localhost ~]# echo $MESOS_master
 zk://localhost:2181/mesos
 [root@localhost ~]# mesos-slave
 Missing required option --master
 [root@localhost ~]# echo $MESOS_work_dir
 /var/lib/mesos
 [root@localhost ~]# mesos-master
 I0917 08:36:46.242200 31325 main.cpp:155] Build: 2014-08-22 05:06:06 by root
 I0917 08:36:46.242369 31325 main.cpp:157] Version: 0.20.0
 I0917 08:36:46.242377 31325 main.cpp:160] Git tag: 0.20.0
 I0917 08:36:46.242382 31325 main.cpp:164] Git SHA: 
 f421ffdf8d32a8834b3a6ee483b5b59f65956497
 --work_dir needed for replicated log based registry
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1728) Libprocess: report bind parameters on failure

2014-09-16 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-1728:
--
Fix Version/s: (was: 0.20.1)
   0.21.0

 Libprocess: report bind parameters on failure
 -

 Key: MESOS-1728
 URL: https://issues.apache.org/jira/browse/MESOS-1728
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Nikita Vetoshkin
Assignee: Nikita Vetoshkin
Priority: Trivial
 Fix For: 0.21.0


 When you attempt to start slave or master and there's another one already 
 running there, it is nice to report what are the actual parameters to 
 {{bind}} call that failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1643) Provide APIs to return port resource for a given role

2014-09-16 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-1643:
--
Fix Version/s: (was: 0.20.1)
   0.21.0

 Provide APIs to return port resource for a given role
 -

 Key: MESOS-1643
 URL: https://issues.apache.org/jira/browse/MESOS-1643
 Project: Mesos
  Issue Type: Improvement
Reporter: Zuyu Zhang
Assignee: Zuyu Zhang
Priority: Trivial
 Fix For: 0.21.0


 It makes more sense to return port resource for a given role, rather than all 
 ports in Resources.
 In mesos/resource.hpp:
 OptionValue::Ranges Resources::ports(const string role = *);
 // Check whether Resources have the given number (num_port) of ports, and 
 return the begin number of the port range.
 Optionlong Resources::getPorts(long num_port, const string role = *);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1716) The slave does not add pending tasks as part of the staging tasks metric.

2014-09-16 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-1716:
--
Fix Version/s: (was: 0.20.1)
   0.21.0

 The slave does not add pending tasks as part of the staging tasks metric.
 -

 Key: MESOS-1716
 URL: https://issues.apache.org/jira/browse/MESOS-1716
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler
Priority: Trivial
 Fix For: 0.21.0


 The slave does not represent pending tasks in the tasks_staging metric.
 This should be a trivial fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-1803) Strict/RegistrarTest.remove test is flaky on jenkins.

2014-09-16 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler resolved MESOS-1803.

Resolution: Cannot Reproduce

The log timings here look as if the threads were starved of CPU:

{noformat}
I0916 22:59:02.136256 21049 leveldb.cpp:343] Persisting action (165 bytes) to 
leveldb took 141908ns
I0916 22:59:02.136267 21047 leveldb.cpp:343] Persisting action (165 bytes) to 
leveldb took 111061ns
I../../src/tests/registrar_tests.cpp:257: Failure
0916 22:59:02.136276 21049 replica.cpp:676] Persisted action at 1
Failed to wait 10secs for registrar.recover(master)
I0916 22:59:14.265326 21049 replica.cpp:661] Replica learned APPEND action at 
position 1
I0916 22:59:02.136291 21047 replica.cpp:676] Persisted action at 1
E0916 22:59:07.135143 21046 registrar.cpp:500] Registrar aborting: Failed to 
update 'registry': Failed to perform store within 5secs
I0916 22:59:14.265393 21047 replica.cpp:661] Replica learned APPEND action at 
position 1
{noformat}

The logging time stamp is determined at the beginning of the LOG(INFO) 
expression, when the initial LogMessage object is created. The interleaving of 
times looks to be a stall of the VM or thread starvation:

{noformat}
22:59:02.136267 21047 // Thread 1, 1st LogMessage flushed.
22:59:02.136276 21049 // Thread 2, 2nd LogMessage flushed.
22:59:14.265326 21049 // Thread 2, 5th LogMessage flushed.
22:59:02.136291 21047 // Thread 1, 3rd LogMessage flushed.
22:59:07.135143 21046 // Thread 3, 4th LogMessage flushed.
22:59:14.265393 21047 // Thread 1, 6th LogMessage flushed.
{noformat}

 Strict/RegistrarTest.remove test is flaky on jenkins.
 -

 Key: MESOS-1803
 URL: https://issues.apache.org/jira/browse/MESOS-1803
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler

 https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull
 {noformat}
 [ RUN  ] Strict/RegistrarTest.remove/1
 Using temporary directory '/tmp/Strict_RegistrarTest_remove_1_3QvnOW'
 I0916 22:59:02.112568 21026 leveldb.cpp:176] Opened db in 1.779835ms
 I0916 22:59:02.112896 21026 leveldb.cpp:183] Compacted db in 301862ns
 I0916 22:59:02.112916 21026 leveldb.cpp:198] Created db iterator in 3065ns
 I0916 22:59:02.112926 21026 leveldb.cpp:204] Seeked to beginning of db in 
 475ns
 I0916 22:59:02.112936 21026 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 330ns
 I0916 22:59:02.112951 21026 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0916 22:59:02.113654 21054 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 421460ns
 I0916 22:59:02.113674 21054 replica.cpp:320] Persisted replica status to 
 VOTING
 I0916 22:59:02.115900 21026 leveldb.cpp:176] Opened db in 1.947919ms
 I0916 22:59:02.116263 21026 leveldb.cpp:183] Compacted db in 338043ns
 I0916 22:59:02.116283 21026 leveldb.cpp:198] Created db iterator in 2809ns
 I0916 22:59:02.116293 21026 leveldb.cpp:204] Seeked to beginning of db in 
 468ns
 I0916 22:59:02.116302 21026 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 195ns
 I0916 22:59:02.116317 21026 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0916 22:59:02.117013 21043 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 472891ns
 I0916 22:59:02.117034 21043 replica.cpp:320] Persisted replica status to 
 VOTING
 I0916 22:59:02.119240 21026 leveldb.cpp:176] Opened db in 1.950367ms
 I0916 22:59:02.120455 21026 leveldb.cpp:183] Compacted db in 1.188056ms
 I0916 22:59:02.120481 21026 leveldb.cpp:198] Created db iterator in 4370ns
 I0916 22:59:02.120499 21026 leveldb.cpp:204] Seeked to beginning of db in 
 7977ns
 I0916 22:59:02.120517 21026 leveldb.cpp:273] Iterated through 1 keys in the 
 db in 8479ns
 I0916 22:59:02.120533 21026 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0916 22:59:02.122890 21026 leveldb.cpp:176] Opened db in 2.301327ms
 I0916 22:59:02.124325 21026 leveldb.cpp:183] Compacted db in 1.406223ms
 I0916 22:59:02.124351 21026 leveldb.cpp:198] Created db iterator in 4185ns
 I0916 22:59:02.124368 21026 leveldb.cpp:204] Seeked to beginning of db in 
 7167ns
 I0916 22:59:02.124387 21026 leveldb.cpp:273] Iterated through 1 keys in the 
 db in 8182ns
 I0916 22:59:02.124403 21026 replica.cpp:741] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0916 22:59:02.124579 21047 recover.cpp:425] Starting replica recovery
 I0916 22:59:02.124651 21047 recover.cpp:451] Replica is in VOTING status
 I0916 22:59:02.124793 21047 recover.cpp:440] Recover process terminated
 I0916 22:59:02.126404 21046 registrar.cpp:313] Recovering 

[jira] [Comment Edited] (MESOS-1746) clear TaskStatus data to avoid OOM

2014-09-16 Thread Chengwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136605#comment-14136605
 ] 

Chengwei Yang edited comment on MESOS-1746 at 9/17/14 1:06 AM:
---

[~tstclair], yes, spark stores very large data into TaskStatus, since there is 
a data field in TaskStatus which was supposed to be used to store application 
specific data, so we can not prevent applications (like spark) from doing so.

please help to review: https://reviews.apache.org/r/25184/


was (Author: chengwei-yang):
[~tstclair], yes, spark stores very large data into TaskStatus, since there is 
a data field in TaskStatus which was supposed to be used to store application 
specific data, so we can not prevent applications (like spark) from doing so.

 clear TaskStatus data to avoid OOM
 --

 Key: MESOS-1746
 URL: https://issues.apache.org/jira/browse/MESOS-1746
 Project: Mesos
  Issue Type: Bug
 Environment: mesos-0.19.0
Reporter: Chengwei Yang
Assignee: Chengwei Yang

 Spark on mesos may use TaskStatus to transfer computed result between worker 
 and scheduler, the source code like below (spark 1.0.2)
 {code}
 val serializedResult = {
   if (serializedDirectResult.limit = execBackend.akkaFrameSize() -
   AkkaUtils.reservedSizeBytes) {  
   
   
 
 logInfo(Storing result for  + taskId +  in local BlockManager)
 val blockId = TaskResultBlockId(taskId)
 env.blockManager.putBytes(
   blockId, serializedDirectResult, 
 StorageLevel.MEMORY_AND_DISK_SER)
 ser.serialize(new IndirectTaskResult[Any](blockId))   
   
   
 
   } else {
   
   
 
 logInfo(Sending result for  + taskId +  directly to driver)
 serializedDirectResult
   
   
 
   }   
   
   
 
 }
 {code}
 And In our test environment, we enlarge akkaFrameSize to 128MB from default 
 value (10MB) and this cause our mesos-master process will be OOM in tens of 
 minutes when running spark tasks in fine-grained mode.
 As you can see, even changed akkaFrameSize back to default value (10MB), it's 
 very likely to make mesos-master OOM too, however more slower.
 So I think it's good to delete data from TaskStatus since this is only 
 designed to on-top framework and we don't interested in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1195) systemd.slice + cgroup enablement fails in multiple ways.

2014-09-16 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-1195:
---
Target Version/s: 0.21.0  (was: 0.20.1)

 systemd.slice + cgroup enablement fails in multiple ways. 
 --

 Key: MESOS-1195
 URL: https://issues.apache.org/jira/browse/MESOS-1195
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 0.18.0
Reporter: Timothy St. Clair
Assignee: Timothy St. Clair

 When attempting to configure mesos to use systemd slices on a 'rawhide/f21' 
 machine, it fails creating the isolator: 
 I0407 12:39:28.035354 14916 containerizer.cpp:180] Using isolation: 
 cgroups/cpu,cgroups/mem
 Failed to create a containerizer: Could not create isolator cgroups/cpu: 
 Failed to create isolator: The cpu subsystem is co-mounted at 
 /sys/fs/cgroup/cpu with other subsytems
 -- details --
 /sys/fs/cgroup
 total 0
 drwxr-xr-x. 12 root root 280 Mar 18 08:47 .
 drwxr-xr-x.  6 root root   0 Mar 18 08:47 ..
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 blkio
 lrwxrwxrwx.  1 root root  11 Mar 18 08:47 cpu - cpu,cpuacct
 lrwxrwxrwx.  1 root root  11 Mar 18 08:47 cpuacct - cpu,cpuacct
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 cpu,cpuacct
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 cpuset
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 devices
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 freezer
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 hugetlb
 drwxr-xr-x.  3 root root   0 Apr  3 11:26 memory
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 net_cls
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 perf_event
 drwxr-xr-x.  4 root root   0 Mar 18 08:47 systemd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1195) systemd.slice + cgroup enablement fails in multiple ways.

2014-09-16 Thread Bhuvan Arumugam (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136614#comment-14136614
 ] 

Bhuvan Arumugam commented on MESOS-1195:


moving it to 0.21.0, as discussed in reviewboard.
  http://reviews.apache.org/r/25695/

 systemd.slice + cgroup enablement fails in multiple ways. 
 --

 Key: MESOS-1195
 URL: https://issues.apache.org/jira/browse/MESOS-1195
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 0.18.0
Reporter: Timothy St. Clair
Assignee: Timothy St. Clair

 When attempting to configure mesos to use systemd slices on a 'rawhide/f21' 
 machine, it fails creating the isolator: 
 I0407 12:39:28.035354 14916 containerizer.cpp:180] Using isolation: 
 cgroups/cpu,cgroups/mem
 Failed to create a containerizer: Could not create isolator cgroups/cpu: 
 Failed to create isolator: The cpu subsystem is co-mounted at 
 /sys/fs/cgroup/cpu with other subsytems
 -- details --
 /sys/fs/cgroup
 total 0
 drwxr-xr-x. 12 root root 280 Mar 18 08:47 .
 drwxr-xr-x.  6 root root   0 Mar 18 08:47 ..
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 blkio
 lrwxrwxrwx.  1 root root  11 Mar 18 08:47 cpu - cpu,cpuacct
 lrwxrwxrwx.  1 root root  11 Mar 18 08:47 cpuacct - cpu,cpuacct
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 cpu,cpuacct
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 cpuset
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 devices
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 freezer
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 hugetlb
 drwxr-xr-x.  3 root root   0 Apr  3 11:26 memory
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 net_cls
 drwxr-xr-x.  2 root root   0 Mar 18 08:47 perf_event
 drwxr-xr-x.  4 root root   0 Mar 18 08:47 systemd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (MESOS-1747) Docker image parsing for private repositories

2014-09-16 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reopened MESOS-1747:
---

 Docker image parsing for private repositories
 -

 Key: MESOS-1747
 URL: https://issues.apache.org/jira/browse/MESOS-1747
 Project: Mesos
  Issue Type: Bug
  Components: containerization, slave
Affects Versions: 0.20.0
Reporter: Don Laidlaw
Assignee: Timothy Chen
  Labels: docker
 Fix For: 0.20.1


 You cannot specify a port number for the host of a private docker repository. 
 Specified as follows: {noformat}
 container: {
 type: DOCKER,
 docker: {
 image: docker-repo:5000/app-base:v0.1
 }
 }
 {noformat}
 results in an error:
 {noformat}
 Aug 29 14:33:29 ip-172-16-2-22 mesos-slave[1128]: E0829 14:33:29.487470  1153 
 slave.cpp:2484] Container '250e0479-552f-4e6f-81dd-71550e45adae' for executor 
 't1-java.71d50bd1-2f89-11e4-ba9a-0adfe6b11716' of framework 
 '20140829-121838-184684716-5050-1177-' failed to start:Not expecting 
 multiple ':' in image: docker-repo:5000/app-base:v0.1
 {noformat}
 The message indicates only one colon character is allowed, but to supply a 
 port number for a private docker repository host you need to have two colons.
 Also if you use a '-' character in a host name you also get an error:
 {noformat}
 Invalid namespace name (docker-repo), only [a-z0-9_] are allowed, size 
 between 4 and 30
 {noformat}
 The hostname parts should not be limited to [a-z0-9_].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-1747) Docker image parsing for private repositories

2014-09-16 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone resolved MESOS-1747.
---
   Resolution: Duplicate
Fix Version/s: (was: 0.20.1)

 Docker image parsing for private repositories
 -

 Key: MESOS-1747
 URL: https://issues.apache.org/jira/browse/MESOS-1747
 Project: Mesos
  Issue Type: Bug
  Components: containerization, slave
Affects Versions: 0.20.0
Reporter: Don Laidlaw
Assignee: Timothy Chen
  Labels: docker

 You cannot specify a port number for the host of a private docker repository. 
 Specified as follows: {noformat}
 container: {
 type: DOCKER,
 docker: {
 image: docker-repo:5000/app-base:v0.1
 }
 }
 {noformat}
 results in an error:
 {noformat}
 Aug 29 14:33:29 ip-172-16-2-22 mesos-slave[1128]: E0829 14:33:29.487470  1153 
 slave.cpp:2484] Container '250e0479-552f-4e6f-81dd-71550e45adae' for executor 
 't1-java.71d50bd1-2f89-11e4-ba9a-0adfe6b11716' of framework 
 '20140829-121838-184684716-5050-1177-' failed to start:Not expecting 
 multiple ':' in image: docker-repo:5000/app-base:v0.1
 {noformat}
 The message indicates only one colon character is allowed, but to supply a 
 port number for a private docker repository host you need to have two colons.
 Also if you use a '-' character in a host name you also get an error:
 {noformat}
 Invalid namespace name (docker-repo), only [a-z0-9_] are allowed, size 
 between 4 and 30
 {noformat}
 The hostname parts should not be limited to [a-z0-9_].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-1621) Docker run networking should be configurable and support bridge network

2014-09-16 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone resolved MESOS-1621.
---
Resolution: Fixed

commit 1453a477511c8f6f22ff16e3dd13d0532e019c5b
Author: Timothy Chen tnac...@apache.org
Date:   Tue Sep 16 18:29:36 2014 -0700

Enabled bridge network for Docker Containerizer.

Review: https://reviews.apache.org/r/25270


 Docker run networking should be configurable and support bridge network
 ---

 Key: MESOS-1621
 URL: https://issues.apache.org/jira/browse/MESOS-1621
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Timothy Chen
Assignee: Timothy Chen
  Labels: Docker
 Fix For: 0.20.1


 Currently to easily support running executors in Docker image, we hardcode 
 --net=host into Docker run so slave and executor and reuse the same mechanism 
 to communicate, which is to pass the slave IP/PORT for the framework to 
 respond with it's own hostname and port information back to setup the tunnel.
 We want to see how to abstract this or even get rid of host networking 
 altogether if we have a good way to not rely on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)