date:20160115

[jira] [Updated] (MESOS-4413) libprocess tests is flaky

2016-01-15 Thread Klaus Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-4413:

Description: 
{code}
./libprocess-tests
[==] Running 155 tests from 23 test cases.
[--] Global test environment set-up.
[--] 3 tests from CollectTest
[ RUN  ] CollectTest.Ready
[   OK ] CollectTest.Ready (1 ms)
[ RUN  ] CollectTest.Failure
[   OK ] CollectTest.Failure (1 ms)
[ RUN  ] CollectTest.DiscardPropagation
[   OK ] CollectTest.DiscardPropagation (1 ms)
[--] 3 tests from CollectTest (3 ms total)

[--] 4 tests from AwaitTest
[ RUN  ] AwaitTest.Success
[   OK ] AwaitTest.Success (0 ms)
[ RUN  ] AwaitTest.Failure
[   OK ] AwaitTest.Failure (1 ms)
[ RUN  ] AwaitTest.Discarded
[   OK ] AwaitTest.Discarded (0 ms)
[ RUN  ] AwaitTest.DiscardPropagation
[   OK ] AwaitTest.DiscardPropagation (0 ms)
[--] 4 tests from AwaitTest (1 ms total)

[--] 6 tests from DecoderTest
[ RUN  ] DecoderTest.Request
[   OK ] DecoderTest.Request (1 ms)
[ RUN  ] DecoderTest.RequestHeaderContinuation
[   OK ] DecoderTest.RequestHeaderContinuation (0 ms)
[ RUN  ] DecoderTest.RequestHeaderCaseInsensitive
[   OK ] DecoderTest.RequestHeaderCaseInsensitive (0 ms)
[ RUN  ] DecoderTest.Response
[   OK ] DecoderTest.Response (0 ms)
[ RUN  ] DecoderTest.StreamingResponse
[   OK ] DecoderTest.StreamingResponse (0 ms)
[ RUN  ] DecoderTest.StreamingResponseFailure
[   OK ] DecoderTest.StreamingResponseFailure (0 ms)
[--] 6 tests from DecoderTest (1 ms total)

[--] 2 tests from EncoderTest
[ RUN  ] EncoderTest.Response
[   OK ] EncoderTest.Response (0 ms)
[ RUN  ] EncoderTest.AcceptableEncodings
[   OK ] EncoderTest.AcceptableEncodings (1 ms)
[--] 2 tests from EncoderTest (1 ms total)

[--] 1 test from FutureTest
[ RUN  ] FutureTest.ArrowOperator
[   OK ] FutureTest.ArrowOperator (0 ms)
[--] 1 test from FutureTest (0 ms total)

[--] 17 tests from HTTPTest
[ RUN  ] HTTPTest.Auth
[   OK ] HTTPTest.Auth (5 ms)
[ RUN  ] HTTPTest.Endpoints
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0116 15:26:53.831742 4820992 process.cpp:1966] Failed to shutdown socket with 
fd 9: Socket is not connected
[   OK ] HTTPTest.Endpoints (3 ms)
[ RUN  ] HTTPTest.PipeEOF
[   OK ] HTTPTest.PipeEOF (0 ms)
[ RUN  ] HTTPTest.PipeFailure
[   OK ] HTTPTest.PipeFailure (1 ms)
[ RUN  ] HTTPTest.PipeReaderCloses
[   OK ] HTTPTest.PipeReaderCloses (0 ms)
[ RUN  ] HTTPTest.Encode
[   OK ] HTTPTest.Encode (0 ms)
[ RUN  ] HTTPTest.PathParse
[   OK ] HTTPTest.PathParse (0 ms)
[ RUN  ] HTTPTest.Get
[   OK ] HTTPTest.Get (3 ms)
[ RUN  ] HTTPTest.NestedGet
[   OK ] HTTPTest.NestedGet (4 ms)
[ RUN  ] HTTPTest.StreamingGetComplete
[   OK ] HTTPTest.StreamingGetComplete (3 ms)
[ RUN  ] HTTPTest.StreamingGetFailure
[   OK ] HTTPTest.StreamingGetFailure (3 ms)
[ RUN  ] HTTPTest.PipeEquality
[   OK ] HTTPTest.PipeEquality (0 ms)
[ RUN  ] HTTPTest.Post
[   OK ] HTTPTest.Post (3 ms)
[ RUN  ] HTTPTest.Delete
[   OK ] HTTPTest.Delete (1 ms)
[ RUN  ] HTTPTest.QueryEncodeDecode
[   OK ] HTTPTest.QueryEncodeDecode (1 ms)
[ RUN  ] HTTPTest.CaseInsensitiveHeaders
[   OK ] HTTPTest.CaseInsensitiveHeaders (0 ms)
[ RUN  ] HTTPTest.Accepts
[   OK ] HTTPTest.Accepts (0 ms)
[--] 17 tests from HTTPTest (28 ms total)

[--] 6 tests from HTTPConnectionTest
[ RUN  ] HTTPConnectionTest.Serial
E0116 15:26:53.856267 4820992 process.cpp:1966] Failed to shutdown socket with 
fd 9: Socket is not connected
[   OK ] HTTPConnectionTest.Serial (5 ms)
[ RUN  ] HTTPConnectionTest.Pipeline
E0116 15:26:53.861946 4820992 process.cpp:1966] Failed to shutdown socket with 
fd 9: Socket is not connected
[   OK ] HTTPConnectionTest.Pipeline (6 ms)
[ RUN  ] HTTPConnectionTest.ClosingRequest
[   OK ] HTTPConnectionTest.ClosingRequest (4 ms)
[ RUN  ] HTTPConnectionTest.ClosingResponse
[   OK ] HTTPConnectionTest.ClosingResponse (4 ms)
[ RUN  ] HTTPConnectionTest.ReferenceCounting
E0116 15:26:53.871278 4820992 process.cpp:1966] Failed to shutdown socket with 
fd 9: Socket is not connected
[   OK ] HTTPConnectionTest.ReferenceCounting (1 ms)
[ RUN  ] HTTPConnectionTest.Equality
E0116 15:26:53.873129 4820992 process.cpp:1966] Failed to shutdown socket with 
fd 9: Socket is not connected
E0116 15:26:53.873286 4820992 process.cpp:1966] Failed to shutdown socket with 
fd 11: Socket is not connected
[   OK ] HTTPConnectionTest.Equality (2 ms)
[--] 6 tests from HTTPConnectionTest (22 ms total)

[--] 2 tests from URLTest
[ RUN  ] URLTest.Stringification
[   OK ] URLTest.Stringification (0

[jira] [Created] (MESOS-4413) libprocess tests is flaky

2016-01-15 Thread Klaus Ma (JIRA)

Klaus Ma created MESOS-4413:
---

 Summary: libprocess tests is flaky
 Key: MESOS-4413
 URL: https://issues.apache.org/jira/browse/MESOS-4413
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
 Environment: Darwin Klauss-MacBook-Pro.local 15.2.0 Darwin Kernel 
Version 15.2.0: Fri Nov 13 19:56:56 PST 2015; 
root:xnu-3248.20.55~2/RELEASE_X86_64 x86_64

Reporter: Klaus Ma


./libprocess-tests
[==] Running 155 tests from 23 test cases.
[--] Global test environment set-up.
[--] 3 tests from CollectTest
[ RUN  ] CollectTest.Ready
[   OK ] CollectTest.Ready (1 ms)
[ RUN  ] CollectTest.Failure
[   OK ] CollectTest.Failure (1 ms)
[ RUN  ] CollectTest.DiscardPropagation
[   OK ] CollectTest.DiscardPropagation (1 ms)
[--] 3 tests from CollectTest (3 ms total)

[--] 4 tests from AwaitTest
[ RUN  ] AwaitTest.Success
[   OK ] AwaitTest.Success (0 ms)
[ RUN  ] AwaitTest.Failure
[   OK ] AwaitTest.Failure (1 ms)
[ RUN  ] AwaitTest.Discarded
[   OK ] AwaitTest.Discarded (0 ms)
[ RUN  ] AwaitTest.DiscardPropagation
[   OK ] AwaitTest.DiscardPropagation (0 ms)
[--] 4 tests from AwaitTest (1 ms total)

[--] 6 tests from DecoderTest
[ RUN  ] DecoderTest.Request
[   OK ] DecoderTest.Request (1 ms)
[ RUN  ] DecoderTest.RequestHeaderContinuation
[   OK ] DecoderTest.RequestHeaderContinuation (0 ms)
[ RUN  ] DecoderTest.RequestHeaderCaseInsensitive
[   OK ] DecoderTest.RequestHeaderCaseInsensitive (0 ms)
[ RUN  ] DecoderTest.Response
[   OK ] DecoderTest.Response (0 ms)
[ RUN  ] DecoderTest.StreamingResponse
[   OK ] DecoderTest.StreamingResponse (0 ms)
[ RUN  ] DecoderTest.StreamingResponseFailure
[   OK ] DecoderTest.StreamingResponseFailure (0 ms)
[--] 6 tests from DecoderTest (1 ms total)

[--] 2 tests from EncoderTest
[ RUN  ] EncoderTest.Response
[   OK ] EncoderTest.Response (0 ms)
[ RUN  ] EncoderTest.AcceptableEncodings
[   OK ] EncoderTest.AcceptableEncodings (1 ms)
[--] 2 tests from EncoderTest (1 ms total)

[--] 1 test from FutureTest
[ RUN  ] FutureTest.ArrowOperator
[   OK ] FutureTest.ArrowOperator (0 ms)
[--] 1 test from FutureTest (0 ms total)

[--] 17 tests from HTTPTest
[ RUN  ] HTTPTest.Auth
[   OK ] HTTPTest.Auth (5 ms)
[ RUN  ] HTTPTest.Endpoints
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0116 15:26:53.831742 4820992 process.cpp:1966] Failed to shutdown socket with 
fd 9: Socket is not connected
[   OK ] HTTPTest.Endpoints (3 ms)
[ RUN  ] HTTPTest.PipeEOF
[   OK ] HTTPTest.PipeEOF (0 ms)
[ RUN  ] HTTPTest.PipeFailure
[   OK ] HTTPTest.PipeFailure (1 ms)
[ RUN  ] HTTPTest.PipeReaderCloses
[   OK ] HTTPTest.PipeReaderCloses (0 ms)
[ RUN  ] HTTPTest.Encode
[   OK ] HTTPTest.Encode (0 ms)
[ RUN  ] HTTPTest.PathParse
[   OK ] HTTPTest.PathParse (0 ms)
[ RUN  ] HTTPTest.Get
[   OK ] HTTPTest.Get (3 ms)
[ RUN  ] HTTPTest.NestedGet
[   OK ] HTTPTest.NestedGet (4 ms)
[ RUN  ] HTTPTest.StreamingGetComplete
[   OK ] HTTPTest.StreamingGetComplete (3 ms)
[ RUN  ] HTTPTest.StreamingGetFailure
[   OK ] HTTPTest.StreamingGetFailure (3 ms)
[ RUN  ] HTTPTest.PipeEquality
[   OK ] HTTPTest.PipeEquality (0 ms)
[ RUN  ] HTTPTest.Post
[   OK ] HTTPTest.Post (3 ms)
[ RUN  ] HTTPTest.Delete
[   OK ] HTTPTest.Delete (1 ms)
[ RUN  ] HTTPTest.QueryEncodeDecode
[   OK ] HTTPTest.QueryEncodeDecode (1 ms)
[ RUN  ] HTTPTest.CaseInsensitiveHeaders
[   OK ] HTTPTest.CaseInsensitiveHeaders (0 ms)
[ RUN  ] HTTPTest.Accepts
[   OK ] HTTPTest.Accepts (0 ms)
[--] 17 tests from HTTPTest (28 ms total)

[--] 6 tests from HTTPConnectionTest
[ RUN  ] HTTPConnectionTest.Serial
E0116 15:26:53.856267 4820992 process.cpp:1966] Failed to shutdown socket with 
fd 9: Socket is not connected
[   OK ] HTTPConnectionTest.Serial (5 ms)
[ RUN  ] HTTPConnectionTest.Pipeline
E0116 15:26:53.861946 4820992 process.cpp:1966] Failed to shutdown socket with 
fd 9: Socket is not connected
[   OK ] HTTPConnectionTest.Pipeline (6 ms)
[ RUN  ] HTTPConnectionTest.ClosingRequest
[   OK ] HTTPConnectionTest.ClosingRequest (4 ms)
[ RUN  ] HTTPConnectionTest.ClosingResponse
[   OK ] HTTPConnectionTest.ClosingResponse (4 ms)
[ RUN  ] HTTPConnectionTest.ReferenceCounting
E0116 15:26:53.871278 4820992 process.cpp:1966] Failed to shutdown socket with 
fd 9: Socket is not connected
[   OK ] HTTPConnectionTest.ReferenceCounting (1 ms)
[ RUN  ] HTTPConnectionTest.Equality
E0116 15:26:53.873129 4820992 process.cpp:1966] Failed to shutdown socket with 
fd 9: Socket is not connected
E0116 15:26:53.873286 4820992 process.

[jira] [Commented] (MESOS-4391) docker pull a remote image conflict

2016-01-15 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103018#comment-15103018
 ] 

Klaus Ma commented on MESOS-4391:
-

Good finding :).

Seems it should be handled by Docker Daemon, not sure Docker's current 
behaviour: pull image by two process, or only one process is fetching the image 
and the other is waiting for it. [~jieyu]/[~qianzhang], do you know any detail 
of that? If not, maybe it can be enhanced by one docker executor instead of one 
docker executor per task. For example, sync up with docker daemon when executor 
launched; pending coming tasks until the required image is ready.

> docker pull a remote image conflict
> ---
>
> Key: MESOS-4391
> URL: https://issues.apache.org/jira/browse/MESOS-4391
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, framework
>Affects Versions: 0.26.0
> Environment: CentOS Linux release 7.2.1511 (Core)
> 3.10.0-327.el7.x86_64
>Reporter: qinlu
>
> I run a docker app with 3 tasks,and the docker image not exist in the slave 
> ,it must to pull from docker.io.
> Marathon assign 2 app run in a slave,and the last in another.
> I see the log by journalctl,it show me like this :level=error msg="HTTP 
> Error" err="No such image: solr:latest" statusCode=404.
> There is two threads to pull the image
> [root@** ~]# ps -ef|grep solr
> root 30113 10735  0 12:17 ?00:00:00 docker -H 
> unix:///var/run/docker.sock pull solr:latest
> root 30114 10735  0 12:17 ?00:00:00 docker -H 
> unix:///var/run/docker.sock pull solr:latest



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4102) Quota doesn't allocate resources on slave joining.

2016-01-15 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103007#comment-15103007
 ] 

Klaus Ma commented on MESOS-4102:
-

Ping [~alexr]/[~neilc] :).

> Quota doesn't allocate resources on slave joining.
> --
>
> Key: MESOS-4102
> URL: https://issues.apache.org/jira/browse/MESOS-4102
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Neil Conway
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: mesosphere, quota
> Attachments: quota_absent_framework_test-1.patch
>
>
> See attached patch. {{framework1}} is not allocated any resources, despite 
> the fact that the resources on {{agent2}} can safely be allocated to it 
> without risk of violating {{quota1}}. If I understand the intended quota 
> behavior correctly, this doesn't seem intended.
> Note that if the framework is added _after_ the slaves are added, the 
> resources on {{agent2}} are allocated to {{framework1}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3838) Put authorize logic for teardown into a common function

2016-01-15 Thread Guangya Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102958#comment-15102958
 ] 

Guangya Liu commented on MESOS-3838:


Sure, thanks [~vi...@twitter.com] , you really helped a lot in my plate. ;-)

[~adam-mesos] , can you please help shepherd for this? This is also related to 
{{teardown}} a framework which related with MESOS-4154.

> Put authorize logic for teardown into a common function
> ---
>
> Key: MESOS-3838
> URL: https://issues.apache.org/jira/browse/MESOS-3838
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
> Fix For: 0.28.0
>
>
> The mesos now have {{authorizeTask}}, {{authorizeFramework}} and may have 
> {{authorizeReserveResource}} and {{authorizeUnReserveResource}} later. 
> But now the {{Master::Http::teardown()}} is putting the authorize logic in 
> the {{Master::Http::teardown()}} itself, it is better to put authorize logic 
> for teardown into a common function {{authorizeTeardown()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3838) Put authorize logic for teardown into a common function

2016-01-15 Thread Guangya Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu updated MESOS-3838:
---
Shepherd:   (was: Vinod Kone)

> Put authorize logic for teardown into a common function
> ---
>
> Key: MESOS-3838
> URL: https://issues.apache.org/jira/browse/MESOS-3838
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
> Fix For: 0.28.0
>
>
> The mesos now have {{authorizeTask}}, {{authorizeFramework}} and may have 
> {{authorizeReserveResource}} and {{authorizeUnReserveResource}} later. 
> But now the {{Master::Http::teardown()}} is putting the authorize logic in 
> the {{Master::Http::teardown()}} itself, it is better to put authorize logic 
> for teardown into a common function {{authorizeTeardown()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4279) Graceful restart of docker task

2016-01-15 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102952#comment-15102952
 ] 

Qian Zhang commented on MESOS-4279:
---

[~bydga], I think the only difference between my env and you env is, I run 
everything (Docker, Mesos master and agent, Zookeeper and Marathon) in a single 
Ubuntu machine, so can you change to that to see if there is any difference? 

> Graceful restart of docker task
> ---
>
> Key: MESOS-4279
> URL: https://issues.apache.org/jira/browse/MESOS-4279
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
>Reporter: Martin Bydzovsky
>Assignee: Qian Zhang
>
> I'm implementing a graceful restarts of our mesos-marathon-docker setup and I 
> came to a following issue:
> (it was already discussed on 
> https://github.com/mesosphere/marathon/issues/2876 and guys form mesosphere 
> got to a point that its probably a docker containerizer problem...)
> To sum it up:
> When i deploy simple python script to all mesos-slaves:
> {code}
> #!/usr/bin/python
> from time import sleep
> import signal
> import sys
> import datetime
> def sigterm_handler(_signo, _stack_frame):
> print "got %i" % _signo
> print datetime.datetime.now().time()
> sys.stdout.flush()
> sleep(2)
> print datetime.datetime.now().time()
> print "ending"
> sys.stdout.flush()
> sys.exit(0)
> signal.signal(signal.SIGTERM, sigterm_handler)
> signal.signal(signal.SIGINT, sigterm_handler)
> try:
> print "Hello"
> i = 0
> while True:
> i += 1
> print datetime.datetime.now().time()
> print "Iteration #%i" % i
> sys.stdout.flush()
> sleep(1)
> finally:
> print "Goodbye"
> {code}
> and I run it through Marathon like
> {code:javascript}
> data = {
>   args: ["/tmp/script.py"],
>   instances: 1,
>   cpus: 0.1,
>   mem: 256,
>   id: "marathon-test-api"
> }
> {code}
> During the app restart I get expected result - the task receives sigterm and 
> dies peacefully (during my script-specified 2 seconds period)
> But when i wrap this python script in a docker:
> {code}
> FROM node:4.2
> RUN mkdir /app
> ADD . /app
> WORKDIR /app
> ENTRYPOINT []
> {code}
> and run appropriate application by Marathon:
> {code:javascript}
> data = {
>   args: ["./script.py"],
>   container: {
>   type: "DOCKER",
>   docker: {
>   image: "bydga/marathon-test-api"
>   },
>   forcePullImage: yes
>   },
>   cpus: 0.1,
>   mem: 256,
>   instances: 1,
>   id: "marathon-test-api"
> }
> {code}
> The task during restart (issued from marathon) dies immediately without 
> having a chance to do any cleanup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4412) MesosZookeeperTest doesn't allow multiple masters

2016-01-15 Thread Dario Rexin (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102892#comment-15102892
 ] 

Dario Rexin commented on MESOS-4412:


I tried that a while back and unfortunately don't have the code anymore, but it 
should be as easy as to start to masters from a MesosZooKeeperTest.

> MesosZookeeperTest doesn't allow multiple masters
> -
>
> Key: MESOS-4412
> URL: https://issues.apache.org/jira/browse/MESOS-4412
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
>Reporter: Dario Rexin
>
> In order to test certain behavior of non-leading nodes - e.g. redirecting to 
> the leading master when sending http api requests to a non-leading node - it 
> would be helpful to be able to spin up multiple masters in the test. The 
> ZooKeeperTest class should allow to do this, but fails when more than one 
> master is being started. The test will run into a timeout when the second 
> master is being started and exit with an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4412) MesosZookeeperTest doesn't allow multiple masters

2016-01-15 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102827#comment-15102827
 ] 

Joseph Wu commented on MESOS-4412:
--

Thanks for confirming this!  (See [MESOS-2976].)

Could you post the code you tried to use to start a second master?  (Possibly 
as a review.)

> MesosZookeeperTest doesn't allow multiple masters
> -
>
> Key: MESOS-4412
> URL: https://issues.apache.org/jira/browse/MESOS-4412
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
>Reporter: Dario Rexin
>
> In order to test certain behavior of non-leading nodes - e.g. redirecting to 
> the leading master when sending http api requests to a non-leading node - it 
> would be helpful to be able to spin up multiple masters in the test. The 
> ZooKeeperTest class should allow to do this, but fails when more than one 
> master is being started. The test will run into a timeout when the second 
> master is being started and exit with an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3379) LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed

2016-01-15 Thread Timothy Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-3379:

Shepherd: Timothy Chen

> LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint is failed
> --
>
> Key: MESOS-3379
> URL: https://issues.apache.org/jira/browse/MESOS-3379
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
>
> {code}
> sudo GLOG_v=1 ./bin/mesos-tests.sh 
> --gtest_filter="LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint"
>  --verbose
> {code}
> failed in Ubuntu 14.04
> Just a problem when investing [MESOS-3349 
> PersistentVolumeTest.AccessPersistentVolume fails when run as 
> root.|https://issues.apache.org/jira/browse/MESOS-3349]
> In LinuxFilesystemIsolatorProcess::cleanup, when we read mount table and 
> umount. The order should use reverse order. Suppose our mount order is 
> {code}
> mount /tmp/a /tmp/b
> mount /tmp/c /tmp/b/c
> {code}
> Currently we umount logic in cleanup is 
> {code}
> umount /tmp/b
> umount /tmp/b/c <- Wrong
> {code}
> This is the reason why ROOT_VolumeFromHostSandboxMountPoint failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-4136) Add a ContainerLogger module that restrains log sizes

2016-01-15 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074359#comment-15074359
 ] 

Joseph Wu edited comment on MESOS-4136 at 1/16/16 12:30 AM:


|| Review || Summary ||
| https://reviews.apache.org/r/42052/ | Cleanup/Refactor of {{Subprocess::IO}} |
| https://reviews.apache.org/r/41779/
https://reviews.apache.org/r/41780/ | Add non-dup option to 
{{Subprocess::IO::FD}} |
| https://reviews.apache.org/r/42358/ | Refactor {{SandboxContainerLogger}} |
| https://reviews.apache.org/r/42374/ | Add {{LOGROTATE}} test filter |
| https://reviews.apache.org/r/41781/ | Add rotating logger test |
| https://reviews.apache.org/r/41782/ | Makefile and test config changes |
| https://reviews.apache.org/r/41783/ | Implement the rotating logger |


was (Author: kaysoky):
|| Review || Summary ||
| https://reviews.apache.org/r/42052/ | Cleanup/Refactor of {{Subprocess::IO}} |
| https://reviews.apache.org/r/41779/
https://reviews.apache.org/r/41780/ | Add non-dup option to 
{{Subprocess::IO::FD}} |
| https://reviews.apache.org/r/42358/ | Refactor {{SandboxContainerLogger}} |
| https://reviews.apache.org/r/41781/ | Add rotating logger test |
| https://reviews.apache.org/r/41782/ | Makefile and test config changes |
| https://reviews.apache.org/r/41783/ | Implement the rotating logger |

> Add a ContainerLogger module that restrains log sizes
> -
>
> Key: MESOS-4136
> URL: https://issues.apache.org/jira/browse/MESOS-4136
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: logging, mesosphere
>
> One of the major problems this logger module aims to solve is overflowing 
> executor/task log files.  Log files are simply written to disk, and are not 
> managed other than via occasional garbage collection by the agent process 
> (and this only deals with terminated executors).
> We should add a {{ContainerLogger}} module that truncates logs as it reaches 
> a configurable maximum size.  Additionally, we should determine if the web 
> UI's {{pailer}} needs to be changed to deal with logs that are not 
> append-only.
> This will be a non-default module which will also serve as an example for how 
> to implement the module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3578) ProvisionerDockerLocalStoreTest.MetadataManagerInitialization is flaky

2016-01-15 Thread Timothy Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102784#comment-15102784
 ] 

Timothy Chen commented on MESOS-3578:
-

I don't think this is being worked on yet, and not a blocker since Provisioner 
is not yet ready.

> ProvisionerDockerLocalStoreTest.MetadataManagerInitialization is flaky
> --
>
> Key: MESOS-3578
> URL: https://issues.apache.org/jira/browse/MESOS-3578
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>  Labels: flaky-test, mesosphere
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/881/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull
> {code}
> [ RUN  ] ProvisionerDockerLocalStoreTest.MetadataManagerInitialization
> Using temporary directory 
> '/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE'
> I0929 02:36:44.066397 30457 local_puller.cpp:127] Untarring image from 
> '/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE/store/staging/aZND7C'
>  to 
> '/tmp/ProvisionerDockerLocalStoreTest_MetadataManagerInitialization_9ynmgE/images/abc:latest.tar'
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:843: Failure
> (layers).failure(): Collect failed: Untar failed with exit code: exited with 
> status 2
> [  FAILED  ] ProvisionerDockerLocalStoreTest.MetadataManagerInitialization 
> (181 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3807) RegistryClientTest.SimpleGetManifest is flaky

2016-01-15 Thread Timothy Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102780#comment-15102780
 ] 

Timothy Chen commented on MESOS-3807:
-

I think we can close this we are retiring the registry client.

> RegistryClientTest.SimpleGetManifest is flaky
> -
>
> Key: MESOS-3807
> URL: https://issues.apache.org/jira/browse/MESOS-3807
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>  Labels: flaky-test, mesosphere
>
> From ASF CI:
> https://builds.apache.org/job/Mesos/976/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=centos:7,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] RegistryClientTest.SimpleGetManifest
> I1026 18:02:45.320374 31975 registry_client.cpp:264] Response status: 401 
> Unauthorized
> I1026 18:02:45.323772 31982 libevent_ssl_socket.cpp:1025] Socket error: 
> Connection reset by peer
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:718: Failure
> (socket).failure(): Failed accept: connection error: Connection reset by peer
> [  FAILED  ] RegistryClientTest.SimpleGetManifest (13 ms)
> {code}
> Logs from a good run:
> {code}
> [ RUN  ] RegistryClientTest.SimpleGetManifest
> I1025 15:35:36.248955 31970 registry_client.cpp:264] Response status: 401 
> Unauthorized
> I1025 15:35:36.267873 31979 registry_client.cpp:264] Response status: 200 OK
> [   OK ] RegistryClientTest.SimpleGetManifest (32 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4029) ContentType/SchedulerTest is flaky.

2016-01-15 Thread Timothy Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102760#comment-15102760
 ] 

Timothy Chen commented on MESOS-4029:
-

Looks like this is not a blocker for 0.27.0 as it's only local to tests.

> ContentType/SchedulerTest is flaky.
> ---
>
> Key: MESOS-4029
> URL: https://issues.apache.org/jira/browse/MESOS-4029
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Artem Harutyunyan
>  Labels: flaky, flaky-test, mesosphere
> Fix For: 0.28.0
>
>
> SSL build, [Ubuntu 
> 14.04|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh],
>  non-root test run.
> {noformat}
> [--] 22 tests from ContentType/SchedulerTest
> [ RUN  ] ContentType/SchedulerTest.Subscribe/0
> [   OK ] ContentType/SchedulerTest.Subscribe/0 (48 ms)
> *** Aborted at 1448928007 (unix time) try "date -d @1448928007" if you are 
> using GNU date ***
> [ RUN  ] ContentType/SchedulerTest.Subscribe/1
> PC: @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> *** SIGSEGV (@0x10030) received by PID 21320 (TID 0x2b549e5d4700) from 
> PID 48; stack trace: ***
> @ 0x2b54c95940b7 os::Linux::chained_handler()
> @ 0x2b54c9598219 JVM_handle_linux_signal
> @ 0x2b5496300340 (unknown)
> @  0x1451b8e 
> testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> @   0xe2ea6d 
> _ZN7testing8internal18FunctionMockerBaseIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS6_SaIS6_E10InvokeWithERKSt5tupleIJSC_EE
> @   0xe2b1bc testing::internal::FunctionMocker<>::Invoke()
> @  0x1118aed 
> mesos::internal::tests::SchedulerTest::Callbacks::received()
> @  0x111c453 
> _ZNKSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS0_2v19scheduler5EventESt5dequeIS8_SaIS8_EclIJSE_EvEEvRS4_DpOT_
> @  0x111c001 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi16__callIvJSF_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @  0x111b90d 
> _ZNSt5_BindIFSt7_Mem_fnIMN5mesos8internal5tests13SchedulerTest9CallbacksEFvRKSt5queueINS1_2v19scheduler5EventESt5dequeIS9_SaIS9_ESt17reference_wrapperIS5_ESt12_PlaceholderILi1clIJSF_EvEET0_DpOT_
> @  0x111ae09 std::_Function_handler<>::_M_invoke()
> @ 0x2b5493c6da09 std::function<>::operator()()
> @ 0x2b5493c688ee process::AsyncExecutorProcess::execute<>()
> @ 0x2b5493c6db2a 
> _ZZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeIS8_SaIS8_ESC_PvSG_SC_SJ_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSO_FSL_T1_T2_T3_ET4_T5_T6_ENKUlPNS_11ProcessBaseEE_clES11_
> @ 0x2b5493c765a4 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingNS0_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISC_SaISC_ESG_PvSK_SG_SN_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSS_FSP_T1_T2_T3_ET4_T5_T6_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2b54946b1201 std::function<>::operator()()
> @ 0x2b549469960f process::ProcessBase::visit()
> @ 0x2b549469d480 process::DispatchEvent::visit()
> @   0x9dc0ba process::ProcessBase::serve()
> @ 0x2b54946958cc process::ProcessManager::resume()
> @ 0x2b5494692a9c 
> _ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
> @ 0x2b549469ccac 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
> @ 0x2b549469cc5c 
> _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
> @ 0x2b549469cbee 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
> @ 0x2b549469cb45 
> _ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
> @ 0x2b549469cade 
> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
> @ 0x2b5495b81a40 (unknown)
> @ 0x2b54962f8182 start_thread
> @ 0x2b549660847d (unknown)
> make[3]: *** [check-local] Segmentation fault
> make[3]: Leaving directory `/home/vagrant/mesos/build/src'
> make[2]: *

[jira] [Commented] (MESOS-3832) Scheduler HTTP API does not redirect to leading master

2016-01-15 Thread Neil Conway (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102731#comment-15102731
 ] 

Neil Conway commented on MESOS-3832:


I think 4412 duplicates MESOS-2976.

I think it could be possible, anyway: as 2976 notes, you'd need to create 
multiple OS processes, but that doesn't seem infeasible.

> Scheduler HTTP API does not redirect to leading master
> --
>
> Key: MESOS-3832
> URL: https://issues.apache.org/jira/browse/MESOS-3832
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Dario Rexin
>Assignee: Dario Rexin
>  Labels: newbie
> Fix For: 0.27.0
>
>
> The documentation for the Scheduler HTTP API says:
> {quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
> Redirect” will be received with the “Location” header pointing to the leading 
> master.{quote}
> While the redirect functionality has been implemented, it was not actually 
> used in the handler for the HTTP api.
> A probable fix could be:
> - Check if the current master is the leading master.
> - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4283) Accept 3-field version of HDFS du output

2016-01-15 Thread James Peach (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-4283:
---
Fix Version/s: 0.27.0

> Accept 3-field version of HDFS du output
> 
>
> Key: MESOS-4283
> URL: https://issues.apache.org/jira/browse/MESOS-4283
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Reporter: James Peach
>Assignee: James Peach
> Fix For: 0.27.0
>
>
> The HDFS {{du}} command can output 3 fields in later Hadoop versions. We 
> should accept both 2-field and 3-field versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3832) Scheduler HTTP API does not redirect to leading master

2016-01-15 Thread Dario Rexin (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102716#comment-15102716
 ] 

Dario Rexin commented on MESOS-3832:


It certainly would, but it's currently not possible. I just created a ticket to 
fix MesosZooKeeperTest: https://issues.apache.org/jira/browse/MESOS-4412

> Scheduler HTTP API does not redirect to leading master
> --
>
> Key: MESOS-3832
> URL: https://issues.apache.org/jira/browse/MESOS-3832
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Dario Rexin
>Assignee: Dario Rexin
>  Labels: newbie
> Fix For: 0.27.0
>
>
> The documentation for the Scheduler HTTP API says:
> {quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
> Redirect” will be received with the “Location” header pointing to the leading 
> master.{quote}
> While the redirect functionality has been implemented, it was not actually 
> used in the handler for the HTTP api.
> A probable fix could be:
> - Check if the current master is the leading master.
> - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4220) Introduce result_of with C++14 semantics to stout.

2016-01-15 Thread Alexander Rukletsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-4220:
---
Labels: mesosphere  (was: mesospheree)

> Introduce result_of with C++14 semantics to stout.
> --
>
> Key: MESOS-4220
> URL: https://issues.apache.org/jira/browse/MESOS-4220
> Project: Mesos
>  Issue Type: Task
>  Components: stout
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere
>
> The {{std::result_of}} in VS 2015 Update 1 implements C++11 semantics which 
> does not allow it to be used in SFINAE contexts.
> Introduce a C++14 {{std::result_of}} into stout until we get to VS 2014 
> Update 2, at which point we can switch back to simply using 
> {{std::result_of}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4412) MesosZookeeperTest doesn't allow multiple masters

2016-01-15 Thread Dario Rexin (JIRA)

Dario Rexin created MESOS-4412:
--

 Summary: MesosZookeeperTest doesn't allow multiple masters
 Key: MESOS-4412
 URL: https://issues.apache.org/jira/browse/MESOS-4412
 Project: Mesos
  Issue Type: Bug
  Components: tests
Affects Versions: 0.25.0, 0.26.0, 0.27.0
Reporter: Dario Rexin


In order to test certain behavior of non-leading nodes - e.g. redirecting to 
the leading master when sending http api requests to a non-leading node - it 
would be helpful to be able to spin up multiple masters in the test. The 
ZooKeeperTest class should allow to do this, but fails when more than one 
master is being started. The test will run into a timeout when the second 
master is being started and exit with an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4411) Traverse all roles for quota allocation

2016-01-15 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102707#comment-15102707
 ] 

Alexander Rukletsov commented on MESOS-4411:


Initial review, discussing the problem and (partially) addressing it: 
https://reviews.apache.org/r/41769/

> Traverse all roles for quota allocation
> ---
>
> Key: MESOS-4411
> URL: https://issues.apache.org/jira/browse/MESOS-4411
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Critical
>  Labels: mesosphere
>
> There might be a bug in how resources are allocated to multiple quota'ed 
> roles if one role's quota is met. We need to investigate this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4411) Traverse all roles for quota allocation

2016-01-15 Thread Alexander Rukletsov (JIRA)

Alexander Rukletsov created MESOS-4411:
--

 Summary: Traverse all roles for quota allocation
 Key: MESOS-4411
 URL: https://issues.apache.org/jira/browse/MESOS-4411
 Project: Mesos
  Issue Type: Bug
  Components: allocation
Reporter: Alexander Rukletsov
Assignee: Guangya Liu
Priority: Critical


There might be a bug in how resources are allocated to multiple quota'ed roles 
if one role's quota is met. We need to investigate this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3832) Scheduler HTTP API does not redirect to leading master

2016-01-15 Thread Neil Conway (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102706#comment-15102706
 ] 

Neil Conway commented on MESOS-3832:


Wouldn't this change benefit from a unit test?

> Scheduler HTTP API does not redirect to leading master
> --
>
> Key: MESOS-3832
> URL: https://issues.apache.org/jira/browse/MESOS-3832
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Dario Rexin
>Assignee: Dario Rexin
>  Labels: newbie
> Fix For: 0.27.0
>
>
> The documentation for the Scheduler HTTP API says:
> {quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
> Redirect” will be received with the “Location” header pointing to the leading 
> master.{quote}
> While the redirect functionality has been implemented, it was not actually 
> used in the handler for the HTTP api.
> A probable fix could be:
> - Check if the current master is the leading master.
> - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4410) Introduce protobuf for quota set request.

2016-01-15 Thread Alexander Rukletsov (JIRA)

Alexander Rukletsov created MESOS-4410:
--

 Summary: Introduce protobuf for quota set request.
 Key: MESOS-4410
 URL: https://issues.apache.org/jira/browse/MESOS-4410
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov


To document quota request JSON schema and simplify request processing, 
introduce a {{QuotaRequest}} protobuf wrapper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4379) Design doc for reservation IDs/labels

2016-01-15 Thread Neil Conway (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102675#comment-15102675
 ] 

Neil Conway commented on MESOS-4379:


The design doc can be found here: 
https://docs.google.com/document/d/1E2KpJpTY2BeA_rvUeGHIt5XsBA7fU3mACfBE1NZe82M/edit#

Comments welcome!

> Design doc for reservation IDs/labels
> -
>
> Key: MESOS-4379
> URL: https://issues.apache.org/jira/browse/MESOS-4379
> Project: Mesos
>  Issue Type: Task
>  Components: general
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere, reservations
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3832) Scheduler HTTP API does not redirect to leading master

2016-01-15 Thread Dario Rexin (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102665#comment-15102665
 ] 

Dario Rexin commented on MESOS-3832:


Vinod, thanks for the clarification. I addressed your comments in my patch and 
posted the changes. I'm always happy to help ;)

> Scheduler HTTP API does not redirect to leading master
> --
>
> Key: MESOS-3832
> URL: https://issues.apache.org/jira/browse/MESOS-3832
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Dario Rexin
>Assignee: Dario Rexin
>  Labels: newbie
>
> The documentation for the Scheduler HTTP API says:
> {quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
> Redirect” will be received with the “Location” header pointing to the leading 
> master.{quote}
> While the redirect functionality has been implemented, it was not actually 
> used in the handler for the HTTP api.
> A probable fix could be:
> - Check if the current master is the leading master.
> - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4262) Enable net_cls subsytem in cgroup infrastructure

2016-01-15 Thread Michael Park (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-4262:

Target Version/s: 0.28.0  (was: 0.27.0)

> Enable net_cls subsytem in cgroup infrastructure
> 
>
> Key: MESOS-4262
> URL: https://issues.apache.org/jira/browse/MESOS-4262
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: container, mesosphere
>
> Currently the control group infrastructure within mesos supports only the 
> memory and CPU subsystems. We need to enhance this infrastructure to support 
> the net_cls subsystem as well. Details of the net_cls subsystem and its 
> use-cases can be found here:
> https://www.kernel.org/doc/Documentation/cgroups/net_cls.txt
> Enabling the net_cls will allow us to provide operators to, potentially, 
> regulate framework traffic on a per-container basis.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4304) hdfs operations fail due to prepended / on path for non-hdfs hadoop clients.

2016-01-15 Thread Michael Park (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-4304:

Priority: Blocker  (was: Major)

> hdfs operations fail due to prepended / on path for non-hdfs hadoop clients.
> 
>
> Key: MESOS-4304
> URL: https://issues.apache.org/jira/browse/MESOS-4304
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 0.26.0
>Reporter: James Cunningham
>Assignee: Bernd Mathiske
>Priority: Blocker
>  Labels: mesosphere
>
> This bug was resolved for the hdfs protocol for MESOS-3602 but since the 
> process checks for the "hdfs" protocol at the beginning of the URI, the fix 
> does not extend itself to non-hdfs hadoop clients.
> {code}
> I0107 01:22:01.259490 17678 logging.cpp:172] INFO level logging started!
> I0107 01:22:01.259856 17678 fetcher.cpp:422] Fetcher Info: 
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/root","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"maprfs:\/\/\/mesos\/storm-mesos-0.9.3.tgz"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/s0121.stag.urbanairship.com:36373\/conf\/storm.yaml"}}],"sandbox_directory":"\/mnt\/data\/mesos\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/frameworks\/530dda5a-481a-4117-8154-3aee637d3b38-\/executors\/word-count-1-1452129714\/runs\/4443d5ac-d034-49b3-bf12-08fb9b0d92d0","user":"root"}
> I0107 01:22:01.262171 17678 fetcher.cpp:377] Fetching URI 
> 'maprfs:///mesos/storm-mesos-0.9.3.tgz'
> I0107 01:22:01.262212 17678 fetcher.cpp:248] Fetching directly into the 
> sandbox directory
> I0107 01:22:01.262243 17678 fetcher.cpp:185] Fetching URI 
> 'maprfs:///mesos/storm-mesos-0.9.3.tgz'
> I0107 01:22:01.671777 17678 fetcher.cpp:110] Downloading resource with Hadoop 
> client from 'maprfs:///mesos/storm-mesos-0.9.3.tgz' to 
> '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz'
> copyToLocal: java.net.URISyntaxException: Expected scheme-specific part at 
> index 7: maprfs:
> Usage: java FsShell [-copyToLocal [-ignoreCrc] [-crc]  ]
> E0107 01:22:02.435556 17678 shell.hpp:90] Command 'hadoop fs -copyToLocal 
> '/maprfs:///mesos/storm-mesos-0.9.3.tgz' 
> '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz''
>  failed; this is the output:
> Failed to fetch 'maprfs:///mesos/storm-mesos-0.9.3.tgz': HDFS copyToLocal 
> failed: Failed to execute 'hadoop fs -copyToLocal 
> '/maprfs:///mesos/storm-mesos-0.9.3.tgz' 
> '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz'';
>  the command was either not found or exited with a non-zero exit status: 255
> Failed to synchronize with slave (it's probably exited)
> {code}
> After a brief chat with [~jieyu], it was recommended to fix the current hdfs 
> client code because the new hadoop fetcher plugin is slated to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-4409) MasterTest.MaxCompletedFrameworksFlag is flaky

2016-01-15 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102634#comment-15102634
 ] 

Greg Mann edited comment on MESOS-4409 at 1/15/16 10:58 PM:


You could try introducing a {{Clock::settle()}} before you hit the state 
endpoint to make sure the master is aware of all framework shutdowns first. 
This will allow all existing scheduled work to complete before continuing.


was (Author: greggomann):
You could try introducing a {{Clock::settle()}} before you hit the state 
endpoint to make sure the master is aware of all framework shutdowns first.

> MasterTest.MaxCompletedFrameworksFlag is flaky
> --
>
> Key: MESOS-4409
> URL: https://issues.apache.org/jira/browse/MESOS-4409
> Project: Mesos
>  Issue Type: Bug
>  Components: master, tests
>Affects Versions: 0.26.0
> Environment: On Jenkins CI: gcc,--verbose,ubuntu:14.04,docker
>Reporter: Greg Mann
>  Labels: flaky-test, mesosphere, tests
>
> Saw this failure on Jenkins CI:
> {code}
> [ RUN  ] MasterTest.MaxCompletedFrameworksFlag
> I0115 21:24:50.344116 31507 leveldb.cpp:174] Opened db in 2.062201ms
> I0115 21:24:50.344874 31507 leveldb.cpp:181] Compacted db in 716863ns
> I0115 21:24:50.344923 31507 leveldb.cpp:196] Created db iterator in 19087ns
> I0115 21:24:50.344949 31507 leveldb.cpp:202] Seeked to beginning of db in 
> 1897ns
> I0115 21:24:50.344965 31507 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 298ns
> I0115 21:24:50.345012 31507 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0115 21:24:50.345432 31536 recover.cpp:447] Starting replica recovery
> I0115 21:24:50.345657 31536 recover.cpp:473] Replica is in EMPTY status
> I0115 21:24:50.346535 31539 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (6089)@172.17.0.4:52665
> I0115 21:24:50.347028 31540 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0115 21:24:50.347554 31526 recover.cpp:564] Updating replica status to 
> STARTING
> I0115 21:24:50.348175 31540 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 433937ns
> I0115 21:24:50.348215 31526 master.cpp:374] Master 
> bf6ba047-245f-4e65-986c-1880cef81248 (4e6fbf10d387) started on 
> 172.17.0.4:52665
> I0115 21:24:50.349417 31540 replica.cpp:320] Persisted replica status to 
> STARTING
> I0115 21:24:50.349630 31536 recover.cpp:473] Replica is in STARTING status
> I0115 21:24:50.349421 31526 master.cpp:376] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/2wURTY/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="0" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/2wURTY/master" --zk_session_timeout="10secs"
> I0115 21:24:50.349720 31526 master.cpp:421] Master only allowing 
> authenticated frameworks to register
> I0115 21:24:50.349737 31526 master.cpp:426] Master only allowing 
> authenticated slaves to register
> I0115 21:24:50.349750 31526 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/2wURTY/credentials'
> I0115 21:24:50.350005 31526 master.cpp:466] Using default 'crammd5' 
> authenticator
> I0115 21:24:50.350132 31526 master.cpp:535] Using default 'basic' HTTP 
> authenticator
> I0115 21:24:50.350256 31526 master.cpp:569] Authorization enabled
> I0115 21:24:50.350546 31529 hierarchical.cpp:147] Initialized hierarchical 
> allocator process
> I0115 21:24:50.350626 31536 whitelist_watcher.cpp:77] No whitelist given
> I0115 21:24:50.350559 31538 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (6090)@172.17.0.4:52665
> I0115 21:24:50.351049 31534 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0115 21:24:50.351704 31537 recover.cpp:564] Updating replica status to VOTING
> I0115 21:24:50.352221 31532 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 38ns
> I0115 21:24:50.352246 31532

[jira] [Commented] (MESOS-4409) MasterTest.MaxCompletedFrameworksFlag is flaky

2016-01-15 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102634#comment-15102634
 ] 

Greg Mann commented on MESOS-4409:
--

You could try introducing a {{Clock::settle()}} before you hit the state 
endpoint to make sure the master is aware of all framework shutdowns first.

> MasterTest.MaxCompletedFrameworksFlag is flaky
> --
>
> Key: MESOS-4409
> URL: https://issues.apache.org/jira/browse/MESOS-4409
> Project: Mesos
>  Issue Type: Bug
>  Components: master, tests
>Affects Versions: 0.26.0
> Environment: On Jenkins CI: gcc,--verbose,ubuntu:14.04,docker
>Reporter: Greg Mann
>  Labels: flaky-test, mesosphere, tests
>
> Saw this failure on Jenkins CI:
> {code}
> [ RUN  ] MasterTest.MaxCompletedFrameworksFlag
> I0115 21:24:50.344116 31507 leveldb.cpp:174] Opened db in 2.062201ms
> I0115 21:24:50.344874 31507 leveldb.cpp:181] Compacted db in 716863ns
> I0115 21:24:50.344923 31507 leveldb.cpp:196] Created db iterator in 19087ns
> I0115 21:24:50.344949 31507 leveldb.cpp:202] Seeked to beginning of db in 
> 1897ns
> I0115 21:24:50.344965 31507 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 298ns
> I0115 21:24:50.345012 31507 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0115 21:24:50.345432 31536 recover.cpp:447] Starting replica recovery
> I0115 21:24:50.345657 31536 recover.cpp:473] Replica is in EMPTY status
> I0115 21:24:50.346535 31539 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (6089)@172.17.0.4:52665
> I0115 21:24:50.347028 31540 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0115 21:24:50.347554 31526 recover.cpp:564] Updating replica status to 
> STARTING
> I0115 21:24:50.348175 31540 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 433937ns
> I0115 21:24:50.348215 31526 master.cpp:374] Master 
> bf6ba047-245f-4e65-986c-1880cef81248 (4e6fbf10d387) started on 
> 172.17.0.4:52665
> I0115 21:24:50.349417 31540 replica.cpp:320] Persisted replica status to 
> STARTING
> I0115 21:24:50.349630 31536 recover.cpp:473] Replica is in STARTING status
> I0115 21:24:50.349421 31526 master.cpp:376] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/2wURTY/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="0" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/2wURTY/master" --zk_session_timeout="10secs"
> I0115 21:24:50.349720 31526 master.cpp:421] Master only allowing 
> authenticated frameworks to register
> I0115 21:24:50.349737 31526 master.cpp:426] Master only allowing 
> authenticated slaves to register
> I0115 21:24:50.349750 31526 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/2wURTY/credentials'
> I0115 21:24:50.350005 31526 master.cpp:466] Using default 'crammd5' 
> authenticator
> I0115 21:24:50.350132 31526 master.cpp:535] Using default 'basic' HTTP 
> authenticator
> I0115 21:24:50.350256 31526 master.cpp:569] Authorization enabled
> I0115 21:24:50.350546 31529 hierarchical.cpp:147] Initialized hierarchical 
> allocator process
> I0115 21:24:50.350626 31536 whitelist_watcher.cpp:77] No whitelist given
> I0115 21:24:50.350559 31538 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (6090)@172.17.0.4:52665
> I0115 21:24:50.351049 31534 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0115 21:24:50.351704 31537 recover.cpp:564] Updating replica status to VOTING
> I0115 21:24:50.352221 31532 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 38ns
> I0115 21:24:50.352246 31532 replica.cpp:320] Persisted replica status to 
> VOTING
> I0115 21:24:50.352371 31541 recover.cpp:578] Successfully joined the Paxos 
> group
> I0115 21:24:50.352620 31541 recover.cpp:462] Recover process terminated
> I0115 21:24:50.353121 31528 master.cpp:1710] The newly elected leader is 
> master@172.1

[jira] [Assigned] (MESOS-3832) Scheduler HTTP API does not redirect to leading master

2016-01-15 Thread Dario Rexin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dario Rexin reassigned MESOS-3832:
--

Assignee: Dario Rexin  (was: Vinod Kone)

> Scheduler HTTP API does not redirect to leading master
> --
>
> Key: MESOS-3832
> URL: https://issues.apache.org/jira/browse/MESOS-3832
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Dario Rexin
>Assignee: Dario Rexin
>  Labels: newbie
>
> The documentation for the Scheduler HTTP API says:
> {quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
> Redirect” will be received with the “Location” header pointing to the leading 
> master.{quote}
> While the redirect functionality has been implemented, it was not actually 
> used in the handler for the HTTP api.
> A probable fix could be:
> - Check if the current master is the leading master.
> - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4409) MasterTest.MaxCompletedFrameworksFlag is flaky

2016-01-15 Thread Kevin Klues (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102601#comment-15102601
 ] 

Kevin Klues commented on MESOS-4409:


Yeah, this is the new unit test I just pushed through last night.  It looks 
like one of the frameworks I create fails to shutdown properly. Any insight as 
to why this might happen?  It never happened locally for me.

> MasterTest.MaxCompletedFrameworksFlag is flaky
> --
>
> Key: MESOS-4409
> URL: https://issues.apache.org/jira/browse/MESOS-4409
> Project: Mesos
>  Issue Type: Bug
>  Components: master, tests
>Affects Versions: 0.26.0
> Environment: On Jenkins CI: gcc,--verbose,ubuntu:14.04,docker
>Reporter: Greg Mann
>  Labels: flaky-test, mesosphere, tests
>
> Saw this failure on Jenkins CI:
> {code}
> [ RUN  ] MasterTest.MaxCompletedFrameworksFlag
> I0115 21:24:50.344116 31507 leveldb.cpp:174] Opened db in 2.062201ms
> I0115 21:24:50.344874 31507 leveldb.cpp:181] Compacted db in 716863ns
> I0115 21:24:50.344923 31507 leveldb.cpp:196] Created db iterator in 19087ns
> I0115 21:24:50.344949 31507 leveldb.cpp:202] Seeked to beginning of db in 
> 1897ns
> I0115 21:24:50.344965 31507 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 298ns
> I0115 21:24:50.345012 31507 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0115 21:24:50.345432 31536 recover.cpp:447] Starting replica recovery
> I0115 21:24:50.345657 31536 recover.cpp:473] Replica is in EMPTY status
> I0115 21:24:50.346535 31539 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (6089)@172.17.0.4:52665
> I0115 21:24:50.347028 31540 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0115 21:24:50.347554 31526 recover.cpp:564] Updating replica status to 
> STARTING
> I0115 21:24:50.348175 31540 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 433937ns
> I0115 21:24:50.348215 31526 master.cpp:374] Master 
> bf6ba047-245f-4e65-986c-1880cef81248 (4e6fbf10d387) started on 
> 172.17.0.4:52665
> I0115 21:24:50.349417 31540 replica.cpp:320] Persisted replica status to 
> STARTING
> I0115 21:24:50.349630 31536 recover.cpp:473] Replica is in STARTING status
> I0115 21:24:50.349421 31526 master.cpp:376] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/2wURTY/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="0" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/2wURTY/master" --zk_session_timeout="10secs"
> I0115 21:24:50.349720 31526 master.cpp:421] Master only allowing 
> authenticated frameworks to register
> I0115 21:24:50.349737 31526 master.cpp:426] Master only allowing 
> authenticated slaves to register
> I0115 21:24:50.349750 31526 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/2wURTY/credentials'
> I0115 21:24:50.350005 31526 master.cpp:466] Using default 'crammd5' 
> authenticator
> I0115 21:24:50.350132 31526 master.cpp:535] Using default 'basic' HTTP 
> authenticator
> I0115 21:24:50.350256 31526 master.cpp:569] Authorization enabled
> I0115 21:24:50.350546 31529 hierarchical.cpp:147] Initialized hierarchical 
> allocator process
> I0115 21:24:50.350626 31536 whitelist_watcher.cpp:77] No whitelist given
> I0115 21:24:50.350559 31538 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (6090)@172.17.0.4:52665
> I0115 21:24:50.351049 31534 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0115 21:24:50.351704 31537 recover.cpp:564] Updating replica status to VOTING
> I0115 21:24:50.352221 31532 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 38ns
> I0115 21:24:50.352246 31532 replica.cpp:320] Persisted replica status to 
> VOTING
> I0115 21:24:50.352371 31541 recover.cpp:578] Successfully joined the Paxos 
> group
> I0115 21:24:50.352620 31541 recover.cpp:462] Recover process terminated
> I0115 21:24:50.3

[jira] [Updated] (MESOS-4396) Adding Tachyon to the list of frameworks

2016-01-15 Thread Jiri Simsa (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiri Simsa updated MESOS-4396:
--
Description: The Tachyon project provides a Mesos framework. Update the 
Mesos documentation to reflect this fact.  (was: The Tachyon project provided a 
Mesos framework. Update the Mesos documentation to reflect this fact.)

> Adding Tachyon to the list of frameworks
> 
>
> Key: MESOS-4396
> URL: https://issues.apache.org/jira/browse/MESOS-4396
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.26.0
>Reporter: Jiri Simsa
>Assignee: Jiri Simsa
>Priority: Minor
>  Labels: documentation
>
> The Tachyon project provides a Mesos framework. Update the Mesos 
> documentation to reflect this fact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-4396) Adding Tachyon to the list of frameworks

2016-01-15 Thread Jiri Simsa (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiri Simsa reassigned MESOS-4396:
-

Assignee: Jiri Simsa

> Adding Tachyon to the list of frameworks
> 
>
> Key: MESOS-4396
> URL: https://issues.apache.org/jira/browse/MESOS-4396
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.26.0
>Reporter: Jiri Simsa
>Assignee: Jiri Simsa
>Priority: Minor
>  Labels: documentation
>
> The Tachyon project provided a Mesos framework. Update the Mesos 
> documentation to reflect this fact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4302) Offer filter timeouts are ignored if the allocator is slow or backlogged.

2016-01-15 Thread Timothy Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-4302:

Priority: Blocker  (was: Critical)

> Offer filter timeouts are ignored if the allocator is slow or backlogged.
> -
>
> Key: MESOS-4302
> URL: https://issues.apache.org/jira/browse/MESOS-4302
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: mesosphere
>
> Currently, when the allocator recovers resources from an offer, it creates a 
> filter timeout based on time at which the call is processed.
> This means that if it takes longer than the filter duration for the allocator 
> to perform an allocation for the relevant agent, then the filter is never 
> applied.
> This leads to pathological behavior: if the framework sets a filter duration 
> that is smaller than the wall clock time it takes for us to perform the next 
> allocation, then the filters will have no effect. This can mean that low 
> share frameworks may continue receiving offers that they have no intent to 
> use, without other frameworks ever receiving these offers.
> The workaround for this is for frameworks to set high filter durations, and 
> possibly reviving offers when they need more resources, however, we should 
> fix this issue in the allocator. (i.e. derive the timeout deadlines and 
> expiry based on allocation times).
> This seems to warrant cherry-picking into bug fix releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3832) Scheduler HTTP API does not redirect to leading master

2016-01-15 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102540#comment-15102540
 ] 

Vinod Kone commented on MESOS-3832:
---

Hey Dario. Totally not the intention to blind side you. I think Anand jumped 
the gun because he wanted to get it into the 0.27 release happening next week. 
Glad to hear you are still interested in driving this. I will ask Anand to 
discard his review. Could you please update your review with the feedback (also 
assign the ticket back to you). Would love to get the patch in for 0.27.0! 
Thanks for filing the bug and submitting the patch, by the way.

> Scheduler HTTP API does not redirect to leading master
> --
>
> Key: MESOS-3832
> URL: https://issues.apache.org/jira/browse/MESOS-3832
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Dario Rexin
>Assignee: Vinod Kone
>  Labels: newbie
>
> The documentation for the Scheduler HTTP API says:
> {quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
> Redirect” will be received with the “Location” header pointing to the leading 
> master.{quote}
> While the redirect functionality has been implemented, it was not actually 
> used in the handler for the HTTP api.
> A probable fix could be:
> - Check if the current master is the leading master.
> - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4404) SlaveTest.HTTPSchedulerSlaveRestart is flaky

2016-01-15 Thread Anand Mazumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102530#comment-15102530
 ] 

Anand Mazumdar commented on MESOS-4404:
---

[~qiujian] Do you want to have a look at this? Not sure, but might be related 
to your recent patch to speed up this test.

> SlaveTest.HTTPSchedulerSlaveRestart is flaky
> 
>
> Key: MESOS-4404
> URL: https://issues.apache.org/jira/browse/MESOS-4404
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, slave
>Affects Versions: 0.26.0
> Environment: From the Jenkins CI: gcc,--verbose --enable-libevent 
> --enable-ssl,centos:7,docker
>Reporter: Greg Mann
>  Labels: flaky-test, mesosphere
>
> Saw this failure on the Jenkins CI:
> {code}
> [ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
> I0115 18:42:25.393354  1762 leveldb.cpp:174] Opened db in 3.456169ms
> I0115 18:42:25.394310  1762 leveldb.cpp:181] Compacted db in 922588ns
> I0115 18:42:25.394361  1762 leveldb.cpp:196] Created db iterator in 18529ns
> I0115 18:42:25.394378  1762 leveldb.cpp:202] Seeked to beginning of db in 
> 1933ns
> I0115 18:42:25.394390  1762 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 280ns
> I0115 18:42:25.394430  1762 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0115 18:42:25.394963  1791 recover.cpp:447] Starting replica recovery
> I0115 18:42:25.395396  1791 recover.cpp:473] Replica is in EMPTY status
> I0115 18:42:25.396589  1795 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (11302)@172.17.0.2:49129
> I0115 18:42:25.397101  1785 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0115 18:42:25.397721  1791 recover.cpp:564] Updating replica status to 
> STARTING
> I0115 18:42:25.398764  1789 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 684584ns
> I0115 18:42:25.398807  1789 replica.cpp:320] Persisted replica status to 
> STARTING
> I0115 18:42:25.398947  1795 master.cpp:374] Master 
> 544823be-76b5-47be-b326-2cd6d6a700b8 (e648fe109cb1) started on 
> 172.17.0.2:49129
> I0115 18:42:25.399209  1788 recover.cpp:473] Replica is in STARTING status
> I0115 18:42:25.398980  1795 master.cpp:376] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/BOGaaq/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/BOGaaq/master" --zk_session_timeout="10secs"
> I0115 18:42:25.399435  1795 master.cpp:421] Master only allowing 
> authenticated frameworks to register
> I0115 18:42:25.399451  1795 master.cpp:426] Master only allowing 
> authenticated slaves to register
> I0115 18:42:25.399461  1795 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/BOGaaq/credentials'
> I0115 18:42:25.399884  1795 master.cpp:466] Using default 'crammd5' 
> authenticator
> I0115 18:42:25.400060  1795 master.cpp:535] Using default 'basic' HTTP 
> authenticator
> I0115 18:42:25.400254  1795 master.cpp:569] Authorization enabled
> I0115 18:42:25.400439  1785 hierarchical.cpp:147] Initialized hierarchical 
> allocator process
> I0115 18:42:25.400470  1789 whitelist_watcher.cpp:77] No whitelist given
> I0115 18:42:25.400656  1792 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (11303)@172.17.0.2:49129
> I0115 18:42:25.400943  1781 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0115 18:42:25.401612  1791 recover.cpp:564] Updating replica status to VOTING
> I0115 18:42:25.402313  1785 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 458849ns
> I0115 18:42:25.402345  1785 replica.cpp:320] Persisted replica status to 
> VOTING
> I0115 18:42:25.402510  1788 recover.cpp:578] Successfully joined the Paxos 
> group
> I0115 18:42:25.402848  1788 recover.cpp:462] Recover process terminated
> I0115 18:42:25.402997  1784 master.cpp:1710] The newly elected leader is

[jira] [Commented] (MESOS-4409) MasterTest.MaxCompletedFrameworksFlag is flaky

2016-01-15 Thread Anand Mazumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102504#comment-15102504
 ] 

Anand Mazumdar commented on MESOS-4409:
---

[~klueska] Do you want to have a look at this?

> MasterTest.MaxCompletedFrameworksFlag is flaky
> --
>
> Key: MESOS-4409
> URL: https://issues.apache.org/jira/browse/MESOS-4409
> Project: Mesos
>  Issue Type: Bug
>  Components: master, tests
>Affects Versions: 0.26.0
> Environment: On Jenkins CI: gcc,--verbose,ubuntu:14.04,docker
>Reporter: Greg Mann
>  Labels: flaky-test, mesosphere, tests
>
> Saw this failure on Jenkins CI:
> {code}
> [ RUN  ] MasterTest.MaxCompletedFrameworksFlag
> I0115 21:24:50.344116 31507 leveldb.cpp:174] Opened db in 2.062201ms
> I0115 21:24:50.344874 31507 leveldb.cpp:181] Compacted db in 716863ns
> I0115 21:24:50.344923 31507 leveldb.cpp:196] Created db iterator in 19087ns
> I0115 21:24:50.344949 31507 leveldb.cpp:202] Seeked to beginning of db in 
> 1897ns
> I0115 21:24:50.344965 31507 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 298ns
> I0115 21:24:50.345012 31507 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0115 21:24:50.345432 31536 recover.cpp:447] Starting replica recovery
> I0115 21:24:50.345657 31536 recover.cpp:473] Replica is in EMPTY status
> I0115 21:24:50.346535 31539 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (6089)@172.17.0.4:52665
> I0115 21:24:50.347028 31540 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0115 21:24:50.347554 31526 recover.cpp:564] Updating replica status to 
> STARTING
> I0115 21:24:50.348175 31540 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 433937ns
> I0115 21:24:50.348215 31526 master.cpp:374] Master 
> bf6ba047-245f-4e65-986c-1880cef81248 (4e6fbf10d387) started on 
> 172.17.0.4:52665
> I0115 21:24:50.349417 31540 replica.cpp:320] Persisted replica status to 
> STARTING
> I0115 21:24:50.349630 31536 recover.cpp:473] Replica is in STARTING status
> I0115 21:24:50.349421 31526 master.cpp:376] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/2wURTY/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="0" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/2wURTY/master" --zk_session_timeout="10secs"
> I0115 21:24:50.349720 31526 master.cpp:421] Master only allowing 
> authenticated frameworks to register
> I0115 21:24:50.349737 31526 master.cpp:426] Master only allowing 
> authenticated slaves to register
> I0115 21:24:50.349750 31526 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/2wURTY/credentials'
> I0115 21:24:50.350005 31526 master.cpp:466] Using default 'crammd5' 
> authenticator
> I0115 21:24:50.350132 31526 master.cpp:535] Using default 'basic' HTTP 
> authenticator
> I0115 21:24:50.350256 31526 master.cpp:569] Authorization enabled
> I0115 21:24:50.350546 31529 hierarchical.cpp:147] Initialized hierarchical 
> allocator process
> I0115 21:24:50.350626 31536 whitelist_watcher.cpp:77] No whitelist given
> I0115 21:24:50.350559 31538 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (6090)@172.17.0.4:52665
> I0115 21:24:50.351049 31534 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0115 21:24:50.351704 31537 recover.cpp:564] Updating replica status to VOTING
> I0115 21:24:50.352221 31532 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 38ns
> I0115 21:24:50.352246 31532 replica.cpp:320] Persisted replica status to 
> VOTING
> I0115 21:24:50.352371 31541 recover.cpp:578] Successfully joined the Paxos 
> group
> I0115 21:24:50.352620 31541 recover.cpp:462] Recover process terminated
> I0115 21:24:50.353121 31528 master.cpp:1710] The newly elected leader is 
> master@172.17.0.4:52665 with id bf6ba047-245f-4e65-986c-1880cef81248
> I0115 21:24:50.353152 31528 master

[jira] [Commented] (MESOS-3832) Scheduler HTTP API does not redirect to leading master

2016-01-15 Thread Dario Rexin (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102502#comment-15102502
 ] 

Dario Rexin commented on MESOS-3832:


Hi Vinod,

I am sorry to say this, but I am not ok with this. I posted the patch more than 
a month ago waiting for feedback and not even 24 hours after I finally got that 
feedback someone else takes the ticket without even asking if I was still 
working on it? It's ultimately on you to decide what to do, but I really think 
it hurts the community if you would permit practices like this.

Please let me know what you think.

> Scheduler HTTP API does not redirect to leading master
> --
>
> Key: MESOS-3832
> URL: https://issues.apache.org/jira/browse/MESOS-3832
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Dario Rexin
>Assignee: Vinod Kone
>  Labels: newbie
>
> The documentation for the Scheduler HTTP API says:
> {quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
> Redirect” will be received with the “Location” header pointing to the leading 
> master.{quote}
> While the redirect functionality has been implemented, it was not actually 
> used in the handler for the HTTP api.
> A probable fix could be:
> - Check if the current master is the leading master.
> - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4058) Do not use `Resource.role` for resources in quota request.

2016-01-15 Thread Joris Van Remoortere (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102498#comment-15102498
 ] 

Joris Van Remoortere commented on MESOS-4058:
-

{code}
commit 579edcc8552842ead7846ffa099de25f8c2dc367
Author: Alexander Rukletsov 
Date:   Fri Jan 15 15:57:38 2016 -0500

Quota: Ensured `QuotaInfo` is valid in registrar tests.

Resources in `QuotaInfo` protobuf must not specify role, hence
remove all occurrences of `flatten()` and add explicit validation.

Review: https://reviews.apache.org/r/41948/

commit 1f33a3e6cb5a2b6dd2b51c638833819bde5c6b5c
Author: Alexander Rukletsov 
Date:   Fri Jan 15 15:56:40 2016 -0500

Quota: Changed signature of `QuotaInfo` validation.

Review: https://reviews.apache.org/r/41947/

commit f23129e117a08032015fc1966d8ed186ef2e4f68
Author: Alexander Rukletsov 
Date:   Fri Jan 15 15:56:25 2016 -0500

Quota: Require role in set request explicitly.

A set quota request must provide a role, which now must be passed as
a top-level field in the request JSON and not in `Resource` objects.

Review: https://reviews.apache.org/r/41936/
{code}

> Do not use `Resource.role` for resources in quota request.
> --
>
> Key: MESOS-4058
> URL: https://issues.apache.org/jira/browse/MESOS-4058
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> To be consistent with other operator endpoints and to adhere to the principal 
> of least surprise, move role from each {{Resource}} in quota set request to 
> the request itself. 
> {{Resource.role}} is used for reserved resources. Since quota is not a direct 
> reservation request, to avoid confusion we shall not reuse this field for 
> communicating the role for which quota should be reserved.
> Food for thought: Shall we try to keep internal storage protobufs as close as 
> possible to operator's JSON to provide some sort of a schema or decouple 
> those two for the sake of flexibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4409) MasterTest.MaxCompletedFrameworksFlag is flaky

2016-01-15 Thread Greg Mann (JIRA)

Greg Mann created MESOS-4409:


 Summary: MasterTest.MaxCompletedFrameworksFlag is flaky
 Key: MESOS-4409
 URL: https://issues.apache.org/jira/browse/MESOS-4409
 Project: Mesos
  Issue Type: Bug
  Components: master, tests
Affects Versions: 0.26.0
 Environment: On Jenkins CI: gcc,--verbose,ubuntu:14.04,docker
Reporter: Greg Mann


Saw this failure on Jenkins CI:

{code}
[ RUN  ] MasterTest.MaxCompletedFrameworksFlag
I0115 21:24:50.344116 31507 leveldb.cpp:174] Opened db in 2.062201ms
I0115 21:24:50.344874 31507 leveldb.cpp:181] Compacted db in 716863ns
I0115 21:24:50.344923 31507 leveldb.cpp:196] Created db iterator in 19087ns
I0115 21:24:50.344949 31507 leveldb.cpp:202] Seeked to beginning of db in 1897ns
I0115 21:24:50.344965 31507 leveldb.cpp:271] Iterated through 0 keys in the db 
in 298ns
I0115 21:24:50.345012 31507 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0115 21:24:50.345432 31536 recover.cpp:447] Starting replica recovery
I0115 21:24:50.345657 31536 recover.cpp:473] Replica is in EMPTY status
I0115 21:24:50.346535 31539 replica.cpp:673] Replica in EMPTY status received a 
broadcasted recover request from (6089)@172.17.0.4:52665
I0115 21:24:50.347028 31540 recover.cpp:193] Received a recover response from a 
replica in EMPTY status
I0115 21:24:50.347554 31526 recover.cpp:564] Updating replica status to STARTING
I0115 21:24:50.348175 31540 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 433937ns
I0115 21:24:50.348215 31526 master.cpp:374] Master 
bf6ba047-245f-4e65-986c-1880cef81248 (4e6fbf10d387) started on 172.17.0.4:52665
I0115 21:24:50.349417 31540 replica.cpp:320] Persisted replica status to 
STARTING
I0115 21:24:50.349630 31536 recover.cpp:473] Replica is in STARTING status
I0115 21:24:50.349421 31526 master.cpp:376] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/2wURTY/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="0" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
--work_dir="/tmp/2wURTY/master" --zk_session_timeout="10secs"
I0115 21:24:50.349720 31526 master.cpp:421] Master only allowing authenticated 
frameworks to register
I0115 21:24:50.349737 31526 master.cpp:426] Master only allowing authenticated 
slaves to register
I0115 21:24:50.349750 31526 credentials.hpp:35] Loading credentials for 
authentication from '/tmp/2wURTY/credentials'
I0115 21:24:50.350005 31526 master.cpp:466] Using default 'crammd5' 
authenticator
I0115 21:24:50.350132 31526 master.cpp:535] Using default 'basic' HTTP 
authenticator
I0115 21:24:50.350256 31526 master.cpp:569] Authorization enabled
I0115 21:24:50.350546 31529 hierarchical.cpp:147] Initialized hierarchical 
allocator process
I0115 21:24:50.350626 31536 whitelist_watcher.cpp:77] No whitelist given
I0115 21:24:50.350559 31538 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from (6090)@172.17.0.4:52665
I0115 21:24:50.351049 31534 recover.cpp:193] Received a recover response from a 
replica in STARTING status
I0115 21:24:50.351704 31537 recover.cpp:564] Updating replica status to VOTING
I0115 21:24:50.352221 31532 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 38ns
I0115 21:24:50.352246 31532 replica.cpp:320] Persisted replica status to VOTING
I0115 21:24:50.352371 31541 recover.cpp:578] Successfully joined the Paxos group
I0115 21:24:50.352620 31541 recover.cpp:462] Recover process terminated
I0115 21:24:50.353121 31528 master.cpp:1710] The newly elected leader is 
master@172.17.0.4:52665 with id bf6ba047-245f-4e65-986c-1880cef81248
I0115 21:24:50.353152 31528 master.cpp:1723] Elected as the leading master!
I0115 21:24:50.353173 31528 master.cpp:1468] Recovering from registrar
I0115 21:24:50.353307 31527 registrar.cpp:307] Recovering registrar
I0115 21:24:50.354140 31532 log.cpp:659] Attempting to start the writer
I0115 21:24:50.355285 31533 replica.cpp:493] Replica received implicit promise 
request from (6092)@172.17.0.4:52665 with proposal 1
I0115 21:24:50.355602 31533 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 2875

[jira] [Updated] (MESOS-4336) Document supported file types for archive extraction by fetcher

2016-01-15 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4336:
---
Assignee: Bernd Mathiske

> Document supported file types for archive extraction by fetcher
> ---
>
> Key: MESOS-4336
> URL: https://issues.apache.org/jira/browse/MESOS-4336
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, fetcher
>Reporter: Sunil Shah
>Assignee: Bernd Mathiske
>Priority: Trivial
>  Labels: documentation, mesosphere, newbie
>
> The Mesos fetcher extracts specified URIs if requested to do so by the 
> scheduler. However, the documentation at 
> http://mesos.apache.org/documentation/latest/fetcher/ doesn't list the file 
> types /extensions that will be extracted by the fetcher.
> [The relevant 
> code|https://github.com/apache/mesos/blob/master/src/launcher/fetcher.cpp#L63]
>  specifies an exhaustive list of extensions that will be extracted, the 
> documentation should be updated to match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4336) Document supported file types for archive extraction by fetcher

2016-01-15 Thread Disha Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Disha Singh updated MESOS-4336:
---
Assignee: (was: Disha Singh)

> Document supported file types for archive extraction by fetcher
> ---
>
> Key: MESOS-4336
> URL: https://issues.apache.org/jira/browse/MESOS-4336
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, fetcher
>Reporter: Sunil Shah
>Priority: Trivial
>  Labels: documentation, mesosphere, newbie
>
> The Mesos fetcher extracts specified URIs if requested to do so by the 
> scheduler. However, the documentation at 
> http://mesos.apache.org/documentation/latest/fetcher/ doesn't list the file 
> types /extensions that will be extracted by the fetcher.
> [The relevant 
> code|https://github.com/apache/mesos/blob/master/src/launcher/fetcher.cpp#L63]
>  specifies an exhaustive list of extensions that will be extracted, the 
> documentation should be updated to match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3608) Optionally install test binaries.

2016-01-15 Thread James Peach (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102414#comment-15102414
 ] 

James Peach commented on MESOS-3608:


No objections from dev@. All review feedback addressed.

> Optionally install test binaries.
> -
>
> Key: MESOS-3608
> URL: https://issues.apache.org/jira/browse/MESOS-3608
> Project: Mesos
>  Issue Type: Improvement
>  Components: build, test
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>  Labels: mesosphere
>
> Many of the tests in Mesos could be described as integration tests, since 
> they have external dependencies on kernel features, installed tools, 
> permissions, etc. I'd like to be able to generate a {{mesos-tests}} RPM along 
> with my {{mesos}} RPM so that I can run the same tests in different 
> deployment environments.
> I propose a new configuration option named {{--enable-test-tools}} that will 
> install the tests into {{libexec/mesos/tests}}. I'll also need to make some 
> minor changes to tests so that helper tools can be found in this location as 
> well as in the build directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4408) For the convenience of new users the "assignment of shepherd" must be mentioned above "issuing of tickets"

2016-01-15 Thread Vaibhav Khanduja (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102409#comment-15102409
 ] 

Vaibhav Khanduja commented on MESOS-4408:
-

If shepherds are busy, the issue would remain unassigned.  This might be 
confusing as there might be a person working on a issue without assigned to 
him/her. 

> For the convenience of new users the "assignment of shepherd" must be 
> mentioned above "issuing of tickets" 
> ---
>
> Key: MESOS-4408
> URL: https://issues.apache.org/jira/browse/MESOS-4408
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Disha Singh
>Assignee: Disha Singh
>Priority: Minor
>  Labels: newbie
>
> New users might get confused with the information in the link :
> http://mesos.apache.org/documentation/latest/submitting-a-patch/
>  under the section "Before you start writing code" with points 4 and 6.
> The data must be swapped for better understanding of the users and also, to 
> avoid misguiding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4408) For the convenience of new users the "assignment of shepherd" must be mentioned above "issuing of tickets"

2016-01-15 Thread Disha Singh (JIRA)

Disha Singh created MESOS-4408:
--

 Summary: For the convenience of new users the "assignment of 
shepherd" must be mentioned above "issuing of tickets" 
 Key: MESOS-4408
 URL: https://issues.apache.org/jira/browse/MESOS-4408
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Disha Singh
Assignee: Disha Singh
Priority: Minor


New users might get confused with the information in the link :
http://mesos.apache.org/documentation/latest/submitting-a-patch/
 under the section "Before you start writing code" with points 4 and 6.
The data must be swapped for better understanding of the users and also, to 
avoid misguiding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4407) For the convenience of new users the "assignment of shepherd" must be mentioned above "issuing of tickets"

2016-01-15 Thread Disha Singh (JIRA)

Disha Singh created MESOS-4407:
--

 Summary: For the convenience of new users the "assignment of 
shepherd" must be mentioned above "issuing of tickets" 
 Key: MESOS-4407
 URL: https://issues.apache.org/jira/browse/MESOS-4407
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Disha Singh
Assignee: Disha Singh
Priority: Minor


New users might get confused with the information in the link :
http://mesos.apache.org/documentation/latest/submitting-a-patch/
 under the section "Before you start writing code" with points 4 and 6.
The data must be swapped for better understanding of the users and also, to 
avoid misguiding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4406) For the convenience of new users the "assignment of shepherd" must be mentioned above "issuing of tickets"

2016-01-15 Thread Disha Singh (JIRA)

Disha Singh created MESOS-4406:
--

 Summary: For the convenience of new users the "assignment of 
shepherd" must be mentioned above "issuing of tickets" 
 Key: MESOS-4406
 URL: https://issues.apache.org/jira/browse/MESOS-4406
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Disha Singh
Assignee: Disha Singh
Priority: Minor


New users might get confused with the information in the link :
http://mesos.apache.org/documentation/latest/submitting-a-patch/
 under the section "Before you start writing code" with points 4 and 6.
The data must be swapped for better understanding of the users and also, to 
avoid misguiding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4405) For the convenience of new users the "assignment of shepherd" must be mentioned above "issuing of tickets"

2016-01-15 Thread Disha Singh (JIRA)

Disha Singh created MESOS-4405:
--

 Summary: For the convenience of new users the "assignment of 
shepherd" must be mentioned above "issuing of tickets" 
 Key: MESOS-4405
 URL: https://issues.apache.org/jira/browse/MESOS-4405
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Disha Singh
Assignee: Disha Singh
Priority: Minor


New users might get confused with the information in the link :
http://mesos.apache.org/documentation/latest/submitting-a-patch/
 under the section "Before you start writing code" with points 4 and 6.
The data must be swapped for better understanding of the users and also, to 
avoid misguiding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4404) SlaveTest.HTTPSchedulerSlaveRestart is flaky

2016-01-15 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102370#comment-15102370
 ] 

Greg Mann commented on MESOS-4404:
--

May be related to this patch: https://reviews.apache.org/r/41675/

> SlaveTest.HTTPSchedulerSlaveRestart is flaky
> 
>
> Key: MESOS-4404
> URL: https://issues.apache.org/jira/browse/MESOS-4404
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, slave
>Affects Versions: 0.26.0
> Environment: From the Jenkins CI: gcc,--verbose --enable-libevent 
> --enable-ssl,centos:7,docker
>Reporter: Greg Mann
>  Labels: flaky-test, mesosphere
>
> Saw this failure on the Jenkins CI:
> {code}
> [ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
> I0115 18:42:25.393354  1762 leveldb.cpp:174] Opened db in 3.456169ms
> I0115 18:42:25.394310  1762 leveldb.cpp:181] Compacted db in 922588ns
> I0115 18:42:25.394361  1762 leveldb.cpp:196] Created db iterator in 18529ns
> I0115 18:42:25.394378  1762 leveldb.cpp:202] Seeked to beginning of db in 
> 1933ns
> I0115 18:42:25.394390  1762 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 280ns
> I0115 18:42:25.394430  1762 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0115 18:42:25.394963  1791 recover.cpp:447] Starting replica recovery
> I0115 18:42:25.395396  1791 recover.cpp:473] Replica is in EMPTY status
> I0115 18:42:25.396589  1795 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (11302)@172.17.0.2:49129
> I0115 18:42:25.397101  1785 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0115 18:42:25.397721  1791 recover.cpp:564] Updating replica status to 
> STARTING
> I0115 18:42:25.398764  1789 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 684584ns
> I0115 18:42:25.398807  1789 replica.cpp:320] Persisted replica status to 
> STARTING
> I0115 18:42:25.398947  1795 master.cpp:374] Master 
> 544823be-76b5-47be-b326-2cd6d6a700b8 (e648fe109cb1) started on 
> 172.17.0.2:49129
> I0115 18:42:25.399209  1788 recover.cpp:473] Replica is in STARTING status
> I0115 18:42:25.398980  1795 master.cpp:376] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/BOGaaq/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/BOGaaq/master" --zk_session_timeout="10secs"
> I0115 18:42:25.399435  1795 master.cpp:421] Master only allowing 
> authenticated frameworks to register
> I0115 18:42:25.399451  1795 master.cpp:426] Master only allowing 
> authenticated slaves to register
> I0115 18:42:25.399461  1795 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/BOGaaq/credentials'
> I0115 18:42:25.399884  1795 master.cpp:466] Using default 'crammd5' 
> authenticator
> I0115 18:42:25.400060  1795 master.cpp:535] Using default 'basic' HTTP 
> authenticator
> I0115 18:42:25.400254  1795 master.cpp:569] Authorization enabled
> I0115 18:42:25.400439  1785 hierarchical.cpp:147] Initialized hierarchical 
> allocator process
> I0115 18:42:25.400470  1789 whitelist_watcher.cpp:77] No whitelist given
> I0115 18:42:25.400656  1792 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (11303)@172.17.0.2:49129
> I0115 18:42:25.400943  1781 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0115 18:42:25.401612  1791 recover.cpp:564] Updating replica status to VOTING
> I0115 18:42:25.402313  1785 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 458849ns
> I0115 18:42:25.402345  1785 replica.cpp:320] Persisted replica status to 
> VOTING
> I0115 18:42:25.402510  1788 recover.cpp:578] Successfully joined the Paxos 
> group
> I0115 18:42:25.402848  1788 recover.cpp:462] Recover process terminated
> I0115 18:42:25.402997  1784 master.cpp:1710] The newly elected leader is 
> master@172.17.0.2:49129 with id 544823be-76b5-47be-b326-2cd6d6a7

[jira] [Created] (MESOS-4404) SlaveTest.HTTPSchedulerSlaveRestart is flaky

2016-01-15 Thread Greg Mann (JIRA)

Greg Mann created MESOS-4404:


 Summary: SlaveTest.HTTPSchedulerSlaveRestart is flaky
 Key: MESOS-4404
 URL: https://issues.apache.org/jira/browse/MESOS-4404
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API, slave
Affects Versions: 0.26.0
 Environment: From the Jenkins CI: gcc,--verbose --enable-libevent 
--enable-ssl,centos:7,docker
Reporter: Greg Mann


Saw this failure on the Jenkins CI:

{code}
[ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
I0115 18:42:25.393354  1762 leveldb.cpp:174] Opened db in 3.456169ms
I0115 18:42:25.394310  1762 leveldb.cpp:181] Compacted db in 922588ns
I0115 18:42:25.394361  1762 leveldb.cpp:196] Created db iterator in 18529ns
I0115 18:42:25.394378  1762 leveldb.cpp:202] Seeked to beginning of db in 1933ns
I0115 18:42:25.394390  1762 leveldb.cpp:271] Iterated through 0 keys in the db 
in 280ns
I0115 18:42:25.394430  1762 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0115 18:42:25.394963  1791 recover.cpp:447] Starting replica recovery
I0115 18:42:25.395396  1791 recover.cpp:473] Replica is in EMPTY status
I0115 18:42:25.396589  1795 replica.cpp:673] Replica in EMPTY status received a 
broadcasted recover request from (11302)@172.17.0.2:49129
I0115 18:42:25.397101  1785 recover.cpp:193] Received a recover response from a 
replica in EMPTY status
I0115 18:42:25.397721  1791 recover.cpp:564] Updating replica status to STARTING
I0115 18:42:25.398764  1789 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 684584ns
I0115 18:42:25.398807  1789 replica.cpp:320] Persisted replica status to 
STARTING
I0115 18:42:25.398947  1795 master.cpp:374] Master 
544823be-76b5-47be-b326-2cd6d6a700b8 (e648fe109cb1) started on 172.17.0.2:49129
I0115 18:42:25.399209  1788 recover.cpp:473] Replica is in STARTING status
I0115 18:42:25.398980  1795 master.cpp:376] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/BOGaaq/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
--work_dir="/tmp/BOGaaq/master" --zk_session_timeout="10secs"
I0115 18:42:25.399435  1795 master.cpp:421] Master only allowing authenticated 
frameworks to register
I0115 18:42:25.399451  1795 master.cpp:426] Master only allowing authenticated 
slaves to register
I0115 18:42:25.399461  1795 credentials.hpp:35] Loading credentials for 
authentication from '/tmp/BOGaaq/credentials'
I0115 18:42:25.399884  1795 master.cpp:466] Using default 'crammd5' 
authenticator
I0115 18:42:25.400060  1795 master.cpp:535] Using default 'basic' HTTP 
authenticator
I0115 18:42:25.400254  1795 master.cpp:569] Authorization enabled
I0115 18:42:25.400439  1785 hierarchical.cpp:147] Initialized hierarchical 
allocator process
I0115 18:42:25.400470  1789 whitelist_watcher.cpp:77] No whitelist given
I0115 18:42:25.400656  1792 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from (11303)@172.17.0.2:49129
I0115 18:42:25.400943  1781 recover.cpp:193] Received a recover response from a 
replica in STARTING status
I0115 18:42:25.401612  1791 recover.cpp:564] Updating replica status to VOTING
I0115 18:42:25.402313  1785 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 458849ns
I0115 18:42:25.402345  1785 replica.cpp:320] Persisted replica status to VOTING
I0115 18:42:25.402510  1788 recover.cpp:578] Successfully joined the Paxos group
I0115 18:42:25.402848  1788 recover.cpp:462] Recover process terminated
I0115 18:42:25.402997  1784 master.cpp:1710] The newly elected leader is 
master@172.17.0.2:49129 with id 544823be-76b5-47be-b326-2cd6d6a700b8
I0115 18:42:25.403038  1784 master.cpp:1723] Elected as the leading master!
I0115 18:42:25.403059  1784 master.cpp:1468] Recovering from registrar
I0115 18:42:25.403267  1791 registrar.cpp:307] Recovering registrar
I0115 18:42:25.404359  1794 log.cpp:659] Attempting to start the writer
I0115 18:42:25.405777  1793 replica.cpp:493] Replica received implicit promise 
request from (11305)@172.17.0.2:49129 with proposal 1
I0115 18:42:25.406337  1793 leveldb.cpp:304] Persisting

[jira] [Comment Edited] (MESOS-4136) Add a ContainerLogger module that restrains log sizes

2016-01-15 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074359#comment-15074359
 ] 

Joseph Wu edited comment on MESOS-4136 at 1/15/16 7:34 PM:
---

|| Review || Summary ||
| https://reviews.apache.org/r/42052/ | Cleanup/Refactor of {{Subprocess::IO}} |
| https://reviews.apache.org/r/41779/
https://reviews.apache.org/r/41780/ | Add non-dup option to 
{{Subprocess::IO::FD}} |
| https://reviews.apache.org/r/42358/ | Refactor {{SandboxContainerLogger}} |
| https://reviews.apache.org/r/41781/ | Add rotating logger test |
| https://reviews.apache.org/r/41782/ | Makefile and test config changes |
| https://reviews.apache.org/r/41783/ | Implement the rotating logger |


was (Author: kaysoky):
|| Review || Summary ||
| https://reviews.apache.org/r/42052/ | Cleanup/Refactor of {{Subprocess::IO}} |
| https://reviews.apache.org/r/41779/
https://reviews.apache.org/r/41780/ | Add non-dup option to 
{{Subprocess::IO::FD}} |
| https://reviews.apache.org/r/41781/ | Add rotating logger test |
| https://reviews.apache.org/r/41782/ | Makefile and test config changes |
| https://reviews.apache.org/r/41783/ | Implement the rotating logger |

> Add a ContainerLogger module that restrains log sizes
> -
>
> Key: MESOS-4136
> URL: https://issues.apache.org/jira/browse/MESOS-4136
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: logging, mesosphere
>
> One of the major problems this logger module aims to solve is overflowing 
> executor/task log files.  Log files are simply written to disk, and are not 
> managed other than via occasional garbage collection by the agent process 
> (and this only deals with terminated executors).
> We should add a {{ContainerLogger}} module that truncates logs as it reaches 
> a configurable maximum size.  Additionally, we should determine if the web 
> UI's {{pailer}} needs to be changed to deal with logs that are not 
> append-only.
> This will be a non-default module which will also serve as an example for how 
> to implement the module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4296) Add docker URI fetcher plugin based on curl.

2016-01-15 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4296:
--
Sprint: Mesosphere Sprint 27  (was: Mesosphere Sprint 26)

> Add docker URI fetcher plugin based on curl.
> 
>
> Key: MESOS-4296
> URL: https://issues.apache.org/jira/browse/MESOS-4296
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere, unified-containerizer-mvp
>
> The existing registry client for docker assumes that Mesos is built using SSL 
> support and SSL is enabled. That means Mesos built with libev (or if SSL is 
> disabled) won't be able to use docker registry client to provision docker 
> images.
> Given the new URI fetcher (MESOS-3918) work has been committed, we can add a 
> new URI fetcher plugin for docker. The plugin will be based on curl so that 
> https and 3xx redirects will be handled automatically. The docker registry 
> puller will just use the URI fetcher to get docker images.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4345) Implement a network-handle manager for net_cls cgroup subsystem

2016-01-15 Thread Avinash Sridharan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-4345:
-
Description: 
As part of implementing the net_cls cgroup isolator we need a mechanism to 
manage the minor handles that will be allocated to containers when they are 
associated with a net_cls cgroup. The network-handle manager needs to provide 
the following functionality:

a) During normal operation keep track of the free and allocated network 
handles. There can be a total of 64K such network handles.
b) On startup, learn the allocated network handle by walking the net_cls cgroup 
tree for mesos and build a map of free network handles available to the agent. 

  was:
As part of implementing the net_cls cgroup isolator we need a mechanism to 
manage the minor handles that will allocated to containers when they are 
associated with a net_cls cgroup. The network-handle manager needs to provide 
the following functionality:

a) During normal operation keep track of the free and allocated network 
handles. There can be a total of 64K such network handles.
b) On startup, learn the allocated network handle by walking the net_cls cgroup 
tree for mesos and build a map of free network handles available to the agent. 


> Implement a network-handle manager for net_cls cgroup subsystem
> ---
>
> Key: MESOS-4345
> URL: https://issues.apache.org/jira/browse/MESOS-4345
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: containerizer, containers, mesosphere
>
> As part of implementing the net_cls cgroup isolator we need a mechanism to 
> manage the minor handles that will be allocated to containers when they are 
> associated with a net_cls cgroup. The network-handle manager needs to provide 
> the following functionality:
> a) During normal operation keep track of the free and allocated network 
> handles. There can be a total of 64K such network handles.
> b) On startup, learn the allocated network handle by walking the net_cls 
> cgroup tree for mesos and build a map of free network handles available to 
> the agent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3832) Scheduler HTTP API does not redirect to leading master

2016-01-15 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102311#comment-15102311
 ] 

Vinod Kone commented on MESOS-3832:
---

[~drexin] Are you ok with me commiting [~anandmazumdar] 's patch in lieu of 
yours? Anand submitted a new review because he thought you were no longer 
working on it (since you assigned the ticket to me).

> Scheduler HTTP API does not redirect to leading master
> --
>
> Key: MESOS-3832
> URL: https://issues.apache.org/jira/browse/MESOS-3832
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Dario Rexin
>Assignee: Vinod Kone
>  Labels: newbie
>
> The documentation for the Scheduler HTTP API says:
> {quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
> Redirect” will be received with the “Location” header pointing to the leading 
> master.{quote}
> While the redirect functionality has been implemented, it was not actually 
> used in the handler for the HTTP api.
> A probable fix could be:
> - Check if the current master is the leading master.
> - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3193) Implement AppC image discovery.

2016-01-15 Thread Jojy Varghese (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jojy Varghese updated MESOS-3193:
-
Sprint: Mesosphere Sprint 27

> Implement AppC image discovery.
> ---
>
> Key: MESOS-3193
> URL: https://issues.apache.org/jira/browse/MESOS-3193
> Project: Mesos
>  Issue Type: Task
>Reporter: Yan Xu
>Assignee: Jojy Varghese
>  Labels: mesosphere, twitter, unified-containerizer-mvp
>
> Appc spec specifies two image discovery mechanisms: simple and meta 
> discovery. We need to have an abstraction for image discovery in AppcStore. 
> For MVP, we can implement the simple discovery first.
> https://reviews.apache.org/r/34139/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4403) Check paths in DiskInfo.Source.Path exist during slave initialization.

2016-01-15 Thread Jie Yu (JIRA)

Jie Yu created MESOS-4403:
-

 Summary: Check paths in DiskInfo.Source.Path exist during slave 
initialization.
 Key: MESOS-4403
 URL: https://issues.apache.org/jira/browse/MESOS-4403
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu


We have two options here. We can either check and fail if it does not exists. 
Or we can create if it does not exist like we did for slave.work_dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3838) Put authorize logic for teardown into a common function

2016-01-15 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102279#comment-15102279
 ] 

Vinod Kone commented on MESOS-3838:
---

[~gyliu] Mind finding a different shepherd? My plate is pretty full right now.

> Put authorize logic for teardown into a common function
> ---
>
> Key: MESOS-3838
> URL: https://issues.apache.org/jira/browse/MESOS-3838
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
> Fix For: 0.28.0
>
>
> The mesos now have {{authorizeTask}}, {{authorizeFramework}} and may have 
> {{authorizeReserveResource}} and {{authorizeUnReserveResource}} later. 
> But now the {{Master::Http::teardown()}} is putting the authorize logic in 
> the {{Master::Http::teardown()}} itself, it is better to put authorize logic 
> for teardown into a common function {{authorizeTeardown()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4402) Update filesystem isolators to look for persistent volume directories from the correct location.

2016-01-15 Thread Jie Yu (JIRA)

Jie Yu created MESOS-4402:
-

 Summary: Update filesystem isolators to look for persistent volume 
directories from the correct location.
 Key: MESOS-4402
 URL: https://issues.apache.org/jira/browse/MESOS-4402
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu


This is related to MESOS-4400.

Since persistent volume directories can be created from non root disk now. We 
need to adjust both posix and linux filesystem isolator to look for volumes 
from the correct location based on the information in DiskInfo.Source.

See relevant code in:
{code}
Future PosixFilesystemIsolatorProcess::update(..);
Future LinuxFilesystemIsolatorProcess::update(..);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4401) Documentation for state abstraction

2016-01-15 Thread Neil Conway (JIRA)

Neil Conway created MESOS-4401:
--

 Summary: Documentation for state abstraction
 Key: MESOS-4401
 URL: https://issues.apache.org/jira/browse/MESOS-4401
 Project: Mesos
  Issue Type: Documentation
  Components: documentation, master
Reporter: Neil Conway
Priority: Minor


* What is it?
* How do framework developers use it?
* What caveats and common gotchas exist?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4380) Adjust Resource arithmetics for DiskInfo.Source.

2016-01-15 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4380:
--
Issue Type: Task  (was: Bug)

> Adjust Resource arithmetics for DiskInfo.Source.
> 
>
> Key: MESOS-4380
> URL: https://issues.apache.org/jira/browse/MESOS-4380
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>
> Since we added the Source for DiskInfo, we need to adjust the Resource 
> arithmetics for that. That includes equality check, addable check, 
> subtractable check, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4378) Add Source to Resource.DiskInfo.

2016-01-15 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4378:
--
Issue Type: Task  (was: Bug)

> Add Source to Resource.DiskInfo.
> 
>
> Key: MESOS-4378
> URL: https://issues.apache.org/jira/browse/MESOS-4378
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>
> Source is used to describe the extra information about the source of a Disk 
> resource. We will support 'PATH' type first and then 'BLOCK' later.
> {noformat}
> message Source {
>   enum Type {
> PATH = 1;
> BLOCK = 2,
>   }
>   message Path {
> // Path to the folder (e.g., /mnt/raid/disk0).
> required string root = 1;
> required double total_size = 2;
>   }
>   message Block {
> // Path to the device file (e.g., /dev/sda1, /dev/vg/v1).
> // It can be a physical partition, or a logical volume (LVM).
> required string device = 1;
>   }
>   required Type type = 1;
>   optional Path path = 2;
>   optional Block block = 3;
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4400) Create persistent volume directories based on DiskInfo.Source.

2016-01-15 Thread Jie Yu (JIRA)

Jie Yu created MESOS-4400:
-

 Summary: Create persistent volume directories based on 
DiskInfo.Source.
 Key: MESOS-4400
 URL: https://issues.apache.org/jira/browse/MESOS-4400
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu


Currently, we always create persistent volumes from root disk, and the 
persistent volumes are directories. With DiskInfo.Source being added, we should 
create the persistent volume accordingly based on the information in 
DiskInfo.Source.

This ticket handles the case where DiskInfo.Source.type is PATH. In that case, 
we should create sub-directories and use the same layout as slave.work_dir.

See the relevant code here:
{code}
void Slave::checkpointResources(...)
{
  // Creates persistent volumes that do not exist and schedules
  // releasing those persistent volumes that are no longer needed.
  //
  // TODO(jieyu): Consider introducing a volume manager once we start
  // to support multiple disks, or raw disks. Depending on the
  // DiskInfo, we may want to create either directories under a root
  // directory, or LVM volumes from a given device.
  Resources volumes = newCheckpointedResources.persistentVolumes();

  foreach (const Resource& volume, volumes) {
// This is validated in master.
CHECK_NE(volume.role(), "*");

string path = paths::getPersistentVolumePath(
flags.work_dir,
volume.role(),
volume.disk().persistence().id());

if (!os::exists(path)) {
  CHECK_SOME(os::mkdir(path, true))
<< "Failed to create persistent volume at '" << path << "'";
}
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4399) ReviewBot should ignore a review chain if any of the reviews in the chain is unpublished

2016-01-15 Thread Vinod Kone (JIRA)

Vinod Kone created MESOS-4399:
-

 Summary: ReviewBot should ignore a review chain if any of the 
reviews in the chain is unpublished
 Key: MESOS-4399
 URL: https://issues.apache.org/jira/browse/MESOS-4399
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone


Observed this recently where review bot was continuously failing on a review 
chain because one of the reviews in the chain was unpublished (by mistake).

Instead of failing, the bot should just skip such a chain and move onto other 
reviews.

{noformat}
Verifying review 42241
Dependent review: https://reviews.apache.org/api/review-requests/42240/ 
Error handling URL https://reviews.apache.org/api/review-requests/42240/: 
FORBIDDEN
git clean -fd
git reset --hard f2cf6cbb1ca9d04033e293a8b79b97b958a72df7

Build step 'Execute shell' marked build as failure
Sending e-mails to: bui...@mesos.apache.org
Finished: FAILURE

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3888) Support distinguishing revocable resources in the Resource protobuf.

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3888:

Shepherd:   (was: Joris Van Remoortere)

> Support distinguishing revocable resources in the Resource protobuf.
> 
>
> Key: MESOS-3888
> URL: https://issues.apache.org/jira/browse/MESOS-3888
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Klaus Ma
>  Labels: mesosphere
>
> Add enum type into RevocableInfo: 
> * Framework need to assign RevocableInfo when launching task; if it’s not 
> assign, use reserved resources. Framework need to identify which resources 
> it’s using
> * Oversubscription resources need to assign the type by Agent (MESOS-3930)
> * Update Oversubscription document that OO has over-subscribe the Allocation 
> Slack and recommend QoS to handle the usage slack only. (MESOS-3889)
> {code}
> message Resource {
>   ...
>   message RevocableInfo {
>enum Type {
>  // Under-utilized, allocated resources.  Controlled by
>  // oversubscription (QoSController & ResourceEstimator).
>  USAGE_SLACK = 1;
>  // Unallocated, reserved resources.
>  // Controlled by optimistic offers (Allocator).
>  ALLOCATION_SLACK = 2; 
>}
>optional Type type = 1;
>   }
>  ...
>   optional RevocableInfo revocable = 9;
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3887) Add a flag to master to enable optimistic offers.

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3887:

Shepherd:   (was: Joris Van Remoortere)

> Add a flag to master to enable optimistic offers. 
> --
>
> Key: MESOS-3887
> URL: https://issues.apache.org/jira/browse/MESOS-3887
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Guangya Liu
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3889) Modify Oversubscription documentation to explicitly forbid the QoS Controller from killing executors running on optimistically offered resources.

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3889:

Shepherd:   (was: Joris Van Remoortere)

> Modify Oversubscription documentation to explicitly forbid the QoS Controller 
> from killing executors running on optimistically offered resources.
> -
>
> Key: MESOS-3889
> URL: https://issues.apache.org/jira/browse/MESOS-3889
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Klaus Ma
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3896) Add accounting for reservation slack in the allocator.

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3896:

Shepherd:   (was: Joris Van Remoortere)

> Add accounting for reservation slack in the allocator.
> --
>
> Key: MESOS-3896
> URL: https://issues.apache.org/jira/browse/MESOS-3896
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Klaus Ma
>  Labels: mesosphere
>
> MESOS-XXX: Optimsistic accounter
> {code}
> class HierarchicalAllocatorProcess 
> {
>   struct Slave
>   {
> ...
> struct Optimistic 
> {
>   Resources total; // The total allocation slack resources
>   Resources allocated; // The allocated allocation slack resources
> };
> 
> Optimistic optimistic;
>   };
> }
> {code}
> MESOS-4146: flatten & allocationSlack for Optimistic Offer
> {code}
> class Resources
> {
> // Returns a Resources object with the same amount of each resource
> // type as these Resources, but with all Resource objects marked as
> // the specified `RevocableInfo::Type`; the other attribute is not
> // affected.
> Resources flatten(Resource::RevocableInfo::Type type);
> // Return a Resources object that:
> //   - if role is given, the resources did not include role's reserved
> // resources.
> //   - the resources's revocable type is `ALLOCATION_SLACK`
> //   - the role of resources is set to "*"
> Resources allocationSlack(Option role = None());
> }
> {code}
> MESOS-XXX: Allocate the allocation_slack resources to framework
> {code}
> void HierarchicalAllocatorProcess::allocate(
> const hashset& slaveIds_)
> {
>   foreach slave; foreach role; foreach framework
>   {
> Resource optimistic;
> if (framework.revocable) {
>   Resources total = 
> slaves[slaveId].optimistic.total.allocationSlack(role);
>   optimistic = total - slaves[slaveId].optimistic.allocated;
> }
> ...
> offerable[frameworkId][slaveId] += resources + optimistic;
> ...
> slaves[slaveId].optimistic.allocated += optimistic;
>   }
> }
> {code}
>   
> Here's some consideration about `ALLOCATION_SLACK`:
> 1. 'Old' resources (available/total) did not include ALLOCATION_SLACK
> 2. After `Quota`, `remainingClusterResources.contains` should not check 
> ALLOCATION_SLACK; if there no enough resources,  master can still offer 
> ALLOCATION_SALCK resources.
> 3. In sorter, it'll not include ALLOCATION_SLACK; as those resources are 
> borrowed from other role/framework
> 4. If either normal resources or ALLOCATION_SLACK resources are 
> allocable/!filtered, it can be offered to framework
> 5. Currently, allocator will assign all ALLOCATION_SALCK resources in slave 
> to one framework
> MESOS-XXX: Update ALLOCATION_SLACK for dynamic reservation (updateAllocation)
> {code}
> void HierarchicalAllocatorProcess::updateAllocation(
> const FrameworkID& frameworkId,
> const SlaveID& slaveId,
> const vector& operations)
> {
> ...
> Try updatedOptimistic =
> slaves[slaveId].optimistic.total.apply(operations);
> CHECK_SOME(updatedTotal);
> slaves[slaveId].optimistic.total =
> 
> updatedOptimistic.get().stateless().reserved().flatten(ALLOCATION_SLACK);
> ...
> }
> {code}
> 
> MESOS-XXX: Add ALLOCATION_SLACK when slaver register/re-register (addSlave)
> {code}
> void HierarchicalAllocatorProcess::addSlave(
> const SlaveID& slaveId,
> const SlaveInfo& slaveInfo,
> const Option& unavailability,
> const Resources& total,
> const hashmap& used)
> {
>   ...
>   slaves[slaveId].optimistic.total =
>   total.stateless().reserved().flatten(ALLOCATION_SLACK);
>   ...
> }
> {code}
>   
> No need to handle `removeSlave`, it'll all related info from `slaves` 
> including `optimistic`.
> MESOS-XXX: return resources to allocator (recoverResources)
> {code}
> void HierarchicalAllocatorProcess::recoverResources(
> const FrameworkID& frameworkId,
> const SlaveID& slaveId,
> const Resources& resources,
> const Option& filters)
> {
>   if (slaves.contains(slaveId))
>   {
> ...
> slaves[slaveId].optimistic.allocated -= resources.allocationSlack();
> ...
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3895) Update reservation slack allocator state during agent failover.

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3895:

Shepherd:   (was: Joris Van Remoortere)

> Update reservation slack allocator state during agent failover.
> ---
>
> Key: MESOS-3895
> URL: https://issues.apache.org/jira/browse/MESOS-3895
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Artem Harutyunyan
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3893) Implement tests for verifying allocator resource math.

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3893:

Shepherd:   (was: Joris Van Remoortere)

> Implement tests for verifying allocator resource math.
> --
>
> Key: MESOS-3893
> URL: https://issues.apache.org/jira/browse/MESOS-3893
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Guangya Liu
>  Labels: mesosphere
>
> Write a test to ensure that the allocator performs the reservation slack 
> calculations correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3590) Support cluster-wide persistent storage (shared or exclusive-owned)

2016-01-15 Thread Michael Park (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-3590:

Assignee: (was: Michael Park)

> Support cluster-wide persistent storage (shared or exclusive-owned)
> ---
>
> Key: MESOS-3590
> URL: https://issues.apache.org/jira/browse/MESOS-3590
> Project: Mesos
>  Issue Type: Epic
>  Components: volumes
>Reporter: Marco Massenzio
>  Labels: external-volumes, mesosphere, persistent-volumes
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3890) Add notion of evictable task to RunTaskMessage

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3890:

Shepherd:   (was: Joris Van Remoortere)

> Add notion of evictable task to RunTaskMessage
> --
>
> Key: MESOS-3890
> URL: https://issues.apache.org/jira/browse/MESOS-3890
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Guangya Liu
>  Labels: mesosphere
>
> {code}
> message RunTaskMessage {
>   ...
>   // This list can be non-empty when a task is launched on reserved
>   // resources.  If the reserved resources are in use (as revocable
>   // resources), this list contains the executors that can be evicted
>   // to make room to run this task.
>   repeated ExecutorID evictable_executors = 5;
>   ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3894) Rebuild reservation slack allocator state during master failover.

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3894:

Shepherd:   (was: Joris Van Remoortere)

> Rebuild reservation slack allocator state during master failover.
> -
>
> Key: MESOS-3894
> URL: https://issues.apache.org/jira/browse/MESOS-3894
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Guangya Liu
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3897) Identify and implement test cases for verifying eviction logic in the agent

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3897:

Shepherd:   (was: Joris Van Remoortere)

> Identify and implement test cases for verifying eviction logic in the agent
> ---
>
> Key: MESOS-3897
> URL: https://issues.apache.org/jira/browse/MESOS-3897
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Klaus Ma
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-4336) Document supported file types for archive extraction by fetcher

2016-01-15 Thread Disha Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Disha Singh reassigned MESOS-4336:
--

Assignee: Disha Singh  (was: Bernd Mathiske)

> Document supported file types for archive extraction by fetcher
> ---
>
> Key: MESOS-4336
> URL: https://issues.apache.org/jira/browse/MESOS-4336
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, fetcher
>Reporter: Sunil Shah
>Assignee: Disha Singh
>Priority: Trivial
>  Labels: documentation, mesosphere, newbie
>
> The Mesos fetcher extracts specified URIs if requested to do so by the 
> scheduler. However, the documentation at 
> http://mesos.apache.org/documentation/latest/fetcher/ doesn't list the file 
> types /extensions that will be extracted by the fetcher.
> [The relevant 
> code|https://github.com/apache/mesos/blob/master/src/launcher/fetcher.cpp#L63]
>  specifies an exhaustive list of extensions that will be extracted, the 
> documentation should be updated to match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4398) Synchronously handle AuthZ errors for the Scheduler endpoint.

2016-01-15 Thread Anand Mazumdar (JIRA)

Anand Mazumdar created MESOS-4398:
-

 Summary: Synchronously handle AuthZ errors for the Scheduler 
endpoint.
 Key: MESOS-4398
 URL: https://issues.apache.org/jira/browse/MESOS-4398
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.25.0
Reporter: Anand Mazumdar


Currently, any AuthZ errors for the {{/scheduler}} endpoint are handled 
asynchronously as {{FrameworkErrorMessage}}. Here is an example:

{code}
  if (authorizationError.isSome()) {
LOG(INFO) << "Refusing subscription of framework"
  << " '" << frameworkInfo.name() << "'"
  << ": " << authorizationError.get().message;

FrameworkErrorMessage message;
message.set_message(authorizationError.get().message);
http.send(message);
http.close();
return;
  }
{code}

We would like to handle such errors synchronously when the request is received 
similar to what other endpoints like {{/reserve}}/{{/quota}} do. We already 
have the relevant functions {{authorizeXXX}} etc in {{master.cpp}}. We should 
just make the requests pass through once the relevant {{Future}} from the 
{{authorizeXXX}} function is fulfilled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3898) Identify and implement test cases for handling a race between optimistic lender and tenant offers.

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3898:

Shepherd:   (was: Joris Van Remoortere)

> Identify and implement test cases for handling a race between optimistic 
> lender and tenant offers.
> --
>
> Key: MESOS-3898
> URL: https://issues.apache.org/jira/browse/MESOS-3898
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Klaus Ma
>  Labels: mesosphere
>
> An example is the when lender launches the task on an agent followed by a  
> borrower launching a task on the same agent before the optimistic offer is 
> rescinded. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3955) Add helper function to get stateless resources.

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3955:

Shepherd:   (was: Joris Van Remoortere)

> Add helper function to get stateless resources.
> ---
>
> Key: MESOS-3955
> URL: https://issues.apache.org/jira/browse/MESOS-3955
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> Stateless are {{No Persistent Volume}} resources, it’s used by allocator to 
> calculate optimised resources (reserved().stateless() -allocated()).
> {code}
> class Resources {
>   ...
>  // Tests if the given Resource object has no stateful elements.
>  static bool isStateless(const Resource& resource);
>  ...
>  // Returns the resources that do not have stateful elements.
>  Resources stateless() const;
>  ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3931) Do not enable task and executor run on different resources

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3931:

Shepherd:   (was: Joris Van Remoortere)

> Do not enable task and executor run on different resources
> --
>
> Key: MESOS-3931
> URL: https://issues.apache.org/jira/browse/MESOS-3931
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> Do not enable task and its executor run on different resources. They must run 
> on same kind of resources, either non-revocable resources, usage slack 
> resources or allocation slack resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4282) Update isolator prepare function to use ContainerLaunchInfo

2016-01-15 Thread Gilbert Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-4282:

Sprint:   (was: Mesosphere Sprint 27)

> Update isolator prepare function to use ContainerLaunchInfo
> ---
>
> Key: MESOS-4282
> URL: https://issues.apache.org/jira/browse/MESOS-4282
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: mesosphere, unified-containerizer-mvp
>
> Currently we have the isolator's prepare function returning 
> ContainerPrepareInfo protobuf. We should enable ContainerLaunchInfo (contains 
> environment variables, namespaces, etc.) to be returned which will be used by 
> Mesos containerize to launch containers. 
> By doing this (ContainerPrepareInfo -> ContainerLaunchInfo), we can select 
> any necessary information and passing then to launcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4123) Added USAGE_SLACK metrics to snapshot endpoint for master/agent

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4123:

Shepherd:   (was: Joris Van Remoortere)

> Added USAGE_SLACK metrics to snapshot endpoint for master/agent
> ---
>
> Key: MESOS-4123
> URL: https://issues.apache.org/jira/browse/MESOS-4123
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> The master/agent have endpoint to get all revocable resources but the current 
> revocable resources will include both usage slack and allocation slack 
> resources.
> It is better to put usage slack metrics to snapshot endpoint for master/agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4145) Update allocator to get allocation slack resources

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4145:

Shepherd:   (was: Joris Van Remoortere)

> Update allocator to get allocation slack resources
> --
>
> Key: MESOS-4145
> URL: https://issues.apache.org/jira/browse/MESOS-4145
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> The allocator should be updated to 
> 1) Report allocation slack resources when add a new agent.
> 2) Allocate allocation slack resources when sending offers.
> 3) Update allocation slack resources when update agent, update dynamic 
> reservations with both framework and http endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4124) Added ALLOCATION_SLACK metrics to snapshot endpoint for master/agent

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4124:

Shepherd:   (was: Joris Van Remoortere)

> Added ALLOCATION_SLACK metrics to snapshot endpoint for master/agent
> 
>
> Key: MESOS-4124
> URL: https://issues.apache.org/jira/browse/MESOS-4124
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> The master/agent have endpoint to get all revocable resources but the current 
> revocable resources will include both usage slack and allocation slack 
> resources.
> It is better to put allocation slack metrics to snapshot endpoint for 
> master/agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4148) Set task as REASON_RESOURCE_PREEMPTED if not enough allocation slack resources

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4148:

Shepherd:   (was: Joris Van Remoortere)

> Set task as REASON_RESOURCE_PREEMPTED if not enough allocation slack resources
> --
>
> Key: MESOS-4148
> URL: https://issues.apache.org/jira/browse/MESOS-4148
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> If a task is launched using revocable resources, the resources must not be in 
> use when launching the task. If they are in use, then the task should fail to 
> start. We need to add a new REASON for this (REASON_RESOURCE_PREEMPTED).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4146) Distinguish usage slack and allocation slack revocable resources

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4146:

Shepherd:   (was: Joris Van Remoortere)

> Distinguish usage slack and allocation slack revocable resources
> 
>
> Key: MESOS-4146
> URL: https://issues.apache.org/jira/browse/MESOS-4146
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> The API revocable() can now return resources which are revocable including 
> both allocation slack and usage slack, it is better add two new APIs to 
> return revocable resources for both allocation slack and usage slack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4267) Added helper function to flatten resources.

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4267:

Shepherd:   (was: Joris Van Remoortere)

> Added helper function to flatten resources.
> ---
>
> Key: MESOS-4267
> URL: https://issues.apache.org/jira/browse/MESOS-4267
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> We need two helper functions to flatten resources.
> {code}
> // Returns the reserved resources in fatten by different roles. This is used
> // to calculate the initial value for allocation slack.
> Resources flattenReserved() const;
> // Set resources as SLACK mode, only `ALLOCATION_SLACK` and `USAGE_SLACK`
> // are now supported.
> Resources flattenSlack(
> const Resource::RevocableInfo::Type& type
> = Resource::RevocableInfo::ALLOCATION_SLACK) const;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3832) Scheduler HTTP API does not redirect to leading master

2016-01-15 Thread Anand Mazumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102187#comment-15102187
 ] 

Anand Mazumdar commented on MESOS-3832:
---

I posted another patch that addressed the comments from Vinod in the earlier 
review. Should be able make it in to 0.27.

https://reviews.apache.org/r/42341/

> Scheduler HTTP API does not redirect to leading master
> --
>
> Key: MESOS-3832
> URL: https://issues.apache.org/jira/browse/MESOS-3832
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Dario Rexin
>Assignee: Vinod Kone
>  Labels: newbie
>
> The documentation for the Scheduler HTTP API says:
> {quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
> Redirect” will be received with the “Location” header pointing to the leading 
> master.{quote}
> While the redirect functionality has been implemented, it was not actually 
> used in the handler for the HTTP api.
> A probable fix could be:
> - Check if the current master is the leading master.
> - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4390) Shared Volumes Design Doc

2016-01-15 Thread Anindya Sinha (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102145#comment-15102145
 ] 

Anindya Sinha commented on MESOS-4390:
--

[~adam-mesos] Just curious what is the deliverable of this JIRA? The design doc 
is already shared on google doc as per the link in this JIRA, so what needs to 
be done on my side to move this JIRA from IN PROGRESS -> REVIEWABLE (since this 
JIRA is ASSIGNED to me). Does this JIRA indicated to update the add a .md in 
the git repo for this feature?

Also, I should be able to send out RB requests for the shared resources epic 
soon. But should I wait for this JIRA to be closed before the reviews for that 
are sent out?

Finally, I added the following 2 JIRAs (mentioned in MESOS-3421):
MESOS-4324 Allow access to shared persistent volumes as read only or read write 
by tasks
MESOS-4325 Offer shareable resources to frameworks only if it is opted in

Can I have you as a shepherd for those 2 related JIRAs as well? Thanks.

> Shared Volumes Design Doc
> -
>
> Key: MESOS-4390
> URL: https://issues.apache.org/jira/browse/MESOS-4390
> Project: Mesos
>  Issue Type: Task
>Reporter: Adam B
>Assignee: Anindya Sinha
>  Labels: mesosphere
>
> Review & Approve design doc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3273) EventCall Test Framework is flaky

2016-01-15 Thread Vinod Kone (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-3273:
--
Shepherd: Anand Mazumdar
  Labels: flaky-test mesosphere tech-debt  (was: flaky-test tech-debt 
twitter)

> EventCall Test Framework is flaky
> -
>
> Key: MESOS-3273
> URL: https://issues.apache.org/jira/browse/MESOS-3273
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0
> Environment: 
> https://builds.apache.org/job/Mesos/705/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>  Labels: flaky-test, mesosphere, tech-debt
> Attachments: asan.log
>
>
> Observed this on ASF CI. h/t [~haosd...@gmail.com]
> Looks like the HTTP scheduler never sent a SUBSCRIBE request to the master.
> {code}
> [ RUN  ] ExamplesTest.EventCallFramework
> Using temporary directory '/tmp/ExamplesTest_EventCallFramework_k4vXkx'
> I0813 19:55:15.643579 26085 exec.cpp:443] Ignoring exited event because the 
> driver is aborted!
> Shutting down
> Sending SIGTERM to process tree at pid 26061
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26062
> Shutting down
> Killing the following process trees:
> [ 
> ]
> Sending SIGTERM to process tree at pid 26063
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26098
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26099
> Killing the following process trees:
> [ 
> ]
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0813 19:55:17.161726 26100 process.cpp:1012] libprocess is initialized on 
> 172.17.2.10:60249 for 16 cpus
> I0813 19:55:17.161888 26100 logging.cpp:177] Logging to STDERR
> I0813 19:55:17.163625 26100 scheduler.cpp:157] Version: 0.24.0
> I0813 19:55:17.175302 26100 leveldb.cpp:176] Opened db in 3.167446ms
> I0813 19:55:17.176393 26100 leveldb.cpp:183] Compacted db in 1.047996ms
> I0813 19:55:17.176496 26100 leveldb.cpp:198] Created db iterator in 77155ns
> I0813 19:55:17.176518 26100 leveldb.cpp:204] Seeked to beginning of db in 
> 8429ns
> I0813 19:55:17.176527 26100 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 4219ns
> I0813 19:55:17.176708 26100 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0813 19:55:17.178951 26136 recover.cpp:449] Starting replica recovery
> I0813 19:55:17.179934 26136 recover.cpp:475] Replica is in EMPTY status
> I0813 19:55:17.181970 26126 master.cpp:378] Master 
> 20150813-195517-167907756-60249-26100 (297daca2d01a) started on 
> 172.17.2.10:60249
> I0813 19:55:17.182317 26126 master.cpp:380] Flags at startup: 
> --acls="permissive: false
> register_frameworks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   roles {
> type: SOME
> values: "*"
>   }
> }
> run_tasks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   users {
> type: SOME
> values: "mesos"
>   }
> }
> " --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" 
> --credentials="/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.24.0/src/webui" --work_dir="/tmp/mesos-II8Gua" 
> --zk_session_timeout="10secs"
> I0813 19:55:17.183475 26126 master.cpp:427] Master allowing unauthenticated 
> frameworks to register
> I0813 19:55:17.183536 26126 master.cpp:432] Master allowing unauthenticated 
> slaves to register
> I0813 19:55:17.183615 26126 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials'
> W0813 19:55:17.183859 26126 credentials.hpp:52] Permissions on credentials 
> file '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' are too open. 
> It is recommended that your credentials file is NOT accessible by others.
> I0813 19:55:17.183969 26123 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0813 19:55:17.184306 26126 master.cpp:469] Using default 'crammd5' 
> authenticator
> I0813 19:55:1

[jira] [Updated] (MESOS-4321) Slave total resources in master does not include ALLOCATION SLACK

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4321:

Shepherd:   (was: Joris Van Remoortere)

> Slave total resources in master does not include ALLOCATION SLACK
> -
>
> Key: MESOS-4321
> URL: https://issues.apache.org/jira/browse/MESOS-4321
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> When update slave in master, the totalResources will also be updated, but the 
> total resources does not include ALLOCATION SLACK, this will cause end user 
> cannot get ALLOCATION SLACK detail via http end point, such as /master/state, 
> /master/slaves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4322) The load qos controller should use only USAGE SLACK resources.

2016-01-15 Thread Joris Van Remoortere (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4322:

Shepherd:   (was: Joris Van Remoortere)

> The load qos controller should use only USAGE SLACK resources.
> --
>
> Key: MESOS-4322
> URL: https://issues.apache.org/jira/browse/MESOS-4322
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> The new added load qos controller should only get USAGE SLACK resources but 
> not ALLOCATION SLACK resources.
> {code}
> for (const ResourceUsage::Executor& executor : usage.executors()) {
> // Set kill correction for all revocable executors.
> if (!Resources(executor.allocated()).revocable().empty()) {
>   QoSCorrection correction;
>   correction.set_type(mesos::slave::QoSCorrection_Type_KILL);
>   correction.mutable_kill()->mutable_framework_id()->CopyFrom(
> executor.executor_info().framework_id());
>   correction.mutable_kill()->mutable_executor_id()->CopyFrom(
> executor.executor_info().executor_id());
>   corrections.push_back(correction);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4397) Rename ContainerPrepareInfo to ContainerLaunchInfo for isolators.

2016-01-15 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4397:
--
Shepherd: Jie Yu

> Rename ContainerPrepareInfo to ContainerLaunchInfo for isolators.
> -
>
> Key: MESOS-4397
> URL: https://issues.apache.org/jira/browse/MESOS-4397
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Gilbert Song
>
> The name "ContainerPrepareInfo" does not really capture the purpose of this 
> struct. ContainerLaunchInfo better captures the purpose of this struct. 
> ContainerLaunchInfo is returned by the isolator 'prepare' function. It 
> contains information about how a container should be launched (e.g., 
> environment variables, namespaces, commands, etc.). The information will be 
> used by the Mesos Containerizer when launching the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4397) Rename ContainerPrepareInfo to ContainerLaunchInfo for isolators.

2016-01-15 Thread Jie Yu (JIRA)

Jie Yu created MESOS-4397:
-

 Summary: Rename ContainerPrepareInfo to ContainerLaunchInfo for 
isolators.
 Key: MESOS-4397
 URL: https://issues.apache.org/jira/browse/MESOS-4397
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu


The name "ContainerPrepareInfo" does not really capture the purpose of this 
struct. ContainerLaunchInfo better captures the purpose of this struct. 
ContainerLaunchInfo is returned by the isolator 'prepare' function. It contains 
information about how a container should be launched (e.g., environment 
variables, namespaces, commands, etc.). The information will be used by the 
Mesos Containerizer when launching the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4397) Rename ContainerPrepareInfo to ContainerLaunchInfo for isolators.

2016-01-15 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4397:
--
Assignee: Gilbert Song

> Rename ContainerPrepareInfo to ContainerLaunchInfo for isolators.
> -
>
> Key: MESOS-4397
> URL: https://issues.apache.org/jira/browse/MESOS-4397
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Gilbert Song
>
> The name "ContainerPrepareInfo" does not really capture the purpose of this 
> struct. ContainerLaunchInfo better captures the purpose of this struct. 
> ContainerLaunchInfo is returned by the isolator 'prepare' function. It 
> contains information about how a container should be launched (e.g., 
> environment variables, namespaces, commands, etc.). The information will be 
> used by the Mesos Containerizer when launching the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4397) Rename ContainerPrepareInfo to ContainerLaunchInfo for isolators.

2016-01-15 Thread Jie Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4397:
--
Story Points: 2

> Rename ContainerPrepareInfo to ContainerLaunchInfo for isolators.
> -
>
> Key: MESOS-4397
> URL: https://issues.apache.org/jira/browse/MESOS-4397
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>
> The name "ContainerPrepareInfo" does not really capture the purpose of this 
> struct. ContainerLaunchInfo better captures the purpose of this struct. 
> ContainerLaunchInfo is returned by the isolator 'prepare' function. It 
> contains information about how a container should be launched (e.g., 
> environment variables, namespaces, commands, etc.). The information will be 
> used by the Mesos Containerizer when launching the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2017) Segfault with "Pure virtual method called" when tests fail

2016-01-15 Thread Kevin Klues (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-2017:
---
Labels: mesosphere tests  (was: mesosphere)

> Segfault with "Pure virtual method called" when tests fail
> --
>
> Key: MESOS-2017
> URL: https://issues.apache.org/jira/browse/MESOS-2017
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Yan Xu
>Assignee: Kevin Klues
>  Labels: mesosphere, tests
>
> The most recent one:
> {noformat:title=DRFAllocatorTest.DRFAllocatorProcess}
> [ RUN  ] DRFAllocatorTest.DRFAllocatorProcess
> Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j'
> I1030 05:55:06.934813 24459 leveldb.cpp:176] Opened db in 3.175202ms
> I1030 05:55:06.935925 24459 leveldb.cpp:183] Compacted db in 1.077924ms
> I1030 05:55:06.935976 24459 leveldb.cpp:198] Created db iterator in 16460ns
> I1030 05:55:06.935995 24459 leveldb.cpp:204] Seeked to beginning of db in 
> 2018ns
> I1030 05:55:06.936005 24459 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 335ns
> I1030 05:55:06.936039 24459 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1030 05:55:06.936705 24480 recover.cpp:437] Starting replica recovery
> I1030 05:55:06.937023 24480 recover.cpp:463] Replica is in EMPTY status
> I1030 05:55:06.938158 24475 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I1030 05:55:06.938859 24482 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I1030 05:55:06.939486 24474 recover.cpp:554] Updating replica status to 
> STARTING
> I1030 05:55:06.940249 24489 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 591981ns
> I1030 05:55:06.940274 24489 replica.cpp:320] Persisted replica status to 
> STARTING
> I1030 05:55:06.940752 24481 recover.cpp:463] Replica is in STARTING status
> I1030 05:55:06.940820 24489 master.cpp:312] Master 
> 20141030-055506-3142697795-40429-24459 (pomona.apache.org) started on 
> 67.195.81.187:40429
> I1030 05:55:06.940871 24489 master.cpp:358] Master only allowing 
> authenticated frameworks to register
> I1030 05:55:06.940891 24489 master.cpp:363] Master only allowing 
> authenticated slaves to register
> I1030 05:55:06.940908 24489 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j/credentials'
> I1030 05:55:06.941215 24489 master.cpp:392] Authorization enabled
> I1030 05:55:06.941751 24475 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I1030 05:55:06.942227 24474 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I1030 05:55:06.942401 24476 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.187:40429
> I1030 05:55:06.942895 24483 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I1030 05:55:06.943035 24474 master.cpp:1242] The newly elected leader is 
> master@67.195.81.187:40429 with id 20141030-055506-3142697795-40429-24459
> I1030 05:55:06.943063 24474 master.cpp:1255] Elected as the leading master!
> I1030 05:55:06.943079 24474 master.cpp:1073] Recovering from registrar
> I1030 05:55:06.943313 24480 registrar.cpp:313] Recovering registrar
> I1030 05:55:06.943455 24475 recover.cpp:554] Updating replica status to VOTING
> I1030 05:55:06.944144 24474 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 536365ns
> I1030 05:55:06.944172 24474 replica.cpp:320] Persisted replica status to 
> VOTING
> I1030 05:55:06.944355 24489 recover.cpp:568] Successfully joined the Paxos 
> group
> I1030 05:55:06.944576 24489 recover.cpp:452] Recover process terminated
> I1030 05:55:06.945155 24486 log.cpp:656] Attempting to start the writer
> I1030 05:55:06.947013 24473 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1030 05:55:06.947854 24473 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 806463ns
> I1030 05:55:06.947883 24473 replica.cpp:342] Persisted promised to 1
> I1030 05:55:06.948547 24481 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I1030 05:55:06.950269 24479 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I1030 05:55:06.950933 24479 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 603843ns
> I1030 05:55:06.950961 24479 replica.cpp:676] Persisted action at 0
> I1030 05:55:06.952180 24476 replica.cpp:508] Replica received write request 
> for position 0
> I1030 05:55:06.952239 24476 leveldb.cpp:438] Reading position from leveldb 
> took 28437ns
> I1030 05:55:06.9

[jira] [Updated] (MESOS-2017) Segfault with "Pure virtual method called" when tests fail

2016-01-15 Thread Kevin Klues (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-2017:
---
Shepherd: Benjamin Mahler

> Segfault with "Pure virtual method called" when tests fail
> --
>
> Key: MESOS-2017
> URL: https://issues.apache.org/jira/browse/MESOS-2017
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Yan Xu
>Assignee: Kevin Klues
>  Labels: mesosphere, tests
>
> The most recent one:
> {noformat:title=DRFAllocatorTest.DRFAllocatorProcess}
> [ RUN  ] DRFAllocatorTest.DRFAllocatorProcess
> Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j'
> I1030 05:55:06.934813 24459 leveldb.cpp:176] Opened db in 3.175202ms
> I1030 05:55:06.935925 24459 leveldb.cpp:183] Compacted db in 1.077924ms
> I1030 05:55:06.935976 24459 leveldb.cpp:198] Created db iterator in 16460ns
> I1030 05:55:06.935995 24459 leveldb.cpp:204] Seeked to beginning of db in 
> 2018ns
> I1030 05:55:06.936005 24459 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 335ns
> I1030 05:55:06.936039 24459 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1030 05:55:06.936705 24480 recover.cpp:437] Starting replica recovery
> I1030 05:55:06.937023 24480 recover.cpp:463] Replica is in EMPTY status
> I1030 05:55:06.938158 24475 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I1030 05:55:06.938859 24482 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I1030 05:55:06.939486 24474 recover.cpp:554] Updating replica status to 
> STARTING
> I1030 05:55:06.940249 24489 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 591981ns
> I1030 05:55:06.940274 24489 replica.cpp:320] Persisted replica status to 
> STARTING
> I1030 05:55:06.940752 24481 recover.cpp:463] Replica is in STARTING status
> I1030 05:55:06.940820 24489 master.cpp:312] Master 
> 20141030-055506-3142697795-40429-24459 (pomona.apache.org) started on 
> 67.195.81.187:40429
> I1030 05:55:06.940871 24489 master.cpp:358] Master only allowing 
> authenticated frameworks to register
> I1030 05:55:06.940891 24489 master.cpp:363] Master only allowing 
> authenticated slaves to register
> I1030 05:55:06.940908 24489 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j/credentials'
> I1030 05:55:06.941215 24489 master.cpp:392] Authorization enabled
> I1030 05:55:06.941751 24475 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I1030 05:55:06.942227 24474 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I1030 05:55:06.942401 24476 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.187:40429
> I1030 05:55:06.942895 24483 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I1030 05:55:06.943035 24474 master.cpp:1242] The newly elected leader is 
> master@67.195.81.187:40429 with id 20141030-055506-3142697795-40429-24459
> I1030 05:55:06.943063 24474 master.cpp:1255] Elected as the leading master!
> I1030 05:55:06.943079 24474 master.cpp:1073] Recovering from registrar
> I1030 05:55:06.943313 24480 registrar.cpp:313] Recovering registrar
> I1030 05:55:06.943455 24475 recover.cpp:554] Updating replica status to VOTING
> I1030 05:55:06.944144 24474 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 536365ns
> I1030 05:55:06.944172 24474 replica.cpp:320] Persisted replica status to 
> VOTING
> I1030 05:55:06.944355 24489 recover.cpp:568] Successfully joined the Paxos 
> group
> I1030 05:55:06.944576 24489 recover.cpp:452] Recover process terminated
> I1030 05:55:06.945155 24486 log.cpp:656] Attempting to start the writer
> I1030 05:55:06.947013 24473 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1030 05:55:06.947854 24473 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 806463ns
> I1030 05:55:06.947883 24473 replica.cpp:342] Persisted promised to 1
> I1030 05:55:06.948547 24481 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I1030 05:55:06.950269 24479 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I1030 05:55:06.950933 24479 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 603843ns
> I1030 05:55:06.950961 24479 replica.cpp:676] Persisted action at 0
> I1030 05:55:06.952180 24476 replica.cpp:508] Replica received write request 
> for position 0
> I1030 05:55:06.952239 24476 leveldb.cpp:438] Reading position from leveldb 
> took 28437ns
> I1030 05:55:06.952896 24476 leveld

[jira] [Commented] (MESOS-3987) /create-volumes, /destroy-volumes should be permissive under a master without authentication.

2016-01-15 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102119#comment-15102119
 ] 

Greg Mann commented on MESOS-3987:
--

It seems that this ticket is unnecessary, as these endpoints currently exhibit 
permissive behavior when authorization is disabled. Closing the ticket, and 
opening MESOS-4395 to create tests for this case.

> /create-volumes, /destroy-volumes should be permissive under a master without 
> authentication.
> -
>
> Key: MESOS-3987
> URL: https://issues.apache.org/jira/browse/MESOS-3987
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>  Labels: authentication, mesosphere, persistent-volumes
>
> See MESOS-3940 for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-2017) Segfault with "Pure virtual method called" when tests fail

2016-01-15 Thread Kevin Klues (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues reassigned MESOS-2017:
--

Assignee: Kevin Klues  (was: Yan Xu)

> Segfault with "Pure virtual method called" when tests fail
> --
>
> Key: MESOS-2017
> URL: https://issues.apache.org/jira/browse/MESOS-2017
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Yan Xu
>Assignee: Kevin Klues
>  Labels: twitter
>
> The most recent one:
> {noformat:title=DRFAllocatorTest.DRFAllocatorProcess}
> [ RUN  ] DRFAllocatorTest.DRFAllocatorProcess
> Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j'
> I1030 05:55:06.934813 24459 leveldb.cpp:176] Opened db in 3.175202ms
> I1030 05:55:06.935925 24459 leveldb.cpp:183] Compacted db in 1.077924ms
> I1030 05:55:06.935976 24459 leveldb.cpp:198] Created db iterator in 16460ns
> I1030 05:55:06.935995 24459 leveldb.cpp:204] Seeked to beginning of db in 
> 2018ns
> I1030 05:55:06.936005 24459 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 335ns
> I1030 05:55:06.936039 24459 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1030 05:55:06.936705 24480 recover.cpp:437] Starting replica recovery
> I1030 05:55:06.937023 24480 recover.cpp:463] Replica is in EMPTY status
> I1030 05:55:06.938158 24475 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I1030 05:55:06.938859 24482 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I1030 05:55:06.939486 24474 recover.cpp:554] Updating replica status to 
> STARTING
> I1030 05:55:06.940249 24489 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 591981ns
> I1030 05:55:06.940274 24489 replica.cpp:320] Persisted replica status to 
> STARTING
> I1030 05:55:06.940752 24481 recover.cpp:463] Replica is in STARTING status
> I1030 05:55:06.940820 24489 master.cpp:312] Master 
> 20141030-055506-3142697795-40429-24459 (pomona.apache.org) started on 
> 67.195.81.187:40429
> I1030 05:55:06.940871 24489 master.cpp:358] Master only allowing 
> authenticated frameworks to register
> I1030 05:55:06.940891 24489 master.cpp:363] Master only allowing 
> authenticated slaves to register
> I1030 05:55:06.940908 24489 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j/credentials'
> I1030 05:55:06.941215 24489 master.cpp:392] Authorization enabled
> I1030 05:55:06.941751 24475 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I1030 05:55:06.942227 24474 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I1030 05:55:06.942401 24476 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.187:40429
> I1030 05:55:06.942895 24483 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I1030 05:55:06.943035 24474 master.cpp:1242] The newly elected leader is 
> master@67.195.81.187:40429 with id 20141030-055506-3142697795-40429-24459
> I1030 05:55:06.943063 24474 master.cpp:1255] Elected as the leading master!
> I1030 05:55:06.943079 24474 master.cpp:1073] Recovering from registrar
> I1030 05:55:06.943313 24480 registrar.cpp:313] Recovering registrar
> I1030 05:55:06.943455 24475 recover.cpp:554] Updating replica status to VOTING
> I1030 05:55:06.944144 24474 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 536365ns
> I1030 05:55:06.944172 24474 replica.cpp:320] Persisted replica status to 
> VOTING
> I1030 05:55:06.944355 24489 recover.cpp:568] Successfully joined the Paxos 
> group
> I1030 05:55:06.944576 24489 recover.cpp:452] Recover process terminated
> I1030 05:55:06.945155 24486 log.cpp:656] Attempting to start the writer
> I1030 05:55:06.947013 24473 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1030 05:55:06.947854 24473 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 806463ns
> I1030 05:55:06.947883 24473 replica.cpp:342] Persisted promised to 1
> I1030 05:55:06.948547 24481 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I1030 05:55:06.950269 24479 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I1030 05:55:06.950933 24479 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 603843ns
> I1030 05:55:06.950961 24479 replica.cpp:676] Persisted action at 0
> I1030 05:55:06.952180 24476 replica.cpp:508] Replica received write request 
> for position 0
> I1030 05:55:06.952239 24476 leveldb.cpp:438] Reading position from leveldb 
> took 28437ns
> I1030 05:55:06.952896 2447

[jira] [Updated] (MESOS-1594) SlaveRecoveryTest/0.ReconcileKillTask is flaky

2016-01-15 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-1594:
-
  Sprint: Mesosphere Sprint 27
Story Points: 2
  Labels: flaky mesosphere  (was: flaky)

> SlaveRecoveryTest/0.ReconcileKillTask is flaky
> --
>
> Key: MESOS-1594
> URL: https://issues.apache.org/jira/browse/MESOS-1594
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.0
> Environment: Ubuntu 12.10 with GCC
>Reporter: Vinod Kone
>Assignee: Greg Mann
>  Labels: flaky, mesosphere
>
> Observed this on Jenkins.
> {code}
> [ RUN  ] SlaveRecoveryTest/0.ReconcileKillTask
> Using temporary directory '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG'
> I0714 15:08:43.915114 27216 leveldb.cpp:176] Opened db in 474.695188ms
> I0714 15:08:43.933645 27216 leveldb.cpp:183] Compacted db in 18.068942ms
> I0714 15:08:43.934129 27216 leveldb.cpp:198] Created db iterator in 7860ns
> I0714 15:08:43.934439 27216 leveldb.cpp:204] Seeked to beginning of db in 
> 2560ns
> I0714 15:08:43.934779 27216 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1400ns
> I0714 15:08:43.935098 27216 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0714 15:08:43.936027 27238 recover.cpp:425] Starting replica recovery
> I0714 15:08:43.936225 27238 recover.cpp:451] Replica is in EMPTY status
> I0714 15:08:43.936867 27238 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0714 15:08:43.937049 27238 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0714 15:08:43.937232 27238 recover.cpp:542] Updating replica status to 
> STARTING
> I0714 15:08:43.945600 27235 master.cpp:288] Master 
> 20140714-150843-16842879-55850-27216 (quantal) started on 127.0.1.1:55850
> I0714 15:08:43.945643 27235 master.cpp:325] Master only allowing 
> authenticated frameworks to register
> I0714 15:08:43.945651 27235 master.cpp:330] Master only allowing 
> authenticated slaves to register
> I0714 15:08:43.945658 27235 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG/credentials'
> I0714 15:08:43.945808 27235 master.cpp:359] Authorization enabled
> I0714 15:08:43.946369 27235 hierarchical_allocator_process.hpp:301] 
> Initializing hierarchical allocator process with master : 
> master@127.0.1.1:55850
> I0714 15:08:43.946419 27235 master.cpp:122] No whitelist given. Advertising 
> offers for all slaves
> I0714 15:08:43.946614 27235 master.cpp:1128] The newly elected leader is 
> master@127.0.1.1:55850 with id 20140714-150843-16842879-55850-27216
> I0714 15:08:43.946630 27235 master.cpp:1141] Elected as the leading master!
> I0714 15:08:43.946637 27235 master.cpp:959] Recovering from registrar
> I0714 15:08:43.946707 27235 registrar.cpp:313] Recovering registrar
> I0714 15:08:43.957895 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 20.529301ms
> I0714 15:08:43.957978 27238 replica.cpp:320] Persisted replica status to 
> STARTING
> I0714 15:08:43.958142 27238 recover.cpp:451] Replica is in STARTING status
> I0714 15:08:43.958664 27238 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0714 15:08:43.958762 27238 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0714 15:08:43.958945 27238 recover.cpp:542] Updating replica status to VOTING
> I0714 15:08:43.975685 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 16.646136ms
> I0714 15:08:43.976367 27238 replica.cpp:320] Persisted replica status to 
> VOTING
> I0714 15:08:43.976824 27241 recover.cpp:556] Successfully joined the Paxos 
> group
> I0714 15:08:43.977072 27242 recover.cpp:440] Recover process terminated
> I0714 15:08:43.980590 27236 log.cpp:656] Attempting to start the writer
> I0714 15:08:43.981385 27236 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0714 15:08:43.999141 27236 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 17.705787ms
> I0714 15:08:43.999222 27236 replica.cpp:342] Persisted promised to 1
> I0714 15:08:44.004451 27240 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0714 15:08:44.004914 27240 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0714 15:08:44.021456 27240 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 16.499775ms
> I0714 15:08:44.021533 27240 replica.cpp:676] Persisted action at 0
> I0714 15:08:44.022006 27240 replica.cpp:508] Replica received write request 
> for position 0
> I0714 15:08:44.022043 27240 leveldb.cpp:438] Reading position from leveldb 
> took 2

[jira] [Updated] (MESOS-1594) SlaveRecoveryTest/0.ReconcileKillTask is flaky

2016-01-15 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-1594:
-
Shepherd: Vinod Kone

> SlaveRecoveryTest/0.ReconcileKillTask is flaky
> --
>
> Key: MESOS-1594
> URL: https://issues.apache.org/jira/browse/MESOS-1594
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.0
> Environment: Ubuntu 12.10 with GCC
>Reporter: Vinod Kone
>Assignee: Greg Mann
>  Labels: flaky, mesosphere
>
> Observed this on Jenkins.
> {code}
> [ RUN  ] SlaveRecoveryTest/0.ReconcileKillTask
> Using temporary directory '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG'
> I0714 15:08:43.915114 27216 leveldb.cpp:176] Opened db in 474.695188ms
> I0714 15:08:43.933645 27216 leveldb.cpp:183] Compacted db in 18.068942ms
> I0714 15:08:43.934129 27216 leveldb.cpp:198] Created db iterator in 7860ns
> I0714 15:08:43.934439 27216 leveldb.cpp:204] Seeked to beginning of db in 
> 2560ns
> I0714 15:08:43.934779 27216 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1400ns
> I0714 15:08:43.935098 27216 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0714 15:08:43.936027 27238 recover.cpp:425] Starting replica recovery
> I0714 15:08:43.936225 27238 recover.cpp:451] Replica is in EMPTY status
> I0714 15:08:43.936867 27238 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0714 15:08:43.937049 27238 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0714 15:08:43.937232 27238 recover.cpp:542] Updating replica status to 
> STARTING
> I0714 15:08:43.945600 27235 master.cpp:288] Master 
> 20140714-150843-16842879-55850-27216 (quantal) started on 127.0.1.1:55850
> I0714 15:08:43.945643 27235 master.cpp:325] Master only allowing 
> authenticated frameworks to register
> I0714 15:08:43.945651 27235 master.cpp:330] Master only allowing 
> authenticated slaves to register
> I0714 15:08:43.945658 27235 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/SlaveRecoveryTest_0_ReconcileKillTask_3zJ6DG/credentials'
> I0714 15:08:43.945808 27235 master.cpp:359] Authorization enabled
> I0714 15:08:43.946369 27235 hierarchical_allocator_process.hpp:301] 
> Initializing hierarchical allocator process with master : 
> master@127.0.1.1:55850
> I0714 15:08:43.946419 27235 master.cpp:122] No whitelist given. Advertising 
> offers for all slaves
> I0714 15:08:43.946614 27235 master.cpp:1128] The newly elected leader is 
> master@127.0.1.1:55850 with id 20140714-150843-16842879-55850-27216
> I0714 15:08:43.946630 27235 master.cpp:1141] Elected as the leading master!
> I0714 15:08:43.946637 27235 master.cpp:959] Recovering from registrar
> I0714 15:08:43.946707 27235 registrar.cpp:313] Recovering registrar
> I0714 15:08:43.957895 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 20.529301ms
> I0714 15:08:43.957978 27238 replica.cpp:320] Persisted replica status to 
> STARTING
> I0714 15:08:43.958142 27238 recover.cpp:451] Replica is in STARTING status
> I0714 15:08:43.958664 27238 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0714 15:08:43.958762 27238 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0714 15:08:43.958945 27238 recover.cpp:542] Updating replica status to VOTING
> I0714 15:08:43.975685 27238 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 16.646136ms
> I0714 15:08:43.976367 27238 replica.cpp:320] Persisted replica status to 
> VOTING
> I0714 15:08:43.976824 27241 recover.cpp:556] Successfully joined the Paxos 
> group
> I0714 15:08:43.977072 27242 recover.cpp:440] Recover process terminated
> I0714 15:08:43.980590 27236 log.cpp:656] Attempting to start the writer
> I0714 15:08:43.981385 27236 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0714 15:08:43.999141 27236 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 17.705787ms
> I0714 15:08:43.999222 27236 replica.cpp:342] Persisted promised to 1
> I0714 15:08:44.004451 27240 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0714 15:08:44.004914 27240 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0714 15:08:44.021456 27240 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 16.499775ms
> I0714 15:08:44.021533 27240 replica.cpp:676] Persisted action at 0
> I0714 15:08:44.022006 27240 replica.cpp:508] Replica received write request 
> for position 0
> I0714 15:08:44.022043 27240 leveldb.cpp:438] Reading position from leveldb 
> took 21376ns
> I0714 15:08:44.035969 27240 leveldb.cpp:343] Persisting action (14 bytes)

1 2 >

1 - 100 of 195 matches

Mail list logo