[jira] [Commented] (MESOS-5212) Allow any principal in ReservationInfo when HTTP authentication is off

2016-05-12 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281385#comment-15281385
 ] 

Bernd Mathiske commented on MESOS-5212:
---

This patch is implementation-only (with tests), which is proper. I am assuming 
the documentation changes that go along with the new behavior will then be 
posted against MESOS-5215? IMHO it would also be OK to dedicate limited doc 
updates to the ticket here.

> Allow any principal in ReservationInfo when HTTP authentication is off
> --
>
> Key: MESOS-5212
> URL: https://issues.apache.org/jira/browse/MESOS-5212
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.28.1
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> Mesos currently provides no way for operators to pass their principal to HTTP 
> endpoints when HTTP authentication is off. Since we enforce that 
> {{ReservationInfo.principal}} be equal to the operator principal in requests 
> to {{/reserve}}, this means that when HTTP authentication is disabled, the 
> {{ReservationInfo.principal}} field cannot be set.
> To address this in the short-term, we should allow 
> {{ReservationInfo.principal}} to hold any value when HTTP authentication is 
> disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5212) Allow any principal in ReservationInfo when HTTP authentication is off

2016-05-09 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-5212:
--
Shepherd: Bernd Mathiske

> Allow any principal in ReservationInfo when HTTP authentication is off
> --
>
> Key: MESOS-5212
> URL: https://issues.apache.org/jira/browse/MESOS-5212
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.28.1
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> Mesos currently provides no way for operators to pass their principal to HTTP 
> endpoints when HTTP authentication is off. Since we enforce that 
> {{ReservationInfo.principal}} be equal to the operator principal in requests 
> to {{/reserve}}, this means that when HTTP authentication is disabled, the 
> {{ReservationInfo.principal}} field cannot be set.
> To address this in the short-term, we should allow 
> {{ReservationInfo.principal}} to hold any value when HTTP authentication is 
> disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky

2016-05-09 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276116#comment-15276116
 ] 

Bernd Mathiske commented on MESOS-3235:
---

It seems doubtful that lengthening the wait time for task completion solves 
much, since successful runs are way shorter than the default 15 seconds, 
typically in the low single digit second range. Machines aren't that slow, are 
they? And we get these failures on machines that are known to be fast 
occasionally as well. I suspect something else is wrong here.

What I have seen in failure logs is that one task somehow has not produced 
status updates all the way up to the AWAIT statement in question - although it 
must have reached the contention barrier which asserts that all tasks have been 
launched as the fetcher has been observed downloading every script. So one 
guess is that something is blocking/eating/delaying status updates at some 
stage - occasionally. In all the cases I have seen the tasks are not launched 
in serial order. And that's exactly why I wrote this test! So we can see if we 
are dealing with concurrency correctly. Too bad we don't know what's failing 
yet. 

If we had a way to reproduce this behavior more often, we could switch on more 
logging and just repeat the test often enough to find something. But repeating 
the test tends to make the problem go away.

Ideas?

> FetcherCacheHttpTest.HttpCachedSerialized and 
> FetcherCacheHttpTest.HttpCachedConcurrent are flaky
> -
>
> Key: MESOS-3235
> URL: https://issues.apache.org/jira/browse/MESOS-3235
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, tests
>Affects Versions: 0.23.0
>Reporter: Joseph Wu
>Assignee: haosdent
>  Labels: mesosphere
> Fix For: 0.27.0
>
> Attachments: fetchercache_log_centos_6.txt
>
>
> On OSX, {{make clean && make -j8 V=0 check}}:
> {code}
> [--] 3 tests from FetcherCacheHttpTest
> [ RUN  ] FetcherCacheHttpTest.HttpCachedSerialized
> HTTP/1.1 200 OK
> Date: Fri, 07 Aug 2015 17:23:05 GMT
> Content-Length: 30
> I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 0
> Forked command at 54363
> sh -c './mesos-fetcher-test-cmd 0'
> E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54363)
> E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 1
> Forked command at 54411
> sh -c './mesos-fetcher-test-cmd 1'
> E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54411)
> E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> ../../src/tests/fetcher_cache_tests.cpp:860: Failure
> Failed to wait 15secs for awaitFinished(task.get())
> *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are 
> using GNU date ***
> [  FAILED  ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms)
> [ RUN  ] FetcherCacheHttpTest.HttpCachedConcurrent
> PC: @0x113723618 process::Owned<>::get()
> *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: ***
> @ 0x7fff8fcacf1a _sigtramp
> @ 0x7f9bc3109710 (unknown)
> @0x1136f07e2 mesos::internal::slave::Fetcher::fetch()
> @0x113862f9d 
> mesos::internal::slave::MesosContainerizerProcess::fetch()
> @0x1138f1b5d 
> _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_
> @0x1138f18cf 
> 

[jira] [Commented] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky

2016-05-03 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268446#comment-15268446
 ] 

Bernd Mathiske commented on MESOS-3235:
---

Looks like task 0 never got started at all and therefore waiting for it fails.

> FetcherCacheHttpTest.HttpCachedSerialized and 
> FetcherCacheHttpTest.HttpCachedConcurrent are flaky
> -
>
> Key: MESOS-3235
> URL: https://issues.apache.org/jira/browse/MESOS-3235
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, tests
>Affects Versions: 0.23.0
>Reporter: Joseph Wu
>Assignee: haosdent
>  Labels: mesosphere
> Fix For: 0.27.0
>
> Attachments: fetchercache_log_centos_6.txt
>
>
> On OSX, {{make clean && make -j8 V=0 check}}:
> {code}
> [--] 3 tests from FetcherCacheHttpTest
> [ RUN  ] FetcherCacheHttpTest.HttpCachedSerialized
> HTTP/1.1 200 OK
> Date: Fri, 07 Aug 2015 17:23:05 GMT
> Content-Length: 30
> I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 0
> Forked command at 54363
> sh -c './mesos-fetcher-test-cmd 0'
> E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54363)
> E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 1
> Forked command at 54411
> sh -c './mesos-fetcher-test-cmd 1'
> E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54411)
> E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> ../../src/tests/fetcher_cache_tests.cpp:860: Failure
> Failed to wait 15secs for awaitFinished(task.get())
> *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are 
> using GNU date ***
> [  FAILED  ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms)
> [ RUN  ] FetcherCacheHttpTest.HttpCachedConcurrent
> PC: @0x113723618 process::Owned<>::get()
> *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: ***
> @ 0x7fff8fcacf1a _sigtramp
> @ 0x7f9bc3109710 (unknown)
> @0x1136f07e2 mesos::internal::slave::Fetcher::fetch()
> @0x113862f9d 
> mesos::internal::slave::MesosContainerizerProcess::fetch()
> @0x1138f1b5d 
> _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_
> @0x1138f18cf 
> _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_
> @0x1143768cf std::__1::function<>::operator()()
> @0x11435ca7f process::ProcessBase::visit()
> @0x1143ed6fe process::DispatchEvent::visit()
> @0x11271 process::ProcessBase::serve()
> @0x114343b4e process::ProcessManager::resume()
> @0x1143431ca process::internal::schedule()
> @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_
> @ 0x7fff95090268 _pthread_body
> @ 0x7fff950901e5 _pthread_start
> @ 0x7fff9508e41d thread_start
> Failed to synchronize with slave (it's probably exited)
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {code}
> This was 

[jira] [Updated] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky

2016-04-28 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3235:
--
Sprint: Mesosphere Sprint 20, Mesosphere Sprint 26, Mesosphere Sprint 33  
(was: Mesosphere Sprint 20, Mesosphere Sprint 26, Mesosphere Sprint 33, 
Mesosphere Sprint 34)

> FetcherCacheHttpTest.HttpCachedSerialized and 
> FetcherCacheHttpTest.HttpCachedConcurrent are flaky
> -
>
> Key: MESOS-3235
> URL: https://issues.apache.org/jira/browse/MESOS-3235
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, tests
>Affects Versions: 0.23.0
>Reporter: Joseph Wu
>Assignee: Bernd Mathiske
>  Labels: mesosphere
> Fix For: 0.27.0
>
> Attachments: fetchercache_log_centos_6.txt
>
>
> On OSX, {{make clean && make -j8 V=0 check}}:
> {code}
> [--] 3 tests from FetcherCacheHttpTest
> [ RUN  ] FetcherCacheHttpTest.HttpCachedSerialized
> HTTP/1.1 200 OK
> Date: Fri, 07 Aug 2015 17:23:05 GMT
> Content-Length: 30
> I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 0
> Forked command at 54363
> sh -c './mesos-fetcher-test-cmd 0'
> E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54363)
> E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 1
> Forked command at 54411
> sh -c './mesos-fetcher-test-cmd 1'
> E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54411)
> E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> ../../src/tests/fetcher_cache_tests.cpp:860: Failure
> Failed to wait 15secs for awaitFinished(task.get())
> *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are 
> using GNU date ***
> [  FAILED  ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms)
> [ RUN  ] FetcherCacheHttpTest.HttpCachedConcurrent
> PC: @0x113723618 process::Owned<>::get()
> *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: ***
> @ 0x7fff8fcacf1a _sigtramp
> @ 0x7f9bc3109710 (unknown)
> @0x1136f07e2 mesos::internal::slave::Fetcher::fetch()
> @0x113862f9d 
> mesos::internal::slave::MesosContainerizerProcess::fetch()
> @0x1138f1b5d 
> _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_
> @0x1138f18cf 
> _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_
> @0x1143768cf std::__1::function<>::operator()()
> @0x11435ca7f process::ProcessBase::visit()
> @0x1143ed6fe process::DispatchEvent::visit()
> @0x11271 process::ProcessBase::serve()
> @0x114343b4e process::ProcessManager::resume()
> @0x1143431ca process::internal::schedule()
> @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_
> @ 0x7fff95090268 _pthread_body
> @ 0x7fff950901e5 _pthread_start
> @ 0x7fff9508e41d thread_start
> Failed to synchronize with slave (it's probably exited)
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> 

[jira] [Commented] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky

2016-04-26 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257751#comment-15257751
 ] 

Bernd Mathiske commented on MESOS-3235:
---

So far I could not reproduce the behavior. Also, a few weeks ago I still saw 
this test failing on several occasions, but lately it has been stable with no 
failures. 

Looking at the log, it seems that all tasks got executed normally. The only 
thing that looks a bit strange is that TASK_KILLED is mentioned after 
TASK_FINISHED. I'll look into that, but on the backburner.

> FetcherCacheHttpTest.HttpCachedSerialized and 
> FetcherCacheHttpTest.HttpCachedConcurrent are flaky
> -
>
> Key: MESOS-3235
> URL: https://issues.apache.org/jira/browse/MESOS-3235
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, tests
>Affects Versions: 0.23.0
>Reporter: Joseph Wu
>Assignee: Bernd Mathiske
>  Labels: mesosphere
> Fix For: 0.27.0
>
> Attachments: fetchercache_log_centos_6.txt
>
>
> On OSX, {{make clean && make -j8 V=0 check}}:
> {code}
> [--] 3 tests from FetcherCacheHttpTest
> [ RUN  ] FetcherCacheHttpTest.HttpCachedSerialized
> HTTP/1.1 200 OK
> Date: Fri, 07 Aug 2015 17:23:05 GMT
> Content-Length: 30
> I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 0
> Forked command at 54363
> sh -c './mesos-fetcher-test-cmd 0'
> E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54363)
> E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 1
> Forked command at 54411
> sh -c './mesos-fetcher-test-cmd 1'
> E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54411)
> E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> ../../src/tests/fetcher_cache_tests.cpp:860: Failure
> Failed to wait 15secs for awaitFinished(task.get())
> *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are 
> using GNU date ***
> [  FAILED  ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms)
> [ RUN  ] FetcherCacheHttpTest.HttpCachedConcurrent
> PC: @0x113723618 process::Owned<>::get()
> *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: ***
> @ 0x7fff8fcacf1a _sigtramp
> @ 0x7f9bc3109710 (unknown)
> @0x1136f07e2 mesos::internal::slave::Fetcher::fetch()
> @0x113862f9d 
> mesos::internal::slave::MesosContainerizerProcess::fetch()
> @0x1138f1b5d 
> _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_
> @0x1138f18cf 
> _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_
> @0x1143768cf std::__1::function<>::operator()()
> @0x11435ca7f process::ProcessBase::visit()
> @0x1143ed6fe process::DispatchEvent::visit()
> @0x11271 process::ProcessBase::serve()
> @0x114343b4e process::ProcessManager::resume()
> @0x1143431ca process::internal::schedule()
> @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_
> @ 0x7fff95090268 _pthread_body
> @ 

[jira] [Commented] (MESOS-3367) Mesos fetcher does not extract archives for URI with parameters

2016-04-25 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256141#comment-15256141
 ] 

Bernd Mathiske commented on MESOS-3367:
---

Sorry, I wasn't following this, because I was OOO. Just FYI I agree with the 
resolution. :-)

> Mesos fetcher does not extract archives for URI with parameters
> ---
>
> Key: MESOS-3367
> URL: https://issues.apache.org/jira/browse/MESOS-3367
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 0.22.1, 0.23.0
> Environment: DCOS 1.1
>Reporter: Renat Zubairov
>Assignee: haosdent
>Priority: Minor
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> I'm deploying using marathon applications with sources served from S3. I'm 
> using a signed URL to give only temporary access to the S3 resources, so URL 
> of the resource have some query parameters.
> So URI is 'https://foo.com/file.tgz?hasi' and fetcher stores it in the file 
> with the name 'file.tgz?hasi', then it thinks that extension 'hasi' is not 
> tgz hence extraction is skipped, despite the fact that MIME Type of the HTTP 
> resource is 'application/x-tar'.
> Workaround - add additional parameter like '=.tgz'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-04-25 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256068#comment-15256068
 ] 

Bernd Mathiske commented on MESOS-4760:
---

 [~mrbrowning], I am not aware of near-term plans for injection of the fetcher 
process into the slave object. If you want to take this on, I am happy to 
shepherd it.

> Expose metrics and gauges for fetcher cache usage and hit rate
> --
>
> Key: MESOS-4760
> URL: https://issues.apache.org/jira/browse/MESOS-4760
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, statistics
>Reporter: Michael Browning
>Assignee: Michael Browning
>Priority: Minor
>  Labels: features, fetcher, statistics, uber
>
> To evaluate the fetcher cache and calibrate the value of the 
> fetcher_cache_size flag, it would be useful to have metrics and gauges on 
> agents that expose operational statistics like cache hit rate, occupied cache 
> size, and time spent downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-04-25 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256068#comment-15256068
 ] 

Bernd Mathiske edited comment on MESOS-4760 at 4/25/16 8:28 AM:


 [~mrbrowning], I am not aware of near-term plans for injection of the fetcher 
into the slave object. If you want to take this on, I am happy to shepherd it.


was (Author: bernd-mesos):
 [~mrbrowning], I am not aware of near-term plans for injection of the fetcher 
process into the slave object. If you want to take this on, I am happy to 
shepherd it.

> Expose metrics and gauges for fetcher cache usage and hit rate
> --
>
> Key: MESOS-4760
> URL: https://issues.apache.org/jira/browse/MESOS-4760
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, statistics
>Reporter: Michael Browning
>Assignee: Michael Browning
>Priority: Minor
>  Labels: features, fetcher, statistics, uber
>
> To evaluate the fetcher cache and calibrate the value of the 
> fetcher_cache_size flag, it would be useful to have metrics and gauges on 
> agents that expose operational statistics like cache hit rate, occupied cache 
> size, and time spent downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5010) Installation of mesos python package is incomplete

2016-04-11 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-5010:
--
   Sprint: Mesosphere Sprint 32
Fix Version/s: 0.29.0

> Installation of mesos python package is incomplete
> --
>
> Key: MESOS-5010
> URL: https://issues.apache.org/jira/browse/MESOS-5010
> Project: Mesos
>  Issue Type: Bug
>  Components: python api
>Affects Versions: 0.26.0, 0.28.0, 0.27.2, 0.29.0
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
> Fix For: 0.29.0
>
>
> The installation of mesos python package is incomplete, i.e., the files 
> {{cli.py}}, {{futures.py}}, and {{http.py}} are not installed.
> {code}
> % ../configure --enable-python
> % make install DESTDIR=$PWD/D
> % PYTHONPATH=$PWD/D/usr/local/lib/python2.7/site-packages:$PYTHONPATH python 
> -c 'from mesos import http'
> Traceback (most recent call last):
>   File "", line 1, in 
> ImportError: cannot import name http
> {code}
> This appears to be first broken with {{d1d70b9}} (MESOS-3969, [Upgraded 
> bundled pip to 7.1.2.|https://reviews.apache.org/r/40630]). Bisecting in 
> {{pip}}-land shows that our install becomes broken for {{pip-6.0.1}} and 
> later (we are using {{pip-7.1.2}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky

2016-04-11 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3235:
--
Sprint: Mesosphere Sprint 20, Mesosphere Sprint 26, Mesosphere Sprint 33  
(was: Mesosphere Sprint 20, Mesosphere Sprint 26)

> FetcherCacheHttpTest.HttpCachedSerialized and 
> FetcherCacheHttpTest.HttpCachedConcurrent are flaky
> -
>
> Key: MESOS-3235
> URL: https://issues.apache.org/jira/browse/MESOS-3235
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, tests
>Affects Versions: 0.23.0
>Reporter: Joseph Wu
>Assignee: Bernd Mathiske
>  Labels: mesosphere
> Fix For: 0.27.0
>
> Attachments: fetchercache_log_centos_6.txt
>
>
> On OSX, {{make clean && make -j8 V=0 check}}:
> {code}
> [--] 3 tests from FetcherCacheHttpTest
> [ RUN  ] FetcherCacheHttpTest.HttpCachedSerialized
> HTTP/1.1 200 OK
> Date: Fri, 07 Aug 2015 17:23:05 GMT
> Content-Length: 30
> I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 0
> Forked command at 54363
> sh -c './mesos-fetcher-test-cmd 0'
> E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54363)
> E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 1
> Forked command at 54411
> sh -c './mesos-fetcher-test-cmd 1'
> E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54411)
> E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> ../../src/tests/fetcher_cache_tests.cpp:860: Failure
> Failed to wait 15secs for awaitFinished(task.get())
> *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are 
> using GNU date ***
> [  FAILED  ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms)
> [ RUN  ] FetcherCacheHttpTest.HttpCachedConcurrent
> PC: @0x113723618 process::Owned<>::get()
> *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: ***
> @ 0x7fff8fcacf1a _sigtramp
> @ 0x7f9bc3109710 (unknown)
> @0x1136f07e2 mesos::internal::slave::Fetcher::fetch()
> @0x113862f9d 
> mesos::internal::slave::MesosContainerizerProcess::fetch()
> @0x1138f1b5d 
> _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_
> @0x1138f18cf 
> _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_
> @0x1143768cf std::__1::function<>::operator()()
> @0x11435ca7f process::ProcessBase::visit()
> @0x1143ed6fe process::DispatchEvent::visit()
> @0x11271 process::ProcessBase::serve()
> @0x114343b4e process::ProcessManager::resume()
> @0x1143431ca process::internal::schedule()
> @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_
> @ 0x7fff95090268 _pthread_body
> @ 0x7fff950901e5 _pthread_start
> @ 0x7fff9508e41d thread_start
> Failed to synchronize with slave (it's probably exited)
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {code}
> 

[jira] [Commented] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-04-07 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230864#comment-15230864
 ] 

Bernd Mathiske commented on MESOS-4760:
---

Alright - let's do this! :-) Thanks!

> Expose metrics and gauges for fetcher cache usage and hit rate
> --
>
> Key: MESOS-4760
> URL: https://issues.apache.org/jira/browse/MESOS-4760
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, statistics
>Reporter: Michael Browning
>Priority: Minor
>  Labels: features, fetcher, statistics, uber
>
> To evaluate the fetcher cache and calibrate the value of the 
> fetcher_cache_size flag, it would be useful to have metrics and gauges on 
> agents that expose operational statistics like cache hit rate, occupied cache 
> size, and time spent downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate

2016-04-05 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225929#comment-15225929
 ] 

Bernd Mathiske commented on MESOS-4760:
---

In principle I would shepherd this, but it seems to have low priority. Right?

> Expose metrics and gauges for fetcher cache usage and hit rate
> --
>
> Key: MESOS-4760
> URL: https://issues.apache.org/jira/browse/MESOS-4760
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher, statistics
>Reporter: Michael Browning
>Priority: Minor
>  Labels: features, fetcher, statistics, uber
>
> To evaluate the fetcher cache and calibrate the value of the 
> fetcher_cache_size flag, it would be useful to have metrics and gauges on 
> agents that expose operational statistics like cache hit rate, occupied cache 
> size, and time spent downloading resources that were not present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6

2016-03-14 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4053:
--
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 31  
(was: Mesosphere Sprint 26, Mesosphere Sprint 27)

> MemoryPressureMesosTest tests fail on CentOS 6.6
> 
>
> Key: MESOS-4053
> URL: https://issues.apache.org/jira/browse/MESOS-4053
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>Assignee: Benjamin Hindman
>  Labels: mesosphere, test-failure
>
> {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and 
> {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It 
> seems that mounted cgroups are not properly cleaned up after previous tests, 
> so multiple hierarchies are detected and thus an error is produced:
> {code}
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4912) LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails.

2016-03-14 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4912:
--
Sprint: Mesosphere Sprint 31

> LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails.
> --
>
> Key: MESOS-4912
> URL: https://issues.apache.org/jira/browse/MESOS-4912
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Affects Versions: 0.28.0
> Environment: CenOS 7, SSL
>Reporter: Bernd Mathiske
>  Labels: mesosphere
>
> Observed on our CI:
> {noformat}
> [09:34:15] :   [Step 11/11] [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_MultipleContainers
> [09:34:19]W:   [Step 11/11] I0309 09:34:19.906719  2357 linux.cpp:81] Making 
> '/tmp/MLVLnv' a shared mount
> [09:34:19]W:   [Step 11/11] I0309 09:34:19.923548  2357 
> linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> [09:34:19]W:   [Step 11/11] I0309 09:34:19.924705  2376 
> containerizer.cpp:666] Starting container 
> 'da610f7f-a709-4de8-94d3-74f4a520619b' for executor 'test_executor1' of 
> framework ''
> [09:34:19]W:   [Step 11/11] I0309 09:34:19.925355  2371 provisioner.cpp:285] 
> Provisioning image rootfs 
> '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0'
>  for container da610f7f-a709-4de8-94d3-74f4a520619b
> [09:34:19]W:   [Step 11/11] I0309 09:34:19.925881  2377 copy.cpp:127] Copying 
> layer path '/tmp/MLVLnv/test_image1' to rootfs 
> '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0'
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.835127  2376 linux.cpp:355] Bind 
> mounting work directory from 
> '/tmp/MLVLnv/slaves/test_slave/frameworks/executors/test_executor1/runs/da610f7f-a709-4de8-94d3-74f4a520619b'
>  to 
> '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox'
>  for container da610f7f-a709-4de8-94d3-74f4a520619b
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.835392  2376 linux.cpp:683] 
> Changing the ownership of the persistent volume at 
> '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' with uid 0 and gid > 0
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.840425  2376 linux.cpp:723] 
> Mounting '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' to 
> '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox/volume'
>  for persistent volume disk(test_role)[persistent_volume_id:volume]:32 of 
> container da610f7f-a709-4de8-94d3-74f4a520619b
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.843878  2374 
> linux_launcher.cpp:304] Cloning child process with flags = CLONE_NEWNS
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.848302  2371 
> containerizer.cpp:666] Starting container 
> 'fe4729c5-1e63-4cc6-a2e3-fe5006ffe087' for executor 'test_executor2' of 
> framework ''
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.848758  2371 
> containerizer.cpp:1392] Destroying container 
> 'da610f7f-a709-4de8-94d3-74f4a520619b'
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.848865  2373 provisioner.cpp:285] 
> Provisioning image rootfs 
> '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917'
>  for container fe4729c5-1e63-4cc6-a2e3-fe5006ffe087
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.849449  2375 copy.cpp:127] Copying 
> layer path '/tmp/MLVLnv/test_image2' to rootfs 
> '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917'
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.854038  2374 cgroups.cpp:2427] 
> Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.856693  2372 cgroups.cpp:1409] 
> Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 
> 2.608128ms
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.859237  2377 cgroups.cpp:2445] 
> Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.861454  2377 cgroups.cpp:1438] 
> Successfullly thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 2176us
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.934608  2378 
> containerizer.cpp:1608] Executor for container 
> 'da610f7f-a709-4de8-94d3-74f4a520619b' has exited
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.937692  2372 linux.cpp:798] 
> Unmounting volume 
> 

[jira] [Updated] (MESOS-4912) LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails.

2016-03-14 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4912:
--
Labels: mesosphere  (was: )

> LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails.
> --
>
> Key: MESOS-4912
> URL: https://issues.apache.org/jira/browse/MESOS-4912
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Affects Versions: 0.28.0
> Environment: CenOS 7, SSL
>Reporter: Bernd Mathiske
>  Labels: mesosphere
>
> Observed on our CI:
> {noformat}
> [09:34:15] :   [Step 11/11] [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_MultipleContainers
> [09:34:19]W:   [Step 11/11] I0309 09:34:19.906719  2357 linux.cpp:81] Making 
> '/tmp/MLVLnv' a shared mount
> [09:34:19]W:   [Step 11/11] I0309 09:34:19.923548  2357 
> linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> [09:34:19]W:   [Step 11/11] I0309 09:34:19.924705  2376 
> containerizer.cpp:666] Starting container 
> 'da610f7f-a709-4de8-94d3-74f4a520619b' for executor 'test_executor1' of 
> framework ''
> [09:34:19]W:   [Step 11/11] I0309 09:34:19.925355  2371 provisioner.cpp:285] 
> Provisioning image rootfs 
> '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0'
>  for container da610f7f-a709-4de8-94d3-74f4a520619b
> [09:34:19]W:   [Step 11/11] I0309 09:34:19.925881  2377 copy.cpp:127] Copying 
> layer path '/tmp/MLVLnv/test_image1' to rootfs 
> '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0'
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.835127  2376 linux.cpp:355] Bind 
> mounting work directory from 
> '/tmp/MLVLnv/slaves/test_slave/frameworks/executors/test_executor1/runs/da610f7f-a709-4de8-94d3-74f4a520619b'
>  to 
> '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox'
>  for container da610f7f-a709-4de8-94d3-74f4a520619b
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.835392  2376 linux.cpp:683] 
> Changing the ownership of the persistent volume at 
> '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' with uid 0 and gid > 0
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.840425  2376 linux.cpp:723] 
> Mounting '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' to 
> '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox/volume'
>  for persistent volume disk(test_role)[persistent_volume_id:volume]:32 of 
> container da610f7f-a709-4de8-94d3-74f4a520619b
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.843878  2374 
> linux_launcher.cpp:304] Cloning child process with flags = CLONE_NEWNS
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.848302  2371 
> containerizer.cpp:666] Starting container 
> 'fe4729c5-1e63-4cc6-a2e3-fe5006ffe087' for executor 'test_executor2' of 
> framework ''
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.848758  2371 
> containerizer.cpp:1392] Destroying container 
> 'da610f7f-a709-4de8-94d3-74f4a520619b'
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.848865  2373 provisioner.cpp:285] 
> Provisioning image rootfs 
> '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917'
>  for container fe4729c5-1e63-4cc6-a2e3-fe5006ffe087
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.849449  2375 copy.cpp:127] Copying 
> layer path '/tmp/MLVLnv/test_image2' to rootfs 
> '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917'
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.854038  2374 cgroups.cpp:2427] 
> Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.856693  2372 cgroups.cpp:1409] 
> Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 
> 2.608128ms
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.859237  2377 cgroups.cpp:2445] 
> Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.861454  2377 cgroups.cpp:1438] 
> Successfullly thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 2176us
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.934608  2378 
> containerizer.cpp:1608] Executor for container 
> 'da610f7f-a709-4de8-94d3-74f4a520619b' has exited
> [09:34:30]W:   [Step 11/11] I0309 09:34:30.937692  2372 linux.cpp:798] 
> Unmounting volume 
> 

[jira] [Updated] (MESOS-4835) CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess is flaky

2016-03-14 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4835:
--
Labels: flaky mesosphere test  (was: flaky test)

> CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess is flaky
> -
>
> Key: MESOS-4835
> URL: https://issues.apache.org/jira/browse/MESOS-4835
> Project: Mesos
>  Issue Type: Bug
> Environment: Seen on Ubuntu 15 & Debian 8, GCC 4.9
>Reporter: Joseph Wu
>  Labels: flaky, mesosphere, test
>
> Verbose logs: 
> {code}
> [ RUN  ] 
> CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess
> I0302 00:43:14.127846 11755 cgroups.cpp:2427] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test
> I0302 00:43:14.267411 11758 cgroups.cpp:1409] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test after 139.46496ms
> I0302 00:43:14.409395 11751 cgroups.cpp:2445] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test
> I0302 00:43:14.551304 11751 cgroups.cpp:1438] Successfullly thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test after 141.811968ms
> ../../src/tests/containerizer/cgroups_tests.cpp:949: Failure
> Value of: ::waitpid(pid, , 0)
>   Actual: 23809
> Expected: -1
> ../../src/tests/containerizer/cgroups_tests.cpp:950: Failure
> Value of: (*__errno_location ())
>   Actual: 0
> Expected: 10
> [  FAILED  ] 
> CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess (1055 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4835) CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess is flaky

2016-03-14 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4835:
--
Sprint: Mesosphere Sprint 31

> CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess is flaky
> -
>
> Key: MESOS-4835
> URL: https://issues.apache.org/jira/browse/MESOS-4835
> Project: Mesos
>  Issue Type: Bug
> Environment: Seen on Ubuntu 15 & Debian 8, GCC 4.9
>Reporter: Joseph Wu
>  Labels: flaky, mesosphere, test
>
> Verbose logs: 
> {code}
> [ RUN  ] 
> CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess
> I0302 00:43:14.127846 11755 cgroups.cpp:2427] Freezing cgroup 
> /sys/fs/cgroup/freezer/mesos_test
> I0302 00:43:14.267411 11758 cgroups.cpp:1409] Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos_test after 139.46496ms
> I0302 00:43:14.409395 11751 cgroups.cpp:2445] Thawing cgroup 
> /sys/fs/cgroup/freezer/mesos_test
> I0302 00:43:14.551304 11751 cgroups.cpp:1438] Successfullly thawed cgroup 
> /sys/fs/cgroup/freezer/mesos_test after 141.811968ms
> ../../src/tests/containerizer/cgroups_tests.cpp:949: Failure
> Value of: ::waitpid(pid, , 0)
>   Actual: 23809
> Expected: -1
> ../../src/tests/containerizer/cgroups_tests.cpp:950: Failure
> Value of: (*__errno_location ())
>   Actual: 0
> Expected: 10
> [  FAILED  ] 
> CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess (1055 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4810) ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.

2016-03-14 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4810:
--
Labels: docker mesosphere test  (was: docker test)

> ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.
> --
>
> Key: MESOS-4810
> URL: https://issues.apache.org/jira/browse/MESOS-4810
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.28.0
> Environment: CentOS 7 on AWS, both with or without SSL.
>Reporter: Bernd Mathiske
>Assignee: Jie Yu
>  Labels: docker, mesosphere, test
>
> {noformat}
> [09:46:46] :   [Step 11/11] [ RUN  ] 
> ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.628413  1166 leveldb.cpp:174] 
> Opened db in 4.242882ms
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629926  1166 leveldb.cpp:181] 
> Compacted db in 1.483621ms
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629966  1166 leveldb.cpp:196] 
> Created db iterator in 15498ns
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629977  1166 leveldb.cpp:202] 
> Seeked to beginning of db in 1405ns
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629984  1166 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 239ns
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.630015  1166 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.630470  1183 recover.cpp:447] 
> Starting replica recovery
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.630702  1180 recover.cpp:473] 
> Replica is in EMPTY status
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.631767  1182 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (14567)@172.30.2.124:37431
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.632115  1183 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.632450  1186 recover.cpp:564] 
> Updating replica status to STARTING
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633476  1186 master.cpp:375] 
> Master 3fbb2fb0-4f18-498b-a440-9acbf6923a13 (ip-172-30-2-124.mesosphere.io) 
> started on 172.30.2.124:37431
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633491  1186 master.cpp:377] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/4UxXoW/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/4UxXoW/master" 
> --zk_session_timeout="10secs"
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633677  1186 master.cpp:422] 
> Master only allowing authenticated frameworks to register
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633685  1186 master.cpp:427] 
> Master only allowing authenticated slaves to register
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633692  1186 credentials.hpp:35] 
> Loading credentials for authentication from '/tmp/4UxXoW/credentials'
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633851  1183 leveldb.cpp:304] 
> Persisting metadata (8 bytes) to leveldb took 1.191043ms
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633873  1183 replica.cpp:320] 
> Persisted replica status to STARTING
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633894  1186 master.cpp:467] Using 
> default 'crammd5' authenticator
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634003  1186 master.cpp:536] Using 
> default 'basic' HTTP authenticator
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634062  1184 recover.cpp:473] 
> Replica is in STARTING status
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634109  1186 master.cpp:570] 
> Authorization enabled
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634249  1187 
> whitelist_watcher.cpp:77] No whitelist given
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634255  1184 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634884  1187 replica.cpp:673] 
> Replica in STARTING 

[jira] [Updated] (MESOS-4794) Add documentation around using the docker containerizer on CentOS 6.

2016-03-14 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4794:
--
Labels: containerizer docker documentation mesosphere  (was: containerizer 
docker documentation)

> Add documentation around using the docker containerizer on CentOS 6.
> 
>
> Key: MESOS-4794
> URL: https://issues.apache.org/jira/browse/MESOS-4794
> Project: Mesos
>  Issue Type: Documentation
>  Components: docker, documentation
>Affects Versions: 0.28.0
>Reporter: Joseph Wu
>  Labels: containerizer, docker, documentation, mesosphere
>
> Support for persistent volumes was added to the docker containerizer in 
> [MESOS-3413].  However, this does not work on CentOS 6.
> On CentOS 6, the same {{docker run -v ...}} operation does not perform a 
> recursive bind, whereas on every other OS supported by Mesos, docker does a 
> recursive bind.
> Docker has already [dropped support for CentOS 
> 6|https://github.com/docker/docker/issues/14365], so we should add 
> precautionary documentation in case anyone tries to use the docker 
> containerizer on CentOS 6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4794) Add documentation around using the docker containerizer on CentOS 6.

2016-03-14 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4794:
--
Sprint: Mesosphere Sprint 31

> Add documentation around using the docker containerizer on CentOS 6.
> 
>
> Key: MESOS-4794
> URL: https://issues.apache.org/jira/browse/MESOS-4794
> Project: Mesos
>  Issue Type: Documentation
>  Components: docker, documentation
>Affects Versions: 0.28.0
>Reporter: Joseph Wu
>  Labels: containerizer, docker, documentation
>
> Support for persistent volumes was added to the docker containerizer in 
> [MESOS-3413].  However, this does not work on CentOS 6.
> On CentOS 6, the same {{docker run -v ...}} operation does not perform a 
> recursive bind, whereas on every other OS supported by Mesos, docker does a 
> recursive bind.
> Docker has already [dropped support for CentOS 
> 6|https://github.com/docker/docker/issues/14365], so we should add 
> precautionary documentation in case anyone tries to use the docker 
> containerizer on CentOS 6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4736) DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes fails on CentOS 6

2016-03-14 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4736:
--
Sprint: Mesosphere Sprint 31

> DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes fails on 
> CentOS 6
> -
>
> Key: MESOS-4736
> URL: https://issues.apache.org/jira/browse/MESOS-4736
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.0
> Environment: Centos6 + GCC 4.9 on AWS
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: flaky, mesosphere, test
>
> This test passes consistently on other OS's, but fails consistently on CentOS 
> 6.
> Verbose logs from test failure:
> {code}
> [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes
> I0222 18:16:12.327957 26681 leveldb.cpp:174] Opened db in 7.466102ms
> I0222 18:16:12.330528 26681 leveldb.cpp:181] Compacted db in 2.540139ms
> I0222 18:16:12.330580 26681 leveldb.cpp:196] Created db iterator in 16908ns
> I0222 18:16:12.330592 26681 leveldb.cpp:202] Seeked to beginning of db in 
> 1403ns
> I0222 18:16:12.330600 26681 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 315ns
> I0222 18:16:12.330634 26681 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0222 18:16:12.331082 26698 recover.cpp:447] Starting replica recovery
> I0222 18:16:12.331289 26698 recover.cpp:473] Replica is in EMPTY status
> I0222 18:16:12.332162 26703 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (13761)@172.30.2.148:35274
> I0222 18:16:12.332701 26701 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0222 18:16:12.333230 26699 recover.cpp:564] Updating replica status to 
> STARTING
> I0222 18:16:12.334102 26698 master.cpp:376] Master 
> 652149b4-3932-4d8b-ba6f-8c9d9045be70 (ip-172-30-2-148.mesosphere.io) started 
> on 172.30.2.148:35274
> I0222 18:16:12.334116 26698 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/QEhLBS/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/QEhLBS/master" 
> --zk_session_timeout="10secs"
> I0222 18:16:12.334354 26698 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0222 18:16:12.334363 26698 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0222 18:16:12.334369 26698 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/QEhLBS/credentials'
> I0222 18:16:12.335366 26698 master.cpp:468] Using default 'crammd5' 
> authenticator
> I0222 18:16:12.335492 26698 master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0222 18:16:12.335623 26698 master.cpp:571] Authorization enabled
> I0222 18:16:12.335752 26703 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 2.314693ms
> I0222 18:16:12.335769 26700 whitelist_watcher.cpp:77] No whitelist given
> I0222 18:16:12.335778 26703 replica.cpp:320] Persisted replica status to 
> STARTING
> I0222 18:16:12.335821 26697 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0222 18:16:12.335965 26701 recover.cpp:473] Replica is in STARTING status
> I0222 18:16:12.336771 26703 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (13763)@172.30.2.148:35274
> I0222 18:16:12.337191 26696 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0222 18:16:12.337635 26700 recover.cpp:564] Updating replica status to VOTING
> I0222 18:16:12.337671 26703 master.cpp:1712] The newly elected leader is 
> master@172.30.2.148:35274 with id 652149b4-3932-4d8b-ba6f-8c9d9045be70
> I0222 18:16:12.337698 26703 master.cpp:1725] Elected as the leading master!
> I0222 18:16:12.337713 26703 master.cpp:1470] Recovering from registrar
> I0222 18:16:12.337828 26696 registrar.cpp:307] Recovering registrar
> I0222 18:16:12.339972 26702 leveldb.cpp:304] Persisting metadata 

[jira] [Updated] (MESOS-2858) FetcherCacheHttpTest.HttpMixed is flaky.

2016-03-14 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-2858:
--
Sprint: Mesosphere Sprint 31

> FetcherCacheHttpTest.HttpMixed is flaky.
> 
>
> Key: MESOS-2858
> URL: https://issues.apache.org/jira/browse/MESOS-2858
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Bernd Mathiske
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheHttpTest.HttpMixed
> Using temporary directory '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC'
> I0611 00:40:28.208909 26042 leveldb.cpp:176] Opened db in 3.831173ms
> I0611 00:40:28.209951 26042 leveldb.cpp:183] Compacted db in 997319ns
> I0611 00:40:28.210011 26042 leveldb.cpp:198] Created db iterator in 23917ns
> I0611 00:40:28.210032 26042 leveldb.cpp:204] Seeked to beginning of db in 
> 2112ns
> I0611 00:40:28.210043 26042 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 392ns
> I0611 00:40:28.210095 26042 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0611 00:40:28.210741 26067 recover.cpp:449] Starting replica recovery
> I0611 00:40:28.211144 26067 recover.cpp:475] Replica is in EMPTY status
> I0611 00:40:28.212210 26074 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0611 00:40:28.212728 26071 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0611 00:40:28.213260 26069 recover.cpp:566] Updating replica status to 
> STARTING
> I0611 00:40:28.214066 26073 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 590673ns
> I0611 00:40:28.214095 26073 replica.cpp:323] Persisted replica status to 
> STARTING
> I0611 00:40:28.214350 26073 recover.cpp:475] Replica is in STARTING status
> I0611 00:40:28.214774 26061 master.cpp:363] Master 
> 20150611-004028-1946161580-33349-26042 (658ddc752264) started on 
> 172.17.0.116:33349
> I0611 00:40:28.214800 26061 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/master" 
> --zk_session_timeout="10secs"
> I0611 00:40:28.215342 26061 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0611 00:40:28.215361 26061 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0611 00:40:28.215397 26061 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials'
> I0611 00:40:28.215589 26064 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0611 00:40:28.215770 26061 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0611 00:40:28.215934 26061 master.cpp:491] Authorization enabled
> I0611 00:40:28.215932 26062 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0611 00:40:28.216256 26070 whitelist_watcher.cpp:79] No whitelist given
> I0611 00:40:28.216310 26066 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0611 00:40:28.216352 26067 recover.cpp:566] Updating replica status to VOTING
> I0611 00:40:28.216909 26070 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 374189ns
> I0611 00:40:28.216931 26070 replica.cpp:323] Persisted replica status to 
> VOTING
> I0611 00:40:28.217052 26075 recover.cpp:580] Successfully joined the Paxos 
> group
> I0611 00:40:28.217355 26063 master.cpp:1476] The newly elected leader is 
> master@172.17.0.116:33349 with id 20150611-004028-1946161580-33349-26042
> I0611 00:40:28.217512 26063 master.cpp:1489] Elected as the leading master!
> I0611 00:40:28.217540 26063 master.cpp:1259] Recovering from registrar
> I0611 00:40:28.217753 26070 registrar.cpp:313] Recovering registrar
> I0611 00:40:28.217396 26075 recover.cpp:464] Recover process terminated
> I0611 00:40:28.218341 26065 log.cpp:661] Attempting to start the writer
> I0611 00:40:28.219391 26067 replica.cpp:477] Replica received implicit 
> promise request with proposal 1
> I0611 

[jira] [Commented] (MESOS-2858) FetcherCacheHttpTest.HttpMixed is flaky.

2016-03-14 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193281#comment-15193281
 ] 

Bernd Mathiske commented on MESOS-2858:
---

FetcherCacheHttpTest.HttpCachedConcurrent exposes the same flaky behavior.

> FetcherCacheHttpTest.HttpMixed is flaky.
> 
>
> Key: MESOS-2858
> URL: https://issues.apache.org/jira/browse/MESOS-2858
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Bernd Mathiske
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheHttpTest.HttpMixed
> Using temporary directory '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC'
> I0611 00:40:28.208909 26042 leveldb.cpp:176] Opened db in 3.831173ms
> I0611 00:40:28.209951 26042 leveldb.cpp:183] Compacted db in 997319ns
> I0611 00:40:28.210011 26042 leveldb.cpp:198] Created db iterator in 23917ns
> I0611 00:40:28.210032 26042 leveldb.cpp:204] Seeked to beginning of db in 
> 2112ns
> I0611 00:40:28.210043 26042 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 392ns
> I0611 00:40:28.210095 26042 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0611 00:40:28.210741 26067 recover.cpp:449] Starting replica recovery
> I0611 00:40:28.211144 26067 recover.cpp:475] Replica is in EMPTY status
> I0611 00:40:28.212210 26074 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0611 00:40:28.212728 26071 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0611 00:40:28.213260 26069 recover.cpp:566] Updating replica status to 
> STARTING
> I0611 00:40:28.214066 26073 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 590673ns
> I0611 00:40:28.214095 26073 replica.cpp:323] Persisted replica status to 
> STARTING
> I0611 00:40:28.214350 26073 recover.cpp:475] Replica is in STARTING status
> I0611 00:40:28.214774 26061 master.cpp:363] Master 
> 20150611-004028-1946161580-33349-26042 (658ddc752264) started on 
> 172.17.0.116:33349
> I0611 00:40:28.214800 26061 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/master" 
> --zk_session_timeout="10secs"
> I0611 00:40:28.215342 26061 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0611 00:40:28.215361 26061 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0611 00:40:28.215397 26061 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials'
> I0611 00:40:28.215589 26064 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0611 00:40:28.215770 26061 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0611 00:40:28.215934 26061 master.cpp:491] Authorization enabled
> I0611 00:40:28.215932 26062 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0611 00:40:28.216256 26070 whitelist_watcher.cpp:79] No whitelist given
> I0611 00:40:28.216310 26066 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0611 00:40:28.216352 26067 recover.cpp:566] Updating replica status to VOTING
> I0611 00:40:28.216909 26070 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 374189ns
> I0611 00:40:28.216931 26070 replica.cpp:323] Persisted replica status to 
> VOTING
> I0611 00:40:28.217052 26075 recover.cpp:580] Successfully joined the Paxos 
> group
> I0611 00:40:28.217355 26063 master.cpp:1476] The newly elected leader is 
> master@172.17.0.116:33349 with id 20150611-004028-1946161580-33349-26042
> I0611 00:40:28.217512 26063 master.cpp:1489] Elected as the leading master!
> I0611 00:40:28.217540 26063 master.cpp:1259] Recovering from registrar
> I0611 00:40:28.217753 26070 registrar.cpp:313] Recovering registrar
> I0611 00:40:28.217396 26075 recover.cpp:464] Recover process terminated
> I0611 00:40:28.218341 26065 log.cpp:661] Attempting to start the writer
> I0611 00:40:28.219391 26067 

[jira] [Created] (MESOS-4912) LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails.

2016-03-10 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-4912:
-

 Summary: LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails.
 Key: MESOS-4912
 URL: https://issues.apache.org/jira/browse/MESOS-4912
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.28.0
 Environment: CenOS 7, SSL
Reporter: Bernd Mathiske


Observed on our CI:
{noformat}
[09:34:15] : [Step 11/11] [ RUN  ] 
LinuxFilesystemIsolatorTest.ROOT_MultipleContainers
[09:34:19]W: [Step 11/11] I0309 09:34:19.906719  2357 linux.cpp:81] Making 
'/tmp/MLVLnv' a shared mount
[09:34:19]W: [Step 11/11] I0309 09:34:19.923548  2357 
linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
for the Linux launcher
[09:34:19]W: [Step 11/11] I0309 09:34:19.924705  2376 
containerizer.cpp:666] Starting container 
'da610f7f-a709-4de8-94d3-74f4a520619b' for executor 'test_executor1' of 
framework ''
[09:34:19]W: [Step 11/11] I0309 09:34:19.925355  2371 provisioner.cpp:285] 
Provisioning image rootfs 
'/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0'
 for container da610f7f-a709-4de8-94d3-74f4a520619b
[09:34:19]W: [Step 11/11] I0309 09:34:19.925881  2377 copy.cpp:127] Copying 
layer path '/tmp/MLVLnv/test_image1' to rootfs 
'/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0'
[09:34:30]W: [Step 11/11] I0309 09:34:30.835127  2376 linux.cpp:355] Bind 
mounting work directory from 
'/tmp/MLVLnv/slaves/test_slave/frameworks/executors/test_executor1/runs/da610f7f-a709-4de8-94d3-74f4a520619b'
 to 
'/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox'
 for container da610f7f-a709-4de8-94d3-74f4a520619b
[09:34:30]W: [Step 11/11] I0309 09:34:30.835392  2376 linux.cpp:683] 
Changing the ownership of the persistent volume at 
'/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' with uid 0 and gid 0
[09:34:30]W: [Step 11/11] I0309 09:34:30.840425  2376 linux.cpp:723] 
Mounting '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' to 
'/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox/volume'
 for persistent volume disk(test_role)[persistent_volume_id:volume]:32 of 
container da610f7f-a709-4de8-94d3-74f4a520619b
[09:34:30]W: [Step 11/11] I0309 09:34:30.843878  2374 
linux_launcher.cpp:304] Cloning child process with flags = CLONE_NEWNS
[09:34:30]W: [Step 11/11] I0309 09:34:30.848302  2371 
containerizer.cpp:666] Starting container 
'fe4729c5-1e63-4cc6-a2e3-fe5006ffe087' for executor 'test_executor2' of 
framework ''
[09:34:30]W: [Step 11/11] I0309 09:34:30.848758  2371 
containerizer.cpp:1392] Destroying container 
'da610f7f-a709-4de8-94d3-74f4a520619b'
[09:34:30]W: [Step 11/11] I0309 09:34:30.848865  2373 provisioner.cpp:285] 
Provisioning image rootfs 
'/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917'
 for container fe4729c5-1e63-4cc6-a2e3-fe5006ffe087
[09:34:30]W: [Step 11/11] I0309 09:34:30.849449  2375 copy.cpp:127] Copying 
layer path '/tmp/MLVLnv/test_image2' to rootfs 
'/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917'
[09:34:30]W: [Step 11/11] I0309 09:34:30.854038  2374 cgroups.cpp:2427] 
Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b
[09:34:30]W: [Step 11/11] I0309 09:34:30.856693  2372 cgroups.cpp:1409] 
Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 
2.608128ms
[09:34:30]W: [Step 11/11] I0309 09:34:30.859237  2377 cgroups.cpp:2445] 
Thawing cgroup /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b
[09:34:30]W: [Step 11/11] I0309 09:34:30.861454  2377 cgroups.cpp:1438] 
Successfullly thawed cgroup 
/sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 2176us
[09:34:30]W: [Step 11/11] I0309 09:34:30.934608  2378 
containerizer.cpp:1608] Executor for container 
'da610f7f-a709-4de8-94d3-74f4a520619b' has exited
[09:34:30]W: [Step 11/11] I0309 09:34:30.937692  2372 linux.cpp:798] 
Unmounting volume 
'/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox/volume'
 for container da610f7f-a709-4de8-94d3-74f4a520619b
[09:34:30]W: [Step 11/11] I0309 09:34:30.937742  2372 linux.cpp:817] 
Unmounting sandbox/work directory 

[jira] [Updated] (MESOS-4750) Document: Mesos Executor expects all SSL_* environment variables to be set

2016-03-10 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4750:
--
Shepherd: Adam B

> Document: Mesos Executor expects all SSL_* environment variables to be set
> --
>
> Key: MESOS-4750
> URL: https://issues.apache.org/jira/browse/MESOS-4750
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, general, slave
>Affects Versions: 0.26.0
>Reporter: pawan
>Assignee: Jan Schlicht
>  Labels: documentation, mesosphere, ssl
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> I was trying to run Docker containers in a fully SSL-ized Mesos cluster but 
> ran into problems because the executor was failing with a "Failed to shutdown 
> socket with fd 10: Transport endpoint is not connected".
> My understanding of why this is happening is because the executor was trying 
> to report its status to Mesos slave over HTTPS, but doesnt have the 
> appropriate certs/env setup inside the executor.
> (Thanks to mslackbot/joseph for helping me figure this out on #mesos)
> It turns out, the executor expects all SSL_* variables to be set inside 
> `CommandInfo.environment` which gets picked up by the executor to 
> successfully reports its status to the slave.
> This part of __executor needing all the SSL_* variables to be set in its 
> environment__ is missing in the Mesos SSL transitioning guide. I request you 
> to please add this vital information to the doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2858) FetcherCacheHttpTest.HttpMixed is flaky.

2016-03-01 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173700#comment-15173700
 ] 

Bernd Mathiske commented on MESOS-2858:
---

Thanks! Having looked through this log once, I have not found the culprit yet. 
According to the sandbox dumps, the 3 tasks run as intended, but somehow 
signaling the TASK_FINISHED status updates gets hung somewhere along the way to 
an AWAIT. Investigation to be continued...

> FetcherCacheHttpTest.HttpMixed is flaky.
> 
>
> Key: MESOS-2858
> URL: https://issues.apache.org/jira/browse/MESOS-2858
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Bernd Mathiske
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheHttpTest.HttpMixed
> Using temporary directory '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC'
> I0611 00:40:28.208909 26042 leveldb.cpp:176] Opened db in 3.831173ms
> I0611 00:40:28.209951 26042 leveldb.cpp:183] Compacted db in 997319ns
> I0611 00:40:28.210011 26042 leveldb.cpp:198] Created db iterator in 23917ns
> I0611 00:40:28.210032 26042 leveldb.cpp:204] Seeked to beginning of db in 
> 2112ns
> I0611 00:40:28.210043 26042 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 392ns
> I0611 00:40:28.210095 26042 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0611 00:40:28.210741 26067 recover.cpp:449] Starting replica recovery
> I0611 00:40:28.211144 26067 recover.cpp:475] Replica is in EMPTY status
> I0611 00:40:28.212210 26074 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0611 00:40:28.212728 26071 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0611 00:40:28.213260 26069 recover.cpp:566] Updating replica status to 
> STARTING
> I0611 00:40:28.214066 26073 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 590673ns
> I0611 00:40:28.214095 26073 replica.cpp:323] Persisted replica status to 
> STARTING
> I0611 00:40:28.214350 26073 recover.cpp:475] Replica is in STARTING status
> I0611 00:40:28.214774 26061 master.cpp:363] Master 
> 20150611-004028-1946161580-33349-26042 (658ddc752264) started on 
> 172.17.0.116:33349
> I0611 00:40:28.214800 26061 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/master" 
> --zk_session_timeout="10secs"
> I0611 00:40:28.215342 26061 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0611 00:40:28.215361 26061 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0611 00:40:28.215397 26061 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials'
> I0611 00:40:28.215589 26064 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0611 00:40:28.215770 26061 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0611 00:40:28.215934 26061 master.cpp:491] Authorization enabled
> I0611 00:40:28.215932 26062 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0611 00:40:28.216256 26070 whitelist_watcher.cpp:79] No whitelist given
> I0611 00:40:28.216310 26066 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0611 00:40:28.216352 26067 recover.cpp:566] Updating replica status to VOTING
> I0611 00:40:28.216909 26070 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 374189ns
> I0611 00:40:28.216931 26070 replica.cpp:323] Persisted replica status to 
> VOTING
> I0611 00:40:28.217052 26075 recover.cpp:580] Successfully joined the Paxos 
> group
> I0611 00:40:28.217355 26063 master.cpp:1476] The newly elected leader is 
> master@172.17.0.116:33349 with id 20150611-004028-1946161580-33349-26042
> I0611 00:40:28.217512 26063 master.cpp:1489] Elected as the leading master!
> I0611 00:40:28.217540 26063 master.cpp:1259] Recovering from registrar
> I0611 00:40:28.217753 26070 registrar.cpp:313] Recovering 

[jira] [Updated] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2016-02-29 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3937:
--
Shepherd: Till Toenshoff  (was: Bernd Mathiske)

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>Assignee: Jan Schlicht
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
> I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a
> I1117 15:08:09.296115 26399 master.cpp:1619] 

[jira] [Updated] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.

2016-02-29 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3937:
--
Assignee: Jan Schlicht

> Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
> ---
>
> Key: MESOS-3937
> URL: https://issues.apache.org/jira/browse/MESOS-3937
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.26.0
> Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2
> 8 CPUs, 16 GB memory
> Vagrant, libvirt/Virtual Box or VMware
>Reporter: Bernd Mathiske
>Assignee: Jan Schlicht
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> {noformat}
> ../configure
> make check
> sudo ./bin/mesos-tests.sh 
> --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose
> {noformat}
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from DockerContainerizerTest
> I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms
> I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms
> I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns
> I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in 
> 4927ns
> I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 1605ns
> I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery
> I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status
> I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (4)@10.0.2.15:50088
> I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to 
> STARTING
> I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.016098ms
> I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to 
> STARTING
> I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status
> I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (5)@10.0.2.15:50088
> I1117 15:08:09.282552 26400 master.cpp:367] Master 
> 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on 
> 10.0.2.15:50088
> I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/40AlT8/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" 
> --zk_session_timeout="10secs"
> I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/40AlT8/credentials'
> I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING
> I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.075466ms
> I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to 
> VOTING
> I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos 
> group
> I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated
> I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL
> I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled
> I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is 
> master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a
> I1117 15:08:09.296115 26399 master.cpp:1619] Elected as the leading 

[jira] [Created] (MESOS-4810) ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.

2016-02-29 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-4810:
-

 Summary: 
ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.
 Key: MESOS-4810
 URL: https://issues.apache.org/jira/browse/MESOS-4810
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.28.0
 Environment: CentOS 7 on AWS, both with or without SSL.
Reporter: Bernd Mathiske


{noformat}
[09:46:46] : [Step 11/11] [ RUN  ] 
ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand
[09:46:46]W: [Step 11/11] I0229 09:46:46.628413  1166 leveldb.cpp:174] 
Opened db in 4.242882ms
[09:46:46]W: [Step 11/11] I0229 09:46:46.629926  1166 leveldb.cpp:181] 
Compacted db in 1.483621ms
[09:46:46]W: [Step 11/11] I0229 09:46:46.629966  1166 leveldb.cpp:196] 
Created db iterator in 15498ns
[09:46:46]W: [Step 11/11] I0229 09:46:46.629977  1166 leveldb.cpp:202] 
Seeked to beginning of db in 1405ns
[09:46:46]W: [Step 11/11] I0229 09:46:46.629984  1166 leveldb.cpp:271] 
Iterated through 0 keys in the db in 239ns
[09:46:46]W: [Step 11/11] I0229 09:46:46.630015  1166 replica.cpp:779] 
Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
[09:46:46]W: [Step 11/11] I0229 09:46:46.630470  1183 recover.cpp:447] 
Starting replica recovery
[09:46:46]W: [Step 11/11] I0229 09:46:46.630702  1180 recover.cpp:473] 
Replica is in EMPTY status
[09:46:46]W: [Step 11/11] I0229 09:46:46.631767  1182 replica.cpp:673] 
Replica in EMPTY status received a broadcasted recover request from 
(14567)@172.30.2.124:37431
[09:46:46]W: [Step 11/11] I0229 09:46:46.632115  1183 recover.cpp:193] 
Received a recover response from a replica in EMPTY status
[09:46:46]W: [Step 11/11] I0229 09:46:46.632450  1186 recover.cpp:564] 
Updating replica status to STARTING
[09:46:46]W: [Step 11/11] I0229 09:46:46.633476  1186 master.cpp:375] 
Master 3fbb2fb0-4f18-498b-a440-9acbf6923a13 (ip-172-30-2-124.mesosphere.io) 
started on 172.30.2.124:37431
[09:46:46]W: [Step 11/11] I0229 09:46:46.633491  1186 master.cpp:377] Flags 
at startup: --acls="" --allocation_interval="1secs" 
--allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" 
--authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/4UxXoW/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="100secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/4UxXoW/master" 
--zk_session_timeout="10secs"
[09:46:46]W: [Step 11/11] I0229 09:46:46.633677  1186 master.cpp:422] 
Master only allowing authenticated frameworks to register
[09:46:46]W: [Step 11/11] I0229 09:46:46.633685  1186 master.cpp:427] 
Master only allowing authenticated slaves to register
[09:46:46]W: [Step 11/11] I0229 09:46:46.633692  1186 credentials.hpp:35] 
Loading credentials for authentication from '/tmp/4UxXoW/credentials'
[09:46:46]W: [Step 11/11] I0229 09:46:46.633851  1183 leveldb.cpp:304] 
Persisting metadata (8 bytes) to leveldb took 1.191043ms
[09:46:46]W: [Step 11/11] I0229 09:46:46.633873  1183 replica.cpp:320] 
Persisted replica status to STARTING
[09:46:46]W: [Step 11/11] I0229 09:46:46.633894  1186 master.cpp:467] Using 
default 'crammd5' authenticator
[09:46:46]W: [Step 11/11] I0229 09:46:46.634003  1186 master.cpp:536] Using 
default 'basic' HTTP authenticator
[09:46:46]W: [Step 11/11] I0229 09:46:46.634062  1184 recover.cpp:473] 
Replica is in STARTING status
[09:46:46]W: [Step 11/11] I0229 09:46:46.634109  1186 master.cpp:570] 
Authorization enabled
[09:46:46]W: [Step 11/11] I0229 09:46:46.634249  1187 
whitelist_watcher.cpp:77] No whitelist given
[09:46:46]W: [Step 11/11] I0229 09:46:46.634255  1184 hierarchical.cpp:144] 
Initialized hierarchical allocator process
[09:46:46]W: [Step 11/11] I0229 09:46:46.634884  1187 replica.cpp:673] 
Replica in STARTING status received a broadcasted recover request from 
(14569)@172.30.2.124:37431
[09:46:46]W: [Step 11/11] I0229 09:46:46.635278  1181 recover.cpp:193] 
Received a recover response from a replica in STARTING status
[09:46:46]W: [Step 11/11] I0229 09:46:46.635742  1187 recover.cpp:564] 
Updating replica status to VOTING
[09:46:46]W: [Step 11/11] I0229 09:46:46.636391  1180 master.cpp:1711] The 
newly 

[jira] [Updated] (MESOS-4810) ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.

2016-02-29 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4810:
--
Labels: docker test  (was: )

> ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.
> --
>
> Key: MESOS-4810
> URL: https://issues.apache.org/jira/browse/MESOS-4810
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.28.0
> Environment: CentOS 7 on AWS, both with or without SSL.
>Reporter: Bernd Mathiske
>  Labels: docker, test
>
> {noformat}
> [09:46:46] :   [Step 11/11] [ RUN  ] 
> ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.628413  1166 leveldb.cpp:174] 
> Opened db in 4.242882ms
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629926  1166 leveldb.cpp:181] 
> Compacted db in 1.483621ms
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629966  1166 leveldb.cpp:196] 
> Created db iterator in 15498ns
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629977  1166 leveldb.cpp:202] 
> Seeked to beginning of db in 1405ns
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.629984  1166 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 239ns
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.630015  1166 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.630470  1183 recover.cpp:447] 
> Starting replica recovery
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.630702  1180 recover.cpp:473] 
> Replica is in EMPTY status
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.631767  1182 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (14567)@172.30.2.124:37431
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.632115  1183 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.632450  1186 recover.cpp:564] 
> Updating replica status to STARTING
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633476  1186 master.cpp:375] 
> Master 3fbb2fb0-4f18-498b-a440-9acbf6923a13 (ip-172-30-2-124.mesosphere.io) 
> started on 172.30.2.124:37431
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633491  1186 master.cpp:377] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/4UxXoW/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/4UxXoW/master" 
> --zk_session_timeout="10secs"
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633677  1186 master.cpp:422] 
> Master only allowing authenticated frameworks to register
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633685  1186 master.cpp:427] 
> Master only allowing authenticated slaves to register
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633692  1186 credentials.hpp:35] 
> Loading credentials for authentication from '/tmp/4UxXoW/credentials'
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633851  1183 leveldb.cpp:304] 
> Persisting metadata (8 bytes) to leveldb took 1.191043ms
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633873  1183 replica.cpp:320] 
> Persisted replica status to STARTING
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.633894  1186 master.cpp:467] Using 
> default 'crammd5' authenticator
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634003  1186 master.cpp:536] Using 
> default 'basic' HTTP authenticator
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634062  1184 recover.cpp:473] 
> Replica is in STARTING status
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634109  1186 master.cpp:570] 
> Authorization enabled
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634249  1187 
> whitelist_watcher.cpp:77] No whitelist given
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634255  1184 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [09:46:46]W:   [Step 11/11] I0229 09:46:46.634884  1187 replica.cpp:673] 
> Replica in STARTING status received a broadcasted recover request from 
> 

[jira] [Commented] (MESOS-4047) MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky

2016-02-29 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171670#comment-15171670
 ] 

Bernd Mathiske commented on MESOS-4047:
---

https://reviews.apache.org/r/43799/

> MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky
> ---
>
> Key: MESOS-4047
> URL: https://issues.apache.org/jira/browse/MESOS-4047
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
> Environment: Ubuntu 14, gcc 4.8.4
>Reporter: Joseph Wu
>Assignee: Alexander Rojas
>  Labels: flaky, flaky-test
> Fix For: 0.28.0
>
>
> {code:title=Output from passed test}
> [--] 1 test from MemoryPressureMesosTest
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB) copied, 0.000430889 s, 2.4 GB/s
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> I1202 11:09:14.319327  5062 exec.cpp:134] Version: 0.27.0
> I1202 11:09:14.17  5079 exec.cpp:208] Executor registered on slave 
> bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0
> Registered executor on ubuntu
> Starting task 4e62294c-cfcf-4a13-b699-c6a4b7ac5162
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 5085
> I1202 11:09:14.391739  5077 exec.cpp:254] Received reconnect request from 
> slave bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0
> I1202 11:09:14.398598  5082 exec.cpp:231] Executor re-registered on slave 
> bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0
> Re-registered executor on ubuntu
> Shutting down
> Sending SIGTERM to process tree at pid 5085
> Killing the following process trees:
> [ 
> -+- 5085 sh -c while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done 
>  \--- 5086 dd count=512 bs=1M if=/dev/zero of=./temp 
> ]
> [   OK ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (1096 ms)
> {code}
> {code:title=Output from failed test}
> [--] 1 test from MemoryPressureMesosTest
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB) copied, 0.000404489 s, 2.6 GB/s
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> I1202 11:09:15.509950  5109 exec.cpp:134] Version: 0.27.0
> I1202 11:09:15.568183  5123 exec.cpp:208] Executor registered on slave 
> 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0
> Registered executor on ubuntu
> Starting task 14b6bab9-9f60-4130-bdc4-44efba262bc6
> Forked command at 5132
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> I1202 11:09:15.665498  5129 exec.cpp:254] Received reconnect request from 
> slave 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0
> I1202 11:09:15.670995  5123 exec.cpp:381] Executor asked to shutdown
> Shutting down
> Sending SIGTERM to process tree at pid 5132
> ../../src/tests/containerizer/memory_pressure_tests.cpp:283: Failure
> (usage).failure(): Unknown container: ebe90e15-72fa-4519-837b-62f43052c913
> *** Aborted at 1449083355 (unix time) try "date -d @1449083355" if you are 
> using GNU date ***
> {code}
> Notice that in the failed test, the executor is asked to shutdown when it 
> tries to reconnect to the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4047) MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky

2016-02-25 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4047:
--
Fix Version/s: (was: 0.27.0)

> MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky
> ---
>
> Key: MESOS-4047
> URL: https://issues.apache.org/jira/browse/MESOS-4047
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
> Environment: Ubuntu 14, gcc 4.8.4
>Reporter: Joseph Wu
>Assignee: Alexander Rojas
>  Labels: flaky, flaky-test
> Fix For: 0.28.0
>
>
> {code:title=Output from passed test}
> [--] 1 test from MemoryPressureMesosTest
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB) copied, 0.000430889 s, 2.4 GB/s
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> I1202 11:09:14.319327  5062 exec.cpp:134] Version: 0.27.0
> I1202 11:09:14.17  5079 exec.cpp:208] Executor registered on slave 
> bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0
> Registered executor on ubuntu
> Starting task 4e62294c-cfcf-4a13-b699-c6a4b7ac5162
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 5085
> I1202 11:09:14.391739  5077 exec.cpp:254] Received reconnect request from 
> slave bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0
> I1202 11:09:14.398598  5082 exec.cpp:231] Executor re-registered on slave 
> bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0
> Re-registered executor on ubuntu
> Shutting down
> Sending SIGTERM to process tree at pid 5085
> Killing the following process trees:
> [ 
> -+- 5085 sh -c while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done 
>  \--- 5086 dd count=512 bs=1M if=/dev/zero of=./temp 
> ]
> [   OK ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (1096 ms)
> {code}
> {code:title=Output from failed test}
> [--] 1 test from MemoryPressureMesosTest
> 1+0 records in
> 1+0 records out
> 1048576 bytes (1.0 MB) copied, 0.000404489 s, 2.6 GB/s
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> I1202 11:09:15.509950  5109 exec.cpp:134] Version: 0.27.0
> I1202 11:09:15.568183  5123 exec.cpp:208] Executor registered on slave 
> 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0
> Registered executor on ubuntu
> Starting task 14b6bab9-9f60-4130-bdc4-44efba262bc6
> Forked command at 5132
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> I1202 11:09:15.665498  5129 exec.cpp:254] Received reconnect request from 
> slave 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0
> I1202 11:09:15.670995  5123 exec.cpp:381] Executor asked to shutdown
> Shutting down
> Sending SIGTERM to process tree at pid 5132
> ../../src/tests/containerizer/memory_pressure_tests.cpp:283: Failure
> (usage).failure(): Unknown container: ebe90e15-72fa-4519-837b-62f43052c913
> *** Aborted at 1449083355 (unix time) try "date -d @1449083355" if you are 
> using GNU date ***
> {code}
> Notice that in the failed test, the executor is asked to shutdown when it 
> tries to reconnect to the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-23 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4676:
--
Sprint:   (was: Mesosphere Sprint 29)

> ROOT_DOCKER_Logs is flaky.
> --
>
> Key: MESOS-4676
> URL: https://issues.apache.org/jira/browse/MESOS-4676
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
> Environment: CentOS 7 with SSL.
>Reporter: Bernd Mathiske
>  Labels: flaky, mesosphere, test
>
> {noformat}
> [18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
> [18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db 
> in 6.548327ms
> [18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted 
> db in 1.837816ms
> [18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
> iterator in 22044ns
> [18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
> beginning of db in 2347ns
> [18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
> through 0 keys in the db in 571ns
> [18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
> replica recovery
> [18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is 
> in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
> EMPTY status received a broadcasted recover request from 
> (13608)@172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
> recover response from a replica in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
> 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started 
> on 172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
> startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
> --zk_session_timeout="10secs"
> [18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
> allowing authenticated frameworks to register
> [18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
> allowing authenticated slaves to register
> [18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35] Loading 
> credentials for authentication from '/tmp/HncLLj/credentials'
> [18:06:25][Step 8/8] I0215 17:06:25.261723  1757 master.cpp:468] Using 
> default 'crammd5' authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.261855  1757 master.cpp:537] Using 
> default 'basic' HTTP authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.262022  1757 master.cpp:571] 
> Authorization enabled
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1755 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1758 whitelist_watcher.cpp:77] No 
> whitelist given
> [18:06:25][Step 8/8] I0215 17:06:25.262899  1760 leveldb.cpp:304] Persisting 
> metadata (8 bytes) to leveldb took 1.517992ms
> [18:06:25][Step 8/8] I0215 17:06:25.262924  1760 replica.cpp:320] Persisted 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.263144  1754 recover.cpp:473] Replica is 
> in STARTING status
> [18:06:25][Step 8/8] I0215 17:06:25.264010  1757 master.cpp:1712] The newly 
> elected leader is master@172.30.2.239:39785 with id 
> 112363e2-c680-4946-8fee-d0626ed8b21e
> [18:06:25][Step 8/8] I0215 17:06:25.264044  1757 master.cpp:1725] Elected as 
> the leading master!
> [18:06:25][Step 8/8] I0215 17:06:25.264061  1757 master.cpp:1470] Recovering 
> from registrar
> [18:06:25][Step 8/8] I0215 17:06:25.264117  1760 replica.cpp:673] Replica in 
> STARTING status received a broadcasted recover 

[jira] [Commented] (MESOS-4547) Introduce TASK_KILLING state.

2016-02-22 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156719#comment-15156719
 ] 

Bernd Mathiske commented on MESOS-4547:
---

The RR for tests (https://reviews.apache.org/r/43490/) has been discarded. Are 
there going to be tests and documentation for this feature?

> Introduce TASK_KILLING state.
> -
>
> Key: MESOS-4547
> URL: https://issues.apache.org/jira/browse/MESOS-4547
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>Assignee: Abhishek Dasgupta
>  Labels: mesosphere
> Fix For: 0.28.0
>
>
> Currently there is no state to express that a task is being killed, but is 
> not yet killed (see MESOS-4140). In a similar way to how we have 
> TASK_STARTING to indicate the task is starting but not yet running, a 
> TASK_KILLING state would indicate the task is being killed but is not yet 
> killed.
> This would need to be guarded by a framework capability to protect old 
> frameworks that cannot understand the TASK_KILLING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1992) Support launching executors with configured systemd

2016-02-18 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-1992:
--
Shepherd:   (was: Bernd Mathiske)

> Support launching executors with configured systemd 
> 
>
> Key: MESOS-1992
> URL: https://issues.apache.org/jira/browse/MESOS-1992
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Timothy Chen
>  Labels: mesosphere
>
> Currently running mesos-slave in docker with systemd, the mesos-slave 
> container cannot be upgraded while keeping all the tasks running since 
> killing the docker container will kill all the processes that is launched 
> with the mesos containerizer.
> If we can let the executor to be launched with systemd outside of the docker 
> container, then we can let the tasks remain running and recover them when the 
> slave is upgraded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4692) FetcherCacheHttpTest.HttpCachedSerialized flaky again.

2016-02-17 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150628#comment-15150628
 ] 

Bernd Mathiske commented on MESOS-4692:
---

If so then not likely because of changes in fetcher code or fetcher cache test 
code. This code has been stable except for how many tasks get run. Running less 
tasks should not make this more flaky. No idea yet what is causing it this 
time, though.

> FetcherCacheHttpTest.HttpCachedSerialized flaky again.
> --
>
> Key: MESOS-4692
> URL: https://issues.apache.org/jira/browse/MESOS-4692
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
> Environment: CentOS 7, plain
>Reporter: Bernd Mathiske
>Priority: Minor
>  Labels: flaky, test
>
> {noformat}
> [12:20:50] :   [Step 8/8] [ RUN  ] 
> FetcherCacheHttpTest.HttpCachedSerialized
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.842162 32498 leveldb.cpp:174] Opened 
> db in 4.973489ms
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.843670 32498 leveldb.cpp:181] 
> Compacted db in 1.48087ms
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.843709 32498 leveldb.cpp:196] 
> Created db iterator in 15661ns
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.843720 32498 leveldb.cpp:202] Seeked 
> to beginning of db in 1401ns
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.843729 32498 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 357ns
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.843760 32498 replica.cpp:779] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.844228 32513 recover.cpp:447] 
> Starting replica recovery
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.844411 32513 recover.cpp:473] 
> Replica is in EMPTY status
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.845355 32516 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> (2089)@172.30.2.21:33004
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.845825 32518 recover.cpp:193] 
> Received a recover response from a replica in EMPTY status
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.846307 32517 recover.cpp:564] 
> Updating replica status to STARTING
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.846789 32518 master.cpp:374] Master 
> 0941887d-60f1-4ff3-85f0-5d19ffee8005 (ip-172-30-2-21.mesosphere.io) started 
> on 172.30.2.21:33004
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.846810 32518 master.cpp:376] Flags 
> at startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/YFwdSN/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/YFwdSN/master" 
> --zk_session_timeout="10secs"
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.847057 32518 master.cpp:421] Master 
> only allowing authenticated frameworks to register
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.847066 32518 master.cpp:426] Master 
> only allowing authenticated slaves to register
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.847072 32518 credentials.hpp:35] 
> Loading credentials for authentication from '/tmp/YFwdSN/credentials'
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.847286 32518 master.cpp:466] Using 
> default 'crammd5' authenticator
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.847395 32518 master.cpp:535] Using 
> default 'basic' HTTP authenticator
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.847511 32518 master.cpp:569] 
> Authorization enabled
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.847642 32517 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.847646 32519 
> whitelist_watcher.cpp:77] No whitelist given
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.847795 32514 leveldb.cpp:304] 
> Persisting metadata (8 bytes) to leveldb took 1.368308ms
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.847825 32514 replica.cpp:320] 
> Persisted replica status to STARTING
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.848002 32512 recover.cpp:473] 
> Replica is in STARTING status
> [12:20:50]W:   [Step 8/8] I0217 12:20:50.849025 

[jira] [Created] (MESOS-4692) FetcherCacheHttpTest.HttpCachedSerialized flaky again.

2016-02-17 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-4692:
-

 Summary: FetcherCacheHttpTest.HttpCachedSerialized flaky again.
 Key: MESOS-4692
 URL: https://issues.apache.org/jira/browse/MESOS-4692
 Project: Mesos
  Issue Type: Bug
  Components: fetcher, test
 Environment: CentOS 7, plain
Reporter: Bernd Mathiske
Priority: Minor


{noformat}
[12:20:50] : [Step 8/8] [ RUN  ] 
FetcherCacheHttpTest.HttpCachedSerialized
[12:20:50]W: [Step 8/8] I0217 12:20:50.842162 32498 leveldb.cpp:174] Opened 
db in 4.973489ms
[12:20:50]W: [Step 8/8] I0217 12:20:50.843670 32498 leveldb.cpp:181] 
Compacted db in 1.48087ms
[12:20:50]W: [Step 8/8] I0217 12:20:50.843709 32498 leveldb.cpp:196] 
Created db iterator in 15661ns
[12:20:50]W: [Step 8/8] I0217 12:20:50.843720 32498 leveldb.cpp:202] Seeked 
to beginning of db in 1401ns
[12:20:50]W: [Step 8/8] I0217 12:20:50.843729 32498 leveldb.cpp:271] 
Iterated through 0 keys in the db in 357ns
[12:20:50]W: [Step 8/8] I0217 12:20:50.843760 32498 replica.cpp:779] 
Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
[12:20:50]W: [Step 8/8] I0217 12:20:50.844228 32513 recover.cpp:447] 
Starting replica recovery
[12:20:50]W: [Step 8/8] I0217 12:20:50.844411 32513 recover.cpp:473] 
Replica is in EMPTY status
[12:20:50]W: [Step 8/8] I0217 12:20:50.845355 32516 replica.cpp:673] 
Replica in EMPTY status received a broadcasted recover request from 
(2089)@172.30.2.21:33004
[12:20:50]W: [Step 8/8] I0217 12:20:50.845825 32518 recover.cpp:193] 
Received a recover response from a replica in EMPTY status
[12:20:50]W: [Step 8/8] I0217 12:20:50.846307 32517 recover.cpp:564] 
Updating replica status to STARTING
[12:20:50]W: [Step 8/8] I0217 12:20:50.846789 32518 master.cpp:374] Master 
0941887d-60f1-4ff3-85f0-5d19ffee8005 (ip-172-30-2-21.mesosphere.io) started on 
172.30.2.21:33004
[12:20:50]W: [Step 8/8] I0217 12:20:50.846810 32518 master.cpp:376] Flags 
at startup: --acls="" --allocation_interval="1secs" 
--allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" 
--authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/YFwdSN/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="25secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/YFwdSN/master" 
--zk_session_timeout="10secs"
[12:20:50]W: [Step 8/8] I0217 12:20:50.847057 32518 master.cpp:421] Master 
only allowing authenticated frameworks to register
[12:20:50]W: [Step 8/8] I0217 12:20:50.847066 32518 master.cpp:426] Master 
only allowing authenticated slaves to register
[12:20:50]W: [Step 8/8] I0217 12:20:50.847072 32518 credentials.hpp:35] 
Loading credentials for authentication from '/tmp/YFwdSN/credentials'
[12:20:50]W: [Step 8/8] I0217 12:20:50.847286 32518 master.cpp:466] Using 
default 'crammd5' authenticator
[12:20:50]W: [Step 8/8] I0217 12:20:50.847395 32518 master.cpp:535] Using 
default 'basic' HTTP authenticator
[12:20:50]W: [Step 8/8] I0217 12:20:50.847511 32518 master.cpp:569] 
Authorization enabled
[12:20:50]W: [Step 8/8] I0217 12:20:50.847642 32517 hierarchical.cpp:144] 
Initialized hierarchical allocator process
[12:20:50]W: [Step 8/8] I0217 12:20:50.847646 32519 
whitelist_watcher.cpp:77] No whitelist given
[12:20:50]W: [Step 8/8] I0217 12:20:50.847795 32514 leveldb.cpp:304] 
Persisting metadata (8 bytes) to leveldb took 1.368308ms
[12:20:50]W: [Step 8/8] I0217 12:20:50.847825 32514 replica.cpp:320] 
Persisted replica status to STARTING
[12:20:50]W: [Step 8/8] I0217 12:20:50.848002 32512 recover.cpp:473] 
Replica is in STARTING status
[12:20:50]W: [Step 8/8] I0217 12:20:50.849025 32516 master.cpp:1710] The 
newly elected leader is master@172.30.2.21:33004 with id 
0941887d-60f1-4ff3-85f0-5d19ffee8005
[12:20:50]W: [Step 8/8] I0217 12:20:50.849047 32516 master.cpp:1723] 
Elected as the leading master!
[12:20:50]W: [Step 8/8] I0217 12:20:50.849061 32516 master.cpp:1468] 
Recovering from registrar
[12:20:50]W: [Step 8/8] I0217 12:20:50.849055 32515 replica.cpp:673] 
Replica in STARTING status received a broadcasted recover request from 
(2091)@172.30.2.21:33004
[12:20:50]W: [Step 8/8] I0217 12:20:50.849172 32518 registrar.cpp:307] 
Recovering 

[jira] [Updated] (MESOS-4615) ContainerLoggerTest.DefaultToSandbox is flaky

2016-02-16 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4615:
--
Shepherd: Bernd Mathiske

> ContainerLoggerTest.DefaultToSandbox is flaky
> -
>
> Key: MESOS-4615
> URL: https://issues.apache.org/jira/browse/MESOS-4615
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 0.27.0
> Environment: CentOS 7, gcc, libevent & SSL enabled
>Reporter: Greg Mann
>Assignee: Joseph Wu
>  Labels: flaky-test, logger, mesosphere
>
> Just saw this failure on the ASF CI:
> {code}
> [ RUN  ] ContainerLoggerTest.DefaultToSandbox
> I0206 01:25:03.766458  2824 leveldb.cpp:174] Opened db in 72.979786ms
> I0206 01:25:03.811712  2824 leveldb.cpp:181] Compacted db in 45.162067ms
> I0206 01:25:03.811810  2824 leveldb.cpp:196] Created db iterator in 26090ns
> I0206 01:25:03.811828  2824 leveldb.cpp:202] Seeked to beginning of db in 
> 3173ns
> I0206 01:25:03.811839  2824 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 497ns
> I0206 01:25:03.811900  2824 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0206 01:25:03.812785  2849 recover.cpp:447] Starting replica recovery
> I0206 01:25:03.813043  2849 recover.cpp:473] Replica is in EMPTY status
> I0206 01:25:03.814668  2854 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (371)@172.17.0.8:37843
> I0206 01:25:03.815210  2849 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0206 01:25:03.815732  2854 recover.cpp:564] Updating replica status to 
> STARTING
> I0206 01:25:03.819664  2857 master.cpp:376] Master 
> 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de (74ef606c4063) started on 
> 172.17.0.8:37843
> I0206 01:25:03.819703  2857 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/h5vu5I/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/h5vu5I/master" --zk_session_timeout="10secs"
> I0206 01:25:03.820241  2857 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0206 01:25:03.820257  2857 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0206 01:25:03.820269  2857 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/h5vu5I/credentials'
> I0206 01:25:03.821110  2857 master.cpp:468] Using default 'crammd5' 
> authenticator
> I0206 01:25:03.821311  2857 master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0206 01:25:03.821636  2857 master.cpp:571] Authorization enabled
> I0206 01:25:03.821979  2846 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0206 01:25:03.822057  2846 whitelist_watcher.cpp:77] No whitelist given
> I0206 01:25:03.825460  2847 master.cpp:1712] The newly elected leader is 
> master@172.17.0.8:37843 with id 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de
> I0206 01:25:03.825512  2847 master.cpp:1725] Elected as the leading master!
> I0206 01:25:03.825533  2847 master.cpp:1470] Recovering from registrar
> I0206 01:25:03.825835  2847 registrar.cpp:307] Recovering registrar
> I0206 01:25:03.848212  2854 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 32.226093ms
> I0206 01:25:03.848299  2854 replica.cpp:320] Persisted replica status to 
> STARTING
> I0206 01:25:03.848702  2854 recover.cpp:473] Replica is in STARTING status
> I0206 01:25:03.850728  2858 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (373)@172.17.0.8:37843
> I0206 01:25:03.851230  2854 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0206 01:25:03.852018  2854 recover.cpp:564] Updating replica status to VOTING
> I0206 01:25:03.881681  2854 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 29.184163ms
> I0206 01:25:03.881772  2854 replica.cpp:320] Persisted replica status to 
> VOTING
> I0206 01:25:03.882058 

[jira] [Commented] (MESOS-4631) Document how to use custom authentication modules

2016-02-16 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149081#comment-15149081
 ] 

Bernd Mathiske commented on MESOS-4631:
---

Till is on vacation this week.

> Document how to use custom authentication modules
> -
>
> Key: MESOS-4631
> URL: https://issues.apache.org/jira/browse/MESOS-4631
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Priority: Minor
>  Labels: authentication, documentation, mesosphere
>
> The authentication doc page talks about custom authentication modules a bit, 
> but doesn't give enough information. For example:
> * What interface does a custom authentication module need to satisfy?
> * Can multiple authentication modules be used?
> * How do I implement a framework that authenticates with a master that uses a 
> non-default authentication module, e.g., one that doesn't use credentials?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4676:
--
Sprint: Mesosphere Sprint 29

> ROOT_DOCKER_Logs is flaky.
> --
>
> Key: MESOS-4676
> URL: https://issues.apache.org/jira/browse/MESOS-4676
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
> Environment: CentOS 7 with SSL.
>Reporter: Bernd Mathiske
>  Labels: flaky, mesosphere, test
>
> {noformat}
> [18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
> [18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db 
> in 6.548327ms
> [18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted 
> db in 1.837816ms
> [18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
> iterator in 22044ns
> [18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
> beginning of db in 2347ns
> [18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
> through 0 keys in the db in 571ns
> [18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
> replica recovery
> [18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is 
> in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
> EMPTY status received a broadcasted recover request from 
> (13608)@172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
> recover response from a replica in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
> 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started 
> on 172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
> startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
> --zk_session_timeout="10secs"
> [18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
> allowing authenticated frameworks to register
> [18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
> allowing authenticated slaves to register
> [18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35] Loading 
> credentials for authentication from '/tmp/HncLLj/credentials'
> [18:06:25][Step 8/8] I0215 17:06:25.261723  1757 master.cpp:468] Using 
> default 'crammd5' authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.261855  1757 master.cpp:537] Using 
> default 'basic' HTTP authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.262022  1757 master.cpp:571] 
> Authorization enabled
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1755 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1758 whitelist_watcher.cpp:77] No 
> whitelist given
> [18:06:25][Step 8/8] I0215 17:06:25.262899  1760 leveldb.cpp:304] Persisting 
> metadata (8 bytes) to leveldb took 1.517992ms
> [18:06:25][Step 8/8] I0215 17:06:25.262924  1760 replica.cpp:320] Persisted 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.263144  1754 recover.cpp:473] Replica is 
> in STARTING status
> [18:06:25][Step 8/8] I0215 17:06:25.264010  1757 master.cpp:1712] The newly 
> elected leader is master@172.30.2.239:39785 with id 
> 112363e2-c680-4946-8fee-d0626ed8b21e
> [18:06:25][Step 8/8] I0215 17:06:25.264044  1757 master.cpp:1725] Elected as 
> the leading master!
> [18:06:25][Step 8/8] I0215 17:06:25.264061  1757 master.cpp:1470] Recovering 
> from registrar
> [18:06:25][Step 8/8] I0215 17:06:25.264117  1760 replica.cpp:673] Replica in 
> STARTING status received a broadcasted recover request 

[jira] [Created] (MESOS-4677) LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.

2016-02-15 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-4677:
-

 Summary: LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is 
flaky.
 Key: MESOS-4677
 URL: https://issues.apache.org/jira/browse/MESOS-4677
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.27
Reporter: Bernd Mathiske


This test fails very often when run on CentOS 7, but may also fail elsewhere 
sometimes. Unfortunately, it tends to only fail when --verbose is not set. The 
output is this:
{noformat}
[21:45:21][Step 8/8] [ RUN  ] 
LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
[21:45:21][Step 8/8] ../../src/tests/containerizer/isolator_tests.cpp:807: 
Failure
[21:45:21][Step 8/8] Value of: usage.get().threads()
[21:45:21][Step 8/8]   Actual: 0
[21:45:21][Step 8/8] Expected: 1U
[21:45:21][Step 8/8] Which is: 1
[21:45:21][Step 8/8] [  FAILED  ] 
LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids (94 ms)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4676:
--
Environment: CentOS 7 with SSL.  (was: CentOS 6 with SSL.)

> ROOT_DOCKER_Logs is flaky.
> --
>
> Key: MESOS-4676
> URL: https://issues.apache.org/jira/browse/MESOS-4676
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
> Environment: CentOS 7 with SSL.
>Reporter: Bernd Mathiske
>  Labels: flaky, mesosphere, test
>
> {noformat}
> [18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
> [18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db 
> in 6.548327ms
> [18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted 
> db in 1.837816ms
> [18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
> iterator in 22044ns
> [18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
> beginning of db in 2347ns
> [18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
> through 0 keys in the db in 571ns
> [18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
> replica recovery
> [18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is 
> in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
> EMPTY status received a broadcasted recover request from 
> (13608)@172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
> recover response from a replica in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
> 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started 
> on 172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
> startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
> --zk_session_timeout="10secs"
> [18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
> allowing authenticated frameworks to register
> [18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
> allowing authenticated slaves to register
> [18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35] Loading 
> credentials for authentication from '/tmp/HncLLj/credentials'
> [18:06:25][Step 8/8] I0215 17:06:25.261723  1757 master.cpp:468] Using 
> default 'crammd5' authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.261855  1757 master.cpp:537] Using 
> default 'basic' HTTP authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.262022  1757 master.cpp:571] 
> Authorization enabled
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1755 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1758 whitelist_watcher.cpp:77] No 
> whitelist given
> [18:06:25][Step 8/8] I0215 17:06:25.262899  1760 leveldb.cpp:304] Persisting 
> metadata (8 bytes) to leveldb took 1.517992ms
> [18:06:25][Step 8/8] I0215 17:06:25.262924  1760 replica.cpp:320] Persisted 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.263144  1754 recover.cpp:473] Replica is 
> in STARTING status
> [18:06:25][Step 8/8] I0215 17:06:25.264010  1757 master.cpp:1712] The newly 
> elected leader is master@172.30.2.239:39785 with id 
> 112363e2-c680-4946-8fee-d0626ed8b21e
> [18:06:25][Step 8/8] I0215 17:06:25.264044  1757 master.cpp:1725] Elected as 
> the leading master!
> [18:06:25][Step 8/8] I0215 17:06:25.264061  1757 master.cpp:1470] Recovering 
> from registrar
> [18:06:25][Step 8/8] I0215 17:06:25.264117  1760 replica.cpp:673] Replica in 
> STARTING status received a 

[jira] [Created] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-15 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-4676:
-

 Summary: ROOT_DOCKER_Logs is flaky.
 Key: MESOS-4676
 URL: https://issues.apache.org/jira/browse/MESOS-4676
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.27
 Environment: CentOS 6 with SSL.
Reporter: Bernd Mathiske


{noformat}
[18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
[18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db in 
6.548327ms
[18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted db 
in 1.837816ms
[18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
iterator in 22044ns
[18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
beginning of db in 2347ns
[18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
through 0 keys in the db in 571ns
[18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
[18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
replica recovery
[18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is in 
EMPTY status
[18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
EMPTY status received a broadcasted recover request from 
(13608)@172.30.2.239:39785
[18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
recover response from a replica in EMPTY status
[18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started on 
172.30.2.239:39785
[18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
replica status to STARTING
[18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="100secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
--zk_session_timeout="10secs"
[18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
allowing authenticated frameworks to register
[18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
allowing authenticated slaves to register
[18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35] Loading 
credentials for authentication from '/tmp/HncLLj/credentials'
[18:06:25][Step 8/8] I0215 17:06:25.261723  1757 master.cpp:468] Using default 
'crammd5' authenticator
[18:06:25][Step 8/8] I0215 17:06:25.261855  1757 master.cpp:537] Using default 
'basic' HTTP authenticator
[18:06:25][Step 8/8] I0215 17:06:25.262022  1757 master.cpp:571] Authorization 
enabled
[18:06:25][Step 8/8] I0215 17:06:25.262177  1755 hierarchical.cpp:144] 
Initialized hierarchical allocator process
[18:06:25][Step 8/8] I0215 17:06:25.262177  1758 whitelist_watcher.cpp:77] No 
whitelist given
[18:06:25][Step 8/8] I0215 17:06:25.262899  1760 leveldb.cpp:304] Persisting 
metadata (8 bytes) to leveldb took 1.517992ms
[18:06:25][Step 8/8] I0215 17:06:25.262924  1760 replica.cpp:320] Persisted 
replica status to STARTING
[18:06:25][Step 8/8] I0215 17:06:25.263144  1754 recover.cpp:473] Replica is in 
STARTING status
[18:06:25][Step 8/8] I0215 17:06:25.264010  1757 master.cpp:1712] The newly 
elected leader is master@172.30.2.239:39785 with id 
112363e2-c680-4946-8fee-d0626ed8b21e
[18:06:25][Step 8/8] I0215 17:06:25.264044  1757 master.cpp:1725] Elected as 
the leading master!
[18:06:25][Step 8/8] I0215 17:06:25.264061  1757 master.cpp:1470] Recovering 
from registrar
[18:06:25][Step 8/8] I0215 17:06:25.264117  1760 replica.cpp:673] Replica in 
STARTING status received a broadcasted recover request from 
(13610)@172.30.2.239:39785
[18:06:25][Step 8/8] I0215 17:06:25.264197  1758 registrar.cpp:307] Recovering 
registrar
[18:06:25][Step 8/8] I0215 17:06:25.264827  1756 recover.cpp:193] Received a 
recover response from a replica in STARTING status
[18:06:25][Step 8/8] I0215 17:06:25.265219  1757 recover.cpp:564] Updating 
replica status to VOTING
[18:06:25][Step 8/8] 

[jira] [Created] (MESOS-4674) Linux filesystem isolator tests are flaky.

2016-02-15 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-4674:
-

 Summary: Linux filesystem isolator tests are flaky.
 Key: MESOS-4674
 URL: https://issues.apache.org/jira/browse/MESOS-4674
 Project: Mesos
  Issue Type: Bug
  Components: testing, flaky
Affects Versions: 0.27
 Environment: CentOS 7 (directly on an AWS instance)

Reporter: Bernd Mathiske


LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem sometimes 
fails on CentOS 7 with this kind of output:
{nofromat}
../../src/tests/containerizer/filesystem_isolator_tests.cpp:1054: Failure
Failed to wait 2mins for launch
{noformat}

LinuxFilesystemIsolatorTest.ROOT_MultipleContainers often has this output:
{nofromat}
../../src/tests/containerizer/filesystem_isolator_tests.cpp:1138: Failure
Failed to wait 1mins for launch1
{noformat}

Whether SSL is configured makes no difference.

This test may also fail on other platforms, but more rarely.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4075) Continue test suite execution across crashing tests.

2016-02-03 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130038#comment-15130038
 ] 

Bernd Mathiske commented on MESOS-4075:
---

We can only estimate what will be more work in the long run:
- patching test exclusion lists and restarting tests, etc.
- fixing the test system once
My bets are on the latter.

> Continue test suite execution across crashing tests.
> 
>
> Key: MESOS-4075
> URL: https://issues.apache.org/jira/browse/MESOS-4075
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Bernd Mathiske
>  Labels: mesosphere
>
> Currently, mesos-tests.sh exits when a test crashes. This is inconvenient 
> when trying to find out all tests that fail. 
> mesos-tests.sh should rate a test that crashes as failed and continue the 
> same way as if the test merely returned with a failure result and exited 
> properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3552) CHECK failure due to floating point precision on reservation request

2016-02-03 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3552:
--
  Sprint: Mesosphere Sprint 28
Story Points: 3

> CHECK failure due to floating point precision on reservation request
> 
>
> Key: MESOS-3552
> URL: https://issues.apache.org/jira/browse/MESOS-3552
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Mandeep Chadha
>Assignee: Mandeep Chadha
>  Labels: mesosphere, tech-debt
> Fix For: 0.26.0
>
>
> result.cpus() == cpus() check is failing due to ( double == double ) 
> comparison problem. 
> Root Cause : 
> Framework requested 0.1 cpu reservation for the first task. So far so good. 
> Next Reserve operation — lead to double operations resulting in following 
> double values :
>  results.cpus() : 23.9964472863211995 cpus() : 24
> And the check ( result.cpus() == cpus() ) failed. 
>  The double arithmetic operations caused results.cpus() value to be :  
> 23.9964472863211995 and hence ( 23.9964472863211995 
> == 24 ) failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1790) Add "chown" option to CommandInfo.URI

2016-02-01 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-1790:
--
Sprint:   (was: Mesosphere Sprint 27)

> Add "chown" option to CommandInfo.URI
> -
>
> Key: MESOS-1790
> URL: https://issues.apache.org/jira/browse/MESOS-1790
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Jim Klucar
>  Labels: myriad, newbie
> Attachments: 
> 0001-MESOS-1790-Adds-chown-option-to-CommandInfo.URI.patch
>
>
> Mesos fetcher always chown()s the extracted executor URIs as the executor 
> user but sometimes this is not desirable, e.g., "setuid" bit gets lost during 
> chown() if slave/fetcher is running as root. 
> It would be nice to give frameworks the ability to skip the chown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3568) The State (/state) endpoint should be documented

2016-02-01 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3568:
--
Assignee: Kevin Klues  (was: Michael Park)

> The State (/state) endpoint should be documented
> 
>
> Key: MESOS-3568
> URL: https://issues.apache.org/jira/browse/MESOS-3568
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, master
>Reporter: James Fisher
>Assignee: Kevin Klues
>  Labels: documentation, mesosphere, newbie, tech-debt
>
> Our tests are using a resource `/state.json` hosted by the Mesos master. I 
> have searched for the documentation for this resource but have been unable to 
> find anything.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4368) Make HierarchicalAllocatorProcess set a Resource's active role during allocation

2016-02-01 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4368:
--
Assignee: (was: Jan Schlicht)

> Make HierarchicalAllocatorProcess set a Resource's active role during 
> allocation
> 
>
> Key: MESOS-4368
> URL: https://issues.apache.org/jira/browse/MESOS-4368
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Bannier
>  Labels: mesosphere
>
> The concrete implementation here depends on the implementation strategy used 
> to solve MESOS-4367.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4368) Make HierarchicalAllocatorProcess set a Resource's active role during allocation

2016-02-01 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126230#comment-15126230
 ] 

Bernd Mathiske commented on MESOS-4368:
---

Postponed?

> Make HierarchicalAllocatorProcess set a Resource's active role during 
> allocation
> 
>
> Key: MESOS-4368
> URL: https://issues.apache.org/jira/browse/MESOS-4368
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Bannier
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> The concrete implementation here depends on the implementation strategy used 
> to solve MESOS-4367.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4368) Make HierarchicalAllocatorProcess set a Resource's active role during allocation

2016-02-01 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4368:
--
Assignee: Jan Schlicht

> Make HierarchicalAllocatorProcess set a Resource's active role during 
> allocation
> 
>
> Key: MESOS-4368
> URL: https://issues.apache.org/jira/browse/MESOS-4368
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Bannier
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> The concrete implementation here depends on the implementation strategy used 
> to solve MESOS-4367.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3787) Expand environment variables through the Docker executor.

2016-02-01 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3787:
--
Sprint: Mesosphere Sprint 26  (was: Mesosphere Sprint 26, Mesosphere Sprint 
27)

> Expand environment variables through the Docker executor.
> -
>
> Key: MESOS-3787
> URL: https://issues.apache.org/jira/browse/MESOS-3787
> Project: Mesos
>  Issue Type: Wish
>Reporter: John Garcia
>Assignee: Adam B
>  Labels: mesosphere
> Attachments: mesos.patch, test-example.json
>
>
> We'd like to have expanded variables usable in [the json files used to create 
> a Marathon app, hence] the Task's CommandInfo, so that the executor is able 
> to detect the correct values at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4334) Add documentation for the registry

2016-02-01 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4334:
--
Sprint:   (was: Mesosphere Sprint 27)

> Add documentation for the registry
> --
>
> Key: MESOS-4334
> URL: https://issues.apache.org/jira/browse/MESOS-4334
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, master
>Reporter: Neil Conway
>  Labels: documentation, mesosphere, registry
>
> What information does the master store in the registry? What do operators 
> need to know about managing the registry?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4156) Speed up FetcherCacheTest.* and FetcherCacheHttpTest.*

2016-01-22 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112113#comment-15112113
 ] 

Bernd Mathiske commented on MESOS-4156:
---

Sure thing.

> Speed up FetcherCacheTest.* and FetcherCacheHttpTest.*
> --
>
> Key: MESOS-4156
> URL: https://issues.apache.org/jira/browse/MESOS-4156
> Project: Mesos
>  Issue Type: Epic
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>Priority: Minor
>  Labels: mesosphere, newbie++, tech-debt
>
> Execution times on Mac OS 10.10.4:
> {code}
> FetcherCacheTest.LocalUncached (2417 ms)
> FetcherCacheTest.LocalCached (2476 ms)
> FetcherCacheTest.LocalUncachedExtract (2496 ms)
> FetcherCacheTest.LocalCachedExtract (2471 ms)
> FetcherCacheTest.SimpleEviction (4451 ms)
> FetcherCacheTest.FallbackFromEviction (2483 ms)
> FetcherCacheTest.RemoveLRUCacheEntries (3422 ms)
> FetcherCacheHttpTest.HttpCachedSerialized (2490 ms)
> FetcherCacheHttpTest.HttpCachedConcurrent (1032 ms)
> FetcherCacheHttpTest.HttpMixed (1022 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4156) Speed up FetcherCacheTest.* and FetcherCacheHttpTest.*

2016-01-22 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4156:
--
Shepherd: Bernd Mathiske

> Speed up FetcherCacheTest.* and FetcherCacheHttpTest.*
> --
>
> Key: MESOS-4156
> URL: https://issues.apache.org/jira/browse/MESOS-4156
> Project: Mesos
>  Issue Type: Epic
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>Priority: Minor
>  Labels: mesosphere, newbie++, tech-debt
>
> Execution times on Mac OS 10.10.4:
> {code}
> FetcherCacheTest.LocalUncached (2417 ms)
> FetcherCacheTest.LocalCached (2476 ms)
> FetcherCacheTest.LocalUncachedExtract (2496 ms)
> FetcherCacheTest.LocalCachedExtract (2471 ms)
> FetcherCacheTest.SimpleEviction (4451 ms)
> FetcherCacheTest.FallbackFromEviction (2483 ms)
> FetcherCacheTest.RemoveLRUCacheEntries (3422 ms)
> FetcherCacheHttpTest.HttpCachedSerialized (2490 ms)
> FetcherCacheHttpTest.HttpCachedConcurrent (1032 ms)
> FetcherCacheHttpTest.HttpMixed (1022 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3854) Finalize design for generalized Authorizer interface

2016-01-21 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3854:
--
Description: 
Finalize the structure of ACLs and achieve consensus on the design doc proposed 
in MESOS-2949.

https://docs.google.com/document/d/1-XARWJFUq0r_TgRHz_472NvLZNjbqE4G8c2JL44OSMQ/edit

  was:Finalize the structure of ACLs and achieve consensus on the design doc 
proposed in MESOS-2949.


> Finalize design for generalized Authorizer interface
> 
>
> Key: MESOS-3854
> URL: https://issues.apache.org/jira/browse/MESOS-3854
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Bernd Mathiske
>Assignee: Alexander Rojas
>  Labels: authorization, mesosphere
>
> Finalize the structure of ACLs and achieve consensus on the design doc 
> proposed in MESOS-2949.
> https://docs.google.com/document/d/1-XARWJFUq0r_TgRHz_472NvLZNjbqE4G8c2JL44OSMQ/edit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4156) Speed up FetcherCacheTest.* and FetcherCacheHttpTest.*

2016-01-21 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1557#comment-1557
 ] 

Bernd Mathiske commented on MESOS-4156:
---

OK for LocalUncached, but LocalCached requires at least 2 rounds to verify the 
caching works as expected.

> Speed up FetcherCacheTest.* and FetcherCacheHttpTest.*
> --
>
> Key: MESOS-4156
> URL: https://issues.apache.org/jira/browse/MESOS-4156
> Project: Mesos
>  Issue Type: Epic
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>Priority: Minor
>  Labels: mesosphere, newbie++, tech-debt
>
> Execution times on Mac OS 10.10.4:
> {code}
> FetcherCacheTest.LocalUncached (2417 ms)
> FetcherCacheTest.LocalCached (2476 ms)
> FetcherCacheTest.LocalUncachedExtract (2496 ms)
> FetcherCacheTest.LocalCachedExtract (2471 ms)
> FetcherCacheTest.SimpleEviction (4451 ms)
> FetcherCacheTest.FallbackFromEviction (2483 ms)
> FetcherCacheTest.RemoveLRUCacheEntries (3422 ms)
> FetcherCacheHttpTest.HttpCachedSerialized (2490 ms)
> FetcherCacheHttpTest.HttpCachedConcurrent (1032 ms)
> FetcherCacheHttpTest.HttpMixed (1022 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4417) Prevent allocator from crashing on successful recovery.

2016-01-20 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4417:
--
Description: 
There might be a bug that may crash the master as pointed out by [~bmahler] in 
https://reviews.apache.org/r/4/:
{noformat}
It looks like if we trip the resume call in addSlave, this delayed resume will 
crash the master due to the CHECK(paused) that currently resides in resume.
{noformat}

  was:There might be a bug that may crash the master as pointed out by 
[~bmahler] in https://reviews.apache.org/r/4/.


> Prevent allocator from crashing on successful recovery.
> ---
>
> Key: MESOS-4417
> URL: https://issues.apache.org/jira/browse/MESOS-4417
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: mesosphere
>
> There might be a bug that may crash the master as pointed out by [~bmahler] 
> in https://reviews.apache.org/r/4/:
> {noformat}
> It looks like if we trip the resume call in addSlave, this delayed resume 
> will crash the master due to the CHECK(paused) that currently resides in 
> resume.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4417) Prevent allocator from crashing on successful recovery.

2016-01-20 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4417:
--
Description: 
There might be a bug that may crash the master as pointed out by [~bmahler] in 
https://reviews.apache.org/r/4/:
{noformat}
It looks like if we trip the resume call in addSlave, this delayed resume will 
crash the master 
due to the CHECK(paused) that currently resides in resume.
{noformat}

  was:
There might be a bug that may crash the master as pointed out by [~bmahler] in 
https://reviews.apache.org/r/4/:
{noformat}
It looks like if we trip the resume call in addSlave, this delayed resume will 
crash the master due to the CHECK(paused) that currently resides in resume.
{noformat}


> Prevent allocator from crashing on successful recovery.
> ---
>
> Key: MESOS-4417
> URL: https://issues.apache.org/jira/browse/MESOS-4417
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: mesosphere
>
> There might be a bug that may crash the master as pointed out by [~bmahler] 
> in https://reviews.apache.org/r/4/:
> {noformat}
> It looks like if we trip the resume call in addSlave, this delayed resume 
> will crash the master 
> due to the CHECK(paused) that currently resides in resume.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.

2016-01-18 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105367#comment-15105367
 ] 

Bernd Mathiske commented on MESOS-4392:
---

Good point/questions:  Should we allow resources beyond the limit as long as 
they are revocable? Should resources up to the limit be non-revocable by 
default? 

> Balance quota frameworks with non-quota, greedy frameworks.
> ---
>
> Key: MESOS-4392
> URL: https://issues.apache.org/jira/browse/MESOS-4392
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, master
>Reporter: Bernd Mathiske
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Maximize resource utilization and minimize starvation risk for both quota 
> frameworks and non-quota, greedy frameworks when competing with each other.
> A greedy analytics batch system wants to use as much of the cluster as 
> possible to maximize computational throughput. When a competing web service 
> with fixed task size starts up, there must be sufficient resources to run it 
> immediately. The operator can reserve these resources by setting quota. 
> However, if these resources are kept idle until the service is in use, this 
> is wasteful from the analytics job's point of view. On the other hand, the 
> analytics job should hand back reserved resources to the service when needed 
> to avoid starvation of the latter.
> We can assume that often, the resources needed by the service will be of the 
> non-revocable variety. Here we need to introduce clearer distinctions between 
> oversubscribed and revocable resources that are not oversubscribed. An 
> oversubscribed resource cannot be converted into a non-revocable resource, 
> not even by preemption. In contrast, a non-oversubscribed, revocable resource 
> can be converted into a non-revocable resource.
> Another related topic is optimistic offers. The pertinent aspect in this 
> context is again whether resources are oversubscribed or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.

2016-01-18 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105066#comment-15105066
 ] 

Bernd Mathiske commented on MESOS-4392:
---

Yes, I should clarify what I am trying to suggest here. Revoking a SINGLE 
accepted oversubscription offer for a resource cannot make the resource 
non-revocable, because another task may be holding on to its actual physical 
assets. Only revoking the "regular" offer that claims the same resource would 
create a clear enough picture to assign the resource as non-revocable to a new 
primary "owner". We'd then rely on the QoS mechanism to satisfy the needs of 
the latter in case a third, revocable offer were currently using the resource.

It is unclear to me from this doc whether oversubscription can only occur when 
there is also one "regular" offer for the same resource:
https://github.com/nqn/mesos/blob/niklas/oversubscription-user-doc/docs/oversubscription.md

My guess would be that you can also have revocable resources only and still 
achieve oversubscription. In any case, once we make revocable the default this 
will be the case. Then the situation above will change slightly. Not having a 
goto "regular owner" offer that determines whether the resource is actually 
available, we can immediately hand out a non-revokable offer. QoS should then 
make the physical resource available on demand. Alternatively, maybe as a 
booster option to speed things up, we could provide a clean slate by revoking 
the entire oversubscription set.


> Balance quota frameworks with non-quota, greedy frameworks.
> ---
>
> Key: MESOS-4392
> URL: https://issues.apache.org/jira/browse/MESOS-4392
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, master
>Reporter: Bernd Mathiske
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Maximize resource utilization and minimize starvation risk for both quota 
> frameworks and non-quota, greedy frameworks when competing with each other.
> A greedy analytics batch system wants to use as much of the cluster as 
> possible to maximize computational throughput. When a competing web service 
> with fixed task size starts up, there must be sufficient resources to run it 
> immediately. The operator can reserve these resources by setting quota. 
> However, if these resources are kept idle until the service is in use, this 
> is wasteful from the analytics job's point of view. On the other hand, the 
> analytics job should hand back reserved resources to the service when needed 
> to avoid starvation of the latter.
> We can assume that often, the resources needed by the service will be of the 
> non-revocable variety. Here we need to introduce clearer distinctions between 
> oversubscribed and revocable resources that are not oversubscribed. An 
> oversubscribed resource cannot be converted into a non-revocable resource, 
> not even by preemption. In contrast, a non-oversubscribed, revocable resource 
> can be converted into a non-revocable resource.
> Another related topic is optimistic offers. The pertinent aspect in this 
> context is again whether resources are oversubscribed or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.

2016-01-18 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105028#comment-15105028
 ] 

Bernd Mathiske commented on MESOS-4392:
---

Except for you last sentence, which I do not understand, I agree. It makes 
sense if a framework uses only non-revocable resources up to its quota. Note 
that if it does not set a quota limit, it can still use resources beyond its 
guarantee and those resources we now want to be revocable by default.

> Balance quota frameworks with non-quota, greedy frameworks.
> ---
>
> Key: MESOS-4392
> URL: https://issues.apache.org/jira/browse/MESOS-4392
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, master
>Reporter: Bernd Mathiske
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Maximize resource utilization and minimize starvation risk for both quota 
> frameworks and non-quota, greedy frameworks when competing with each other.
> A greedy analytics batch system wants to use as much of the cluster as 
> possible to maximize computational throughput. When a competing web service 
> with fixed task size starts up, there must be sufficient resources to run it 
> immediately. The operator can reserve these resources by setting quota. 
> However, if these resources are kept idle until the service is in use, this 
> is wasteful from the analytics job's point of view. On the other hand, the 
> analytics job should hand back reserved resources to the service when needed 
> to avoid starvation of the latter.
> We can assume that often, the resources needed by the service will be of the 
> non-revocable variety. Here we need to introduce clearer distinctions between 
> oversubscribed and revocable resources that are not oversubscribed. An 
> oversubscribed resource cannot be converted into a non-revocable resource, 
> not even by preemption. In contrast, a non-oversubscribed, revocable resource 
> can be converted into a non-revocable resource.
> Another related topic is optimistic offers. The pertinent aspect in this 
> context is again whether resources are oversubscribed or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4130) Document how the fetcher can reach across a proxy connection.

2016-01-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4130:
--
Attachment: signature.asc

Submitted.




> Document how the fetcher can reach across a proxy connection.
> -
>
> Key: MESOS-4130
> URL: https://issues.apache.org/jira/browse/MESOS-4130
> Project: Mesos
>  Issue Type: Documentation
>  Components: fetcher
>Reporter: Bernd Mathiske
>Assignee: Shuai Lin
>  Labels: mesosphere, newbie
> Attachments: signature.asc
>
>
> The fetcher uses libcurl for downloading content from HTTP, HTTPS, etc. There 
> is no source code in the pertinent parts of "net.hpp" that deals with proxy 
> settings. However, libcurl automatically picks up certain environment 
> variables and adjusts its settings accordingly. See "man libcurl-tutorial" 
> for details. See section "Proxies", subsection "Environment Variables". If 
> you follow this recipe in your Mesos agent startup script, you can use a 
> proxy. 
> We should document this in the fetcher (cache) doc 
> (http://mesos.apache.org/documentation/latest/fetcher/).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4363) Add a roles field to FrameworkInfo

2016-01-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4363:
--
  Sprint: Mesosphere Sprint 27
Story Points: 1

> Add a roles field to FrameworkInfo
> --
>
> Key: MESOS-4363
> URL: https://issues.apache.org/jira/browse/MESOS-4363
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework, master
>Reporter: Benjamin Bannier
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> To represent multiple roles per framework a new repeated string field for 
> roles is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4363) Add a roles field to FrameworkInfo

2016-01-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4363:
--
Component/s: master
 framework

> Add a roles field to FrameworkInfo
> --
>
> Key: MESOS-4363
> URL: https://issues.apache.org/jira/browse/MESOS-4363
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework, master
>Reporter: Benjamin Bannier
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> To represent multiple roles per framework a new repeated string field for 
> roles is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.

2016-01-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4392:
--
Epic Name: Revocable by default

> Balance quota frameworks with non-quota, greedy frameworks.
> ---
>
> Key: MESOS-4392
> URL: https://issues.apache.org/jira/browse/MESOS-4392
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, master
>Reporter: Bernd Mathiske
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Maximize resource utilization and minimize starvation risk for both quota 
> frameworks and non-quota, greedy frameworks when competing with each other.
> A greedy analytics batch system wants to use as much of the cluster as 
> possible to maximize computational throughput. When a competing web service 
> with fixed task size starts up, there must be sufficient resources to run it 
> immediately. The operator can reserve these resources by setting quota. 
> However, if these resources are kept idle until the service is in use, this 
> is wasteful from the analytics job's point of view. On the other hand, the 
> analytics job should hand back reserved resources to the service when needed 
> to avoid starvation of the latter.
> We can assume that often, the resources needed by the service will be of the 
> non-revocable variety. Here we need to introduce clearer distinctions between 
> oversubscribed and revocable resources that are not oversubscribed. An 
> oversubscribed resource cannot be converted into a non-revocable resource, 
> not even by preemption. In contrast, a non-oversubscribed, revocable resource 
> can be converted into a non-revocable resource.
> Another related topic is optimistic offers. The pertinent aspect in this 
> context is again whether resources are oversubscribed or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.

2016-01-15 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4392:
--
Issue Type: Epic  (was: Improvement)

> Balance quota frameworks with non-quota, greedy frameworks.
> ---
>
> Key: MESOS-4392
> URL: https://issues.apache.org/jira/browse/MESOS-4392
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, master
>Reporter: Bernd Mathiske
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Maximize resource utilization and minimize starvation risk for both quota 
> frameworks and non-quota, greedy frameworks when competing with each other.
> A greedy analytics batch system wants to use as much of the cluster as 
> possible to maximize computational throughput. When a competing web service 
> with fixed task size starts up, there must be sufficient resources to run it 
> immediately. The operator can reserve these resources by setting quota. 
> However, if these resources are kept idle until the service is in use, this 
> is wasteful from the analytics job's point of view. On the other hand, the 
> analytics job should hand back reserved resources to the service when needed 
> to avoid starvation of the latter.
> We can assume that often, the resources needed by the service will be of the 
> non-revocable variety. Here we need to introduce clearer distinctions between 
> oversubscribed and revocable resources that are not oversubscribed. An 
> oversubscribed resource cannot be converted into a non-revocable resource, 
> not even by preemption. In contrast, a non-oversubscribed, revocable resource 
> can be converted into a non-revocable resource.
> Another related topic is optimistic offers. The pertinent aspect in this 
> context is again whether resources are oversubscribed or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4393) Draft design document for resource revocability by default.

2016-01-15 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-4393:
-

 Summary: Draft design document for resource revocability by 
default.
 Key: MESOS-4393
 URL: https://issues.apache.org/jira/browse/MESOS-4393
 Project: Mesos
  Issue Type: Task
  Components: allocation, master
Reporter: Bernd Mathiske
Assignee: Alexander Rukletsov


Create a design document for setting offered resources as "revocable by 
default". Greedy frameworks can then temporarily use resources set aside to 
satisfy quota.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.

2016-01-15 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-4392:
-

 Summary: Balance quota frameworks with non-quota, greedy 
frameworks.
 Key: MESOS-4392
 URL: https://issues.apache.org/jira/browse/MESOS-4392
 Project: Mesos
  Issue Type: Improvement
  Components: allocation, master
Reporter: Bernd Mathiske
Assignee: Alexander Rukletsov


Maximize resource utilization and minimize starvation risk for both quota 
frameworks and non-quota, greedy frameworks when competing with each other.

A greedy analytics batch system wants to use as much of the cluster as possible 
to maximize computational throughput. When a competing web service with fixed 
task size starts up, there must be sufficient resources to run it immediately. 
The operator can reserve these resources by setting quota. However, if these 
resources are kept idle until the service is in use, this is wasteful from the 
analytics job's point of view. On the other hand, the analytics job should hand 
back reserved resources to the service when needed to avoid starvation of the 
latter.

We can assume that often, the resources needed by the service will be of the 
non-revocable variety. Here we need to introduce clearer distinctions between 
oversubscribed and revocable resources that are not oversubscribed. An 
oversubscribed resource cannot be converted into a non-revocable resource, not 
even by preemption. In contrast, a non-oversubscribed, revocable resource can 
be converted into a non-revocable resource.

Another related topic is optimistic offers. The pertinent aspect in this 
context is again whether resources are oversubscribed or not.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4304) hdfs operations fail due to prepended / on path for non-hdfs hadoop clients.

2016-01-13 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095903#comment-15095903
 ] 

Bernd Mathiske commented on MESOS-4304:
---

Roger.

> hdfs operations fail due to prepended / on path for non-hdfs hadoop clients.
> 
>
> Key: MESOS-4304
> URL: https://issues.apache.org/jira/browse/MESOS-4304
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 0.26.0
>Reporter: James Cunningham
>
> This bug was resolved for the hdfs protocol for MESOS-3602 but since the 
> process checks for the "hdfs" protocol at the beginning of the URI, the fix 
> does not extend itself to non-hdfs hadoop clients.
> {code}
> I0107 01:22:01.259490 17678 logging.cpp:172] INFO level logging started!
> I0107 01:22:01.259856 17678 fetcher.cpp:422] Fetcher Info: 
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/root","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"maprfs:\/\/\/mesos\/storm-mesos-0.9.3.tgz"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/s0121.stag.urbanairship.com:36373\/conf\/storm.yaml"}}],"sandbox_directory":"\/mnt\/data\/mesos\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/frameworks\/530dda5a-481a-4117-8154-3aee637d3b38-\/executors\/word-count-1-1452129714\/runs\/4443d5ac-d034-49b3-bf12-08fb9b0d92d0","user":"root"}
> I0107 01:22:01.262171 17678 fetcher.cpp:377] Fetching URI 
> 'maprfs:///mesos/storm-mesos-0.9.3.tgz'
> I0107 01:22:01.262212 17678 fetcher.cpp:248] Fetching directly into the 
> sandbox directory
> I0107 01:22:01.262243 17678 fetcher.cpp:185] Fetching URI 
> 'maprfs:///mesos/storm-mesos-0.9.3.tgz'
> I0107 01:22:01.671777 17678 fetcher.cpp:110] Downloading resource with Hadoop 
> client from 'maprfs:///mesos/storm-mesos-0.9.3.tgz' to 
> '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz'
> copyToLocal: java.net.URISyntaxException: Expected scheme-specific part at 
> index 7: maprfs:
> Usage: java FsShell [-copyToLocal [-ignoreCrc] [-crc]  ]
> E0107 01:22:02.435556 17678 shell.hpp:90] Command 'hadoop fs -copyToLocal 
> '/maprfs:///mesos/storm-mesos-0.9.3.tgz' 
> '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz''
>  failed; this is the output:
> Failed to fetch 'maprfs:///mesos/storm-mesos-0.9.3.tgz': HDFS copyToLocal 
> failed: Failed to execute 'hadoop fs -copyToLocal 
> '/maprfs:///mesos/storm-mesos-0.9.3.tgz' 
> '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz'';
>  the command was either not found or exited with a non-zero exit status: 255
> Failed to synchronize with slave (it's probably exited)
> {code}
> After a brief chat with [~jieyu], it was recommended to fix the current hdfs 
> client code because the new hadoop fetcher plugin is slated to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4304) hdfs operations fail due to prepended / on path for non-hdfs hadoop clients.

2016-01-13 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4304:
--
Shepherd: Jie Yu

> hdfs operations fail due to prepended / on path for non-hdfs hadoop clients.
> 
>
> Key: MESOS-4304
> URL: https://issues.apache.org/jira/browse/MESOS-4304
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 0.26.0
>Reporter: James Cunningham
>Assignee: Bernd Mathiske
>
> This bug was resolved for the hdfs protocol for MESOS-3602 but since the 
> process checks for the "hdfs" protocol at the beginning of the URI, the fix 
> does not extend itself to non-hdfs hadoop clients.
> {code}
> I0107 01:22:01.259490 17678 logging.cpp:172] INFO level logging started!
> I0107 01:22:01.259856 17678 fetcher.cpp:422] Fetcher Info: 
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/root","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"maprfs:\/\/\/mesos\/storm-mesos-0.9.3.tgz"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/s0121.stag.urbanairship.com:36373\/conf\/storm.yaml"}}],"sandbox_directory":"\/mnt\/data\/mesos\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/frameworks\/530dda5a-481a-4117-8154-3aee637d3b38-\/executors\/word-count-1-1452129714\/runs\/4443d5ac-d034-49b3-bf12-08fb9b0d92d0","user":"root"}
> I0107 01:22:01.262171 17678 fetcher.cpp:377] Fetching URI 
> 'maprfs:///mesos/storm-mesos-0.9.3.tgz'
> I0107 01:22:01.262212 17678 fetcher.cpp:248] Fetching directly into the 
> sandbox directory
> I0107 01:22:01.262243 17678 fetcher.cpp:185] Fetching URI 
> 'maprfs:///mesos/storm-mesos-0.9.3.tgz'
> I0107 01:22:01.671777 17678 fetcher.cpp:110] Downloading resource with Hadoop 
> client from 'maprfs:///mesos/storm-mesos-0.9.3.tgz' to 
> '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz'
> copyToLocal: java.net.URISyntaxException: Expected scheme-specific part at 
> index 7: maprfs:
> Usage: java FsShell [-copyToLocal [-ignoreCrc] [-crc]  ]
> E0107 01:22:02.435556 17678 shell.hpp:90] Command 'hadoop fs -copyToLocal 
> '/maprfs:///mesos/storm-mesos-0.9.3.tgz' 
> '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz''
>  failed; this is the output:
> Failed to fetch 'maprfs:///mesos/storm-mesos-0.9.3.tgz': HDFS copyToLocal 
> failed: Failed to execute 'hadoop fs -copyToLocal 
> '/maprfs:///mesos/storm-mesos-0.9.3.tgz' 
> '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz'';
>  the command was either not found or exited with a non-zero exit status: 255
> Failed to synchronize with slave (it's probably exited)
> {code}
> After a brief chat with [~jieyu], it was recommended to fix the current hdfs 
> client code because the new hadoop fetcher plugin is slated to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4304) hdfs operations fail due to prepended / on path for non-hdfs hadoop clients.

2016-01-13 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske reassigned MESOS-4304:
-

Assignee: Bernd Mathiske

> hdfs operations fail due to prepended / on path for non-hdfs hadoop clients.
> 
>
> Key: MESOS-4304
> URL: https://issues.apache.org/jira/browse/MESOS-4304
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 0.26.0
>Reporter: James Cunningham
>Assignee: Bernd Mathiske
>
> This bug was resolved for the hdfs protocol for MESOS-3602 but since the 
> process checks for the "hdfs" protocol at the beginning of the URI, the fix 
> does not extend itself to non-hdfs hadoop clients.
> {code}
> I0107 01:22:01.259490 17678 logging.cpp:172] INFO level logging started!
> I0107 01:22:01.259856 17678 fetcher.cpp:422] Fetcher Info: 
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/root","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"maprfs:\/\/\/mesos\/storm-mesos-0.9.3.tgz"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/s0121.stag.urbanairship.com:36373\/conf\/storm.yaml"}}],"sandbox_directory":"\/mnt\/data\/mesos\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/frameworks\/530dda5a-481a-4117-8154-3aee637d3b38-\/executors\/word-count-1-1452129714\/runs\/4443d5ac-d034-49b3-bf12-08fb9b0d92d0","user":"root"}
> I0107 01:22:01.262171 17678 fetcher.cpp:377] Fetching URI 
> 'maprfs:///mesos/storm-mesos-0.9.3.tgz'
> I0107 01:22:01.262212 17678 fetcher.cpp:248] Fetching directly into the 
> sandbox directory
> I0107 01:22:01.262243 17678 fetcher.cpp:185] Fetching URI 
> 'maprfs:///mesos/storm-mesos-0.9.3.tgz'
> I0107 01:22:01.671777 17678 fetcher.cpp:110] Downloading resource with Hadoop 
> client from 'maprfs:///mesos/storm-mesos-0.9.3.tgz' to 
> '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz'
> copyToLocal: java.net.URISyntaxException: Expected scheme-specific part at 
> index 7: maprfs:
> Usage: java FsShell [-copyToLocal [-ignoreCrc] [-crc]  ]
> E0107 01:22:02.435556 17678 shell.hpp:90] Command 'hadoop fs -copyToLocal 
> '/maprfs:///mesos/storm-mesos-0.9.3.tgz' 
> '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz''
>  failed; this is the output:
> Failed to fetch 'maprfs:///mesos/storm-mesos-0.9.3.tgz': HDFS copyToLocal 
> failed: Failed to execute 'hadoop fs -copyToLocal 
> '/maprfs:///mesos/storm-mesos-0.9.3.tgz' 
> '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz'';
>  the command was either not found or exited with a non-zero exit status: 255
> Failed to synchronize with slave (it's probably exited)
> {code}
> After a brief chat with [~jieyu], it was recommended to fix the current hdfs 
> client code because the new hadoop fetcher plugin is slated to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4075) Continue test suite execution across crashing tests.

2016-01-12 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4075:
--
Target Version/s:   (was: 0.27.0)

> Continue test suite execution across crashing tests.
> 
>
> Key: MESOS-4075
> URL: https://issues.apache.org/jira/browse/MESOS-4075
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Bernd Mathiske
>  Labels: mesosphere
>
> Currently, mesos-tests.sh exits when a test crashes. This is inconvenient 
> when trying to find out all tests that fail. 
> mesos-tests.sh should rate a test that crashes as failed and continue the 
> same way as if the test merely returned with a failure result and exited 
> properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4075) Continue test suite execution across crashing tests.

2016-01-12 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095784#comment-15095784
 ] 

Bernd Mathiske commented on MESOS-4075:
---

I thought solving this would make your life as release master easier, but so 
would fixing any remaining crashes, which need to be worked on anyhow if they 
occur. See above. I just changed the target version for this ticket to 
indeterminate. 

Long-term, I suggest we should solve it, even if tests run more slowly then and 
you can only use this feature optionally. Getting a full assessment of what 
does and does not work in one swoop expedites testing. Repeatedly rerunning the 
tests with an incrementally updated list of test exclusions starts getting 
inefficient once there is more than one crash involved. (This cost us a lot of 
time in 0.26.0.) On second thought, maybe the latter procedure could be 
automated as a band-aid? 

> Continue test suite execution across crashing tests.
> 
>
> Key: MESOS-4075
> URL: https://issues.apache.org/jira/browse/MESOS-4075
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Bernd Mathiske
>  Labels: mesosphere
>
> Currently, mesos-tests.sh exits when a test crashes. This is inconvenient 
> when trying to find out all tests that fail. 
> mesos-tests.sh should rate a test that crashes as failed and continue the 
> same way as if the test merely returned with a failure result and exited 
> properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4336) Document supported file types for archive extraction by fetcher

2016-01-12 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4336:
--
Story Points: 1
  Labels: documentation mesosphere newbie  (was: documentation 
mesosphere)
Priority: Trivial  (was: Minor)
 Summary: Document supported file types for archive extraction by 
fetcher  (was: Document supported file types for fetcher)

> Document supported file types for archive extraction by fetcher
> ---
>
> Key: MESOS-4336
> URL: https://issues.apache.org/jira/browse/MESOS-4336
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, fetcher
>Reporter: Sunil Shah
>Priority: Trivial
>  Labels: documentation, mesosphere, newbie
>
> The Mesos fetcher extracts specified URIs if requested to do so by the 
> scheduler. However, the documentation at 
> http://mesos.apache.org/documentation/latest/fetcher/ doesn't list the file 
> types /extensions that will be extracted by the fetcher.
> [The relevant 
> code|https://github.com/apache/mesos/blob/master/src/launcher/fetcher.cpp#L63]
>  specifies an exhaustive list of extensions that will be extracted, the 
> documentation should be updated to match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3208) Fetch checksum files to inform fetcher cache use

2016-01-11 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091688#comment-15091688
 ] 

Bernd Mathiske commented on MESOS-3208:
---

Discarded for now, until this project becomes a priority again.

> Fetch checksum files to inform fetcher cache use
> 
>
> Key: MESOS-3208
> URL: https://issues.apache.org/jira/browse/MESOS-3208
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Reporter: Bernd Mathiske
>Priority: Minor
>
> This is the first part of phase 1 as described in the comments for 
> MESOS-2073. We add a field to CommandInfo::URI that contains the URI of a 
> checksum file. When this file has new content, then the contents of the 
> associated value URI needs to be refreshed in the fetcher cache. 
> In this implementation step, we just add the above basic functionality 
> (download, checksum comparison). In later steps, we will add more control 
> flow to cover corner cases and thus make this feature more useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky

2016-01-11 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3235:
--
 Sprint: Mesosphere Sprint 20, Mesosphere Sprint 26  (was: Mesosphere 
Sprint 20)
Component/s: fetcher
 tests

> FetcherCacheHttpTest.HttpCachedSerialized and 
> FetcherCacheHttpTest.HttpCachedConcurrent are flaky
> -
>
> Key: MESOS-3235
> URL: https://issues.apache.org/jira/browse/MESOS-3235
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, tests
>Affects Versions: 0.23.0
>Reporter: Joseph Wu
>Assignee: Bernd Mathiske
>  Labels: mesosphere
>
> On OSX, {{make clean && make -j8 V=0 check}}:
> {code}
> [--] 3 tests from FetcherCacheHttpTest
> [ RUN  ] FetcherCacheHttpTest.HttpCachedSerialized
> HTTP/1.1 200 OK
> Date: Fri, 07 Aug 2015 17:23:05 GMT
> Content-Length: 30
> I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 0
> Forked command at 54363
> sh -c './mesos-fetcher-test-cmd 0'
> E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54363)
> E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 1
> Forked command at 54411
> sh -c './mesos-fetcher-test-cmd 1'
> E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54411)
> E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> ../../src/tests/fetcher_cache_tests.cpp:860: Failure
> Failed to wait 15secs for awaitFinished(task.get())
> *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are 
> using GNU date ***
> [  FAILED  ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms)
> [ RUN  ] FetcherCacheHttpTest.HttpCachedConcurrent
> PC: @0x113723618 process::Owned<>::get()
> *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: ***
> @ 0x7fff8fcacf1a _sigtramp
> @ 0x7f9bc3109710 (unknown)
> @0x1136f07e2 mesos::internal::slave::Fetcher::fetch()
> @0x113862f9d 
> mesos::internal::slave::MesosContainerizerProcess::fetch()
> @0x1138f1b5d 
> _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_
> @0x1138f18cf 
> _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_
> @0x1143768cf std::__1::function<>::operator()()
> @0x11435ca7f process::ProcessBase::visit()
> @0x1143ed6fe process::DispatchEvent::visit()
> @0x11271 process::ProcessBase::serve()
> @0x114343b4e process::ProcessManager::resume()
> @0x1143431ca process::internal::schedule()
> @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_
> @ 0x7fff95090268 _pthread_body
> @ 0x7fff950901e5 _pthread_start
> @ 0x7fff9508e41d thread_start
> Failed to synchronize with slave (it's probably exited)
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {code}
> This was encountered just once out of 3+ {{make check}}s.



--
This message 

[jira] [Commented] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky

2016-01-11 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092126#comment-15092126
 ] 

Bernd Mathiske commented on MESOS-3235:
---

Thanks! This confirms nicely what Alexander found before: task 3 never starts, 
then waiting for all tasks fails. This should not crash anything, though. 
That's new.

> FetcherCacheHttpTest.HttpCachedSerialized and 
> FetcherCacheHttpTest.HttpCachedConcurrent are flaky
> -
>
> Key: MESOS-3235
> URL: https://issues.apache.org/jira/browse/MESOS-3235
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Joseph Wu
>Assignee: Bernd Mathiske
>  Labels: mesosphere
>
> On OSX, {{make clean && make -j8 V=0 check}}:
> {code}
> [--] 3 tests from FetcherCacheHttpTest
> [ RUN  ] FetcherCacheHttpTest.HttpCachedSerialized
> HTTP/1.1 200 OK
> Date: Fri, 07 Aug 2015 17:23:05 GMT
> Content-Length: 30
> I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 0
> Forked command at 54363
> sh -c './mesos-fetcher-test-cmd 0'
> E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54363)
> E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 1
> Forked command at 54411
> sh -c './mesos-fetcher-test-cmd 1'
> E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54411)
> E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> ../../src/tests/fetcher_cache_tests.cpp:860: Failure
> Failed to wait 15secs for awaitFinished(task.get())
> *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are 
> using GNU date ***
> [  FAILED  ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms)
> [ RUN  ] FetcherCacheHttpTest.HttpCachedConcurrent
> PC: @0x113723618 process::Owned<>::get()
> *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: ***
> @ 0x7fff8fcacf1a _sigtramp
> @ 0x7f9bc3109710 (unknown)
> @0x1136f07e2 mesos::internal::slave::Fetcher::fetch()
> @0x113862f9d 
> mesos::internal::slave::MesosContainerizerProcess::fetch()
> @0x1138f1b5d 
> _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_
> @0x1138f18cf 
> _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_
> @0x1143768cf std::__1::function<>::operator()()
> @0x11435ca7f process::ProcessBase::visit()
> @0x1143ed6fe process::DispatchEvent::visit()
> @0x11271 process::ProcessBase::serve()
> @0x114343b4e process::ProcessManager::resume()
> @0x1143431ca process::internal::schedule()
> @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_
> @ 0x7fff95090268 _pthread_body
> @ 0x7fff950901e5 _pthread_start
> @ 0x7fff9508e41d thread_start
> Failed to synchronize with slave (it's probably exited)
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {code}
> This was encountered just once out of 3+ 

[jira] [Commented] (MESOS-4181) Change port range logging to different logging level.

2016-01-05 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082684#comment-15082684
 ] 

Bernd Mathiske commented on MESOS-4181:
---

Good ideas. This short-term fix is part of an epic that takes the broader view 
you are alluding to: https://issues.apache.org/jira/browse/MESOS-4233...

> Change port range logging to different logging level.
> -
>
> Key: MESOS-4181
> URL: https://issues.apache.org/jira/browse/MESOS-4181
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.25.0
>Reporter: Cody Maloney
>Assignee: Joerg Schad
>  Labels: mesosphere, newbie
>
> Transforming from mesos' internal port range representation -> text is 
> non-linear in the number of bytest output. We end up with a massive amount of 
> log data like the following:
> {noformat}
> Dec 15 23:54:08 ip-10-0-7-60.us-west-2.compute.internal mesos-master[15919]: 
> I1215 23:51:58.891165 15925 hierarchical.hpp:1103] Recovered cpus(*):1e-05; 
> mem(*):10; ports(*):[5565-5565] (total: ports(*):[1025-2180, 2182-3887, 
> 3889-5049, 5052-8079, 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; 
> disk(*):32541, allocated: cpus(*):0.01815; ports(*):[1050-1050, 1092-1092, 
> 1094-1094, 1129-1129, 1132-1132, 1140-1140, 1177-1178, 1180-1180, 1192-1192, 
> 1205-1205, 1221-1221, 1308-1308, 1311-1311, 1323-1323, 1326-1326, 1335-1335, 
> 1365-1365, 1404-1404, 1412-1412, 1436-1436, 1455-1455, 1459-1459, 1472-1472, 
> 1477-1477, 1482-1482, 1491-1491, 1510-1510, 1551-1551, 1553-1553, 1559-1559, 
> 1573-1573, 1590-1590, 1592-1592, 1619-1619, 1635-1636, 1678-1678, 1738-1738, 
> 1742-1742, 1752-1752, 1770-1770, 1780-1782, 1790-1790, 1792-1792, 1799-1799, 
> 1804-1804, 1844-1844, 1852-1852, 1867-1867, 1899-1899, 1936-1936, 1945-1945, 
> 1954-1954, 2046-2046, 2055-2055, 2063-2063, 2070-2070, 2089-2089, 2104-2104, 
> 2117-2117, 2132-2132, 2173-2173, 2178-2178, 2188-2188, 2200-2200, 2218-2218, 
> 2223-2223, 2244-2244, 2248-2248, 2250-2250, 2270-2270, 2286-2286, 2302-2302, 
> 2332-2332, 2377-2377, 2397-2397, 2423-2423, 2435-2435, 2442-2442, 2448-2448, 
> 2477-2477, 2482-2482, 2522-2522, 2586-2586, 2594-2594, 2600-2600, 2602-2602, 
> 2643-2643, 2648-2648, 2659-2659, 2691-2691, 2716-2716, 2739-2739, 2794-2794, 
> 2802-2802, 2823-2823, 2831-2831, 2840-2840, 2848-2848, 2876-2876, 2894-2895, 
> 2900-2900, 2904-2904, 2912-2912, 2983-2983, 2991-2991, 2999-2999, 3011-3011, 
> 3025-3025, 3036-3036, 3041-3041, 3051-3051, 3074-3074, 3097-3097, 3107-3107, 
> 3121-3121, 3171-3171, 3176-3176, 3195-3195, 3197-3197, 3210-3210, 3221-3221, 
> 3234-3234, 3245-3245, 3250-3251, 3255-3255, 3270-3270, 3293-3293, 3298-3298, 
> 3312-3312, 3318-3318, 3325-3325, 3368-3368, 3379-3379, 3391-3391, 3412-3412, 
> 3414-3414, 3420-3420, 3492-3492, 3501-3501, 3538-3538, 3579-3579, 3631-3631, 
> 3680-3680, 3684-3684, 3695-3695, 3699-3699, 3738-3738, 3758-3758, 3793-3793, 
> 3808-3808, 3817-3817, 3854-3854, 3856-3856, 3900-3900, 3906-3906, 3909-3909, 
> 3912-3912, 3946-3946, 3956-3956, 3959-3959, 3963-3963, 3974-
> Dec 15 23:54:09 ip-10-0-7-60.us-west-2.compute.internal mesos-master[15919]: 
> 3974, 3981-3981, 3985-3985, 4134-4134, 4178-4178, 4206-4206, 4223-4223, 
> 4239-4239, 4245-4245, 4251-4251, 4262-4263, 4271-4271, 4308-4308, 4323-4323, 
> 4329-4329, 4368-4368, 4385-4385, 4404-4404, 4419-4419, 4430-4430, 4448-4448, 
> 4464-4464, 4481-4481, 4494-4494, 4499-4499, 4510-4510, 4534-4534, 4543-4543, 
> 4555-4555, 4561-4562, 4577-4577, 4601-4601, 4675-4675, 4722-4722, 4739-4739, 
> 4748-4748, 4752-4752, 4764-4764, 4771-4771, 4787-4787, 4827-4827, 4830-4830, 
> 4837-4837, 4848-4848, 4853-4853, 4879-4879, 4883-4883, 4897-4897, 4902-4902, 
> 4911-4911, 4940-4940, 4946-4946, 4957-4957, 4994-4994, 4996-4996, 5008-5008, 
> 5019-5019, 5043-5043, 5059-5059, 5109-5109, 5134-5135, 5157-5157, 5172-5172, 
> 5192-5192, 5211-5211, 5215-5215, 5234-5234, 5237-5237, 5246-5246, 5255-5255, 
> 5268-5268, 5311-5311, 5314-5314, 5316-5316, 5348-5348, 5391-5391, 5407-5407, 
> 5433-5433, 5446-5447, 5454-5454, 5456-5456, 5482-5482, 5514-5515, 5517-5517, 
> 5525-5525, 5542-5542, 5554-5554, 5581-5581, 5624-5624, 5647-5647, 5695-5695, 
> 5700-5700, 5703-5703, 5743-5743, 5747-5747, 5793-5793, 5850-5850, 5856-5856, 
> 5858-5858, 5899-5899, 5901-5901, 5940-5940, 5958-5958, 5962-5962, 5974-5974, 
> 5995-5995, 6000-6001, 6037-6037, 6053-6053, 6066-6066, 6078-6078, 6129-6129, 
> 6139-6139, 6160-6160, 6174-6174, 6193-6193, 6234-6234, 6263-6263, 6276-6276, 
> 6287-6287, 6292-6292, 6294-6294, 6296-6296, 6306-6307, 6333-6333, 6343-6343, 
> 6349-6349, 6377-6377, 6418-6418, 6454-6454, 6484-6484, 6496-6496, 6504-6504, 
> 6518-6518, 6589-6589, 6592-6592, 6606-6606, 6640-6640, 6713-6713, 6717-6717, 
> 6738-6738, 6757-6757, 6765-6765, 6778-6778, 6792-6792, 6798-6798, 

[jira] [Commented] (MESOS-4075) Continue test suite execution across crashing tests.

2016-01-05 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083302#comment-15083302
 ] 

Bernd Mathiske commented on MESOS-4075:
---

Indeed, we are focussing on fixing crashes first and foremost. Yet it would be 
nice if any new crashes would not hinder us when running test suites (on CI).

> Continue test suite execution across crashing tests.
> 
>
> Key: MESOS-4075
> URL: https://issues.apache.org/jira/browse/MESOS-4075
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Bernd Mathiske
>  Labels: mesosphere
>
> Currently, mesos-tests.sh exits when a test crashes. This is inconvenient 
> when trying to find out all tests that fail. 
> mesos-tests.sh should rate a test that crashes as failed and continue the 
> same way as if the test merely returned with a failure result and exited 
> properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1763) Add support for multiple roles to be specified in FrameworkInfo

2016-01-04 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-1763:
--
Issue Type: Epic  (was: Task)

> Add support for multiple roles to be specified in FrameworkInfo
> ---
>
> Key: MESOS-1763
> URL: https://issues.apache.org/jira/browse/MESOS-1763
> Project: Mesos
>  Issue Type: Epic
>  Components: master
>Reporter: Vinod Kone
>Assignee: Timothy Chen
>  Labels: mesosphere, roles
>
> Currently frameworks have the ability to set only one (resource) role in 
> FrameworkInfo. It would be nice to let frameworks specify multiple roles so 
> that they can do more fine grained resource accounting per role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4284) Draft design doc for multi-role frameworks

2016-01-04 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-4284:
-

 Summary: Draft design doc for multi-role frameworks
 Key: MESOS-4284
 URL: https://issues.apache.org/jira/browse/MESOS-4284
 Project: Mesos
  Issue Type: Story
  Components: master
Reporter: Bernd Mathiske
Assignee: Benjamin Bannier


Create a document that describes the problems with having only single-role 
frameworks and proposes an MVP solution and implementation approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1763) Add support for multiple roles to be specified in FrameworkInfo

2016-01-04 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-1763:
--
 Assignee: (was: Timothy Chen)
Epic Name: multi-role frameworks

> Add support for multiple roles to be specified in FrameworkInfo
> ---
>
> Key: MESOS-1763
> URL: https://issues.apache.org/jira/browse/MESOS-1763
> Project: Mesos
>  Issue Type: Epic
>  Components: master
>Reporter: Vinod Kone
>  Labels: mesosphere, roles
>
> Currently frameworks have the ability to set only one (resource) role in 
> FrameworkInfo. It would be nice to let frameworks specify multiple roles so 
> that they can do more fine grained resource accounting per role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3809) Expose advertise_ip and advertise_port as command line options in mesos slave

2016-01-04 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15080861#comment-15080861
 ] 

Bernd Mathiske commented on MESOS-3809:
---

Unfortunately, this commit was indeed not cherry-picked into 0.26.0, but should 
have, and the ticket shows up in the CHANGELOG. I'll update the CHANGELOG for 
0.26.0, removing MESOS-3809 from it, and set the target version for this ticket 
to 0.27.0.

> Expose advertise_ip and advertise_port as command line options in mesos slave
> -
>
> Key: MESOS-3809
> URL: https://issues.apache.org/jira/browse/MESOS-3809
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.25.0
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> advertise_ip and advertise_port are exposed as mesos master command line args 
> (MESOS-809). But the following use case makes it a candidate for adding as 
> command line args in mesos slave as well.
> On Tue, Oct 27, 2015 at 7:43 PM, Xiaodong Zhang  wrote:
> It works! Thanks a lot.
> 发件人: haosdent 
> 答复: "u...@mesos.apache.org" 
> 日期: 2015年10月28日 星期三 上午10:23
> 至: user 
> 主题: Re: How to tell master which ip to connect.
> Do you try `export LIBPROCESS_ADVERTISE_IP=xxx` and 
> `LIBPROCESS_ADVERTISE_PORT` when start slave?
> On Wed, Oct 28, 2015 at 10:16 AM, Xiaodong Zhang  wrote:
> Hi teams:
> My scenarios is like this:
> My master nodes were deployed in AWS. My slaves were in AZURE.So they 
> communicate via public ip.
> I got trouble when slaves try to register to master. 
> Now slaves can get master’s public ip address,and can send register 
> request.But they can only send there private ip to master.(Because they don’t 
> know there public ip,thus they can’t not bind a public ip via —ip  flag), 
> thus  masters can’t connect slaves.How can the slave to tell master which ip 
> master should connect(I can’t find any flags like —advertise_ip in master).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3809) Expose advertise_ip and advertise_port as command line options in mesos slave

2016-01-04 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3809:
--
Target Version/s: 0.27.0
   Fix Version/s: (was: 0.26.0)

> Expose advertise_ip and advertise_port as command line options in mesos slave
> -
>
> Key: MESOS-3809
> URL: https://issues.apache.org/jira/browse/MESOS-3809
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.25.0
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>  Labels: mesosphere
>
> advertise_ip and advertise_port are exposed as mesos master command line args 
> (MESOS-809). But the following use case makes it a candidate for adding as 
> command line args in mesos slave as well.
> On Tue, Oct 27, 2015 at 7:43 PM, Xiaodong Zhang  wrote:
> It works! Thanks a lot.
> 发件人: haosdent 
> 答复: "u...@mesos.apache.org" 
> 日期: 2015年10月28日 星期三 上午10:23
> 至: user 
> 主题: Re: How to tell master which ip to connect.
> Do you try `export LIBPROCESS_ADVERTISE_IP=xxx` and 
> `LIBPROCESS_ADVERTISE_PORT` when start slave?
> On Wed, Oct 28, 2015 at 10:16 AM, Xiaodong Zhang  wrote:
> Hi teams:
> My scenarios is like this:
> My master nodes were deployed in AWS. My slaves were in AZURE.So they 
> communicate via public ip.
> I got trouble when slaves try to register to master. 
> Now slaves can get master’s public ip address,and can send register 
> request.But they can only send there private ip to master.(Because they don’t 
> know there public ip,thus they can’t not bind a public ip via —ip  flag), 
> thus  masters can’t connect slaves.How can the slave to tell master which ip 
> master should connect(I can’t find any flags like —advertise_ip in master).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4075) Continue test suite execution across crashing tests.

2016-01-04 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4075:
--
Assignee: (was: Bernd Mathiske)

> Continue test suite execution across crashing tests.
> 
>
> Key: MESOS-4075
> URL: https://issues.apache.org/jira/browse/MESOS-4075
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Bernd Mathiske
>  Labels: mesosphere
>
> Currently, mesos-tests.sh exits when a test crashes. This is inconvenient 
> when trying to find out all tests that fail. 
> mesos-tests.sh should rate a test that crashes as failed and continue the 
> same way as if the test merely returned with a failure result and exited 
> properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3370) Deprecate the external containerizer

2015-12-23 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069437#comment-15069437
 ] 

Bernd Mathiske commented on MESOS-3370:
---

commit 43420dd0a27cd4adf1b2c929262f96e86d647acf
Author: Joerg Schad 
Date:   Wed Dec 23 10:41:38 2015 +0100

Added links to individual containerizers in containerizer-internal.md.

Review: https://reviews.apache.org/r/41683/

> Deprecate the external containerizer
> 
>
> Key: MESOS-3370
> URL: https://issues.apache.org/jira/browse/MESOS-3370
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>
> To our knowledge, no one is using the external containerizer and we could 
> clean up code paths in the slave and containerizer interface (the dual 
> launch() signatures)
> In a deprecation cycle, we can move this code into a module (dependent on 
> containerizer modules landing) and from there, move it into it's own repo



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3370) Deprecate the external containerizer

2015-12-23 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069440#comment-15069440
 ] 

Bernd Mathiske commented on MESOS-3370:
---

commit 3c40d2d27d792c4baa927271414c4541f59069bd
Author: Joerg Schad 
Date:   Wed Dec 23 10:43:33 2015 +0100

Reflected deprecation of external containerizer in documentation.

Review: https://reviews.apache.org/r/41682/

> Deprecate the external containerizer
> 
>
> Key: MESOS-3370
> URL: https://issues.apache.org/jira/browse/MESOS-3370
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>
> To our knowledge, no one is using the external containerizer and we could 
> clean up code paths in the slave and containerizer interface (the dual 
> launch() signatures)
> In a deprecation cycle, we can move this code into a module (dependent on 
> containerizer modules landing) and from there, move it into it's own repo



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4113) Docker Executor should not set container IP during bridged mode

2015-12-23 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069446#comment-15069446
 ] 

Bernd Mathiske commented on MESOS-4113:
---

@scalp42, thanks for being persistent! I don't see either how MESOS-4064 can be 
viewed as a duplicate of this issue here. I suspect it was closed assuming the 
"duplicate" link is correct. Further indication for this is that AFAICT none of 
the code in the reviews posted for MESOS-4064 addresses MESOS-4113. @hartem, 
can you confirm this view?

Reopening this ticket. 

@scalp42, It would be great if you check the output from current master is the 
same as from 0.26.0. I suspect it is.


> Docker Executor should not set container IP during bridged mode
> ---
>
> Key: MESOS-4113
> URL: https://issues.apache.org/jira/browse/MESOS-4113
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.25.0, 0.26.0
>Reporter: Sargun Dhillon
>Assignee: Artem Harutyunyan
>  Labels: mesosphere
>
> The docker executor currently sets the IP address of the container into 
> ContainerStatus.NetworkInfo.IPAddresses. This isn't a good thing, because 
> during bridged mode execution, it makes it so that that IP address is 
> useless, since it's behind the Docker NAT. I would like a flag that disables 
> filling the IP address in, and allows it to fall back to the agent IP. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.

2015-12-21 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-2857:
--
Sprint: Mesosphere Sprint 23  (was: Mesosphere Sprint 23, Mesosphere Sprint 
24)

> FetcherCacheTest.LocalCachedExtract is flaky.
> -
>
> Key: MESOS-2857
> URL: https://issues.apache.org/jira/browse/MESOS-2857
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheTest.LocalCachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj'
> I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms
> I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns
> I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns
> I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 
> 8967ns
> I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 7762ns
> I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery
> I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status
> I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to 
> STARTING
> I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 717888ns
> I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to 
> STARTING
> I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status
> I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING
> I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 432335ns
> I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to 
> VOTING
> I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos 
> group
> I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated
> I0610 20:04:48.602905 24594 master.cpp:363] Master 
> 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 
> 172.17.0.231:32907
> I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" 
> --zk_session_timeout="10secs"
> I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials'
> I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled
> I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given
> I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is 
> master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561
> I0610 20:04:48.607466 24594 master.cpp:1489] Elected as the leading master!
> I0610 20:04:48.607481 24594 master.cpp:1259] Recovering from registrar
> I0610 20:04:48.607712 24594 registrar.cpp:313] Recovering registrar
> I0610 20:04:48.608543 24588 log.cpp:661] Attempting to start the writer
> I0610 20:04:48.610231 24588 

[jira] [Commented] (MESOS-3552) CHECK failure due to floating point precision on reservation request

2015-12-18 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063799#comment-15063799
 ] 

Bernd Mathiske commented on MESOS-3552:
---

commit 7a57b0c6c403d3c5dd6b67087f8727d1b348b625
Author: Bernd Mathiske 
Date:   Fri Dec 18 10:59:52 2015 +0100

Ported approx. Option CPU resource number comparison to v1.

Review: https://reviews.apache.org/r/40903/

> CHECK failure due to floating point precision on reservation request
> 
>
> Key: MESOS-3552
> URL: https://issues.apache.org/jira/browse/MESOS-3552
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Mandeep Chadha
>Assignee: Mandeep Chadha
>  Labels: mesosphere, tech-debt
> Fix For: 0.26.0
>
>
> result.cpus() == cpus() check is failing due to ( double == double ) 
> comparison problem. 
> Root Cause : 
> Framework requested 0.1 cpu reservation for the first task. So far so good. 
> Next Reserve operation — lead to double operations resulting in following 
> double values :
>  results.cpus() : 23.9964472863211995 cpus() : 24
> And the check ( result.cpus() == cpus() ) failed. 
>  The double arithmetic operations caused results.cpus() value to be :  
> 23.9964472863211995 and hence ( 23.9964472863211995 
> == 24 ) failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4130) Document how the fetcher can reach across a proxy connection.

2015-12-11 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-4130:
-

 Summary: Document how the fetcher can reach across a proxy 
connection.
 Key: MESOS-4130
 URL: https://issues.apache.org/jira/browse/MESOS-4130
 Project: Mesos
  Issue Type: Documentation
  Components: fetcher
Reporter: Bernd Mathiske


The fetcher uses libcurl for downloading content from HTTP, HTTPS, etc. There 
is no source code in the pertinent parts of "net.hpp" that deals with proxy 
settings. However, libcurl automatically picks up certain environment variables 
and adjusts its settings accordingly. See "man libcurl-tutorial" for details. 
See section "Proxies", subsection "Environment Variables". If you follow this 
recipe in your Mesos agent startup script, you can use a proxy. 

We should document this in the fetcher (cache) doc 
(http://mesos.apache.org/documentation/latest/fetcher/).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4120) Make DiscoveryInfo dynamically updatable

2015-12-11 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4120:
--
Affects Version/s: 0.26.0
   0.25.0
 Target Version/s: 0.27.0

> Make DiscoveryInfo dynamically updatable
> 
>
> Key: MESOS-4120
> URL: https://issues.apache.org/jira/browse/MESOS-4120
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.25.0, 0.26.0
>Reporter: Sargun Dhillon
>Priority: Critical
>  Labels: mesosphere
>
> K8s tasks can dynamically update what they expose to make discoverable by the 
> cluster. Unfortunately, all DiscoveryInfo the cluster is immutable, at the 
> time of task start. 
> We would like to enable DiscoveryInfo to be dynamically updatable, so that 
> executors can change what they're advertising based on their internal state, 
> versus requiring DiscoveryInfo to be known prior to starting the tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4113) Docker Executor should not set container IP during bridged mode

2015-12-11 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-4113:
--
Affects Version/s: 0.26.0

> Docker Executor should not set container IP during bridged mode
> ---
>
> Key: MESOS-4113
> URL: https://issues.apache.org/jira/browse/MESOS-4113
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.25.0, 0.26.0
>Reporter: Sargun Dhillon
>  Labels: mesosphere
>
> The docker executor currently sets the IP address of the container into 
> ContainerStatus.NetworkInfo.IPAddresses. This isn't a good thing, because 
> during bridged mode execution, it makes it so that that IP address is 
> useless, since it's behind the Docker NAT. I would like a flag that disables 
> filling the IP address in, and allows it to fall back to the agent IP. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4119) Add support for enabling --3way to apply-reviews.py.

2015-12-11 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052943#comment-15052943
 ] 

Bernd Mathiske commented on MESOS-4119:
---

Since you marked this "newbie", please explain to newbies what you mean by 
--3way and what apply-reviews is in general.

> Add support for enabling --3way to apply-reviews.py.
> 
>
> Key: MESOS-4119
> URL: https://issues.apache.org/jira/browse/MESOS-4119
> Project: Mesos
>  Issue Type: Task
>Reporter: Artem Harutyunyan
>  Labels: beginner, mesosphere, newbie
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4080) Clean up HTTP authentication in quota endpoints

2015-12-11 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052930#comment-15052930
 ] 

Bernd Mathiske commented on MESOS-4080:
---

Can you please be more specific about the tech debt mentioned?

> Clean up HTTP authentication in quota endpoints
> ---
>
> Key: MESOS-4080
> URL: https://issues.apache.org/jira/browse/MESOS-4080
> Project: Mesos
>  Issue Type: Task
>  Components: HTTP API, master
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>Priority: Critical
>  Labels: mesosphere, quota, tech-debt
>
> The authentification of quota requests introduces some technical dept that 
> will be resolved by the refactored HTTP based authentification. This ticket 
> tracks the work related to cleaning up the quota handling to use the new HTTP 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3086) Create cgroups TasksKiller for non freeze subsystems.

2015-12-07 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3086:
--
Sprint: Mesosphere Sprint 15, Mesosphere Sprint 16, Mesosphere Sprint 17, 
Mesosphere Sprint 18, Mesosphere Sprint 19, Mesosphere Sprint 20, Mesosphere 
Sprint 21, Mesosphere Sprint 22  (was: Mesosphere Sprint 15, Mesosphere Sprint 
16, Mesosphere Sprint 17, Mesosphere Sprint 18, Mesosphere Sprint 19, 
Mesosphere Sprint 20, Mesosphere Sprint 21, Mesosphere Sprint 22, Mesosphere 
Sprint 23)

> Create cgroups TasksKiller for non freeze subsystems.
> -
>
> Key: MESOS-3086
> URL: https://issues.apache.org/jira/browse/MESOS-3086
> Project: Mesos
>  Issue Type: Bug
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> We have a number of test issues when we cannot remove cgroups (in case there 
> are still related tasks running) in cases where the freezer subsystem is not 
> available. 
> In the current code 
> (https://github.com/apache/mesos/blob/0.22.1/src/linux/cgroups.cpp#L1728)  we 
> will fallback to a very simple mechnism of recursivly trying to remove the 
> cgroups which fails if there are still tasks running. 
> Therefore we need an additional  (NonFreeze)TasksKiller which doesn't  rely 
> on the freezer subsystem.
> This problem caused issues when running 'sudo make check' during 0.23 release 
> testing, where BenH provided already a better error message with 
> b1a23d6a52c31b8c5c840ab01902dbe00cb1feef / https://reviews.apache.org/r/36604.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4075) Continue test suite execution across crashing tests.

2015-12-07 Thread Bernd Mathiske (JIRA)
Bernd Mathiske created MESOS-4075:
-

 Summary: Continue test suite execution across crashing tests.
 Key: MESOS-4075
 URL: https://issues.apache.org/jira/browse/MESOS-4075
 Project: Mesos
  Issue Type: Improvement
  Components: test
Affects Versions: 0.26.0
Reporter: Bernd Mathiske


Currently, mesos-tests.sh exits when a test crashes. This is inconvenient when 
trying to find out all tests that fail. 

mesos-tests.sh should rate a test that crashes as failed and continue the same 
way as if the test merely returned with a failure result and exited properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3208) Fetch checksum files to inform fetcher cache use

2015-12-07 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3208:
--
Assignee: (was: Bernd Mathiske)

> Fetch checksum files to inform fetcher cache use
> 
>
> Key: MESOS-3208
> URL: https://issues.apache.org/jira/browse/MESOS-3208
> Project: Mesos
>  Issue Type: Improvement
>  Components: fetcher
>Reporter: Bernd Mathiske
>Priority: Minor
>
> This is the first part of phase 1 as described in the comments for 
> MESOS-2073. We add a field to CommandInfo::URI that contains the URI of a 
> checksum file. When this file has new content, then the contents of the 
> associated value URI needs to be refreshed in the fetcher cache. 
> In this implementation step, we just add the above basic functionality 
> (download, checksum comparison). In later steps, we will add more control 
> flow to cover corner cases and thus make this feature more useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   >