[jira] [Commented] (MESOS-5212) Allow any principal in ReservationInfo when HTTP authentication is off
[ https://issues.apache.org/jira/browse/MESOS-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281385#comment-15281385 ] Bernd Mathiske commented on MESOS-5212: --- This patch is implementation-only (with tests), which is proper. I am assuming the documentation changes that go along with the new behavior will then be posted against MESOS-5215? IMHO it would also be OK to dedicate limited doc updates to the ticket here. > Allow any principal in ReservationInfo when HTTP authentication is off > -- > > Key: MESOS-5212 > URL: https://issues.apache.org/jira/browse/MESOS-5212 > Project: Mesos > Issue Type: Improvement >Affects Versions: 0.28.1 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > Fix For: 0.29.0 > > > Mesos currently provides no way for operators to pass their principal to HTTP > endpoints when HTTP authentication is off. Since we enforce that > {{ReservationInfo.principal}} be equal to the operator principal in requests > to {{/reserve}}, this means that when HTTP authentication is disabled, the > {{ReservationInfo.principal}} field cannot be set. > To address this in the short-term, we should allow > {{ReservationInfo.principal}} to hold any value when HTTP authentication is > disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5212) Allow any principal in ReservationInfo when HTTP authentication is off
[ https://issues.apache.org/jira/browse/MESOS-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-5212: -- Shepherd: Bernd Mathiske > Allow any principal in ReservationInfo when HTTP authentication is off > -- > > Key: MESOS-5212 > URL: https://issues.apache.org/jira/browse/MESOS-5212 > Project: Mesos > Issue Type: Improvement >Affects Versions: 0.28.1 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > Fix For: 0.29.0 > > > Mesos currently provides no way for operators to pass their principal to HTTP > endpoints when HTTP authentication is off. Since we enforce that > {{ReservationInfo.principal}} be equal to the operator principal in requests > to {{/reserve}}, this means that when HTTP authentication is disabled, the > {{ReservationInfo.principal}} field cannot be set. > To address this in the short-term, we should allow > {{ReservationInfo.principal}} to hold any value when HTTP authentication is > disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky
[ https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276116#comment-15276116 ] Bernd Mathiske commented on MESOS-3235: --- It seems doubtful that lengthening the wait time for task completion solves much, since successful runs are way shorter than the default 15 seconds, typically in the low single digit second range. Machines aren't that slow, are they? And we get these failures on machines that are known to be fast occasionally as well. I suspect something else is wrong here. What I have seen in failure logs is that one task somehow has not produced status updates all the way up to the AWAIT statement in question - although it must have reached the contention barrier which asserts that all tasks have been launched as the fetcher has been observed downloading every script. So one guess is that something is blocking/eating/delaying status updates at some stage - occasionally. In all the cases I have seen the tasks are not launched in serial order. And that's exactly why I wrote this test! So we can see if we are dealing with concurrency correctly. Too bad we don't know what's failing yet. If we had a way to reproduce this behavior more often, we could switch on more logging and just repeat the test often enough to find something. But repeating the test tends to make the problem go away. Ideas? > FetcherCacheHttpTest.HttpCachedSerialized and > FetcherCacheHttpTest.HttpCachedConcurrent are flaky > - > > Key: MESOS-3235 > URL: https://issues.apache.org/jira/browse/MESOS-3235 > Project: Mesos > Issue Type: Bug > Components: fetcher, tests >Affects Versions: 0.23.0 >Reporter: Joseph Wu >Assignee: haosdent > Labels: mesosphere > Fix For: 0.27.0 > > Attachments: fetchercache_log_centos_6.txt > > > On OSX, {{make clean && make -j8 V=0 check}}: > {code} > [--] 3 tests from FetcherCacheHttpTest > [ RUN ] FetcherCacheHttpTest.HttpCachedSerialized > HTTP/1.1 200 OK > Date: Fri, 07 Aug 2015 17:23:05 GMT > Content-Length: 30 > I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 0 > Forked command at 54363 > sh -c './mesos-fetcher-test-cmd 0' > E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54363) > E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 1 > Forked command at 54411 > sh -c './mesos-fetcher-test-cmd 1' > E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54411) > E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > ../../src/tests/fetcher_cache_tests.cpp:860: Failure > Failed to wait 15secs for awaitFinished(task.get()) > *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are > using GNU date *** > [ FAILED ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms) > [ RUN ] FetcherCacheHttpTest.HttpCachedConcurrent > PC: @0x113723618 process::Owned<>::get() > *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: *** > @ 0x7fff8fcacf1a _sigtramp > @ 0x7f9bc3109710 (unknown) > @0x1136f07e2 mesos::internal::slave::Fetcher::fetch() > @0x113862f9d > mesos::internal::slave::MesosContainerizerProcess::fetch() > @0x1138f1b5d > _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_ > @0x1138f18cf >
[jira] [Commented] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky
[ https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268446#comment-15268446 ] Bernd Mathiske commented on MESOS-3235: --- Looks like task 0 never got started at all and therefore waiting for it fails. > FetcherCacheHttpTest.HttpCachedSerialized and > FetcherCacheHttpTest.HttpCachedConcurrent are flaky > - > > Key: MESOS-3235 > URL: https://issues.apache.org/jira/browse/MESOS-3235 > Project: Mesos > Issue Type: Bug > Components: fetcher, tests >Affects Versions: 0.23.0 >Reporter: Joseph Wu >Assignee: haosdent > Labels: mesosphere > Fix For: 0.27.0 > > Attachments: fetchercache_log_centos_6.txt > > > On OSX, {{make clean && make -j8 V=0 check}}: > {code} > [--] 3 tests from FetcherCacheHttpTest > [ RUN ] FetcherCacheHttpTest.HttpCachedSerialized > HTTP/1.1 200 OK > Date: Fri, 07 Aug 2015 17:23:05 GMT > Content-Length: 30 > I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 0 > Forked command at 54363 > sh -c './mesos-fetcher-test-cmd 0' > E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54363) > E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 1 > Forked command at 54411 > sh -c './mesos-fetcher-test-cmd 1' > E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54411) > E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > ../../src/tests/fetcher_cache_tests.cpp:860: Failure > Failed to wait 15secs for awaitFinished(task.get()) > *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are > using GNU date *** > [ FAILED ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms) > [ RUN ] FetcherCacheHttpTest.HttpCachedConcurrent > PC: @0x113723618 process::Owned<>::get() > *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: *** > @ 0x7fff8fcacf1a _sigtramp > @ 0x7f9bc3109710 (unknown) > @0x1136f07e2 mesos::internal::slave::Fetcher::fetch() > @0x113862f9d > mesos::internal::slave::MesosContainerizerProcess::fetch() > @0x1138f1b5d > _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_ > @0x1138f18cf > _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_ > @0x1143768cf std::__1::function<>::operator()() > @0x11435ca7f process::ProcessBase::visit() > @0x1143ed6fe process::DispatchEvent::visit() > @0x11271 process::ProcessBase::serve() > @0x114343b4e process::ProcessManager::resume() > @0x1143431ca process::internal::schedule() > @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_ > @ 0x7fff95090268 _pthread_body > @ 0x7fff950901e5 _pthread_start > @ 0x7fff9508e41d thread_start > Failed to synchronize with slave (it's probably exited) > make[3]: *** [check-local] Segmentation fault: 11 > make[2]: *** [check-am] Error 2 > make[1]: *** [check] Error 2 > make: *** [check-recursive] Error 1 > {code} > This was
[jira] [Updated] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky
[ https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3235: -- Sprint: Mesosphere Sprint 20, Mesosphere Sprint 26, Mesosphere Sprint 33 (was: Mesosphere Sprint 20, Mesosphere Sprint 26, Mesosphere Sprint 33, Mesosphere Sprint 34) > FetcherCacheHttpTest.HttpCachedSerialized and > FetcherCacheHttpTest.HttpCachedConcurrent are flaky > - > > Key: MESOS-3235 > URL: https://issues.apache.org/jira/browse/MESOS-3235 > Project: Mesos > Issue Type: Bug > Components: fetcher, tests >Affects Versions: 0.23.0 >Reporter: Joseph Wu >Assignee: Bernd Mathiske > Labels: mesosphere > Fix For: 0.27.0 > > Attachments: fetchercache_log_centos_6.txt > > > On OSX, {{make clean && make -j8 V=0 check}}: > {code} > [--] 3 tests from FetcherCacheHttpTest > [ RUN ] FetcherCacheHttpTest.HttpCachedSerialized > HTTP/1.1 200 OK > Date: Fri, 07 Aug 2015 17:23:05 GMT > Content-Length: 30 > I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 0 > Forked command at 54363 > sh -c './mesos-fetcher-test-cmd 0' > E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54363) > E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 1 > Forked command at 54411 > sh -c './mesos-fetcher-test-cmd 1' > E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54411) > E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > ../../src/tests/fetcher_cache_tests.cpp:860: Failure > Failed to wait 15secs for awaitFinished(task.get()) > *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are > using GNU date *** > [ FAILED ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms) > [ RUN ] FetcherCacheHttpTest.HttpCachedConcurrent > PC: @0x113723618 process::Owned<>::get() > *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: *** > @ 0x7fff8fcacf1a _sigtramp > @ 0x7f9bc3109710 (unknown) > @0x1136f07e2 mesos::internal::slave::Fetcher::fetch() > @0x113862f9d > mesos::internal::slave::MesosContainerizerProcess::fetch() > @0x1138f1b5d > _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_ > @0x1138f18cf > _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_ > @0x1143768cf std::__1::function<>::operator()() > @0x11435ca7f process::ProcessBase::visit() > @0x1143ed6fe process::DispatchEvent::visit() > @0x11271 process::ProcessBase::serve() > @0x114343b4e process::ProcessManager::resume() > @0x1143431ca process::internal::schedule() > @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_ > @ 0x7fff95090268 _pthread_body > @ 0x7fff950901e5 _pthread_start > @ 0x7fff9508e41d thread_start > Failed to synchronize with slave (it's probably exited) > make[3]: *** [check-local] Segmentation fault: 11 > make[2]: *** [check-am] Error 2 > make[1]: *** [check] Error 2 >
[jira] [Commented] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky
[ https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257751#comment-15257751 ] Bernd Mathiske commented on MESOS-3235: --- So far I could not reproduce the behavior. Also, a few weeks ago I still saw this test failing on several occasions, but lately it has been stable with no failures. Looking at the log, it seems that all tasks got executed normally. The only thing that looks a bit strange is that TASK_KILLED is mentioned after TASK_FINISHED. I'll look into that, but on the backburner. > FetcherCacheHttpTest.HttpCachedSerialized and > FetcherCacheHttpTest.HttpCachedConcurrent are flaky > - > > Key: MESOS-3235 > URL: https://issues.apache.org/jira/browse/MESOS-3235 > Project: Mesos > Issue Type: Bug > Components: fetcher, tests >Affects Versions: 0.23.0 >Reporter: Joseph Wu >Assignee: Bernd Mathiske > Labels: mesosphere > Fix For: 0.27.0 > > Attachments: fetchercache_log_centos_6.txt > > > On OSX, {{make clean && make -j8 V=0 check}}: > {code} > [--] 3 tests from FetcherCacheHttpTest > [ RUN ] FetcherCacheHttpTest.HttpCachedSerialized > HTTP/1.1 200 OK > Date: Fri, 07 Aug 2015 17:23:05 GMT > Content-Length: 30 > I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 0 > Forked command at 54363 > sh -c './mesos-fetcher-test-cmd 0' > E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54363) > E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 1 > Forked command at 54411 > sh -c './mesos-fetcher-test-cmd 1' > E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54411) > E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > ../../src/tests/fetcher_cache_tests.cpp:860: Failure > Failed to wait 15secs for awaitFinished(task.get()) > *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are > using GNU date *** > [ FAILED ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms) > [ RUN ] FetcherCacheHttpTest.HttpCachedConcurrent > PC: @0x113723618 process::Owned<>::get() > *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: *** > @ 0x7fff8fcacf1a _sigtramp > @ 0x7f9bc3109710 (unknown) > @0x1136f07e2 mesos::internal::slave::Fetcher::fetch() > @0x113862f9d > mesos::internal::slave::MesosContainerizerProcess::fetch() > @0x1138f1b5d > _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_ > @0x1138f18cf > _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_ > @0x1143768cf std::__1::function<>::operator()() > @0x11435ca7f process::ProcessBase::visit() > @0x1143ed6fe process::DispatchEvent::visit() > @0x11271 process::ProcessBase::serve() > @0x114343b4e process::ProcessManager::resume() > @0x1143431ca process::internal::schedule() > @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_ > @ 0x7fff95090268 _pthread_body > @
[jira] [Commented] (MESOS-3367) Mesos fetcher does not extract archives for URI with parameters
[ https://issues.apache.org/jira/browse/MESOS-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256141#comment-15256141 ] Bernd Mathiske commented on MESOS-3367: --- Sorry, I wasn't following this, because I was OOO. Just FYI I agree with the resolution. :-) > Mesos fetcher does not extract archives for URI with parameters > --- > > Key: MESOS-3367 > URL: https://issues.apache.org/jira/browse/MESOS-3367 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Affects Versions: 0.22.1, 0.23.0 > Environment: DCOS 1.1 >Reporter: Renat Zubairov >Assignee: haosdent >Priority: Minor > Labels: mesosphere > Fix For: 0.29.0 > > > I'm deploying using marathon applications with sources served from S3. I'm > using a signed URL to give only temporary access to the S3 resources, so URL > of the resource have some query parameters. > So URI is 'https://foo.com/file.tgz?hasi' and fetcher stores it in the file > with the name 'file.tgz?hasi', then it thinks that extension 'hasi' is not > tgz hence extraction is skipped, despite the fact that MIME Type of the HTTP > resource is 'application/x-tar'. > Workaround - add additional parameter like '=.tgz' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate
[ https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256068#comment-15256068 ] Bernd Mathiske commented on MESOS-4760: --- [~mrbrowning], I am not aware of near-term plans for injection of the fetcher process into the slave object. If you want to take this on, I am happy to shepherd it. > Expose metrics and gauges for fetcher cache usage and hit rate > -- > > Key: MESOS-4760 > URL: https://issues.apache.org/jira/browse/MESOS-4760 > Project: Mesos > Issue Type: Improvement > Components: fetcher, statistics >Reporter: Michael Browning >Assignee: Michael Browning >Priority: Minor > Labels: features, fetcher, statistics, uber > > To evaluate the fetcher cache and calibrate the value of the > fetcher_cache_size flag, it would be useful to have metrics and gauges on > agents that expose operational statistics like cache hit rate, occupied cache > size, and time spent downloading resources that were not present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate
[ https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256068#comment-15256068 ] Bernd Mathiske edited comment on MESOS-4760 at 4/25/16 8:28 AM: [~mrbrowning], I am not aware of near-term plans for injection of the fetcher into the slave object. If you want to take this on, I am happy to shepherd it. was (Author: bernd-mesos): [~mrbrowning], I am not aware of near-term plans for injection of the fetcher process into the slave object. If you want to take this on, I am happy to shepherd it. > Expose metrics and gauges for fetcher cache usage and hit rate > -- > > Key: MESOS-4760 > URL: https://issues.apache.org/jira/browse/MESOS-4760 > Project: Mesos > Issue Type: Improvement > Components: fetcher, statistics >Reporter: Michael Browning >Assignee: Michael Browning >Priority: Minor > Labels: features, fetcher, statistics, uber > > To evaluate the fetcher cache and calibrate the value of the > fetcher_cache_size flag, it would be useful to have metrics and gauges on > agents that expose operational statistics like cache hit rate, occupied cache > size, and time spent downloading resources that were not present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5010) Installation of mesos python package is incomplete
[ https://issues.apache.org/jira/browse/MESOS-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-5010: -- Sprint: Mesosphere Sprint 32 Fix Version/s: 0.29.0 > Installation of mesos python package is incomplete > -- > > Key: MESOS-5010 > URL: https://issues.apache.org/jira/browse/MESOS-5010 > Project: Mesos > Issue Type: Bug > Components: python api >Affects Versions: 0.26.0, 0.28.0, 0.27.2, 0.29.0 >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > Fix For: 0.29.0 > > > The installation of mesos python package is incomplete, i.e., the files > {{cli.py}}, {{futures.py}}, and {{http.py}} are not installed. > {code} > % ../configure --enable-python > % make install DESTDIR=$PWD/D > % PYTHONPATH=$PWD/D/usr/local/lib/python2.7/site-packages:$PYTHONPATH python > -c 'from mesos import http' > Traceback (most recent call last): > File "", line 1, in > ImportError: cannot import name http > {code} > This appears to be first broken with {{d1d70b9}} (MESOS-3969, [Upgraded > bundled pip to 7.1.2.|https://reviews.apache.org/r/40630]). Bisecting in > {{pip}}-land shows that our install becomes broken for {{pip-6.0.1}} and > later (we are using {{pip-7.1.2}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky
[ https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3235: -- Sprint: Mesosphere Sprint 20, Mesosphere Sprint 26, Mesosphere Sprint 33 (was: Mesosphere Sprint 20, Mesosphere Sprint 26) > FetcherCacheHttpTest.HttpCachedSerialized and > FetcherCacheHttpTest.HttpCachedConcurrent are flaky > - > > Key: MESOS-3235 > URL: https://issues.apache.org/jira/browse/MESOS-3235 > Project: Mesos > Issue Type: Bug > Components: fetcher, tests >Affects Versions: 0.23.0 >Reporter: Joseph Wu >Assignee: Bernd Mathiske > Labels: mesosphere > Fix For: 0.27.0 > > Attachments: fetchercache_log_centos_6.txt > > > On OSX, {{make clean && make -j8 V=0 check}}: > {code} > [--] 3 tests from FetcherCacheHttpTest > [ RUN ] FetcherCacheHttpTest.HttpCachedSerialized > HTTP/1.1 200 OK > Date: Fri, 07 Aug 2015 17:23:05 GMT > Content-Length: 30 > I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 0 > Forked command at 54363 > sh -c './mesos-fetcher-test-cmd 0' > E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54363) > E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 1 > Forked command at 54411 > sh -c './mesos-fetcher-test-cmd 1' > E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54411) > E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > ../../src/tests/fetcher_cache_tests.cpp:860: Failure > Failed to wait 15secs for awaitFinished(task.get()) > *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are > using GNU date *** > [ FAILED ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms) > [ RUN ] FetcherCacheHttpTest.HttpCachedConcurrent > PC: @0x113723618 process::Owned<>::get() > *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: *** > @ 0x7fff8fcacf1a _sigtramp > @ 0x7f9bc3109710 (unknown) > @0x1136f07e2 mesos::internal::slave::Fetcher::fetch() > @0x113862f9d > mesos::internal::slave::MesosContainerizerProcess::fetch() > @0x1138f1b5d > _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_ > @0x1138f18cf > _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_ > @0x1143768cf std::__1::function<>::operator()() > @0x11435ca7f process::ProcessBase::visit() > @0x1143ed6fe process::DispatchEvent::visit() > @0x11271 process::ProcessBase::serve() > @0x114343b4e process::ProcessManager::resume() > @0x1143431ca process::internal::schedule() > @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_ > @ 0x7fff95090268 _pthread_body > @ 0x7fff950901e5 _pthread_start > @ 0x7fff9508e41d thread_start > Failed to synchronize with slave (it's probably exited) > make[3]: *** [check-local] Segmentation fault: 11 > make[2]: *** [check-am] Error 2 > make[1]: *** [check] Error 2 > make: *** [check-recursive] Error 1 > {code} >
[jira] [Commented] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate
[ https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230864#comment-15230864 ] Bernd Mathiske commented on MESOS-4760: --- Alright - let's do this! :-) Thanks! > Expose metrics and gauges for fetcher cache usage and hit rate > -- > > Key: MESOS-4760 > URL: https://issues.apache.org/jira/browse/MESOS-4760 > Project: Mesos > Issue Type: Improvement > Components: fetcher, statistics >Reporter: Michael Browning >Priority: Minor > Labels: features, fetcher, statistics, uber > > To evaluate the fetcher cache and calibrate the value of the > fetcher_cache_size flag, it would be useful to have metrics and gauges on > agents that expose operational statistics like cache hit rate, occupied cache > size, and time spent downloading resources that were not present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4760) Expose metrics and gauges for fetcher cache usage and hit rate
[ https://issues.apache.org/jira/browse/MESOS-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225929#comment-15225929 ] Bernd Mathiske commented on MESOS-4760: --- In principle I would shepherd this, but it seems to have low priority. Right? > Expose metrics and gauges for fetcher cache usage and hit rate > -- > > Key: MESOS-4760 > URL: https://issues.apache.org/jira/browse/MESOS-4760 > Project: Mesos > Issue Type: Improvement > Components: fetcher, statistics >Reporter: Michael Browning >Priority: Minor > Labels: features, fetcher, statistics, uber > > To evaluate the fetcher cache and calibrate the value of the > fetcher_cache_size flag, it would be useful to have metrics and gauges on > agents that expose operational statistics like cache hit rate, occupied cache > size, and time spent downloading resources that were not present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6
[ https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4053: -- Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 31 (was: Mesosphere Sprint 26, Mesosphere Sprint 27) > MemoryPressureMesosTest tests fail on CentOS 6.6 > > > Key: MESOS-4053 > URL: https://issues.apache.org/jira/browse/MESOS-4053 > Project: Mesos > Issue Type: Bug > Environment: CentOS 6.6 >Reporter: Greg Mann >Assignee: Benjamin Hindman > Labels: mesosphere, test-failure > > {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and > {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It > seems that mounted cgroups are not properly cleaned up after previous tests, > so multiple hierarchies are detected and thus an error is produced: > {code} > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > ../../src/tests/mesos.cpp:849: Failure > Value of: _baseHierarchy.get() > Actual: "/cgroup" > Expected: baseHierarchy > Which is: "/tmp/mesos_test_cgroup" > - > Multiple cgroups base hierarchies detected: > '/tmp/mesos_test_cgroup' > '/cgroup' > Mesos does not support multiple cgroups base hierarchies. > Please unmount the corresponding (or all) subsystems. > - > ../../src/tests/mesos.cpp:932: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms) > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery > ../../src/tests/mesos.cpp:849: Failure > Value of: _baseHierarchy.get() > Actual: "/cgroup" > Expected: baseHierarchy > Which is: "/tmp/mesos_test_cgroup" > - > Multiple cgroups base hierarchies detected: > '/tmp/mesos_test_cgroup' > '/cgroup' > Mesos does not support multiple cgroups base hierarchies. > Please unmount the corresponding (or all) subsystems. > - > ../../src/tests/mesos.cpp:932: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4912) LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails.
[ https://issues.apache.org/jira/browse/MESOS-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4912: -- Sprint: Mesosphere Sprint 31 > LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails. > -- > > Key: MESOS-4912 > URL: https://issues.apache.org/jira/browse/MESOS-4912 > Project: Mesos > Issue Type: Bug > Components: isolation >Affects Versions: 0.28.0 > Environment: CenOS 7, SSL >Reporter: Bernd Mathiske > Labels: mesosphere > > Observed on our CI: > {noformat} > [09:34:15] : [Step 11/11] [ RUN ] > LinuxFilesystemIsolatorTest.ROOT_MultipleContainers > [09:34:19]W: [Step 11/11] I0309 09:34:19.906719 2357 linux.cpp:81] Making > '/tmp/MLVLnv' a shared mount > [09:34:19]W: [Step 11/11] I0309 09:34:19.923548 2357 > linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy > for the Linux launcher > [09:34:19]W: [Step 11/11] I0309 09:34:19.924705 2376 > containerizer.cpp:666] Starting container > 'da610f7f-a709-4de8-94d3-74f4a520619b' for executor 'test_executor1' of > framework '' > [09:34:19]W: [Step 11/11] I0309 09:34:19.925355 2371 provisioner.cpp:285] > Provisioning image rootfs > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0' > for container da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:19]W: [Step 11/11] I0309 09:34:19.925881 2377 copy.cpp:127] Copying > layer path '/tmp/MLVLnv/test_image1' to rootfs > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0' > [09:34:30]W: [Step 11/11] I0309 09:34:30.835127 2376 linux.cpp:355] Bind > mounting work directory from > '/tmp/MLVLnv/slaves/test_slave/frameworks/executors/test_executor1/runs/da610f7f-a709-4de8-94d3-74f4a520619b' > to > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox' > for container da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.835392 2376 linux.cpp:683] > Changing the ownership of the persistent volume at > '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' with uid 0 and gid > 0 > [09:34:30]W: [Step 11/11] I0309 09:34:30.840425 2376 linux.cpp:723] > Mounting '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' to > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox/volume' > for persistent volume disk(test_role)[persistent_volume_id:volume]:32 of > container da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.843878 2374 > linux_launcher.cpp:304] Cloning child process with flags = CLONE_NEWNS > [09:34:30]W: [Step 11/11] I0309 09:34:30.848302 2371 > containerizer.cpp:666] Starting container > 'fe4729c5-1e63-4cc6-a2e3-fe5006ffe087' for executor 'test_executor2' of > framework '' > [09:34:30]W: [Step 11/11] I0309 09:34:30.848758 2371 > containerizer.cpp:1392] Destroying container > 'da610f7f-a709-4de8-94d3-74f4a520619b' > [09:34:30]W: [Step 11/11] I0309 09:34:30.848865 2373 provisioner.cpp:285] > Provisioning image rootfs > '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917' > for container fe4729c5-1e63-4cc6-a2e3-fe5006ffe087 > [09:34:30]W: [Step 11/11] I0309 09:34:30.849449 2375 copy.cpp:127] Copying > layer path '/tmp/MLVLnv/test_image2' to rootfs > '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917' > [09:34:30]W: [Step 11/11] I0309 09:34:30.854038 2374 cgroups.cpp:2427] > Freezing cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.856693 2372 cgroups.cpp:1409] > Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after > 2.608128ms > [09:34:30]W: [Step 11/11] I0309 09:34:30.859237 2377 cgroups.cpp:2445] > Thawing cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.861454 2377 cgroups.cpp:1438] > Successfullly thawed cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 2176us > [09:34:30]W: [Step 11/11] I0309 09:34:30.934608 2378 > containerizer.cpp:1608] Executor for container > 'da610f7f-a709-4de8-94d3-74f4a520619b' has exited > [09:34:30]W: [Step 11/11] I0309 09:34:30.937692 2372 linux.cpp:798] > Unmounting volume >
[jira] [Updated] (MESOS-4912) LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails.
[ https://issues.apache.org/jira/browse/MESOS-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4912: -- Labels: mesosphere (was: ) > LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails. > -- > > Key: MESOS-4912 > URL: https://issues.apache.org/jira/browse/MESOS-4912 > Project: Mesos > Issue Type: Bug > Components: isolation >Affects Versions: 0.28.0 > Environment: CenOS 7, SSL >Reporter: Bernd Mathiske > Labels: mesosphere > > Observed on our CI: > {noformat} > [09:34:15] : [Step 11/11] [ RUN ] > LinuxFilesystemIsolatorTest.ROOT_MultipleContainers > [09:34:19]W: [Step 11/11] I0309 09:34:19.906719 2357 linux.cpp:81] Making > '/tmp/MLVLnv' a shared mount > [09:34:19]W: [Step 11/11] I0309 09:34:19.923548 2357 > linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy > for the Linux launcher > [09:34:19]W: [Step 11/11] I0309 09:34:19.924705 2376 > containerizer.cpp:666] Starting container > 'da610f7f-a709-4de8-94d3-74f4a520619b' for executor 'test_executor1' of > framework '' > [09:34:19]W: [Step 11/11] I0309 09:34:19.925355 2371 provisioner.cpp:285] > Provisioning image rootfs > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0' > for container da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:19]W: [Step 11/11] I0309 09:34:19.925881 2377 copy.cpp:127] Copying > layer path '/tmp/MLVLnv/test_image1' to rootfs > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0' > [09:34:30]W: [Step 11/11] I0309 09:34:30.835127 2376 linux.cpp:355] Bind > mounting work directory from > '/tmp/MLVLnv/slaves/test_slave/frameworks/executors/test_executor1/runs/da610f7f-a709-4de8-94d3-74f4a520619b' > to > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox' > for container da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.835392 2376 linux.cpp:683] > Changing the ownership of the persistent volume at > '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' with uid 0 and gid > 0 > [09:34:30]W: [Step 11/11] I0309 09:34:30.840425 2376 linux.cpp:723] > Mounting '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' to > '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox/volume' > for persistent volume disk(test_role)[persistent_volume_id:volume]:32 of > container da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.843878 2374 > linux_launcher.cpp:304] Cloning child process with flags = CLONE_NEWNS > [09:34:30]W: [Step 11/11] I0309 09:34:30.848302 2371 > containerizer.cpp:666] Starting container > 'fe4729c5-1e63-4cc6-a2e3-fe5006ffe087' for executor 'test_executor2' of > framework '' > [09:34:30]W: [Step 11/11] I0309 09:34:30.848758 2371 > containerizer.cpp:1392] Destroying container > 'da610f7f-a709-4de8-94d3-74f4a520619b' > [09:34:30]W: [Step 11/11] I0309 09:34:30.848865 2373 provisioner.cpp:285] > Provisioning image rootfs > '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917' > for container fe4729c5-1e63-4cc6-a2e3-fe5006ffe087 > [09:34:30]W: [Step 11/11] I0309 09:34:30.849449 2375 copy.cpp:127] Copying > layer path '/tmp/MLVLnv/test_image2' to rootfs > '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917' > [09:34:30]W: [Step 11/11] I0309 09:34:30.854038 2374 cgroups.cpp:2427] > Freezing cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.856693 2372 cgroups.cpp:1409] > Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after > 2.608128ms > [09:34:30]W: [Step 11/11] I0309 09:34:30.859237 2377 cgroups.cpp:2445] > Thawing cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b > [09:34:30]W: [Step 11/11] I0309 09:34:30.861454 2377 cgroups.cpp:1438] > Successfullly thawed cgroup > /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 2176us > [09:34:30]W: [Step 11/11] I0309 09:34:30.934608 2378 > containerizer.cpp:1608] Executor for container > 'da610f7f-a709-4de8-94d3-74f4a520619b' has exited > [09:34:30]W: [Step 11/11] I0309 09:34:30.937692 2372 linux.cpp:798] > Unmounting volume >
[jira] [Updated] (MESOS-4835) CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess is flaky
[ https://issues.apache.org/jira/browse/MESOS-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4835: -- Labels: flaky mesosphere test (was: flaky test) > CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess is flaky > - > > Key: MESOS-4835 > URL: https://issues.apache.org/jira/browse/MESOS-4835 > Project: Mesos > Issue Type: Bug > Environment: Seen on Ubuntu 15 & Debian 8, GCC 4.9 >Reporter: Joseph Wu > Labels: flaky, mesosphere, test > > Verbose logs: > {code} > [ RUN ] > CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess > I0302 00:43:14.127846 11755 cgroups.cpp:2427] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test > I0302 00:43:14.267411 11758 cgroups.cpp:1409] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test after 139.46496ms > I0302 00:43:14.409395 11751 cgroups.cpp:2445] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test > I0302 00:43:14.551304 11751 cgroups.cpp:1438] Successfullly thawed cgroup > /sys/fs/cgroup/freezer/mesos_test after 141.811968ms > ../../src/tests/containerizer/cgroups_tests.cpp:949: Failure > Value of: ::waitpid(pid, , 0) > Actual: 23809 > Expected: -1 > ../../src/tests/containerizer/cgroups_tests.cpp:950: Failure > Value of: (*__errno_location ()) > Actual: 0 > Expected: 10 > [ FAILED ] > CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess (1055 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4835) CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess is flaky
[ https://issues.apache.org/jira/browse/MESOS-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4835: -- Sprint: Mesosphere Sprint 31 > CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess is flaky > - > > Key: MESOS-4835 > URL: https://issues.apache.org/jira/browse/MESOS-4835 > Project: Mesos > Issue Type: Bug > Environment: Seen on Ubuntu 15 & Debian 8, GCC 4.9 >Reporter: Joseph Wu > Labels: flaky, mesosphere, test > > Verbose logs: > {code} > [ RUN ] > CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess > I0302 00:43:14.127846 11755 cgroups.cpp:2427] Freezing cgroup > /sys/fs/cgroup/freezer/mesos_test > I0302 00:43:14.267411 11758 cgroups.cpp:1409] Successfully froze cgroup > /sys/fs/cgroup/freezer/mesos_test after 139.46496ms > I0302 00:43:14.409395 11751 cgroups.cpp:2445] Thawing cgroup > /sys/fs/cgroup/freezer/mesos_test > I0302 00:43:14.551304 11751 cgroups.cpp:1438] Successfullly thawed cgroup > /sys/fs/cgroup/freezer/mesos_test after 141.811968ms > ../../src/tests/containerizer/cgroups_tests.cpp:949: Failure > Value of: ::waitpid(pid, , 0) > Actual: 23809 > Expected: -1 > ../../src/tests/containerizer/cgroups_tests.cpp:950: Failure > Value of: (*__errno_location ()) > Actual: 0 > Expected: 10 > [ FAILED ] > CgroupsAnyHierarchyWithFreezerTest.ROOT_CGROUPS_DestroyTracedProcess (1055 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4810) ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.
[ https://issues.apache.org/jira/browse/MESOS-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4810: -- Labels: docker mesosphere test (was: docker test) > ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails. > -- > > Key: MESOS-4810 > URL: https://issues.apache.org/jira/browse/MESOS-4810 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.28.0 > Environment: CentOS 7 on AWS, both with or without SSL. >Reporter: Bernd Mathiske >Assignee: Jie Yu > Labels: docker, mesosphere, test > > {noformat} > [09:46:46] : [Step 11/11] [ RUN ] > ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand > [09:46:46]W: [Step 11/11] I0229 09:46:46.628413 1166 leveldb.cpp:174] > Opened db in 4.242882ms > [09:46:46]W: [Step 11/11] I0229 09:46:46.629926 1166 leveldb.cpp:181] > Compacted db in 1.483621ms > [09:46:46]W: [Step 11/11] I0229 09:46:46.629966 1166 leveldb.cpp:196] > Created db iterator in 15498ns > [09:46:46]W: [Step 11/11] I0229 09:46:46.629977 1166 leveldb.cpp:202] > Seeked to beginning of db in 1405ns > [09:46:46]W: [Step 11/11] I0229 09:46:46.629984 1166 leveldb.cpp:271] > Iterated through 0 keys in the db in 239ns > [09:46:46]W: [Step 11/11] I0229 09:46:46.630015 1166 replica.cpp:779] > Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [09:46:46]W: [Step 11/11] I0229 09:46:46.630470 1183 recover.cpp:447] > Starting replica recovery > [09:46:46]W: [Step 11/11] I0229 09:46:46.630702 1180 recover.cpp:473] > Replica is in EMPTY status > [09:46:46]W: [Step 11/11] I0229 09:46:46.631767 1182 replica.cpp:673] > Replica in EMPTY status received a broadcasted recover request from > (14567)@172.30.2.124:37431 > [09:46:46]W: [Step 11/11] I0229 09:46:46.632115 1183 recover.cpp:193] > Received a recover response from a replica in EMPTY status > [09:46:46]W: [Step 11/11] I0229 09:46:46.632450 1186 recover.cpp:564] > Updating replica status to STARTING > [09:46:46]W: [Step 11/11] I0229 09:46:46.633476 1186 master.cpp:375] > Master 3fbb2fb0-4f18-498b-a440-9acbf6923a13 (ip-172-30-2-124.mesosphere.io) > started on 172.30.2.124:37431 > [09:46:46]W: [Step 11/11] I0229 09:46:46.633491 1186 master.cpp:377] Flags > at startup: --acls="" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate="true" > --authenticate_http="true" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/4UxXoW/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="100secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/4UxXoW/master" > --zk_session_timeout="10secs" > [09:46:46]W: [Step 11/11] I0229 09:46:46.633677 1186 master.cpp:422] > Master only allowing authenticated frameworks to register > [09:46:46]W: [Step 11/11] I0229 09:46:46.633685 1186 master.cpp:427] > Master only allowing authenticated slaves to register > [09:46:46]W: [Step 11/11] I0229 09:46:46.633692 1186 credentials.hpp:35] > Loading credentials for authentication from '/tmp/4UxXoW/credentials' > [09:46:46]W: [Step 11/11] I0229 09:46:46.633851 1183 leveldb.cpp:304] > Persisting metadata (8 bytes) to leveldb took 1.191043ms > [09:46:46]W: [Step 11/11] I0229 09:46:46.633873 1183 replica.cpp:320] > Persisted replica status to STARTING > [09:46:46]W: [Step 11/11] I0229 09:46:46.633894 1186 master.cpp:467] Using > default 'crammd5' authenticator > [09:46:46]W: [Step 11/11] I0229 09:46:46.634003 1186 master.cpp:536] Using > default 'basic' HTTP authenticator > [09:46:46]W: [Step 11/11] I0229 09:46:46.634062 1184 recover.cpp:473] > Replica is in STARTING status > [09:46:46]W: [Step 11/11] I0229 09:46:46.634109 1186 master.cpp:570] > Authorization enabled > [09:46:46]W: [Step 11/11] I0229 09:46:46.634249 1187 > whitelist_watcher.cpp:77] No whitelist given > [09:46:46]W: [Step 11/11] I0229 09:46:46.634255 1184 hierarchical.cpp:144] > Initialized hierarchical allocator process > [09:46:46]W: [Step 11/11] I0229 09:46:46.634884 1187 replica.cpp:673] > Replica in STARTING
[jira] [Updated] (MESOS-4794) Add documentation around using the docker containerizer on CentOS 6.
[ https://issues.apache.org/jira/browse/MESOS-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4794: -- Labels: containerizer docker documentation mesosphere (was: containerizer docker documentation) > Add documentation around using the docker containerizer on CentOS 6. > > > Key: MESOS-4794 > URL: https://issues.apache.org/jira/browse/MESOS-4794 > Project: Mesos > Issue Type: Documentation > Components: docker, documentation >Affects Versions: 0.28.0 >Reporter: Joseph Wu > Labels: containerizer, docker, documentation, mesosphere > > Support for persistent volumes was added to the docker containerizer in > [MESOS-3413]. However, this does not work on CentOS 6. > On CentOS 6, the same {{docker run -v ...}} operation does not perform a > recursive bind, whereas on every other OS supported by Mesos, docker does a > recursive bind. > Docker has already [dropped support for CentOS > 6|https://github.com/docker/docker/issues/14365], so we should add > precautionary documentation in case anyone tries to use the docker > containerizer on CentOS 6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4794) Add documentation around using the docker containerizer on CentOS 6.
[ https://issues.apache.org/jira/browse/MESOS-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4794: -- Sprint: Mesosphere Sprint 31 > Add documentation around using the docker containerizer on CentOS 6. > > > Key: MESOS-4794 > URL: https://issues.apache.org/jira/browse/MESOS-4794 > Project: Mesos > Issue Type: Documentation > Components: docker, documentation >Affects Versions: 0.28.0 >Reporter: Joseph Wu > Labels: containerizer, docker, documentation > > Support for persistent volumes was added to the docker containerizer in > [MESOS-3413]. However, this does not work on CentOS 6. > On CentOS 6, the same {{docker run -v ...}} operation does not perform a > recursive bind, whereas on every other OS supported by Mesos, docker does a > recursive bind. > Docker has already [dropped support for CentOS > 6|https://github.com/docker/docker/issues/14365], so we should add > precautionary documentation in case anyone tries to use the docker > containerizer on CentOS 6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4736) DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes fails on CentOS 6
[ https://issues.apache.org/jira/browse/MESOS-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4736: -- Sprint: Mesosphere Sprint 31 > DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes fails on > CentOS 6 > - > > Key: MESOS-4736 > URL: https://issues.apache.org/jira/browse/MESOS-4736 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.0 > Environment: Centos6 + GCC 4.9 on AWS >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: flaky, mesosphere, test > > This test passes consistently on other OS's, but fails consistently on CentOS > 6. > Verbose logs from test failure: > {code} > [ RUN ] DockerContainerizerTest.ROOT_DOCKER_LaunchWithPersistentVolumes > I0222 18:16:12.327957 26681 leveldb.cpp:174] Opened db in 7.466102ms > I0222 18:16:12.330528 26681 leveldb.cpp:181] Compacted db in 2.540139ms > I0222 18:16:12.330580 26681 leveldb.cpp:196] Created db iterator in 16908ns > I0222 18:16:12.330592 26681 leveldb.cpp:202] Seeked to beginning of db in > 1403ns > I0222 18:16:12.330600 26681 leveldb.cpp:271] Iterated through 0 keys in the > db in 315ns > I0222 18:16:12.330634 26681 replica.cpp:779] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0222 18:16:12.331082 26698 recover.cpp:447] Starting replica recovery > I0222 18:16:12.331289 26698 recover.cpp:473] Replica is in EMPTY status > I0222 18:16:12.332162 26703 replica.cpp:673] Replica in EMPTY status received > a broadcasted recover request from (13761)@172.30.2.148:35274 > I0222 18:16:12.332701 26701 recover.cpp:193] Received a recover response from > a replica in EMPTY status > I0222 18:16:12.333230 26699 recover.cpp:564] Updating replica status to > STARTING > I0222 18:16:12.334102 26698 master.cpp:376] Master > 652149b4-3932-4d8b-ba6f-8c9d9045be70 (ip-172-30-2-148.mesosphere.io) started > on 172.30.2.148:35274 > I0222 18:16:12.334116 26698 master.cpp:378] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/QEhLBS/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="100secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/QEhLBS/master" > --zk_session_timeout="10secs" > I0222 18:16:12.334354 26698 master.cpp:423] Master only allowing > authenticated frameworks to register > I0222 18:16:12.334363 26698 master.cpp:428] Master only allowing > authenticated slaves to register > I0222 18:16:12.334369 26698 credentials.hpp:35] Loading credentials for > authentication from '/tmp/QEhLBS/credentials' > I0222 18:16:12.335366 26698 master.cpp:468] Using default 'crammd5' > authenticator > I0222 18:16:12.335492 26698 master.cpp:537] Using default 'basic' HTTP > authenticator > I0222 18:16:12.335623 26698 master.cpp:571] Authorization enabled > I0222 18:16:12.335752 26703 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 2.314693ms > I0222 18:16:12.335769 26700 whitelist_watcher.cpp:77] No whitelist given > I0222 18:16:12.335778 26703 replica.cpp:320] Persisted replica status to > STARTING > I0222 18:16:12.335821 26697 hierarchical.cpp:144] Initialized hierarchical > allocator process > I0222 18:16:12.335965 26701 recover.cpp:473] Replica is in STARTING status > I0222 18:16:12.336771 26703 replica.cpp:673] Replica in STARTING status > received a broadcasted recover request from (13763)@172.30.2.148:35274 > I0222 18:16:12.337191 26696 recover.cpp:193] Received a recover response from > a replica in STARTING status > I0222 18:16:12.337635 26700 recover.cpp:564] Updating replica status to VOTING > I0222 18:16:12.337671 26703 master.cpp:1712] The newly elected leader is > master@172.30.2.148:35274 with id 652149b4-3932-4d8b-ba6f-8c9d9045be70 > I0222 18:16:12.337698 26703 master.cpp:1725] Elected as the leading master! > I0222 18:16:12.337713 26703 master.cpp:1470] Recovering from registrar > I0222 18:16:12.337828 26696 registrar.cpp:307] Recovering registrar > I0222 18:16:12.339972 26702 leveldb.cpp:304] Persisting metadata
[jira] [Updated] (MESOS-2858) FetcherCacheHttpTest.HttpMixed is flaky.
[ https://issues.apache.org/jira/browse/MESOS-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-2858: -- Sprint: Mesosphere Sprint 31 > FetcherCacheHttpTest.HttpMixed is flaky. > > > Key: MESOS-2858 > URL: https://issues.apache.org/jira/browse/MESOS-2858 > Project: Mesos > Issue Type: Bug > Components: fetcher, test >Reporter: Benjamin Mahler >Assignee: Bernd Mathiske > Labels: flaky-test, mesosphere > > From jenkins: > {noformat} > [ RUN ] FetcherCacheHttpTest.HttpMixed > Using temporary directory '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC' > I0611 00:40:28.208909 26042 leveldb.cpp:176] Opened db in 3.831173ms > I0611 00:40:28.209951 26042 leveldb.cpp:183] Compacted db in 997319ns > I0611 00:40:28.210011 26042 leveldb.cpp:198] Created db iterator in 23917ns > I0611 00:40:28.210032 26042 leveldb.cpp:204] Seeked to beginning of db in > 2112ns > I0611 00:40:28.210043 26042 leveldb.cpp:273] Iterated through 0 keys in the > db in 392ns > I0611 00:40:28.210095 26042 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0611 00:40:28.210741 26067 recover.cpp:449] Starting replica recovery > I0611 00:40:28.211144 26067 recover.cpp:475] Replica is in EMPTY status > I0611 00:40:28.212210 26074 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I0611 00:40:28.212728 26071 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I0611 00:40:28.213260 26069 recover.cpp:566] Updating replica status to > STARTING > I0611 00:40:28.214066 26073 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 590673ns > I0611 00:40:28.214095 26073 replica.cpp:323] Persisted replica status to > STARTING > I0611 00:40:28.214350 26073 recover.cpp:475] Replica is in STARTING status > I0611 00:40:28.214774 26061 master.cpp:363] Master > 20150611-004028-1946161580-33349-26042 (658ddc752264) started on > 172.17.0.116:33349 > I0611 00:40:28.214800 26061 master.cpp:365] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --credentials="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials" > --framework_sorter="drf" --help="false" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" > --work_dir="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/master" > --zk_session_timeout="10secs" > I0611 00:40:28.215342 26061 master.cpp:410] Master only allowing > authenticated frameworks to register > I0611 00:40:28.215361 26061 master.cpp:415] Master only allowing > authenticated slaves to register > I0611 00:40:28.215397 26061 credentials.hpp:37] Loading credentials for > authentication from '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials' > I0611 00:40:28.215589 26064 replica.cpp:641] Replica in STARTING status > received a broadcasted recover request > I0611 00:40:28.215770 26061 master.cpp:454] Using default 'crammd5' > authenticator > I0611 00:40:28.215934 26061 master.cpp:491] Authorization enabled > I0611 00:40:28.215932 26062 recover.cpp:195] Received a recover response from > a replica in STARTING status > I0611 00:40:28.216256 26070 whitelist_watcher.cpp:79] No whitelist given > I0611 00:40:28.216310 26066 hierarchical.hpp:309] Initialized hierarchical > allocator process > I0611 00:40:28.216352 26067 recover.cpp:566] Updating replica status to VOTING > I0611 00:40:28.216909 26070 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 374189ns > I0611 00:40:28.216931 26070 replica.cpp:323] Persisted replica status to > VOTING > I0611 00:40:28.217052 26075 recover.cpp:580] Successfully joined the Paxos > group > I0611 00:40:28.217355 26063 master.cpp:1476] The newly elected leader is > master@172.17.0.116:33349 with id 20150611-004028-1946161580-33349-26042 > I0611 00:40:28.217512 26063 master.cpp:1489] Elected as the leading master! > I0611 00:40:28.217540 26063 master.cpp:1259] Recovering from registrar > I0611 00:40:28.217753 26070 registrar.cpp:313] Recovering registrar > I0611 00:40:28.217396 26075 recover.cpp:464] Recover process terminated > I0611 00:40:28.218341 26065 log.cpp:661] Attempting to start the writer > I0611 00:40:28.219391 26067 replica.cpp:477] Replica received implicit > promise request with proposal 1 > I0611
[jira] [Commented] (MESOS-2858) FetcherCacheHttpTest.HttpMixed is flaky.
[ https://issues.apache.org/jira/browse/MESOS-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193281#comment-15193281 ] Bernd Mathiske commented on MESOS-2858: --- FetcherCacheHttpTest.HttpCachedConcurrent exposes the same flaky behavior. > FetcherCacheHttpTest.HttpMixed is flaky. > > > Key: MESOS-2858 > URL: https://issues.apache.org/jira/browse/MESOS-2858 > Project: Mesos > Issue Type: Bug > Components: fetcher, test >Reporter: Benjamin Mahler >Assignee: Bernd Mathiske > Labels: flaky-test, mesosphere > > From jenkins: > {noformat} > [ RUN ] FetcherCacheHttpTest.HttpMixed > Using temporary directory '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC' > I0611 00:40:28.208909 26042 leveldb.cpp:176] Opened db in 3.831173ms > I0611 00:40:28.209951 26042 leveldb.cpp:183] Compacted db in 997319ns > I0611 00:40:28.210011 26042 leveldb.cpp:198] Created db iterator in 23917ns > I0611 00:40:28.210032 26042 leveldb.cpp:204] Seeked to beginning of db in > 2112ns > I0611 00:40:28.210043 26042 leveldb.cpp:273] Iterated through 0 keys in the > db in 392ns > I0611 00:40:28.210095 26042 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0611 00:40:28.210741 26067 recover.cpp:449] Starting replica recovery > I0611 00:40:28.211144 26067 recover.cpp:475] Replica is in EMPTY status > I0611 00:40:28.212210 26074 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I0611 00:40:28.212728 26071 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I0611 00:40:28.213260 26069 recover.cpp:566] Updating replica status to > STARTING > I0611 00:40:28.214066 26073 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 590673ns > I0611 00:40:28.214095 26073 replica.cpp:323] Persisted replica status to > STARTING > I0611 00:40:28.214350 26073 recover.cpp:475] Replica is in STARTING status > I0611 00:40:28.214774 26061 master.cpp:363] Master > 20150611-004028-1946161580-33349-26042 (658ddc752264) started on > 172.17.0.116:33349 > I0611 00:40:28.214800 26061 master.cpp:365] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --credentials="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials" > --framework_sorter="drf" --help="false" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" > --work_dir="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/master" > --zk_session_timeout="10secs" > I0611 00:40:28.215342 26061 master.cpp:410] Master only allowing > authenticated frameworks to register > I0611 00:40:28.215361 26061 master.cpp:415] Master only allowing > authenticated slaves to register > I0611 00:40:28.215397 26061 credentials.hpp:37] Loading credentials for > authentication from '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials' > I0611 00:40:28.215589 26064 replica.cpp:641] Replica in STARTING status > received a broadcasted recover request > I0611 00:40:28.215770 26061 master.cpp:454] Using default 'crammd5' > authenticator > I0611 00:40:28.215934 26061 master.cpp:491] Authorization enabled > I0611 00:40:28.215932 26062 recover.cpp:195] Received a recover response from > a replica in STARTING status > I0611 00:40:28.216256 26070 whitelist_watcher.cpp:79] No whitelist given > I0611 00:40:28.216310 26066 hierarchical.hpp:309] Initialized hierarchical > allocator process > I0611 00:40:28.216352 26067 recover.cpp:566] Updating replica status to VOTING > I0611 00:40:28.216909 26070 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 374189ns > I0611 00:40:28.216931 26070 replica.cpp:323] Persisted replica status to > VOTING > I0611 00:40:28.217052 26075 recover.cpp:580] Successfully joined the Paxos > group > I0611 00:40:28.217355 26063 master.cpp:1476] The newly elected leader is > master@172.17.0.116:33349 with id 20150611-004028-1946161580-33349-26042 > I0611 00:40:28.217512 26063 master.cpp:1489] Elected as the leading master! > I0611 00:40:28.217540 26063 master.cpp:1259] Recovering from registrar > I0611 00:40:28.217753 26070 registrar.cpp:313] Recovering registrar > I0611 00:40:28.217396 26075 recover.cpp:464] Recover process terminated > I0611 00:40:28.218341 26065 log.cpp:661] Attempting to start the writer > I0611 00:40:28.219391 26067
[jira] [Created] (MESOS-4912) LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails.
Bernd Mathiske created MESOS-4912: - Summary: LinuxFilesystemIsolatorTest.ROOT_MultipleContainers fails. Key: MESOS-4912 URL: https://issues.apache.org/jira/browse/MESOS-4912 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.28.0 Environment: CenOS 7, SSL Reporter: Bernd Mathiske Observed on our CI: {noformat} [09:34:15] : [Step 11/11] [ RUN ] LinuxFilesystemIsolatorTest.ROOT_MultipleContainers [09:34:19]W: [Step 11/11] I0309 09:34:19.906719 2357 linux.cpp:81] Making '/tmp/MLVLnv' a shared mount [09:34:19]W: [Step 11/11] I0309 09:34:19.923548 2357 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher [09:34:19]W: [Step 11/11] I0309 09:34:19.924705 2376 containerizer.cpp:666] Starting container 'da610f7f-a709-4de8-94d3-74f4a520619b' for executor 'test_executor1' of framework '' [09:34:19]W: [Step 11/11] I0309 09:34:19.925355 2371 provisioner.cpp:285] Provisioning image rootfs '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0' for container da610f7f-a709-4de8-94d3-74f4a520619b [09:34:19]W: [Step 11/11] I0309 09:34:19.925881 2377 copy.cpp:127] Copying layer path '/tmp/MLVLnv/test_image1' to rootfs '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0' [09:34:30]W: [Step 11/11] I0309 09:34:30.835127 2376 linux.cpp:355] Bind mounting work directory from '/tmp/MLVLnv/slaves/test_slave/frameworks/executors/test_executor1/runs/da610f7f-a709-4de8-94d3-74f4a520619b' to '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox' for container da610f7f-a709-4de8-94d3-74f4a520619b [09:34:30]W: [Step 11/11] I0309 09:34:30.835392 2376 linux.cpp:683] Changing the ownership of the persistent volume at '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' with uid 0 and gid 0 [09:34:30]W: [Step 11/11] I0309 09:34:30.840425 2376 linux.cpp:723] Mounting '/tmp/MLVLnv/volumes/roles/test_role/persistent_volume_id' to '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox/volume' for persistent volume disk(test_role)[persistent_volume_id:volume]:32 of container da610f7f-a709-4de8-94d3-74f4a520619b [09:34:30]W: [Step 11/11] I0309 09:34:30.843878 2374 linux_launcher.cpp:304] Cloning child process with flags = CLONE_NEWNS [09:34:30]W: [Step 11/11] I0309 09:34:30.848302 2371 containerizer.cpp:666] Starting container 'fe4729c5-1e63-4cc6-a2e3-fe5006ffe087' for executor 'test_executor2' of framework '' [09:34:30]W: [Step 11/11] I0309 09:34:30.848758 2371 containerizer.cpp:1392] Destroying container 'da610f7f-a709-4de8-94d3-74f4a520619b' [09:34:30]W: [Step 11/11] I0309 09:34:30.848865 2373 provisioner.cpp:285] Provisioning image rootfs '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917' for container fe4729c5-1e63-4cc6-a2e3-fe5006ffe087 [09:34:30]W: [Step 11/11] I0309 09:34:30.849449 2375 copy.cpp:127] Copying layer path '/tmp/MLVLnv/test_image2' to rootfs '/tmp/MLVLnv/provisioner/containers/fe4729c5-1e63-4cc6-a2e3-fe5006ffe087/backends/copy/rootfses/518b2464-43dd-47b0-9648-e78aedde6917' [09:34:30]W: [Step 11/11] I0309 09:34:30.854038 2374 cgroups.cpp:2427] Freezing cgroup /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b [09:34:30]W: [Step 11/11] I0309 09:34:30.856693 2372 cgroups.cpp:1409] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 2.608128ms [09:34:30]W: [Step 11/11] I0309 09:34:30.859237 2377 cgroups.cpp:2445] Thawing cgroup /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b [09:34:30]W: [Step 11/11] I0309 09:34:30.861454 2377 cgroups.cpp:1438] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/da610f7f-a709-4de8-94d3-74f4a520619b after 2176us [09:34:30]W: [Step 11/11] I0309 09:34:30.934608 2378 containerizer.cpp:1608] Executor for container 'da610f7f-a709-4de8-94d3-74f4a520619b' has exited [09:34:30]W: [Step 11/11] I0309 09:34:30.937692 2372 linux.cpp:798] Unmounting volume '/tmp/MLVLnv/provisioner/containers/da610f7f-a709-4de8-94d3-74f4a520619b/backends/copy/rootfses/0d7e047a-50f1-490b-bb58-00e9c49628d0/mnt/mesos/sandbox/volume' for container da610f7f-a709-4de8-94d3-74f4a520619b [09:34:30]W: [Step 11/11] I0309 09:34:30.937742 2372 linux.cpp:817] Unmounting sandbox/work directory
[jira] [Updated] (MESOS-4750) Document: Mesos Executor expects all SSL_* environment variables to be set
[ https://issues.apache.org/jira/browse/MESOS-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4750: -- Shepherd: Adam B > Document: Mesos Executor expects all SSL_* environment variables to be set > -- > > Key: MESOS-4750 > URL: https://issues.apache.org/jira/browse/MESOS-4750 > Project: Mesos > Issue Type: Documentation > Components: documentation, general, slave >Affects Versions: 0.26.0 >Reporter: pawan >Assignee: Jan Schlicht > Labels: documentation, mesosphere, ssl > Original Estimate: 2h > Remaining Estimate: 2h > > I was trying to run Docker containers in a fully SSL-ized Mesos cluster but > ran into problems because the executor was failing with a "Failed to shutdown > socket with fd 10: Transport endpoint is not connected". > My understanding of why this is happening is because the executor was trying > to report its status to Mesos slave over HTTPS, but doesnt have the > appropriate certs/env setup inside the executor. > (Thanks to mslackbot/joseph for helping me figure this out on #mesos) > It turns out, the executor expects all SSL_* variables to be set inside > `CommandInfo.environment` which gets picked up by the executor to > successfully reports its status to the slave. > This part of __executor needing all the SSL_* variables to be set in its > environment__ is missing in the Mesos SSL transitioning guide. I request you > to please add this vital information to the doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2858) FetcherCacheHttpTest.HttpMixed is flaky.
[ https://issues.apache.org/jira/browse/MESOS-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173700#comment-15173700 ] Bernd Mathiske commented on MESOS-2858: --- Thanks! Having looked through this log once, I have not found the culprit yet. According to the sandbox dumps, the 3 tasks run as intended, but somehow signaling the TASK_FINISHED status updates gets hung somewhere along the way to an AWAIT. Investigation to be continued... > FetcherCacheHttpTest.HttpMixed is flaky. > > > Key: MESOS-2858 > URL: https://issues.apache.org/jira/browse/MESOS-2858 > Project: Mesos > Issue Type: Bug > Components: fetcher, test >Reporter: Benjamin Mahler >Assignee: Bernd Mathiske > Labels: flaky-test, mesosphere > > From jenkins: > {noformat} > [ RUN ] FetcherCacheHttpTest.HttpMixed > Using temporary directory '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC' > I0611 00:40:28.208909 26042 leveldb.cpp:176] Opened db in 3.831173ms > I0611 00:40:28.209951 26042 leveldb.cpp:183] Compacted db in 997319ns > I0611 00:40:28.210011 26042 leveldb.cpp:198] Created db iterator in 23917ns > I0611 00:40:28.210032 26042 leveldb.cpp:204] Seeked to beginning of db in > 2112ns > I0611 00:40:28.210043 26042 leveldb.cpp:273] Iterated through 0 keys in the > db in 392ns > I0611 00:40:28.210095 26042 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0611 00:40:28.210741 26067 recover.cpp:449] Starting replica recovery > I0611 00:40:28.211144 26067 recover.cpp:475] Replica is in EMPTY status > I0611 00:40:28.212210 26074 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I0611 00:40:28.212728 26071 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I0611 00:40:28.213260 26069 recover.cpp:566] Updating replica status to > STARTING > I0611 00:40:28.214066 26073 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 590673ns > I0611 00:40:28.214095 26073 replica.cpp:323] Persisted replica status to > STARTING > I0611 00:40:28.214350 26073 recover.cpp:475] Replica is in STARTING status > I0611 00:40:28.214774 26061 master.cpp:363] Master > 20150611-004028-1946161580-33349-26042 (658ddc752264) started on > 172.17.0.116:33349 > I0611 00:40:28.214800 26061 master.cpp:365] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --credentials="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials" > --framework_sorter="drf" --help="false" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" > --work_dir="/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/master" > --zk_session_timeout="10secs" > I0611 00:40:28.215342 26061 master.cpp:410] Master only allowing > authenticated frameworks to register > I0611 00:40:28.215361 26061 master.cpp:415] Master only allowing > authenticated slaves to register > I0611 00:40:28.215397 26061 credentials.hpp:37] Loading credentials for > authentication from '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials' > I0611 00:40:28.215589 26064 replica.cpp:641] Replica in STARTING status > received a broadcasted recover request > I0611 00:40:28.215770 26061 master.cpp:454] Using default 'crammd5' > authenticator > I0611 00:40:28.215934 26061 master.cpp:491] Authorization enabled > I0611 00:40:28.215932 26062 recover.cpp:195] Received a recover response from > a replica in STARTING status > I0611 00:40:28.216256 26070 whitelist_watcher.cpp:79] No whitelist given > I0611 00:40:28.216310 26066 hierarchical.hpp:309] Initialized hierarchical > allocator process > I0611 00:40:28.216352 26067 recover.cpp:566] Updating replica status to VOTING > I0611 00:40:28.216909 26070 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 374189ns > I0611 00:40:28.216931 26070 replica.cpp:323] Persisted replica status to > VOTING > I0611 00:40:28.217052 26075 recover.cpp:580] Successfully joined the Paxos > group > I0611 00:40:28.217355 26063 master.cpp:1476] The newly elected leader is > master@172.17.0.116:33349 with id 20150611-004028-1946161580-33349-26042 > I0611 00:40:28.217512 26063 master.cpp:1489] Elected as the leading master! > I0611 00:40:28.217540 26063 master.cpp:1259] Recovering from registrar > I0611 00:40:28.217753 26070 registrar.cpp:313] Recovering
[jira] [Updated] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
[ https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3937: -- Shepherd: Till Toenshoff (was: Bernd Mathiske) > Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails. > --- > > Key: MESOS-3937 > URL: https://issues.apache.org/jira/browse/MESOS-3937 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.26.0 > Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2 > 8 CPUs, 16 GB memory > Vagrant, libvirt/Virtual Box or VMware >Reporter: Bernd Mathiske >Assignee: Jan Schlicht > Labels: mesosphere > Fix For: 0.26.0 > > > {noformat} > ../configure > make check > sudo ./bin/mesos-tests.sh > --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose > {noformat} > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from DockerContainerizerTest > I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms > I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms > I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns > I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in > 4927ns > I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the > db in 1605ns > I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery > I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status > I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received > a broadcasted recover request from (4)@10.0.2.15:50088 > I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to > STARTING > I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.016098ms > I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to > STARTING > I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status > I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status > received a broadcasted recover request from (5)@10.0.2.15:50088 > I1117 15:08:09.282552 26400 master.cpp:367] Master > 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on > 10.0.2.15:50088 > I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/40AlT8/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" > --zk_session_timeout="10secs" > I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing > authenticated frameworks to register > I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing > authenticated slaves to register > I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for > authentication from '/tmp/40AlT8/credentials' > I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING > I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' > authenticator > I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.075466ms > I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to > VOTING > I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos > group > I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated > I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL > I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled > I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is > master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a > I1117 15:08:09.296115 26399 master.cpp:1619]
[jira] [Updated] (MESOS-3937) Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails.
[ https://issues.apache.org/jira/browse/MESOS-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3937: -- Assignee: Jan Schlicht > Test DockerContainerizerTest.ROOT_DOCKER_Launch_Executor fails. > --- > > Key: MESOS-3937 > URL: https://issues.apache.org/jira/browse/MESOS-3937 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.26.0 > Environment: Ubuntu 14.04, gcc 4.8.4, Docker version 1.6.2 > 8 CPUs, 16 GB memory > Vagrant, libvirt/Virtual Box or VMware >Reporter: Bernd Mathiske >Assignee: Jan Schlicht > Labels: mesosphere > Fix For: 0.26.0 > > > {noformat} > ../configure > make check > sudo ./bin/mesos-tests.sh > --gtest_filter="DockerContainerizerTest.ROOT_DOCKER_Launch_Executor" --verbose > {noformat} > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from DockerContainerizerTest > I1117 15:08:09.265943 26380 leveldb.cpp:176] Opened db in 3.199666ms > I1117 15:08:09.267761 26380 leveldb.cpp:183] Compacted db in 1.684873ms > I1117 15:08:09.267902 26380 leveldb.cpp:198] Created db iterator in 58313ns > I1117 15:08:09.267966 26380 leveldb.cpp:204] Seeked to beginning of db in > 4927ns > I1117 15:08:09.267997 26380 leveldb.cpp:273] Iterated through 0 keys in the > db in 1605ns > I1117 15:08:09.268156 26380 replica.cpp:780] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1117 15:08:09.270148 26396 recover.cpp:449] Starting replica recovery > I1117 15:08:09.272105 26396 recover.cpp:475] Replica is in EMPTY status > I1117 15:08:09.275640 26396 replica.cpp:676] Replica in EMPTY status received > a broadcasted recover request from (4)@10.0.2.15:50088 > I1117 15:08:09.276578 26399 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1117 15:08:09.277600 26397 recover.cpp:566] Updating replica status to > STARTING > I1117 15:08:09.279613 26396 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.016098ms > I1117 15:08:09.279731 26396 replica.cpp:323] Persisted replica status to > STARTING > I1117 15:08:09.280306 26399 recover.cpp:475] Replica is in STARTING status > I1117 15:08:09.282181 26400 replica.cpp:676] Replica in STARTING status > received a broadcasted recover request from (5)@10.0.2.15:50088 > I1117 15:08:09.282552 26400 master.cpp:367] Master > 59c600f1-92ff-4926-9c84-073d9b81f68a (vagrant-ubuntu-trusty-64) started on > 10.0.2.15:50088 > I1117 15:08:09.283021 26400 master.cpp:369] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/40AlT8/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/40AlT8/master" > --zk_session_timeout="10secs" > I1117 15:08:09.283920 26400 master.cpp:414] Master only allowing > authenticated frameworks to register > I1117 15:08:09.283972 26400 master.cpp:419] Master only allowing > authenticated slaves to register > I1117 15:08:09.284032 26400 credentials.hpp:37] Loading credentials for > authentication from '/tmp/40AlT8/credentials' > I1117 15:08:09.282944 26401 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1117 15:08:09.284639 26401 recover.cpp:566] Updating replica status to VOTING > I1117 15:08:09.285539 26400 master.cpp:458] Using default 'crammd5' > authenticator > I1117 15:08:09.285995 26401 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.075466ms > I1117 15:08:09.286062 26401 replica.cpp:323] Persisted replica status to > VOTING > I1117 15:08:09.286200 26401 recover.cpp:580] Successfully joined the Paxos > group > I1117 15:08:09.286471 26401 recover.cpp:464] Recover process terminated > I1117 15:08:09.287303 26400 authenticator.cpp:520] Initializing server SASL > I1117 15:08:09.289371 26400 master.cpp:495] Authorization enabled > I1117 15:08:09.296018 26399 master.cpp:1606] The newly elected leader is > master@10.0.2.15:50088 with id 59c600f1-92ff-4926-9c84-073d9b81f68a > I1117 15:08:09.296115 26399 master.cpp:1619] Elected as the leading
[jira] [Created] (MESOS-4810) ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.
Bernd Mathiske created MESOS-4810: - Summary: ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails. Key: MESOS-4810 URL: https://issues.apache.org/jira/browse/MESOS-4810 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.28.0 Environment: CentOS 7 on AWS, both with or without SSL. Reporter: Bernd Mathiske {noformat} [09:46:46] : [Step 11/11] [ RUN ] ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand [09:46:46]W: [Step 11/11] I0229 09:46:46.628413 1166 leveldb.cpp:174] Opened db in 4.242882ms [09:46:46]W: [Step 11/11] I0229 09:46:46.629926 1166 leveldb.cpp:181] Compacted db in 1.483621ms [09:46:46]W: [Step 11/11] I0229 09:46:46.629966 1166 leveldb.cpp:196] Created db iterator in 15498ns [09:46:46]W: [Step 11/11] I0229 09:46:46.629977 1166 leveldb.cpp:202] Seeked to beginning of db in 1405ns [09:46:46]W: [Step 11/11] I0229 09:46:46.629984 1166 leveldb.cpp:271] Iterated through 0 keys in the db in 239ns [09:46:46]W: [Step 11/11] I0229 09:46:46.630015 1166 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned [09:46:46]W: [Step 11/11] I0229 09:46:46.630470 1183 recover.cpp:447] Starting replica recovery [09:46:46]W: [Step 11/11] I0229 09:46:46.630702 1180 recover.cpp:473] Replica is in EMPTY status [09:46:46]W: [Step 11/11] I0229 09:46:46.631767 1182 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (14567)@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.632115 1183 recover.cpp:193] Received a recover response from a replica in EMPTY status [09:46:46]W: [Step 11/11] I0229 09:46:46.632450 1186 recover.cpp:564] Updating replica status to STARTING [09:46:46]W: [Step 11/11] I0229 09:46:46.633476 1186 master.cpp:375] Master 3fbb2fb0-4f18-498b-a440-9acbf6923a13 (ip-172-30-2-124.mesosphere.io) started on 172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.633491 1186 master.cpp:377] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/4UxXoW/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/4UxXoW/master" --zk_session_timeout="10secs" [09:46:46]W: [Step 11/11] I0229 09:46:46.633677 1186 master.cpp:422] Master only allowing authenticated frameworks to register [09:46:46]W: [Step 11/11] I0229 09:46:46.633685 1186 master.cpp:427] Master only allowing authenticated slaves to register [09:46:46]W: [Step 11/11] I0229 09:46:46.633692 1186 credentials.hpp:35] Loading credentials for authentication from '/tmp/4UxXoW/credentials' [09:46:46]W: [Step 11/11] I0229 09:46:46.633851 1183 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.191043ms [09:46:46]W: [Step 11/11] I0229 09:46:46.633873 1183 replica.cpp:320] Persisted replica status to STARTING [09:46:46]W: [Step 11/11] I0229 09:46:46.633894 1186 master.cpp:467] Using default 'crammd5' authenticator [09:46:46]W: [Step 11/11] I0229 09:46:46.634003 1186 master.cpp:536] Using default 'basic' HTTP authenticator [09:46:46]W: [Step 11/11] I0229 09:46:46.634062 1184 recover.cpp:473] Replica is in STARTING status [09:46:46]W: [Step 11/11] I0229 09:46:46.634109 1186 master.cpp:570] Authorization enabled [09:46:46]W: [Step 11/11] I0229 09:46:46.634249 1187 whitelist_watcher.cpp:77] No whitelist given [09:46:46]W: [Step 11/11] I0229 09:46:46.634255 1184 hierarchical.cpp:144] Initialized hierarchical allocator process [09:46:46]W: [Step 11/11] I0229 09:46:46.634884 1187 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (14569)@172.30.2.124:37431 [09:46:46]W: [Step 11/11] I0229 09:46:46.635278 1181 recover.cpp:193] Received a recover response from a replica in STARTING status [09:46:46]W: [Step 11/11] I0229 09:46:46.635742 1187 recover.cpp:564] Updating replica status to VOTING [09:46:46]W: [Step 11/11] I0229 09:46:46.636391 1180 master.cpp:1711] The newly
[jira] [Updated] (MESOS-4810) ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails.
[ https://issues.apache.org/jira/browse/MESOS-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4810: -- Labels: docker test (was: ) > ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand fails. > -- > > Key: MESOS-4810 > URL: https://issues.apache.org/jira/browse/MESOS-4810 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.28.0 > Environment: CentOS 7 on AWS, both with or without SSL. >Reporter: Bernd Mathiske > Labels: docker, test > > {noformat} > [09:46:46] : [Step 11/11] [ RUN ] > ProvisionerDockerRegistryPullerTest.ROOT_INTERNET_CURL_ShellCommand > [09:46:46]W: [Step 11/11] I0229 09:46:46.628413 1166 leveldb.cpp:174] > Opened db in 4.242882ms > [09:46:46]W: [Step 11/11] I0229 09:46:46.629926 1166 leveldb.cpp:181] > Compacted db in 1.483621ms > [09:46:46]W: [Step 11/11] I0229 09:46:46.629966 1166 leveldb.cpp:196] > Created db iterator in 15498ns > [09:46:46]W: [Step 11/11] I0229 09:46:46.629977 1166 leveldb.cpp:202] > Seeked to beginning of db in 1405ns > [09:46:46]W: [Step 11/11] I0229 09:46:46.629984 1166 leveldb.cpp:271] > Iterated through 0 keys in the db in 239ns > [09:46:46]W: [Step 11/11] I0229 09:46:46.630015 1166 replica.cpp:779] > Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [09:46:46]W: [Step 11/11] I0229 09:46:46.630470 1183 recover.cpp:447] > Starting replica recovery > [09:46:46]W: [Step 11/11] I0229 09:46:46.630702 1180 recover.cpp:473] > Replica is in EMPTY status > [09:46:46]W: [Step 11/11] I0229 09:46:46.631767 1182 replica.cpp:673] > Replica in EMPTY status received a broadcasted recover request from > (14567)@172.30.2.124:37431 > [09:46:46]W: [Step 11/11] I0229 09:46:46.632115 1183 recover.cpp:193] > Received a recover response from a replica in EMPTY status > [09:46:46]W: [Step 11/11] I0229 09:46:46.632450 1186 recover.cpp:564] > Updating replica status to STARTING > [09:46:46]W: [Step 11/11] I0229 09:46:46.633476 1186 master.cpp:375] > Master 3fbb2fb0-4f18-498b-a440-9acbf6923a13 (ip-172-30-2-124.mesosphere.io) > started on 172.30.2.124:37431 > [09:46:46]W: [Step 11/11] I0229 09:46:46.633491 1186 master.cpp:377] Flags > at startup: --acls="" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate="true" > --authenticate_http="true" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/4UxXoW/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="100secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/4UxXoW/master" > --zk_session_timeout="10secs" > [09:46:46]W: [Step 11/11] I0229 09:46:46.633677 1186 master.cpp:422] > Master only allowing authenticated frameworks to register > [09:46:46]W: [Step 11/11] I0229 09:46:46.633685 1186 master.cpp:427] > Master only allowing authenticated slaves to register > [09:46:46]W: [Step 11/11] I0229 09:46:46.633692 1186 credentials.hpp:35] > Loading credentials for authentication from '/tmp/4UxXoW/credentials' > [09:46:46]W: [Step 11/11] I0229 09:46:46.633851 1183 leveldb.cpp:304] > Persisting metadata (8 bytes) to leveldb took 1.191043ms > [09:46:46]W: [Step 11/11] I0229 09:46:46.633873 1183 replica.cpp:320] > Persisted replica status to STARTING > [09:46:46]W: [Step 11/11] I0229 09:46:46.633894 1186 master.cpp:467] Using > default 'crammd5' authenticator > [09:46:46]W: [Step 11/11] I0229 09:46:46.634003 1186 master.cpp:536] Using > default 'basic' HTTP authenticator > [09:46:46]W: [Step 11/11] I0229 09:46:46.634062 1184 recover.cpp:473] > Replica is in STARTING status > [09:46:46]W: [Step 11/11] I0229 09:46:46.634109 1186 master.cpp:570] > Authorization enabled > [09:46:46]W: [Step 11/11] I0229 09:46:46.634249 1187 > whitelist_watcher.cpp:77] No whitelist given > [09:46:46]W: [Step 11/11] I0229 09:46:46.634255 1184 hierarchical.cpp:144] > Initialized hierarchical allocator process > [09:46:46]W: [Step 11/11] I0229 09:46:46.634884 1187 replica.cpp:673] > Replica in STARTING status received a broadcasted recover request from >
[jira] [Commented] (MESOS-4047) MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky
[ https://issues.apache.org/jira/browse/MESOS-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171670#comment-15171670 ] Bernd Mathiske commented on MESOS-4047: --- https://reviews.apache.org/r/43799/ > MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky > --- > > Key: MESOS-4047 > URL: https://issues.apache.org/jira/browse/MESOS-4047 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.26.0 > Environment: Ubuntu 14, gcc 4.8.4 >Reporter: Joseph Wu >Assignee: Alexander Rojas > Labels: flaky, flaky-test > Fix For: 0.28.0 > > > {code:title=Output from passed test} > [--] 1 test from MemoryPressureMesosTest > 1+0 records in > 1+0 records out > 1048576 bytes (1.0 MB) copied, 0.000430889 s, 2.4 GB/s > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery > I1202 11:09:14.319327 5062 exec.cpp:134] Version: 0.27.0 > I1202 11:09:14.17 5079 exec.cpp:208] Executor registered on slave > bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0 > Registered executor on ubuntu > Starting task 4e62294c-cfcf-4a13-b699-c6a4b7ac5162 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 5085 > I1202 11:09:14.391739 5077 exec.cpp:254] Received reconnect request from > slave bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0 > I1202 11:09:14.398598 5082 exec.cpp:231] Executor re-registered on slave > bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0 > Re-registered executor on ubuntu > Shutting down > Sending SIGTERM to process tree at pid 5085 > Killing the following process trees: > [ > -+- 5085 sh -c while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done > \--- 5086 dd count=512 bs=1M if=/dev/zero of=./temp > ] > [ OK ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (1096 ms) > {code} > {code:title=Output from failed test} > [--] 1 test from MemoryPressureMesosTest > 1+0 records in > 1+0 records out > 1048576 bytes (1.0 MB) copied, 0.000404489 s, 2.6 GB/s > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery > I1202 11:09:15.509950 5109 exec.cpp:134] Version: 0.27.0 > I1202 11:09:15.568183 5123 exec.cpp:208] Executor registered on slave > 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0 > Registered executor on ubuntu > Starting task 14b6bab9-9f60-4130-bdc4-44efba262bc6 > Forked command at 5132 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > I1202 11:09:15.665498 5129 exec.cpp:254] Received reconnect request from > slave 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0 > I1202 11:09:15.670995 5123 exec.cpp:381] Executor asked to shutdown > Shutting down > Sending SIGTERM to process tree at pid 5132 > ../../src/tests/containerizer/memory_pressure_tests.cpp:283: Failure > (usage).failure(): Unknown container: ebe90e15-72fa-4519-837b-62f43052c913 > *** Aborted at 1449083355 (unix time) try "date -d @1449083355" if you are > using GNU date *** > {code} > Notice that in the failed test, the executor is asked to shutdown when it > tries to reconnect to the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4047) MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky
[ https://issues.apache.org/jira/browse/MESOS-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4047: -- Fix Version/s: (was: 0.27.0) > MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery is flaky > --- > > Key: MESOS-4047 > URL: https://issues.apache.org/jira/browse/MESOS-4047 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.26.0 > Environment: Ubuntu 14, gcc 4.8.4 >Reporter: Joseph Wu >Assignee: Alexander Rojas > Labels: flaky, flaky-test > Fix For: 0.28.0 > > > {code:title=Output from passed test} > [--] 1 test from MemoryPressureMesosTest > 1+0 records in > 1+0 records out > 1048576 bytes (1.0 MB) copied, 0.000430889 s, 2.4 GB/s > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery > I1202 11:09:14.319327 5062 exec.cpp:134] Version: 0.27.0 > I1202 11:09:14.17 5079 exec.cpp:208] Executor registered on slave > bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0 > Registered executor on ubuntu > Starting task 4e62294c-cfcf-4a13-b699-c6a4b7ac5162 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > Forked command at 5085 > I1202 11:09:14.391739 5077 exec.cpp:254] Received reconnect request from > slave bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0 > I1202 11:09:14.398598 5082 exec.cpp:231] Executor re-registered on slave > bea15b35-9aa1-4b57-96fb-29b5f70638ac-S0 > Re-registered executor on ubuntu > Shutting down > Sending SIGTERM to process tree at pid 5085 > Killing the following process trees: > [ > -+- 5085 sh -c while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done > \--- 5086 dd count=512 bs=1M if=/dev/zero of=./temp > ] > [ OK ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (1096 ms) > {code} > {code:title=Output from failed test} > [--] 1 test from MemoryPressureMesosTest > 1+0 records in > 1+0 records out > 1048576 bytes (1.0 MB) copied, 0.000404489 s, 2.6 GB/s > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery > I1202 11:09:15.509950 5109 exec.cpp:134] Version: 0.27.0 > I1202 11:09:15.568183 5123 exec.cpp:208] Executor registered on slave > 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0 > Registered executor on ubuntu > Starting task 14b6bab9-9f60-4130-bdc4-44efba262bc6 > Forked command at 5132 > sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done' > I1202 11:09:15.665498 5129 exec.cpp:254] Received reconnect request from > slave 88734acc-718e-45b0-95b9-d8f07cea8a9e-S0 > I1202 11:09:15.670995 5123 exec.cpp:381] Executor asked to shutdown > Shutting down > Sending SIGTERM to process tree at pid 5132 > ../../src/tests/containerizer/memory_pressure_tests.cpp:283: Failure > (usage).failure(): Unknown container: ebe90e15-72fa-4519-837b-62f43052c913 > *** Aborted at 1449083355 (unix time) try "date -d @1449083355" if you are > using GNU date *** > {code} > Notice that in the failed test, the executor is asked to shutdown when it > tries to reconnect to the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4676) ROOT_DOCKER_Logs is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4676: -- Sprint: (was: Mesosphere Sprint 29) > ROOT_DOCKER_Logs is flaky. > -- > > Key: MESOS-4676 > URL: https://issues.apache.org/jira/browse/MESOS-4676 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27 > Environment: CentOS 7 with SSL. >Reporter: Bernd Mathiske > Labels: flaky, mesosphere, test > > {noformat} > [18:06:25][Step 8/8] [ RUN ] DockerContainerizerTest.ROOT_DOCKER_Logs > [18:06:25][Step 8/8] I0215 17:06:25.256103 1740 leveldb.cpp:174] Opened db > in 6.548327ms > [18:06:25][Step 8/8] I0215 17:06:25.258002 1740 leveldb.cpp:181] Compacted > db in 1.837816ms > [18:06:25][Step 8/8] I0215 17:06:25.258059 1740 leveldb.cpp:196] Created db > iterator in 22044ns > [18:06:25][Step 8/8] I0215 17:06:25.258076 1740 leveldb.cpp:202] Seeked to > beginning of db in 2347ns > [18:06:25][Step 8/8] I0215 17:06:25.258091 1740 leveldb.cpp:271] Iterated > through 0 keys in the db in 571ns > [18:06:25][Step 8/8] I0215 17:06:25.258152 1740 replica.cpp:779] Replica > recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [18:06:25][Step 8/8] I0215 17:06:25.258936 1758 recover.cpp:447] Starting > replica recovery > [18:06:25][Step 8/8] I0215 17:06:25.259177 1758 recover.cpp:473] Replica is > in EMPTY status > [18:06:25][Step 8/8] I0215 17:06:25.260327 1757 replica.cpp:673] Replica in > EMPTY status received a broadcasted recover request from > (13608)@172.30.2.239:39785 > [18:06:25][Step 8/8] I0215 17:06:25.260545 1758 recover.cpp:193] Received a > recover response from a replica in EMPTY status > [18:06:25][Step 8/8] I0215 17:06:25.261065 1757 master.cpp:376] Master > 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started > on 172.30.2.239:39785 > [18:06:25][Step 8/8] I0215 17:06:25.261209 1761 recover.cpp:564] Updating > replica status to STARTING > [18:06:25][Step 8/8] I0215 17:06:25.261086 1757 master.cpp:378] Flags at > startup: --acls="" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate="true" > --authenticate_http="true" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="100secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" > --zk_session_timeout="10secs" > [18:06:25][Step 8/8] I0215 17:06:25.261446 1757 master.cpp:423] Master only > allowing authenticated frameworks to register > [18:06:25][Step 8/8] I0215 17:06:25.261456 1757 master.cpp:428] Master only > allowing authenticated slaves to register > [18:06:25][Step 8/8] I0215 17:06:25.261462 1757 credentials.hpp:35] Loading > credentials for authentication from '/tmp/HncLLj/credentials' > [18:06:25][Step 8/8] I0215 17:06:25.261723 1757 master.cpp:468] Using > default 'crammd5' authenticator > [18:06:25][Step 8/8] I0215 17:06:25.261855 1757 master.cpp:537] Using > default 'basic' HTTP authenticator > [18:06:25][Step 8/8] I0215 17:06:25.262022 1757 master.cpp:571] > Authorization enabled > [18:06:25][Step 8/8] I0215 17:06:25.262177 1755 hierarchical.cpp:144] > Initialized hierarchical allocator process > [18:06:25][Step 8/8] I0215 17:06:25.262177 1758 whitelist_watcher.cpp:77] No > whitelist given > [18:06:25][Step 8/8] I0215 17:06:25.262899 1760 leveldb.cpp:304] Persisting > metadata (8 bytes) to leveldb took 1.517992ms > [18:06:25][Step 8/8] I0215 17:06:25.262924 1760 replica.cpp:320] Persisted > replica status to STARTING > [18:06:25][Step 8/8] I0215 17:06:25.263144 1754 recover.cpp:473] Replica is > in STARTING status > [18:06:25][Step 8/8] I0215 17:06:25.264010 1757 master.cpp:1712] The newly > elected leader is master@172.30.2.239:39785 with id > 112363e2-c680-4946-8fee-d0626ed8b21e > [18:06:25][Step 8/8] I0215 17:06:25.264044 1757 master.cpp:1725] Elected as > the leading master! > [18:06:25][Step 8/8] I0215 17:06:25.264061 1757 master.cpp:1470] Recovering > from registrar > [18:06:25][Step 8/8] I0215 17:06:25.264117 1760 replica.cpp:673] Replica in > STARTING status received a broadcasted recover
[jira] [Commented] (MESOS-4547) Introduce TASK_KILLING state.
[ https://issues.apache.org/jira/browse/MESOS-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156719#comment-15156719 ] Bernd Mathiske commented on MESOS-4547: --- The RR for tests (https://reviews.apache.org/r/43490/) has been discarded. Are there going to be tests and documentation for this feature? > Introduce TASK_KILLING state. > - > > Key: MESOS-4547 > URL: https://issues.apache.org/jira/browse/MESOS-4547 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Mahler >Assignee: Abhishek Dasgupta > Labels: mesosphere > Fix For: 0.28.0 > > > Currently there is no state to express that a task is being killed, but is > not yet killed (see MESOS-4140). In a similar way to how we have > TASK_STARTING to indicate the task is starting but not yet running, a > TASK_KILLING state would indicate the task is being killed but is not yet > killed. > This would need to be guarded by a framework capability to protect old > frameworks that cannot understand the TASK_KILLING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1992) Support launching executors with configured systemd
[ https://issues.apache.org/jira/browse/MESOS-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-1992: -- Shepherd: (was: Bernd Mathiske) > Support launching executors with configured systemd > > > Key: MESOS-1992 > URL: https://issues.apache.org/jira/browse/MESOS-1992 > Project: Mesos > Issue Type: Improvement > Components: slave >Reporter: Timothy Chen > Labels: mesosphere > > Currently running mesos-slave in docker with systemd, the mesos-slave > container cannot be upgraded while keeping all the tasks running since > killing the docker container will kill all the processes that is launched > with the mesos containerizer. > If we can let the executor to be launched with systemd outside of the docker > container, then we can let the tasks remain running and recover them when the > slave is upgraded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4692) FetcherCacheHttpTest.HttpCachedSerialized flaky again.
[ https://issues.apache.org/jira/browse/MESOS-4692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150628#comment-15150628 ] Bernd Mathiske commented on MESOS-4692: --- If so then not likely because of changes in fetcher code or fetcher cache test code. This code has been stable except for how many tasks get run. Running less tasks should not make this more flaky. No idea yet what is causing it this time, though. > FetcherCacheHttpTest.HttpCachedSerialized flaky again. > -- > > Key: MESOS-4692 > URL: https://issues.apache.org/jira/browse/MESOS-4692 > Project: Mesos > Issue Type: Bug > Components: fetcher, test > Environment: CentOS 7, plain >Reporter: Bernd Mathiske >Priority: Minor > Labels: flaky, test > > {noformat} > [12:20:50] : [Step 8/8] [ RUN ] > FetcherCacheHttpTest.HttpCachedSerialized > [12:20:50]W: [Step 8/8] I0217 12:20:50.842162 32498 leveldb.cpp:174] Opened > db in 4.973489ms > [12:20:50]W: [Step 8/8] I0217 12:20:50.843670 32498 leveldb.cpp:181] > Compacted db in 1.48087ms > [12:20:50]W: [Step 8/8] I0217 12:20:50.843709 32498 leveldb.cpp:196] > Created db iterator in 15661ns > [12:20:50]W: [Step 8/8] I0217 12:20:50.843720 32498 leveldb.cpp:202] Seeked > to beginning of db in 1401ns > [12:20:50]W: [Step 8/8] I0217 12:20:50.843729 32498 leveldb.cpp:271] > Iterated through 0 keys in the db in 357ns > [12:20:50]W: [Step 8/8] I0217 12:20:50.843760 32498 replica.cpp:779] > Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [12:20:50]W: [Step 8/8] I0217 12:20:50.844228 32513 recover.cpp:447] > Starting replica recovery > [12:20:50]W: [Step 8/8] I0217 12:20:50.844411 32513 recover.cpp:473] > Replica is in EMPTY status > [12:20:50]W: [Step 8/8] I0217 12:20:50.845355 32516 replica.cpp:673] > Replica in EMPTY status received a broadcasted recover request from > (2089)@172.30.2.21:33004 > [12:20:50]W: [Step 8/8] I0217 12:20:50.845825 32518 recover.cpp:193] > Received a recover response from a replica in EMPTY status > [12:20:50]W: [Step 8/8] I0217 12:20:50.846307 32517 recover.cpp:564] > Updating replica status to STARTING > [12:20:50]W: [Step 8/8] I0217 12:20:50.846789 32518 master.cpp:374] Master > 0941887d-60f1-4ff3-85f0-5d19ffee8005 (ip-172-30-2-21.mesosphere.io) started > on 172.30.2.21:33004 > [12:20:50]W: [Step 8/8] I0217 12:20:50.846810 32518 master.cpp:376] Flags > at startup: --acls="" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate="true" > --authenticate_http="true" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/YFwdSN/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/YFwdSN/master" > --zk_session_timeout="10secs" > [12:20:50]W: [Step 8/8] I0217 12:20:50.847057 32518 master.cpp:421] Master > only allowing authenticated frameworks to register > [12:20:50]W: [Step 8/8] I0217 12:20:50.847066 32518 master.cpp:426] Master > only allowing authenticated slaves to register > [12:20:50]W: [Step 8/8] I0217 12:20:50.847072 32518 credentials.hpp:35] > Loading credentials for authentication from '/tmp/YFwdSN/credentials' > [12:20:50]W: [Step 8/8] I0217 12:20:50.847286 32518 master.cpp:466] Using > default 'crammd5' authenticator > [12:20:50]W: [Step 8/8] I0217 12:20:50.847395 32518 master.cpp:535] Using > default 'basic' HTTP authenticator > [12:20:50]W: [Step 8/8] I0217 12:20:50.847511 32518 master.cpp:569] > Authorization enabled > [12:20:50]W: [Step 8/8] I0217 12:20:50.847642 32517 hierarchical.cpp:144] > Initialized hierarchical allocator process > [12:20:50]W: [Step 8/8] I0217 12:20:50.847646 32519 > whitelist_watcher.cpp:77] No whitelist given > [12:20:50]W: [Step 8/8] I0217 12:20:50.847795 32514 leveldb.cpp:304] > Persisting metadata (8 bytes) to leveldb took 1.368308ms > [12:20:50]W: [Step 8/8] I0217 12:20:50.847825 32514 replica.cpp:320] > Persisted replica status to STARTING > [12:20:50]W: [Step 8/8] I0217 12:20:50.848002 32512 recover.cpp:473] > Replica is in STARTING status > [12:20:50]W: [Step 8/8] I0217 12:20:50.849025
[jira] [Created] (MESOS-4692) FetcherCacheHttpTest.HttpCachedSerialized flaky again.
Bernd Mathiske created MESOS-4692: - Summary: FetcherCacheHttpTest.HttpCachedSerialized flaky again. Key: MESOS-4692 URL: https://issues.apache.org/jira/browse/MESOS-4692 Project: Mesos Issue Type: Bug Components: fetcher, test Environment: CentOS 7, plain Reporter: Bernd Mathiske Priority: Minor {noformat} [12:20:50] : [Step 8/8] [ RUN ] FetcherCacheHttpTest.HttpCachedSerialized [12:20:50]W: [Step 8/8] I0217 12:20:50.842162 32498 leveldb.cpp:174] Opened db in 4.973489ms [12:20:50]W: [Step 8/8] I0217 12:20:50.843670 32498 leveldb.cpp:181] Compacted db in 1.48087ms [12:20:50]W: [Step 8/8] I0217 12:20:50.843709 32498 leveldb.cpp:196] Created db iterator in 15661ns [12:20:50]W: [Step 8/8] I0217 12:20:50.843720 32498 leveldb.cpp:202] Seeked to beginning of db in 1401ns [12:20:50]W: [Step 8/8] I0217 12:20:50.843729 32498 leveldb.cpp:271] Iterated through 0 keys in the db in 357ns [12:20:50]W: [Step 8/8] I0217 12:20:50.843760 32498 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned [12:20:50]W: [Step 8/8] I0217 12:20:50.844228 32513 recover.cpp:447] Starting replica recovery [12:20:50]W: [Step 8/8] I0217 12:20:50.844411 32513 recover.cpp:473] Replica is in EMPTY status [12:20:50]W: [Step 8/8] I0217 12:20:50.845355 32516 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (2089)@172.30.2.21:33004 [12:20:50]W: [Step 8/8] I0217 12:20:50.845825 32518 recover.cpp:193] Received a recover response from a replica in EMPTY status [12:20:50]W: [Step 8/8] I0217 12:20:50.846307 32517 recover.cpp:564] Updating replica status to STARTING [12:20:50]W: [Step 8/8] I0217 12:20:50.846789 32518 master.cpp:374] Master 0941887d-60f1-4ff3-85f0-5d19ffee8005 (ip-172-30-2-21.mesosphere.io) started on 172.30.2.21:33004 [12:20:50]W: [Step 8/8] I0217 12:20:50.846810 32518 master.cpp:376] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/YFwdSN/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="25secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/YFwdSN/master" --zk_session_timeout="10secs" [12:20:50]W: [Step 8/8] I0217 12:20:50.847057 32518 master.cpp:421] Master only allowing authenticated frameworks to register [12:20:50]W: [Step 8/8] I0217 12:20:50.847066 32518 master.cpp:426] Master only allowing authenticated slaves to register [12:20:50]W: [Step 8/8] I0217 12:20:50.847072 32518 credentials.hpp:35] Loading credentials for authentication from '/tmp/YFwdSN/credentials' [12:20:50]W: [Step 8/8] I0217 12:20:50.847286 32518 master.cpp:466] Using default 'crammd5' authenticator [12:20:50]W: [Step 8/8] I0217 12:20:50.847395 32518 master.cpp:535] Using default 'basic' HTTP authenticator [12:20:50]W: [Step 8/8] I0217 12:20:50.847511 32518 master.cpp:569] Authorization enabled [12:20:50]W: [Step 8/8] I0217 12:20:50.847642 32517 hierarchical.cpp:144] Initialized hierarchical allocator process [12:20:50]W: [Step 8/8] I0217 12:20:50.847646 32519 whitelist_watcher.cpp:77] No whitelist given [12:20:50]W: [Step 8/8] I0217 12:20:50.847795 32514 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.368308ms [12:20:50]W: [Step 8/8] I0217 12:20:50.847825 32514 replica.cpp:320] Persisted replica status to STARTING [12:20:50]W: [Step 8/8] I0217 12:20:50.848002 32512 recover.cpp:473] Replica is in STARTING status [12:20:50]W: [Step 8/8] I0217 12:20:50.849025 32516 master.cpp:1710] The newly elected leader is master@172.30.2.21:33004 with id 0941887d-60f1-4ff3-85f0-5d19ffee8005 [12:20:50]W: [Step 8/8] I0217 12:20:50.849047 32516 master.cpp:1723] Elected as the leading master! [12:20:50]W: [Step 8/8] I0217 12:20:50.849061 32516 master.cpp:1468] Recovering from registrar [12:20:50]W: [Step 8/8] I0217 12:20:50.849055 32515 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (2091)@172.30.2.21:33004 [12:20:50]W: [Step 8/8] I0217 12:20:50.849172 32518 registrar.cpp:307] Recovering
[jira] [Updated] (MESOS-4615) ContainerLoggerTest.DefaultToSandbox is flaky
[ https://issues.apache.org/jira/browse/MESOS-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4615: -- Shepherd: Bernd Mathiske > ContainerLoggerTest.DefaultToSandbox is flaky > - > > Key: MESOS-4615 > URL: https://issues.apache.org/jira/browse/MESOS-4615 > Project: Mesos > Issue Type: Bug > Components: tests >Affects Versions: 0.27.0 > Environment: CentOS 7, gcc, libevent & SSL enabled >Reporter: Greg Mann >Assignee: Joseph Wu > Labels: flaky-test, logger, mesosphere > > Just saw this failure on the ASF CI: > {code} > [ RUN ] ContainerLoggerTest.DefaultToSandbox > I0206 01:25:03.766458 2824 leveldb.cpp:174] Opened db in 72.979786ms > I0206 01:25:03.811712 2824 leveldb.cpp:181] Compacted db in 45.162067ms > I0206 01:25:03.811810 2824 leveldb.cpp:196] Created db iterator in 26090ns > I0206 01:25:03.811828 2824 leveldb.cpp:202] Seeked to beginning of db in > 3173ns > I0206 01:25:03.811839 2824 leveldb.cpp:271] Iterated through 0 keys in the > db in 497ns > I0206 01:25:03.811900 2824 replica.cpp:779] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0206 01:25:03.812785 2849 recover.cpp:447] Starting replica recovery > I0206 01:25:03.813043 2849 recover.cpp:473] Replica is in EMPTY status > I0206 01:25:03.814668 2854 replica.cpp:673] Replica in EMPTY status received > a broadcasted recover request from (371)@172.17.0.8:37843 > I0206 01:25:03.815210 2849 recover.cpp:193] Received a recover response from > a replica in EMPTY status > I0206 01:25:03.815732 2854 recover.cpp:564] Updating replica status to > STARTING > I0206 01:25:03.819664 2857 master.cpp:376] Master > 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de (74ef606c4063) started on > 172.17.0.8:37843 > I0206 01:25:03.819703 2857 master.cpp:378] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/h5vu5I/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="100secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" > --work_dir="/tmp/h5vu5I/master" --zk_session_timeout="10secs" > I0206 01:25:03.820241 2857 master.cpp:423] Master only allowing > authenticated frameworks to register > I0206 01:25:03.820257 2857 master.cpp:428] Master only allowing > authenticated slaves to register > I0206 01:25:03.820269 2857 credentials.hpp:35] Loading credentials for > authentication from '/tmp/h5vu5I/credentials' > I0206 01:25:03.821110 2857 master.cpp:468] Using default 'crammd5' > authenticator > I0206 01:25:03.821311 2857 master.cpp:537] Using default 'basic' HTTP > authenticator > I0206 01:25:03.821636 2857 master.cpp:571] Authorization enabled > I0206 01:25:03.821979 2846 hierarchical.cpp:144] Initialized hierarchical > allocator process > I0206 01:25:03.822057 2846 whitelist_watcher.cpp:77] No whitelist given > I0206 01:25:03.825460 2847 master.cpp:1712] The newly elected leader is > master@172.17.0.8:37843 with id 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de > I0206 01:25:03.825512 2847 master.cpp:1725] Elected as the leading master! > I0206 01:25:03.825533 2847 master.cpp:1470] Recovering from registrar > I0206 01:25:03.825835 2847 registrar.cpp:307] Recovering registrar > I0206 01:25:03.848212 2854 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 32.226093ms > I0206 01:25:03.848299 2854 replica.cpp:320] Persisted replica status to > STARTING > I0206 01:25:03.848702 2854 recover.cpp:473] Replica is in STARTING status > I0206 01:25:03.850728 2858 replica.cpp:673] Replica in STARTING status > received a broadcasted recover request from (373)@172.17.0.8:37843 > I0206 01:25:03.851230 2854 recover.cpp:193] Received a recover response from > a replica in STARTING status > I0206 01:25:03.852018 2854 recover.cpp:564] Updating replica status to VOTING > I0206 01:25:03.881681 2854 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 29.184163ms > I0206 01:25:03.881772 2854 replica.cpp:320] Persisted replica status to > VOTING > I0206 01:25:03.882058
[jira] [Commented] (MESOS-4631) Document how to use custom authentication modules
[ https://issues.apache.org/jira/browse/MESOS-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149081#comment-15149081 ] Bernd Mathiske commented on MESOS-4631: --- Till is on vacation this week. > Document how to use custom authentication modules > - > > Key: MESOS-4631 > URL: https://issues.apache.org/jira/browse/MESOS-4631 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Neil Conway >Priority: Minor > Labels: authentication, documentation, mesosphere > > The authentication doc page talks about custom authentication modules a bit, > but doesn't give enough information. For example: > * What interface does a custom authentication module need to satisfy? > * Can multiple authentication modules be used? > * How do I implement a framework that authenticates with a master that uses a > non-default authentication module, e.g., one that doesn't use credentials? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4676) ROOT_DOCKER_Logs is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4676: -- Sprint: Mesosphere Sprint 29 > ROOT_DOCKER_Logs is flaky. > -- > > Key: MESOS-4676 > URL: https://issues.apache.org/jira/browse/MESOS-4676 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27 > Environment: CentOS 7 with SSL. >Reporter: Bernd Mathiske > Labels: flaky, mesosphere, test > > {noformat} > [18:06:25][Step 8/8] [ RUN ] DockerContainerizerTest.ROOT_DOCKER_Logs > [18:06:25][Step 8/8] I0215 17:06:25.256103 1740 leveldb.cpp:174] Opened db > in 6.548327ms > [18:06:25][Step 8/8] I0215 17:06:25.258002 1740 leveldb.cpp:181] Compacted > db in 1.837816ms > [18:06:25][Step 8/8] I0215 17:06:25.258059 1740 leveldb.cpp:196] Created db > iterator in 22044ns > [18:06:25][Step 8/8] I0215 17:06:25.258076 1740 leveldb.cpp:202] Seeked to > beginning of db in 2347ns > [18:06:25][Step 8/8] I0215 17:06:25.258091 1740 leveldb.cpp:271] Iterated > through 0 keys in the db in 571ns > [18:06:25][Step 8/8] I0215 17:06:25.258152 1740 replica.cpp:779] Replica > recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [18:06:25][Step 8/8] I0215 17:06:25.258936 1758 recover.cpp:447] Starting > replica recovery > [18:06:25][Step 8/8] I0215 17:06:25.259177 1758 recover.cpp:473] Replica is > in EMPTY status > [18:06:25][Step 8/8] I0215 17:06:25.260327 1757 replica.cpp:673] Replica in > EMPTY status received a broadcasted recover request from > (13608)@172.30.2.239:39785 > [18:06:25][Step 8/8] I0215 17:06:25.260545 1758 recover.cpp:193] Received a > recover response from a replica in EMPTY status > [18:06:25][Step 8/8] I0215 17:06:25.261065 1757 master.cpp:376] Master > 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started > on 172.30.2.239:39785 > [18:06:25][Step 8/8] I0215 17:06:25.261209 1761 recover.cpp:564] Updating > replica status to STARTING > [18:06:25][Step 8/8] I0215 17:06:25.261086 1757 master.cpp:378] Flags at > startup: --acls="" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate="true" > --authenticate_http="true" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="100secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" > --zk_session_timeout="10secs" > [18:06:25][Step 8/8] I0215 17:06:25.261446 1757 master.cpp:423] Master only > allowing authenticated frameworks to register > [18:06:25][Step 8/8] I0215 17:06:25.261456 1757 master.cpp:428] Master only > allowing authenticated slaves to register > [18:06:25][Step 8/8] I0215 17:06:25.261462 1757 credentials.hpp:35] Loading > credentials for authentication from '/tmp/HncLLj/credentials' > [18:06:25][Step 8/8] I0215 17:06:25.261723 1757 master.cpp:468] Using > default 'crammd5' authenticator > [18:06:25][Step 8/8] I0215 17:06:25.261855 1757 master.cpp:537] Using > default 'basic' HTTP authenticator > [18:06:25][Step 8/8] I0215 17:06:25.262022 1757 master.cpp:571] > Authorization enabled > [18:06:25][Step 8/8] I0215 17:06:25.262177 1755 hierarchical.cpp:144] > Initialized hierarchical allocator process > [18:06:25][Step 8/8] I0215 17:06:25.262177 1758 whitelist_watcher.cpp:77] No > whitelist given > [18:06:25][Step 8/8] I0215 17:06:25.262899 1760 leveldb.cpp:304] Persisting > metadata (8 bytes) to leveldb took 1.517992ms > [18:06:25][Step 8/8] I0215 17:06:25.262924 1760 replica.cpp:320] Persisted > replica status to STARTING > [18:06:25][Step 8/8] I0215 17:06:25.263144 1754 recover.cpp:473] Replica is > in STARTING status > [18:06:25][Step 8/8] I0215 17:06:25.264010 1757 master.cpp:1712] The newly > elected leader is master@172.30.2.239:39785 with id > 112363e2-c680-4946-8fee-d0626ed8b21e > [18:06:25][Step 8/8] I0215 17:06:25.264044 1757 master.cpp:1725] Elected as > the leading master! > [18:06:25][Step 8/8] I0215 17:06:25.264061 1757 master.cpp:1470] Recovering > from registrar > [18:06:25][Step 8/8] I0215 17:06:25.264117 1760 replica.cpp:673] Replica in > STARTING status received a broadcasted recover request
[jira] [Created] (MESOS-4677) LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.
Bernd Mathiske created MESOS-4677: - Summary: LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky. Key: MESOS-4677 URL: https://issues.apache.org/jira/browse/MESOS-4677 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.27 Reporter: Bernd Mathiske This test fails very often when run on CentOS 7, but may also fail elsewhere sometimes. Unfortunately, it tends to only fail when --verbose is not set. The output is this: {noformat} [21:45:21][Step 8/8] [ RUN ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids [21:45:21][Step 8/8] ../../src/tests/containerizer/isolator_tests.cpp:807: Failure [21:45:21][Step 8/8] Value of: usage.get().threads() [21:45:21][Step 8/8] Actual: 0 [21:45:21][Step 8/8] Expected: 1U [21:45:21][Step 8/8] Which is: 1 [21:45:21][Step 8/8] [ FAILED ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids (94 ms) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4676) ROOT_DOCKER_Logs is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4676: -- Environment: CentOS 7 with SSL. (was: CentOS 6 with SSL.) > ROOT_DOCKER_Logs is flaky. > -- > > Key: MESOS-4676 > URL: https://issues.apache.org/jira/browse/MESOS-4676 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27 > Environment: CentOS 7 with SSL. >Reporter: Bernd Mathiske > Labels: flaky, mesosphere, test > > {noformat} > [18:06:25][Step 8/8] [ RUN ] DockerContainerizerTest.ROOT_DOCKER_Logs > [18:06:25][Step 8/8] I0215 17:06:25.256103 1740 leveldb.cpp:174] Opened db > in 6.548327ms > [18:06:25][Step 8/8] I0215 17:06:25.258002 1740 leveldb.cpp:181] Compacted > db in 1.837816ms > [18:06:25][Step 8/8] I0215 17:06:25.258059 1740 leveldb.cpp:196] Created db > iterator in 22044ns > [18:06:25][Step 8/8] I0215 17:06:25.258076 1740 leveldb.cpp:202] Seeked to > beginning of db in 2347ns > [18:06:25][Step 8/8] I0215 17:06:25.258091 1740 leveldb.cpp:271] Iterated > through 0 keys in the db in 571ns > [18:06:25][Step 8/8] I0215 17:06:25.258152 1740 replica.cpp:779] Replica > recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > [18:06:25][Step 8/8] I0215 17:06:25.258936 1758 recover.cpp:447] Starting > replica recovery > [18:06:25][Step 8/8] I0215 17:06:25.259177 1758 recover.cpp:473] Replica is > in EMPTY status > [18:06:25][Step 8/8] I0215 17:06:25.260327 1757 replica.cpp:673] Replica in > EMPTY status received a broadcasted recover request from > (13608)@172.30.2.239:39785 > [18:06:25][Step 8/8] I0215 17:06:25.260545 1758 recover.cpp:193] Received a > recover response from a replica in EMPTY status > [18:06:25][Step 8/8] I0215 17:06:25.261065 1757 master.cpp:376] Master > 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started > on 172.30.2.239:39785 > [18:06:25][Step 8/8] I0215 17:06:25.261209 1761 recover.cpp:564] Updating > replica status to STARTING > [18:06:25][Step 8/8] I0215 17:06:25.261086 1757 master.cpp:378] Flags at > startup: --acls="" --allocation_interval="1secs" > --allocator="HierarchicalDRF" --authenticate="true" > --authenticate_http="true" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="100secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" > --zk_session_timeout="10secs" > [18:06:25][Step 8/8] I0215 17:06:25.261446 1757 master.cpp:423] Master only > allowing authenticated frameworks to register > [18:06:25][Step 8/8] I0215 17:06:25.261456 1757 master.cpp:428] Master only > allowing authenticated slaves to register > [18:06:25][Step 8/8] I0215 17:06:25.261462 1757 credentials.hpp:35] Loading > credentials for authentication from '/tmp/HncLLj/credentials' > [18:06:25][Step 8/8] I0215 17:06:25.261723 1757 master.cpp:468] Using > default 'crammd5' authenticator > [18:06:25][Step 8/8] I0215 17:06:25.261855 1757 master.cpp:537] Using > default 'basic' HTTP authenticator > [18:06:25][Step 8/8] I0215 17:06:25.262022 1757 master.cpp:571] > Authorization enabled > [18:06:25][Step 8/8] I0215 17:06:25.262177 1755 hierarchical.cpp:144] > Initialized hierarchical allocator process > [18:06:25][Step 8/8] I0215 17:06:25.262177 1758 whitelist_watcher.cpp:77] No > whitelist given > [18:06:25][Step 8/8] I0215 17:06:25.262899 1760 leveldb.cpp:304] Persisting > metadata (8 bytes) to leveldb took 1.517992ms > [18:06:25][Step 8/8] I0215 17:06:25.262924 1760 replica.cpp:320] Persisted > replica status to STARTING > [18:06:25][Step 8/8] I0215 17:06:25.263144 1754 recover.cpp:473] Replica is > in STARTING status > [18:06:25][Step 8/8] I0215 17:06:25.264010 1757 master.cpp:1712] The newly > elected leader is master@172.30.2.239:39785 with id > 112363e2-c680-4946-8fee-d0626ed8b21e > [18:06:25][Step 8/8] I0215 17:06:25.264044 1757 master.cpp:1725] Elected as > the leading master! > [18:06:25][Step 8/8] I0215 17:06:25.264061 1757 master.cpp:1470] Recovering > from registrar > [18:06:25][Step 8/8] I0215 17:06:25.264117 1760 replica.cpp:673] Replica in > STARTING status received a
[jira] [Created] (MESOS-4676) ROOT_DOCKER_Logs is flaky.
Bernd Mathiske created MESOS-4676: - Summary: ROOT_DOCKER_Logs is flaky. Key: MESOS-4676 URL: https://issues.apache.org/jira/browse/MESOS-4676 Project: Mesos Issue Type: Bug Affects Versions: 0.27 Environment: CentOS 6 with SSL. Reporter: Bernd Mathiske {noformat} [18:06:25][Step 8/8] [ RUN ] DockerContainerizerTest.ROOT_DOCKER_Logs [18:06:25][Step 8/8] I0215 17:06:25.256103 1740 leveldb.cpp:174] Opened db in 6.548327ms [18:06:25][Step 8/8] I0215 17:06:25.258002 1740 leveldb.cpp:181] Compacted db in 1.837816ms [18:06:25][Step 8/8] I0215 17:06:25.258059 1740 leveldb.cpp:196] Created db iterator in 22044ns [18:06:25][Step 8/8] I0215 17:06:25.258076 1740 leveldb.cpp:202] Seeked to beginning of db in 2347ns [18:06:25][Step 8/8] I0215 17:06:25.258091 1740 leveldb.cpp:271] Iterated through 0 keys in the db in 571ns [18:06:25][Step 8/8] I0215 17:06:25.258152 1740 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned [18:06:25][Step 8/8] I0215 17:06:25.258936 1758 recover.cpp:447] Starting replica recovery [18:06:25][Step 8/8] I0215 17:06:25.259177 1758 recover.cpp:473] Replica is in EMPTY status [18:06:25][Step 8/8] I0215 17:06:25.260327 1757 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (13608)@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.260545 1758 recover.cpp:193] Received a recover response from a replica in EMPTY status [18:06:25][Step 8/8] I0215 17:06:25.261065 1757 master.cpp:376] Master 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started on 172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.261209 1761 recover.cpp:564] Updating replica status to STARTING [18:06:25][Step 8/8] I0215 17:06:25.261086 1757 master.cpp:378] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" --zk_session_timeout="10secs" [18:06:25][Step 8/8] I0215 17:06:25.261446 1757 master.cpp:423] Master only allowing authenticated frameworks to register [18:06:25][Step 8/8] I0215 17:06:25.261456 1757 master.cpp:428] Master only allowing authenticated slaves to register [18:06:25][Step 8/8] I0215 17:06:25.261462 1757 credentials.hpp:35] Loading credentials for authentication from '/tmp/HncLLj/credentials' [18:06:25][Step 8/8] I0215 17:06:25.261723 1757 master.cpp:468] Using default 'crammd5' authenticator [18:06:25][Step 8/8] I0215 17:06:25.261855 1757 master.cpp:537] Using default 'basic' HTTP authenticator [18:06:25][Step 8/8] I0215 17:06:25.262022 1757 master.cpp:571] Authorization enabled [18:06:25][Step 8/8] I0215 17:06:25.262177 1755 hierarchical.cpp:144] Initialized hierarchical allocator process [18:06:25][Step 8/8] I0215 17:06:25.262177 1758 whitelist_watcher.cpp:77] No whitelist given [18:06:25][Step 8/8] I0215 17:06:25.262899 1760 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 1.517992ms [18:06:25][Step 8/8] I0215 17:06:25.262924 1760 replica.cpp:320] Persisted replica status to STARTING [18:06:25][Step 8/8] I0215 17:06:25.263144 1754 recover.cpp:473] Replica is in STARTING status [18:06:25][Step 8/8] I0215 17:06:25.264010 1757 master.cpp:1712] The newly elected leader is master@172.30.2.239:39785 with id 112363e2-c680-4946-8fee-d0626ed8b21e [18:06:25][Step 8/8] I0215 17:06:25.264044 1757 master.cpp:1725] Elected as the leading master! [18:06:25][Step 8/8] I0215 17:06:25.264061 1757 master.cpp:1470] Recovering from registrar [18:06:25][Step 8/8] I0215 17:06:25.264117 1760 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (13610)@172.30.2.239:39785 [18:06:25][Step 8/8] I0215 17:06:25.264197 1758 registrar.cpp:307] Recovering registrar [18:06:25][Step 8/8] I0215 17:06:25.264827 1756 recover.cpp:193] Received a recover response from a replica in STARTING status [18:06:25][Step 8/8] I0215 17:06:25.265219 1757 recover.cpp:564] Updating replica status to VOTING [18:06:25][Step 8/8]
[jira] [Created] (MESOS-4674) Linux filesystem isolator tests are flaky.
Bernd Mathiske created MESOS-4674: - Summary: Linux filesystem isolator tests are flaky. Key: MESOS-4674 URL: https://issues.apache.org/jira/browse/MESOS-4674 Project: Mesos Issue Type: Bug Components: testing, flaky Affects Versions: 0.27 Environment: CentOS 7 (directly on an AWS instance) Reporter: Bernd Mathiske LinuxFilesystemIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem sometimes fails on CentOS 7 with this kind of output: {nofromat} ../../src/tests/containerizer/filesystem_isolator_tests.cpp:1054: Failure Failed to wait 2mins for launch {noformat} LinuxFilesystemIsolatorTest.ROOT_MultipleContainers often has this output: {nofromat} ../../src/tests/containerizer/filesystem_isolator_tests.cpp:1138: Failure Failed to wait 1mins for launch1 {noformat} Whether SSL is configured makes no difference. This test may also fail on other platforms, but more rarely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4075) Continue test suite execution across crashing tests.
[ https://issues.apache.org/jira/browse/MESOS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130038#comment-15130038 ] Bernd Mathiske commented on MESOS-4075: --- We can only estimate what will be more work in the long run: - patching test exclusion lists and restarting tests, etc. - fixing the test system once My bets are on the latter. > Continue test suite execution across crashing tests. > > > Key: MESOS-4075 > URL: https://issues.apache.org/jira/browse/MESOS-4075 > Project: Mesos > Issue Type: Improvement > Components: test >Affects Versions: 0.26.0 >Reporter: Bernd Mathiske > Labels: mesosphere > > Currently, mesos-tests.sh exits when a test crashes. This is inconvenient > when trying to find out all tests that fail. > mesos-tests.sh should rate a test that crashes as failed and continue the > same way as if the test merely returned with a failure result and exited > properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3552) CHECK failure due to floating point precision on reservation request
[ https://issues.apache.org/jira/browse/MESOS-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3552: -- Sprint: Mesosphere Sprint 28 Story Points: 3 > CHECK failure due to floating point precision on reservation request > > > Key: MESOS-3552 > URL: https://issues.apache.org/jira/browse/MESOS-3552 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Mandeep Chadha >Assignee: Mandeep Chadha > Labels: mesosphere, tech-debt > Fix For: 0.26.0 > > > result.cpus() == cpus() check is failing due to ( double == double ) > comparison problem. > Root Cause : > Framework requested 0.1 cpu reservation for the first task. So far so good. > Next Reserve operation — lead to double operations resulting in following > double values : > results.cpus() : 23.9964472863211995 cpus() : 24 > And the check ( result.cpus() == cpus() ) failed. > The double arithmetic operations caused results.cpus() value to be : > 23.9964472863211995 and hence ( 23.9964472863211995 > == 24 ) failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1790) Add "chown" option to CommandInfo.URI
[ https://issues.apache.org/jira/browse/MESOS-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-1790: -- Sprint: (was: Mesosphere Sprint 27) > Add "chown" option to CommandInfo.URI > - > > Key: MESOS-1790 > URL: https://issues.apache.org/jira/browse/MESOS-1790 > Project: Mesos > Issue Type: Improvement >Reporter: Vinod Kone >Assignee: Jim Klucar > Labels: myriad, newbie > Attachments: > 0001-MESOS-1790-Adds-chown-option-to-CommandInfo.URI.patch > > > Mesos fetcher always chown()s the extracted executor URIs as the executor > user but sometimes this is not desirable, e.g., "setuid" bit gets lost during > chown() if slave/fetcher is running as root. > It would be nice to give frameworks the ability to skip the chown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3568) The State (/state) endpoint should be documented
[ https://issues.apache.org/jira/browse/MESOS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3568: -- Assignee: Kevin Klues (was: Michael Park) > The State (/state) endpoint should be documented > > > Key: MESOS-3568 > URL: https://issues.apache.org/jira/browse/MESOS-3568 > Project: Mesos > Issue Type: Documentation > Components: documentation, master >Reporter: James Fisher >Assignee: Kevin Klues > Labels: documentation, mesosphere, newbie, tech-debt > > Our tests are using a resource `/state.json` hosted by the Mesos master. I > have searched for the documentation for this resource but have been unable to > find anything. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4368) Make HierarchicalAllocatorProcess set a Resource's active role during allocation
[ https://issues.apache.org/jira/browse/MESOS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4368: -- Assignee: (was: Jan Schlicht) > Make HierarchicalAllocatorProcess set a Resource's active role during > allocation > > > Key: MESOS-4368 > URL: https://issues.apache.org/jira/browse/MESOS-4368 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Benjamin Bannier > Labels: mesosphere > > The concrete implementation here depends on the implementation strategy used > to solve MESOS-4367. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4368) Make HierarchicalAllocatorProcess set a Resource's active role during allocation
[ https://issues.apache.org/jira/browse/MESOS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126230#comment-15126230 ] Bernd Mathiske commented on MESOS-4368: --- Postponed? > Make HierarchicalAllocatorProcess set a Resource's active role during > allocation > > > Key: MESOS-4368 > URL: https://issues.apache.org/jira/browse/MESOS-4368 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Benjamin Bannier >Assignee: Jan Schlicht > Labels: mesosphere > > The concrete implementation here depends on the implementation strategy used > to solve MESOS-4367. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4368) Make HierarchicalAllocatorProcess set a Resource's active role during allocation
[ https://issues.apache.org/jira/browse/MESOS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4368: -- Assignee: Jan Schlicht > Make HierarchicalAllocatorProcess set a Resource's active role during > allocation > > > Key: MESOS-4368 > URL: https://issues.apache.org/jira/browse/MESOS-4368 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Benjamin Bannier >Assignee: Jan Schlicht > Labels: mesosphere > > The concrete implementation here depends on the implementation strategy used > to solve MESOS-4367. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3787) Expand environment variables through the Docker executor.
[ https://issues.apache.org/jira/browse/MESOS-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3787: -- Sprint: Mesosphere Sprint 26 (was: Mesosphere Sprint 26, Mesosphere Sprint 27) > Expand environment variables through the Docker executor. > - > > Key: MESOS-3787 > URL: https://issues.apache.org/jira/browse/MESOS-3787 > Project: Mesos > Issue Type: Wish >Reporter: John Garcia >Assignee: Adam B > Labels: mesosphere > Attachments: mesos.patch, test-example.json > > > We'd like to have expanded variables usable in [the json files used to create > a Marathon app, hence] the Task's CommandInfo, so that the executor is able > to detect the correct values at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4334) Add documentation for the registry
[ https://issues.apache.org/jira/browse/MESOS-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4334: -- Sprint: (was: Mesosphere Sprint 27) > Add documentation for the registry > -- > > Key: MESOS-4334 > URL: https://issues.apache.org/jira/browse/MESOS-4334 > Project: Mesos > Issue Type: Documentation > Components: documentation, master >Reporter: Neil Conway > Labels: documentation, mesosphere, registry > > What information does the master store in the registry? What do operators > need to know about managing the registry? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4156) Speed up FetcherCacheTest.* and FetcherCacheHttpTest.*
[ https://issues.apache.org/jira/browse/MESOS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112113#comment-15112113 ] Bernd Mathiske commented on MESOS-4156: --- Sure thing. > Speed up FetcherCacheTest.* and FetcherCacheHttpTest.* > -- > > Key: MESOS-4156 > URL: https://issues.apache.org/jira/browse/MESOS-4156 > Project: Mesos > Issue Type: Epic > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > Execution times on Mac OS 10.10.4: > {code} > FetcherCacheTest.LocalUncached (2417 ms) > FetcherCacheTest.LocalCached (2476 ms) > FetcherCacheTest.LocalUncachedExtract (2496 ms) > FetcherCacheTest.LocalCachedExtract (2471 ms) > FetcherCacheTest.SimpleEviction (4451 ms) > FetcherCacheTest.FallbackFromEviction (2483 ms) > FetcherCacheTest.RemoveLRUCacheEntries (3422 ms) > FetcherCacheHttpTest.HttpCachedSerialized (2490 ms) > FetcherCacheHttpTest.HttpCachedConcurrent (1032 ms) > FetcherCacheHttpTest.HttpMixed (1022 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4156) Speed up FetcherCacheTest.* and FetcherCacheHttpTest.*
[ https://issues.apache.org/jira/browse/MESOS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4156: -- Shepherd: Bernd Mathiske > Speed up FetcherCacheTest.* and FetcherCacheHttpTest.* > -- > > Key: MESOS-4156 > URL: https://issues.apache.org/jira/browse/MESOS-4156 > Project: Mesos > Issue Type: Epic > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > Execution times on Mac OS 10.10.4: > {code} > FetcherCacheTest.LocalUncached (2417 ms) > FetcherCacheTest.LocalCached (2476 ms) > FetcherCacheTest.LocalUncachedExtract (2496 ms) > FetcherCacheTest.LocalCachedExtract (2471 ms) > FetcherCacheTest.SimpleEviction (4451 ms) > FetcherCacheTest.FallbackFromEviction (2483 ms) > FetcherCacheTest.RemoveLRUCacheEntries (3422 ms) > FetcherCacheHttpTest.HttpCachedSerialized (2490 ms) > FetcherCacheHttpTest.HttpCachedConcurrent (1032 ms) > FetcherCacheHttpTest.HttpMixed (1022 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3854) Finalize design for generalized Authorizer interface
[ https://issues.apache.org/jira/browse/MESOS-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3854: -- Description: Finalize the structure of ACLs and achieve consensus on the design doc proposed in MESOS-2949. https://docs.google.com/document/d/1-XARWJFUq0r_TgRHz_472NvLZNjbqE4G8c2JL44OSMQ/edit was:Finalize the structure of ACLs and achieve consensus on the design doc proposed in MESOS-2949. > Finalize design for generalized Authorizer interface > > > Key: MESOS-3854 > URL: https://issues.apache.org/jira/browse/MESOS-3854 > Project: Mesos > Issue Type: Task > Components: security >Reporter: Bernd Mathiske >Assignee: Alexander Rojas > Labels: authorization, mesosphere > > Finalize the structure of ACLs and achieve consensus on the design doc > proposed in MESOS-2949. > https://docs.google.com/document/d/1-XARWJFUq0r_TgRHz_472NvLZNjbqE4G8c2JL44OSMQ/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4156) Speed up FetcherCacheTest.* and FetcherCacheHttpTest.*
[ https://issues.apache.org/jira/browse/MESOS-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1557#comment-1557 ] Bernd Mathiske commented on MESOS-4156: --- OK for LocalUncached, but LocalCached requires at least 2 rounds to verify the caching works as expected. > Speed up FetcherCacheTest.* and FetcherCacheHttpTest.* > -- > > Key: MESOS-4156 > URL: https://issues.apache.org/jira/browse/MESOS-4156 > Project: Mesos > Issue Type: Epic > Components: technical debt, test >Reporter: Alexander Rukletsov >Assignee: haosdent >Priority: Minor > Labels: mesosphere, newbie++, tech-debt > > Execution times on Mac OS 10.10.4: > {code} > FetcherCacheTest.LocalUncached (2417 ms) > FetcherCacheTest.LocalCached (2476 ms) > FetcherCacheTest.LocalUncachedExtract (2496 ms) > FetcherCacheTest.LocalCachedExtract (2471 ms) > FetcherCacheTest.SimpleEviction (4451 ms) > FetcherCacheTest.FallbackFromEviction (2483 ms) > FetcherCacheTest.RemoveLRUCacheEntries (3422 ms) > FetcherCacheHttpTest.HttpCachedSerialized (2490 ms) > FetcherCacheHttpTest.HttpCachedConcurrent (1032 ms) > FetcherCacheHttpTest.HttpMixed (1022 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4417) Prevent allocator from crashing on successful recovery.
[ https://issues.apache.org/jira/browse/MESOS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4417: -- Description: There might be a bug that may crash the master as pointed out by [~bmahler] in https://reviews.apache.org/r/4/: {noformat} It looks like if we trip the resume call in addSlave, this delayed resume will crash the master due to the CHECK(paused) that currently resides in resume. {noformat} was:There might be a bug that may crash the master as pointed out by [~bmahler] in https://reviews.apache.org/r/4/. > Prevent allocator from crashing on successful recovery. > --- > > Key: MESOS-4417 > URL: https://issues.apache.org/jira/browse/MESOS-4417 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov >Priority: Blocker > Labels: mesosphere > > There might be a bug that may crash the master as pointed out by [~bmahler] > in https://reviews.apache.org/r/4/: > {noformat} > It looks like if we trip the resume call in addSlave, this delayed resume > will crash the master due to the CHECK(paused) that currently resides in > resume. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4417) Prevent allocator from crashing on successful recovery.
[ https://issues.apache.org/jira/browse/MESOS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4417: -- Description: There might be a bug that may crash the master as pointed out by [~bmahler] in https://reviews.apache.org/r/4/: {noformat} It looks like if we trip the resume call in addSlave, this delayed resume will crash the master due to the CHECK(paused) that currently resides in resume. {noformat} was: There might be a bug that may crash the master as pointed out by [~bmahler] in https://reviews.apache.org/r/4/: {noformat} It looks like if we trip the resume call in addSlave, this delayed resume will crash the master due to the CHECK(paused) that currently resides in resume. {noformat} > Prevent allocator from crashing on successful recovery. > --- > > Key: MESOS-4417 > URL: https://issues.apache.org/jira/browse/MESOS-4417 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov >Priority: Blocker > Labels: mesosphere > > There might be a bug that may crash the master as pointed out by [~bmahler] > in https://reviews.apache.org/r/4/: > {noformat} > It looks like if we trip the resume call in addSlave, this delayed resume > will crash the master > due to the CHECK(paused) that currently resides in resume. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.
[ https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105367#comment-15105367 ] Bernd Mathiske commented on MESOS-4392: --- Good point/questions: Should we allow resources beyond the limit as long as they are revocable? Should resources up to the limit be non-revocable by default? > Balance quota frameworks with non-quota, greedy frameworks. > --- > > Key: MESOS-4392 > URL: https://issues.apache.org/jira/browse/MESOS-4392 > Project: Mesos > Issue Type: Epic > Components: allocation, master >Reporter: Bernd Mathiske >Assignee: Alexander Rukletsov > Labels: mesosphere > > Maximize resource utilization and minimize starvation risk for both quota > frameworks and non-quota, greedy frameworks when competing with each other. > A greedy analytics batch system wants to use as much of the cluster as > possible to maximize computational throughput. When a competing web service > with fixed task size starts up, there must be sufficient resources to run it > immediately. The operator can reserve these resources by setting quota. > However, if these resources are kept idle until the service is in use, this > is wasteful from the analytics job's point of view. On the other hand, the > analytics job should hand back reserved resources to the service when needed > to avoid starvation of the latter. > We can assume that often, the resources needed by the service will be of the > non-revocable variety. Here we need to introduce clearer distinctions between > oversubscribed and revocable resources that are not oversubscribed. An > oversubscribed resource cannot be converted into a non-revocable resource, > not even by preemption. In contrast, a non-oversubscribed, revocable resource > can be converted into a non-revocable resource. > Another related topic is optimistic offers. The pertinent aspect in this > context is again whether resources are oversubscribed or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.
[ https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105066#comment-15105066 ] Bernd Mathiske commented on MESOS-4392: --- Yes, I should clarify what I am trying to suggest here. Revoking a SINGLE accepted oversubscription offer for a resource cannot make the resource non-revocable, because another task may be holding on to its actual physical assets. Only revoking the "regular" offer that claims the same resource would create a clear enough picture to assign the resource as non-revocable to a new primary "owner". We'd then rely on the QoS mechanism to satisfy the needs of the latter in case a third, revocable offer were currently using the resource. It is unclear to me from this doc whether oversubscription can only occur when there is also one "regular" offer for the same resource: https://github.com/nqn/mesos/blob/niklas/oversubscription-user-doc/docs/oversubscription.md My guess would be that you can also have revocable resources only and still achieve oversubscription. In any case, once we make revocable the default this will be the case. Then the situation above will change slightly. Not having a goto "regular owner" offer that determines whether the resource is actually available, we can immediately hand out a non-revokable offer. QoS should then make the physical resource available on demand. Alternatively, maybe as a booster option to speed things up, we could provide a clean slate by revoking the entire oversubscription set. > Balance quota frameworks with non-quota, greedy frameworks. > --- > > Key: MESOS-4392 > URL: https://issues.apache.org/jira/browse/MESOS-4392 > Project: Mesos > Issue Type: Epic > Components: allocation, master >Reporter: Bernd Mathiske >Assignee: Alexander Rukletsov > Labels: mesosphere > > Maximize resource utilization and minimize starvation risk for both quota > frameworks and non-quota, greedy frameworks when competing with each other. > A greedy analytics batch system wants to use as much of the cluster as > possible to maximize computational throughput. When a competing web service > with fixed task size starts up, there must be sufficient resources to run it > immediately. The operator can reserve these resources by setting quota. > However, if these resources are kept idle until the service is in use, this > is wasteful from the analytics job's point of view. On the other hand, the > analytics job should hand back reserved resources to the service when needed > to avoid starvation of the latter. > We can assume that often, the resources needed by the service will be of the > non-revocable variety. Here we need to introduce clearer distinctions between > oversubscribed and revocable resources that are not oversubscribed. An > oversubscribed resource cannot be converted into a non-revocable resource, > not even by preemption. In contrast, a non-oversubscribed, revocable resource > can be converted into a non-revocable resource. > Another related topic is optimistic offers. The pertinent aspect in this > context is again whether resources are oversubscribed or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.
[ https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105028#comment-15105028 ] Bernd Mathiske commented on MESOS-4392: --- Except for you last sentence, which I do not understand, I agree. It makes sense if a framework uses only non-revocable resources up to its quota. Note that if it does not set a quota limit, it can still use resources beyond its guarantee and those resources we now want to be revocable by default. > Balance quota frameworks with non-quota, greedy frameworks. > --- > > Key: MESOS-4392 > URL: https://issues.apache.org/jira/browse/MESOS-4392 > Project: Mesos > Issue Type: Epic > Components: allocation, master >Reporter: Bernd Mathiske >Assignee: Alexander Rukletsov > Labels: mesosphere > > Maximize resource utilization and minimize starvation risk for both quota > frameworks and non-quota, greedy frameworks when competing with each other. > A greedy analytics batch system wants to use as much of the cluster as > possible to maximize computational throughput. When a competing web service > with fixed task size starts up, there must be sufficient resources to run it > immediately. The operator can reserve these resources by setting quota. > However, if these resources are kept idle until the service is in use, this > is wasteful from the analytics job's point of view. On the other hand, the > analytics job should hand back reserved resources to the service when needed > to avoid starvation of the latter. > We can assume that often, the resources needed by the service will be of the > non-revocable variety. Here we need to introduce clearer distinctions between > oversubscribed and revocable resources that are not oversubscribed. An > oversubscribed resource cannot be converted into a non-revocable resource, > not even by preemption. In contrast, a non-oversubscribed, revocable resource > can be converted into a non-revocable resource. > Another related topic is optimistic offers. The pertinent aspect in this > context is again whether resources are oversubscribed or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4130) Document how the fetcher can reach across a proxy connection.
[ https://issues.apache.org/jira/browse/MESOS-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4130: -- Attachment: signature.asc Submitted. > Document how the fetcher can reach across a proxy connection. > - > > Key: MESOS-4130 > URL: https://issues.apache.org/jira/browse/MESOS-4130 > Project: Mesos > Issue Type: Documentation > Components: fetcher >Reporter: Bernd Mathiske >Assignee: Shuai Lin > Labels: mesosphere, newbie > Attachments: signature.asc > > > The fetcher uses libcurl for downloading content from HTTP, HTTPS, etc. There > is no source code in the pertinent parts of "net.hpp" that deals with proxy > settings. However, libcurl automatically picks up certain environment > variables and adjusts its settings accordingly. See "man libcurl-tutorial" > for details. See section "Proxies", subsection "Environment Variables". If > you follow this recipe in your Mesos agent startup script, you can use a > proxy. > We should document this in the fetcher (cache) doc > (http://mesos.apache.org/documentation/latest/fetcher/). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4363) Add a roles field to FrameworkInfo
[ https://issues.apache.org/jira/browse/MESOS-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4363: -- Sprint: Mesosphere Sprint 27 Story Points: 1 > Add a roles field to FrameworkInfo > -- > > Key: MESOS-4363 > URL: https://issues.apache.org/jira/browse/MESOS-4363 > Project: Mesos > Issue Type: Improvement > Components: framework, master >Reporter: Benjamin Bannier >Assignee: Qian Zhang > Labels: mesosphere > > To represent multiple roles per framework a new repeated string field for > roles is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4363) Add a roles field to FrameworkInfo
[ https://issues.apache.org/jira/browse/MESOS-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4363: -- Component/s: master framework > Add a roles field to FrameworkInfo > -- > > Key: MESOS-4363 > URL: https://issues.apache.org/jira/browse/MESOS-4363 > Project: Mesos > Issue Type: Improvement > Components: framework, master >Reporter: Benjamin Bannier >Assignee: Qian Zhang > Labels: mesosphere > > To represent multiple roles per framework a new repeated string field for > roles is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.
[ https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4392: -- Epic Name: Revocable by default > Balance quota frameworks with non-quota, greedy frameworks. > --- > > Key: MESOS-4392 > URL: https://issues.apache.org/jira/browse/MESOS-4392 > Project: Mesos > Issue Type: Epic > Components: allocation, master >Reporter: Bernd Mathiske >Assignee: Alexander Rukletsov > Labels: mesosphere > > Maximize resource utilization and minimize starvation risk for both quota > frameworks and non-quota, greedy frameworks when competing with each other. > A greedy analytics batch system wants to use as much of the cluster as > possible to maximize computational throughput. When a competing web service > with fixed task size starts up, there must be sufficient resources to run it > immediately. The operator can reserve these resources by setting quota. > However, if these resources are kept idle until the service is in use, this > is wasteful from the analytics job's point of view. On the other hand, the > analytics job should hand back reserved resources to the service when needed > to avoid starvation of the latter. > We can assume that often, the resources needed by the service will be of the > non-revocable variety. Here we need to introduce clearer distinctions between > oversubscribed and revocable resources that are not oversubscribed. An > oversubscribed resource cannot be converted into a non-revocable resource, > not even by preemption. In contrast, a non-oversubscribed, revocable resource > can be converted into a non-revocable resource. > Another related topic is optimistic offers. The pertinent aspect in this > context is again whether resources are oversubscribed or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.
[ https://issues.apache.org/jira/browse/MESOS-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4392: -- Issue Type: Epic (was: Improvement) > Balance quota frameworks with non-quota, greedy frameworks. > --- > > Key: MESOS-4392 > URL: https://issues.apache.org/jira/browse/MESOS-4392 > Project: Mesos > Issue Type: Epic > Components: allocation, master >Reporter: Bernd Mathiske >Assignee: Alexander Rukletsov > Labels: mesosphere > > Maximize resource utilization and minimize starvation risk for both quota > frameworks and non-quota, greedy frameworks when competing with each other. > A greedy analytics batch system wants to use as much of the cluster as > possible to maximize computational throughput. When a competing web service > with fixed task size starts up, there must be sufficient resources to run it > immediately. The operator can reserve these resources by setting quota. > However, if these resources are kept idle until the service is in use, this > is wasteful from the analytics job's point of view. On the other hand, the > analytics job should hand back reserved resources to the service when needed > to avoid starvation of the latter. > We can assume that often, the resources needed by the service will be of the > non-revocable variety. Here we need to introduce clearer distinctions between > oversubscribed and revocable resources that are not oversubscribed. An > oversubscribed resource cannot be converted into a non-revocable resource, > not even by preemption. In contrast, a non-oversubscribed, revocable resource > can be converted into a non-revocable resource. > Another related topic is optimistic offers. The pertinent aspect in this > context is again whether resources are oversubscribed or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4393) Draft design document for resource revocability by default.
Bernd Mathiske created MESOS-4393: - Summary: Draft design document for resource revocability by default. Key: MESOS-4393 URL: https://issues.apache.org/jira/browse/MESOS-4393 Project: Mesos Issue Type: Task Components: allocation, master Reporter: Bernd Mathiske Assignee: Alexander Rukletsov Create a design document for setting offered resources as "revocable by default". Greedy frameworks can then temporarily use resources set aside to satisfy quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4392) Balance quota frameworks with non-quota, greedy frameworks.
Bernd Mathiske created MESOS-4392: - Summary: Balance quota frameworks with non-quota, greedy frameworks. Key: MESOS-4392 URL: https://issues.apache.org/jira/browse/MESOS-4392 Project: Mesos Issue Type: Improvement Components: allocation, master Reporter: Bernd Mathiske Assignee: Alexander Rukletsov Maximize resource utilization and minimize starvation risk for both quota frameworks and non-quota, greedy frameworks when competing with each other. A greedy analytics batch system wants to use as much of the cluster as possible to maximize computational throughput. When a competing web service with fixed task size starts up, there must be sufficient resources to run it immediately. The operator can reserve these resources by setting quota. However, if these resources are kept idle until the service is in use, this is wasteful from the analytics job's point of view. On the other hand, the analytics job should hand back reserved resources to the service when needed to avoid starvation of the latter. We can assume that often, the resources needed by the service will be of the non-revocable variety. Here we need to introduce clearer distinctions between oversubscribed and revocable resources that are not oversubscribed. An oversubscribed resource cannot be converted into a non-revocable resource, not even by preemption. In contrast, a non-oversubscribed, revocable resource can be converted into a non-revocable resource. Another related topic is optimistic offers. The pertinent aspect in this context is again whether resources are oversubscribed or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4304) hdfs operations fail due to prepended / on path for non-hdfs hadoop clients.
[ https://issues.apache.org/jira/browse/MESOS-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095903#comment-15095903 ] Bernd Mathiske commented on MESOS-4304: --- Roger. > hdfs operations fail due to prepended / on path for non-hdfs hadoop clients. > > > Key: MESOS-4304 > URL: https://issues.apache.org/jira/browse/MESOS-4304 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 0.26.0 >Reporter: James Cunningham > > This bug was resolved for the hdfs protocol for MESOS-3602 but since the > process checks for the "hdfs" protocol at the beginning of the URI, the fix > does not extend itself to non-hdfs hadoop clients. > {code} > I0107 01:22:01.259490 17678 logging.cpp:172] INFO level logging started! > I0107 01:22:01.259856 17678 fetcher.cpp:422] Fetcher Info: > {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/root","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"maprfs:\/\/\/mesos\/storm-mesos-0.9.3.tgz"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/s0121.stag.urbanairship.com:36373\/conf\/storm.yaml"}}],"sandbox_directory":"\/mnt\/data\/mesos\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/frameworks\/530dda5a-481a-4117-8154-3aee637d3b38-\/executors\/word-count-1-1452129714\/runs\/4443d5ac-d034-49b3-bf12-08fb9b0d92d0","user":"root"} > I0107 01:22:01.262171 17678 fetcher.cpp:377] Fetching URI > 'maprfs:///mesos/storm-mesos-0.9.3.tgz' > I0107 01:22:01.262212 17678 fetcher.cpp:248] Fetching directly into the > sandbox directory > I0107 01:22:01.262243 17678 fetcher.cpp:185] Fetching URI > 'maprfs:///mesos/storm-mesos-0.9.3.tgz' > I0107 01:22:01.671777 17678 fetcher.cpp:110] Downloading resource with Hadoop > client from 'maprfs:///mesos/storm-mesos-0.9.3.tgz' to > '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz' > copyToLocal: java.net.URISyntaxException: Expected scheme-specific part at > index 7: maprfs: > Usage: java FsShell [-copyToLocal [-ignoreCrc] [-crc] ] > E0107 01:22:02.435556 17678 shell.hpp:90] Command 'hadoop fs -copyToLocal > '/maprfs:///mesos/storm-mesos-0.9.3.tgz' > '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz'' > failed; this is the output: > Failed to fetch 'maprfs:///mesos/storm-mesos-0.9.3.tgz': HDFS copyToLocal > failed: Failed to execute 'hadoop fs -copyToLocal > '/maprfs:///mesos/storm-mesos-0.9.3.tgz' > '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz''; > the command was either not found or exited with a non-zero exit status: 255 > Failed to synchronize with slave (it's probably exited) > {code} > After a brief chat with [~jieyu], it was recommended to fix the current hdfs > client code because the new hadoop fetcher plugin is slated to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4304) hdfs operations fail due to prepended / on path for non-hdfs hadoop clients.
[ https://issues.apache.org/jira/browse/MESOS-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4304: -- Shepherd: Jie Yu > hdfs operations fail due to prepended / on path for non-hdfs hadoop clients. > > > Key: MESOS-4304 > URL: https://issues.apache.org/jira/browse/MESOS-4304 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 0.26.0 >Reporter: James Cunningham >Assignee: Bernd Mathiske > > This bug was resolved for the hdfs protocol for MESOS-3602 but since the > process checks for the "hdfs" protocol at the beginning of the URI, the fix > does not extend itself to non-hdfs hadoop clients. > {code} > I0107 01:22:01.259490 17678 logging.cpp:172] INFO level logging started! > I0107 01:22:01.259856 17678 fetcher.cpp:422] Fetcher Info: > {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/root","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"maprfs:\/\/\/mesos\/storm-mesos-0.9.3.tgz"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/s0121.stag.urbanairship.com:36373\/conf\/storm.yaml"}}],"sandbox_directory":"\/mnt\/data\/mesos\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/frameworks\/530dda5a-481a-4117-8154-3aee637d3b38-\/executors\/word-count-1-1452129714\/runs\/4443d5ac-d034-49b3-bf12-08fb9b0d92d0","user":"root"} > I0107 01:22:01.262171 17678 fetcher.cpp:377] Fetching URI > 'maprfs:///mesos/storm-mesos-0.9.3.tgz' > I0107 01:22:01.262212 17678 fetcher.cpp:248] Fetching directly into the > sandbox directory > I0107 01:22:01.262243 17678 fetcher.cpp:185] Fetching URI > 'maprfs:///mesos/storm-mesos-0.9.3.tgz' > I0107 01:22:01.671777 17678 fetcher.cpp:110] Downloading resource with Hadoop > client from 'maprfs:///mesos/storm-mesos-0.9.3.tgz' to > '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz' > copyToLocal: java.net.URISyntaxException: Expected scheme-specific part at > index 7: maprfs: > Usage: java FsShell [-copyToLocal [-ignoreCrc] [-crc] ] > E0107 01:22:02.435556 17678 shell.hpp:90] Command 'hadoop fs -copyToLocal > '/maprfs:///mesos/storm-mesos-0.9.3.tgz' > '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz'' > failed; this is the output: > Failed to fetch 'maprfs:///mesos/storm-mesos-0.9.3.tgz': HDFS copyToLocal > failed: Failed to execute 'hadoop fs -copyToLocal > '/maprfs:///mesos/storm-mesos-0.9.3.tgz' > '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz''; > the command was either not found or exited with a non-zero exit status: 255 > Failed to synchronize with slave (it's probably exited) > {code} > After a brief chat with [~jieyu], it was recommended to fix the current hdfs > client code because the new hadoop fetcher plugin is slated to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4304) hdfs operations fail due to prepended / on path for non-hdfs hadoop clients.
[ https://issues.apache.org/jira/browse/MESOS-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske reassigned MESOS-4304: - Assignee: Bernd Mathiske > hdfs operations fail due to prepended / on path for non-hdfs hadoop clients. > > > Key: MESOS-4304 > URL: https://issues.apache.org/jira/browse/MESOS-4304 > Project: Mesos > Issue Type: Bug > Components: fetcher >Affects Versions: 0.26.0 >Reporter: James Cunningham >Assignee: Bernd Mathiske > > This bug was resolved for the hdfs protocol for MESOS-3602 but since the > process checks for the "hdfs" protocol at the beginning of the URI, the fix > does not extend itself to non-hdfs hadoop clients. > {code} > I0107 01:22:01.259490 17678 logging.cpp:172] INFO level logging started! > I0107 01:22:01.259856 17678 fetcher.cpp:422] Fetcher Info: > {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/root","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"maprfs:\/\/\/mesos\/storm-mesos-0.9.3.tgz"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/s0121.stag.urbanairship.com:36373\/conf\/storm.yaml"}}],"sandbox_directory":"\/mnt\/data\/mesos\/slaves\/530dda5a-481a-4117-8154-3aee637d3b38-S3\/frameworks\/530dda5a-481a-4117-8154-3aee637d3b38-\/executors\/word-count-1-1452129714\/runs\/4443d5ac-d034-49b3-bf12-08fb9b0d92d0","user":"root"} > I0107 01:22:01.262171 17678 fetcher.cpp:377] Fetching URI > 'maprfs:///mesos/storm-mesos-0.9.3.tgz' > I0107 01:22:01.262212 17678 fetcher.cpp:248] Fetching directly into the > sandbox directory > I0107 01:22:01.262243 17678 fetcher.cpp:185] Fetching URI > 'maprfs:///mesos/storm-mesos-0.9.3.tgz' > I0107 01:22:01.671777 17678 fetcher.cpp:110] Downloading resource with Hadoop > client from 'maprfs:///mesos/storm-mesos-0.9.3.tgz' to > '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz' > copyToLocal: java.net.URISyntaxException: Expected scheme-specific part at > index 7: maprfs: > Usage: java FsShell [-copyToLocal [-ignoreCrc] [-crc] ] > E0107 01:22:02.435556 17678 shell.hpp:90] Command 'hadoop fs -copyToLocal > '/maprfs:///mesos/storm-mesos-0.9.3.tgz' > '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz'' > failed; this is the output: > Failed to fetch 'maprfs:///mesos/storm-mesos-0.9.3.tgz': HDFS copyToLocal > failed: Failed to execute 'hadoop fs -copyToLocal > '/maprfs:///mesos/storm-mesos-0.9.3.tgz' > '/mnt/data/mesos/slaves/530dda5a-481a-4117-8154-3aee637d3b38-S3/frameworks/530dda5a-481a-4117-8154-3aee637d3b38-/executors/word-count-1-1452129714/runs/4443d5ac-d034-49b3-bf12-08fb9b0d92d0/storm-mesos-0.9.3.tgz''; > the command was either not found or exited with a non-zero exit status: 255 > Failed to synchronize with slave (it's probably exited) > {code} > After a brief chat with [~jieyu], it was recommended to fix the current hdfs > client code because the new hadoop fetcher plugin is slated to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4075) Continue test suite execution across crashing tests.
[ https://issues.apache.org/jira/browse/MESOS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4075: -- Target Version/s: (was: 0.27.0) > Continue test suite execution across crashing tests. > > > Key: MESOS-4075 > URL: https://issues.apache.org/jira/browse/MESOS-4075 > Project: Mesos > Issue Type: Improvement > Components: test >Affects Versions: 0.26.0 >Reporter: Bernd Mathiske > Labels: mesosphere > > Currently, mesos-tests.sh exits when a test crashes. This is inconvenient > when trying to find out all tests that fail. > mesos-tests.sh should rate a test that crashes as failed and continue the > same way as if the test merely returned with a failure result and exited > properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4075) Continue test suite execution across crashing tests.
[ https://issues.apache.org/jira/browse/MESOS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095784#comment-15095784 ] Bernd Mathiske commented on MESOS-4075: --- I thought solving this would make your life as release master easier, but so would fixing any remaining crashes, which need to be worked on anyhow if they occur. See above. I just changed the target version for this ticket to indeterminate. Long-term, I suggest we should solve it, even if tests run more slowly then and you can only use this feature optionally. Getting a full assessment of what does and does not work in one swoop expedites testing. Repeatedly rerunning the tests with an incrementally updated list of test exclusions starts getting inefficient once there is more than one crash involved. (This cost us a lot of time in 0.26.0.) On second thought, maybe the latter procedure could be automated as a band-aid? > Continue test suite execution across crashing tests. > > > Key: MESOS-4075 > URL: https://issues.apache.org/jira/browse/MESOS-4075 > Project: Mesos > Issue Type: Improvement > Components: test >Affects Versions: 0.26.0 >Reporter: Bernd Mathiske > Labels: mesosphere > > Currently, mesos-tests.sh exits when a test crashes. This is inconvenient > when trying to find out all tests that fail. > mesos-tests.sh should rate a test that crashes as failed and continue the > same way as if the test merely returned with a failure result and exited > properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4336) Document supported file types for archive extraction by fetcher
[ https://issues.apache.org/jira/browse/MESOS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4336: -- Story Points: 1 Labels: documentation mesosphere newbie (was: documentation mesosphere) Priority: Trivial (was: Minor) Summary: Document supported file types for archive extraction by fetcher (was: Document supported file types for fetcher) > Document supported file types for archive extraction by fetcher > --- > > Key: MESOS-4336 > URL: https://issues.apache.org/jira/browse/MESOS-4336 > Project: Mesos > Issue Type: Documentation > Components: documentation, fetcher >Reporter: Sunil Shah >Priority: Trivial > Labels: documentation, mesosphere, newbie > > The Mesos fetcher extracts specified URIs if requested to do so by the > scheduler. However, the documentation at > http://mesos.apache.org/documentation/latest/fetcher/ doesn't list the file > types /extensions that will be extracted by the fetcher. > [The relevant > code|https://github.com/apache/mesos/blob/master/src/launcher/fetcher.cpp#L63] > specifies an exhaustive list of extensions that will be extracted, the > documentation should be updated to match. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3208) Fetch checksum files to inform fetcher cache use
[ https://issues.apache.org/jira/browse/MESOS-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091688#comment-15091688 ] Bernd Mathiske commented on MESOS-3208: --- Discarded for now, until this project becomes a priority again. > Fetch checksum files to inform fetcher cache use > > > Key: MESOS-3208 > URL: https://issues.apache.org/jira/browse/MESOS-3208 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Bernd Mathiske >Priority: Minor > > This is the first part of phase 1 as described in the comments for > MESOS-2073. We add a field to CommandInfo::URI that contains the URI of a > checksum file. When this file has new content, then the contents of the > associated value URI needs to be refreshed in the fetcher cache. > In this implementation step, we just add the above basic functionality > (download, checksum comparison). In later steps, we will add more control > flow to cover corner cases and thus make this feature more useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky
[ https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3235: -- Sprint: Mesosphere Sprint 20, Mesosphere Sprint 26 (was: Mesosphere Sprint 20) Component/s: fetcher tests > FetcherCacheHttpTest.HttpCachedSerialized and > FetcherCacheHttpTest.HttpCachedConcurrent are flaky > - > > Key: MESOS-3235 > URL: https://issues.apache.org/jira/browse/MESOS-3235 > Project: Mesos > Issue Type: Bug > Components: fetcher, tests >Affects Versions: 0.23.0 >Reporter: Joseph Wu >Assignee: Bernd Mathiske > Labels: mesosphere > > On OSX, {{make clean && make -j8 V=0 check}}: > {code} > [--] 3 tests from FetcherCacheHttpTest > [ RUN ] FetcherCacheHttpTest.HttpCachedSerialized > HTTP/1.1 200 OK > Date: Fri, 07 Aug 2015 17:23:05 GMT > Content-Length: 30 > I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 0 > Forked command at 54363 > sh -c './mesos-fetcher-test-cmd 0' > E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54363) > E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 1 > Forked command at 54411 > sh -c './mesos-fetcher-test-cmd 1' > E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54411) > E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > ../../src/tests/fetcher_cache_tests.cpp:860: Failure > Failed to wait 15secs for awaitFinished(task.get()) > *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are > using GNU date *** > [ FAILED ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms) > [ RUN ] FetcherCacheHttpTest.HttpCachedConcurrent > PC: @0x113723618 process::Owned<>::get() > *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: *** > @ 0x7fff8fcacf1a _sigtramp > @ 0x7f9bc3109710 (unknown) > @0x1136f07e2 mesos::internal::slave::Fetcher::fetch() > @0x113862f9d > mesos::internal::slave::MesosContainerizerProcess::fetch() > @0x1138f1b5d > _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_ > @0x1138f18cf > _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_ > @0x1143768cf std::__1::function<>::operator()() > @0x11435ca7f process::ProcessBase::visit() > @0x1143ed6fe process::DispatchEvent::visit() > @0x11271 process::ProcessBase::serve() > @0x114343b4e process::ProcessManager::resume() > @0x1143431ca process::internal::schedule() > @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_ > @ 0x7fff95090268 _pthread_body > @ 0x7fff950901e5 _pthread_start > @ 0x7fff9508e41d thread_start > Failed to synchronize with slave (it's probably exited) > make[3]: *** [check-local] Segmentation fault: 11 > make[2]: *** [check-am] Error 2 > make[1]: *** [check] Error 2 > make: *** [check-recursive] Error 1 > {code} > This was encountered just once out of 3+ {{make check}}s. -- This message
[jira] [Commented] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky
[ https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092126#comment-15092126 ] Bernd Mathiske commented on MESOS-3235: --- Thanks! This confirms nicely what Alexander found before: task 3 never starts, then waiting for all tasks fails. This should not crash anything, though. That's new. > FetcherCacheHttpTest.HttpCachedSerialized and > FetcherCacheHttpTest.HttpCachedConcurrent are flaky > - > > Key: MESOS-3235 > URL: https://issues.apache.org/jira/browse/MESOS-3235 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: Joseph Wu >Assignee: Bernd Mathiske > Labels: mesosphere > > On OSX, {{make clean && make -j8 V=0 check}}: > {code} > [--] 3 tests from FetcherCacheHttpTest > [ RUN ] FetcherCacheHttpTest.HttpCachedSerialized > HTTP/1.1 200 OK > Date: Fri, 07 Aug 2015 17:23:05 GMT > Content-Length: 30 > I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 0 > Forked command at 54363 > sh -c './mesos-fetcher-test-cmd 0' > E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54363) > E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 1 > Forked command at 54411 > sh -c './mesos-fetcher-test-cmd 1' > E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54411) > E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > ../../src/tests/fetcher_cache_tests.cpp:860: Failure > Failed to wait 15secs for awaitFinished(task.get()) > *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are > using GNU date *** > [ FAILED ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms) > [ RUN ] FetcherCacheHttpTest.HttpCachedConcurrent > PC: @0x113723618 process::Owned<>::get() > *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: *** > @ 0x7fff8fcacf1a _sigtramp > @ 0x7f9bc3109710 (unknown) > @0x1136f07e2 mesos::internal::slave::Fetcher::fetch() > @0x113862f9d > mesos::internal::slave::MesosContainerizerProcess::fetch() > @0x1138f1b5d > _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_ > @0x1138f18cf > _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_ > @0x1143768cf std::__1::function<>::operator()() > @0x11435ca7f process::ProcessBase::visit() > @0x1143ed6fe process::DispatchEvent::visit() > @0x11271 process::ProcessBase::serve() > @0x114343b4e process::ProcessManager::resume() > @0x1143431ca process::internal::schedule() > @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_ > @ 0x7fff95090268 _pthread_body > @ 0x7fff950901e5 _pthread_start > @ 0x7fff9508e41d thread_start > Failed to synchronize with slave (it's probably exited) > make[3]: *** [check-local] Segmentation fault: 11 > make[2]: *** [check-am] Error 2 > make[1]: *** [check] Error 2 > make: *** [check-recursive] Error 1 > {code} > This was encountered just once out of 3+
[jira] [Commented] (MESOS-4181) Change port range logging to different logging level.
[ https://issues.apache.org/jira/browse/MESOS-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082684#comment-15082684 ] Bernd Mathiske commented on MESOS-4181: --- Good ideas. This short-term fix is part of an epic that takes the broader view you are alluding to: https://issues.apache.org/jira/browse/MESOS-4233... > Change port range logging to different logging level. > - > > Key: MESOS-4181 > URL: https://issues.apache.org/jira/browse/MESOS-4181 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 0.25.0 >Reporter: Cody Maloney >Assignee: Joerg Schad > Labels: mesosphere, newbie > > Transforming from mesos' internal port range representation -> text is > non-linear in the number of bytest output. We end up with a massive amount of > log data like the following: > {noformat} > Dec 15 23:54:08 ip-10-0-7-60.us-west-2.compute.internal mesos-master[15919]: > I1215 23:51:58.891165 15925 hierarchical.hpp:1103] Recovered cpus(*):1e-05; > mem(*):10; ports(*):[5565-5565] (total: ports(*):[1025-2180, 2182-3887, > 3889-5049, 5052-8079, 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; > disk(*):32541, allocated: cpus(*):0.01815; ports(*):[1050-1050, 1092-1092, > 1094-1094, 1129-1129, 1132-1132, 1140-1140, 1177-1178, 1180-1180, 1192-1192, > 1205-1205, 1221-1221, 1308-1308, 1311-1311, 1323-1323, 1326-1326, 1335-1335, > 1365-1365, 1404-1404, 1412-1412, 1436-1436, 1455-1455, 1459-1459, 1472-1472, > 1477-1477, 1482-1482, 1491-1491, 1510-1510, 1551-1551, 1553-1553, 1559-1559, > 1573-1573, 1590-1590, 1592-1592, 1619-1619, 1635-1636, 1678-1678, 1738-1738, > 1742-1742, 1752-1752, 1770-1770, 1780-1782, 1790-1790, 1792-1792, 1799-1799, > 1804-1804, 1844-1844, 1852-1852, 1867-1867, 1899-1899, 1936-1936, 1945-1945, > 1954-1954, 2046-2046, 2055-2055, 2063-2063, 2070-2070, 2089-2089, 2104-2104, > 2117-2117, 2132-2132, 2173-2173, 2178-2178, 2188-2188, 2200-2200, 2218-2218, > 2223-2223, 2244-2244, 2248-2248, 2250-2250, 2270-2270, 2286-2286, 2302-2302, > 2332-2332, 2377-2377, 2397-2397, 2423-2423, 2435-2435, 2442-2442, 2448-2448, > 2477-2477, 2482-2482, 2522-2522, 2586-2586, 2594-2594, 2600-2600, 2602-2602, > 2643-2643, 2648-2648, 2659-2659, 2691-2691, 2716-2716, 2739-2739, 2794-2794, > 2802-2802, 2823-2823, 2831-2831, 2840-2840, 2848-2848, 2876-2876, 2894-2895, > 2900-2900, 2904-2904, 2912-2912, 2983-2983, 2991-2991, 2999-2999, 3011-3011, > 3025-3025, 3036-3036, 3041-3041, 3051-3051, 3074-3074, 3097-3097, 3107-3107, > 3121-3121, 3171-3171, 3176-3176, 3195-3195, 3197-3197, 3210-3210, 3221-3221, > 3234-3234, 3245-3245, 3250-3251, 3255-3255, 3270-3270, 3293-3293, 3298-3298, > 3312-3312, 3318-3318, 3325-3325, 3368-3368, 3379-3379, 3391-3391, 3412-3412, > 3414-3414, 3420-3420, 3492-3492, 3501-3501, 3538-3538, 3579-3579, 3631-3631, > 3680-3680, 3684-3684, 3695-3695, 3699-3699, 3738-3738, 3758-3758, 3793-3793, > 3808-3808, 3817-3817, 3854-3854, 3856-3856, 3900-3900, 3906-3906, 3909-3909, > 3912-3912, 3946-3946, 3956-3956, 3959-3959, 3963-3963, 3974- > Dec 15 23:54:09 ip-10-0-7-60.us-west-2.compute.internal mesos-master[15919]: > 3974, 3981-3981, 3985-3985, 4134-4134, 4178-4178, 4206-4206, 4223-4223, > 4239-4239, 4245-4245, 4251-4251, 4262-4263, 4271-4271, 4308-4308, 4323-4323, > 4329-4329, 4368-4368, 4385-4385, 4404-4404, 4419-4419, 4430-4430, 4448-4448, > 4464-4464, 4481-4481, 4494-4494, 4499-4499, 4510-4510, 4534-4534, 4543-4543, > 4555-4555, 4561-4562, 4577-4577, 4601-4601, 4675-4675, 4722-4722, 4739-4739, > 4748-4748, 4752-4752, 4764-4764, 4771-4771, 4787-4787, 4827-4827, 4830-4830, > 4837-4837, 4848-4848, 4853-4853, 4879-4879, 4883-4883, 4897-4897, 4902-4902, > 4911-4911, 4940-4940, 4946-4946, 4957-4957, 4994-4994, 4996-4996, 5008-5008, > 5019-5019, 5043-5043, 5059-5059, 5109-5109, 5134-5135, 5157-5157, 5172-5172, > 5192-5192, 5211-5211, 5215-5215, 5234-5234, 5237-5237, 5246-5246, 5255-5255, > 5268-5268, 5311-5311, 5314-5314, 5316-5316, 5348-5348, 5391-5391, 5407-5407, > 5433-5433, 5446-5447, 5454-5454, 5456-5456, 5482-5482, 5514-5515, 5517-5517, > 5525-5525, 5542-5542, 5554-5554, 5581-5581, 5624-5624, 5647-5647, 5695-5695, > 5700-5700, 5703-5703, 5743-5743, 5747-5747, 5793-5793, 5850-5850, 5856-5856, > 5858-5858, 5899-5899, 5901-5901, 5940-5940, 5958-5958, 5962-5962, 5974-5974, > 5995-5995, 6000-6001, 6037-6037, 6053-6053, 6066-6066, 6078-6078, 6129-6129, > 6139-6139, 6160-6160, 6174-6174, 6193-6193, 6234-6234, 6263-6263, 6276-6276, > 6287-6287, 6292-6292, 6294-6294, 6296-6296, 6306-6307, 6333-6333, 6343-6343, > 6349-6349, 6377-6377, 6418-6418, 6454-6454, 6484-6484, 6496-6496, 6504-6504, > 6518-6518, 6589-6589, 6592-6592, 6606-6606, 6640-6640, 6713-6713, 6717-6717, > 6738-6738, 6757-6757, 6765-6765, 6778-6778, 6792-6792, 6798-6798,
[jira] [Commented] (MESOS-4075) Continue test suite execution across crashing tests.
[ https://issues.apache.org/jira/browse/MESOS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083302#comment-15083302 ] Bernd Mathiske commented on MESOS-4075: --- Indeed, we are focussing on fixing crashes first and foremost. Yet it would be nice if any new crashes would not hinder us when running test suites (on CI). > Continue test suite execution across crashing tests. > > > Key: MESOS-4075 > URL: https://issues.apache.org/jira/browse/MESOS-4075 > Project: Mesos > Issue Type: Improvement > Components: test >Affects Versions: 0.26.0 >Reporter: Bernd Mathiske > Labels: mesosphere > > Currently, mesos-tests.sh exits when a test crashes. This is inconvenient > when trying to find out all tests that fail. > mesos-tests.sh should rate a test that crashes as failed and continue the > same way as if the test merely returned with a failure result and exited > properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1763) Add support for multiple roles to be specified in FrameworkInfo
[ https://issues.apache.org/jira/browse/MESOS-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-1763: -- Issue Type: Epic (was: Task) > Add support for multiple roles to be specified in FrameworkInfo > --- > > Key: MESOS-1763 > URL: https://issues.apache.org/jira/browse/MESOS-1763 > Project: Mesos > Issue Type: Epic > Components: master >Reporter: Vinod Kone >Assignee: Timothy Chen > Labels: mesosphere, roles > > Currently frameworks have the ability to set only one (resource) role in > FrameworkInfo. It would be nice to let frameworks specify multiple roles so > that they can do more fine grained resource accounting per role. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4284) Draft design doc for multi-role frameworks
Bernd Mathiske created MESOS-4284: - Summary: Draft design doc for multi-role frameworks Key: MESOS-4284 URL: https://issues.apache.org/jira/browse/MESOS-4284 Project: Mesos Issue Type: Story Components: master Reporter: Bernd Mathiske Assignee: Benjamin Bannier Create a document that describes the problems with having only single-role frameworks and proposes an MVP solution and implementation approach. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1763) Add support for multiple roles to be specified in FrameworkInfo
[ https://issues.apache.org/jira/browse/MESOS-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-1763: -- Assignee: (was: Timothy Chen) Epic Name: multi-role frameworks > Add support for multiple roles to be specified in FrameworkInfo > --- > > Key: MESOS-1763 > URL: https://issues.apache.org/jira/browse/MESOS-1763 > Project: Mesos > Issue Type: Epic > Components: master >Reporter: Vinod Kone > Labels: mesosphere, roles > > Currently frameworks have the ability to set only one (resource) role in > FrameworkInfo. It would be nice to let frameworks specify multiple roles so > that they can do more fine grained resource accounting per role. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3809) Expose advertise_ip and advertise_port as command line options in mesos slave
[ https://issues.apache.org/jira/browse/MESOS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15080861#comment-15080861 ] Bernd Mathiske commented on MESOS-3809: --- Unfortunately, this commit was indeed not cherry-picked into 0.26.0, but should have, and the ticket shows up in the CHANGELOG. I'll update the CHANGELOG for 0.26.0, removing MESOS-3809 from it, and set the target version for this ticket to 0.27.0. > Expose advertise_ip and advertise_port as command line options in mesos slave > - > > Key: MESOS-3809 > URL: https://issues.apache.org/jira/browse/MESOS-3809 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 0.25.0 >Reporter: Anindya Sinha >Assignee: Anindya Sinha >Priority: Minor > Labels: mesosphere > Fix For: 0.26.0 > > > advertise_ip and advertise_port are exposed as mesos master command line args > (MESOS-809). But the following use case makes it a candidate for adding as > command line args in mesos slave as well. > On Tue, Oct 27, 2015 at 7:43 PM, Xiaodong Zhangwrote: > It works! Thanks a lot. > 发件人: haosdent > 答复: "u...@mesos.apache.org" > 日期: 2015年10月28日 星期三 上午10:23 > 至: user > 主题: Re: How to tell master which ip to connect. > Do you try `export LIBPROCESS_ADVERTISE_IP=xxx` and > `LIBPROCESS_ADVERTISE_PORT` when start slave? > On Wed, Oct 28, 2015 at 10:16 AM, Xiaodong Zhang wrote: > Hi teams: > My scenarios is like this: > My master nodes were deployed in AWS. My slaves were in AZURE.So they > communicate via public ip. > I got trouble when slaves try to register to master. > Now slaves can get master’s public ip address,and can send register > request.But they can only send there private ip to master.(Because they don’t > know there public ip,thus they can’t not bind a public ip via —ip flag), > thus masters can’t connect slaves.How can the slave to tell master which ip > master should connect(I can’t find any flags like —advertise_ip in master). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3809) Expose advertise_ip and advertise_port as command line options in mesos slave
[ https://issues.apache.org/jira/browse/MESOS-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3809: -- Target Version/s: 0.27.0 Fix Version/s: (was: 0.26.0) > Expose advertise_ip and advertise_port as command line options in mesos slave > - > > Key: MESOS-3809 > URL: https://issues.apache.org/jira/browse/MESOS-3809 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 0.25.0 >Reporter: Anindya Sinha >Assignee: Anindya Sinha >Priority: Minor > Labels: mesosphere > > advertise_ip and advertise_port are exposed as mesos master command line args > (MESOS-809). But the following use case makes it a candidate for adding as > command line args in mesos slave as well. > On Tue, Oct 27, 2015 at 7:43 PM, Xiaodong Zhangwrote: > It works! Thanks a lot. > 发件人: haosdent > 答复: "u...@mesos.apache.org" > 日期: 2015年10月28日 星期三 上午10:23 > 至: user > 主题: Re: How to tell master which ip to connect. > Do you try `export LIBPROCESS_ADVERTISE_IP=xxx` and > `LIBPROCESS_ADVERTISE_PORT` when start slave? > On Wed, Oct 28, 2015 at 10:16 AM, Xiaodong Zhang wrote: > Hi teams: > My scenarios is like this: > My master nodes were deployed in AWS. My slaves were in AZURE.So they > communicate via public ip. > I got trouble when slaves try to register to master. > Now slaves can get master’s public ip address,and can send register > request.But they can only send there private ip to master.(Because they don’t > know there public ip,thus they can’t not bind a public ip via —ip flag), > thus masters can’t connect slaves.How can the slave to tell master which ip > master should connect(I can’t find any flags like —advertise_ip in master). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4075) Continue test suite execution across crashing tests.
[ https://issues.apache.org/jira/browse/MESOS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4075: -- Assignee: (was: Bernd Mathiske) > Continue test suite execution across crashing tests. > > > Key: MESOS-4075 > URL: https://issues.apache.org/jira/browse/MESOS-4075 > Project: Mesos > Issue Type: Improvement > Components: test >Affects Versions: 0.26.0 >Reporter: Bernd Mathiske > Labels: mesosphere > > Currently, mesos-tests.sh exits when a test crashes. This is inconvenient > when trying to find out all tests that fail. > mesos-tests.sh should rate a test that crashes as failed and continue the > same way as if the test merely returned with a failure result and exited > properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3370) Deprecate the external containerizer
[ https://issues.apache.org/jira/browse/MESOS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069437#comment-15069437 ] Bernd Mathiske commented on MESOS-3370: --- commit 43420dd0a27cd4adf1b2c929262f96e86d647acf Author: Joerg SchadDate: Wed Dec 23 10:41:38 2015 +0100 Added links to individual containerizers in containerizer-internal.md. Review: https://reviews.apache.org/r/41683/ > Deprecate the external containerizer > > > Key: MESOS-3370 > URL: https://issues.apache.org/jira/browse/MESOS-3370 > Project: Mesos > Issue Type: Task >Reporter: Niklas Quarfot Nielsen > > To our knowledge, no one is using the external containerizer and we could > clean up code paths in the slave and containerizer interface (the dual > launch() signatures) > In a deprecation cycle, we can move this code into a module (dependent on > containerizer modules landing) and from there, move it into it's own repo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3370) Deprecate the external containerizer
[ https://issues.apache.org/jira/browse/MESOS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069440#comment-15069440 ] Bernd Mathiske commented on MESOS-3370: --- commit 3c40d2d27d792c4baa927271414c4541f59069bd Author: Joerg SchadDate: Wed Dec 23 10:43:33 2015 +0100 Reflected deprecation of external containerizer in documentation. Review: https://reviews.apache.org/r/41682/ > Deprecate the external containerizer > > > Key: MESOS-3370 > URL: https://issues.apache.org/jira/browse/MESOS-3370 > Project: Mesos > Issue Type: Task >Reporter: Niklas Quarfot Nielsen > > To our knowledge, no one is using the external containerizer and we could > clean up code paths in the slave and containerizer interface (the dual > launch() signatures) > In a deprecation cycle, we can move this code into a module (dependent on > containerizer modules landing) and from there, move it into it's own repo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4113) Docker Executor should not set container IP during bridged mode
[ https://issues.apache.org/jira/browse/MESOS-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069446#comment-15069446 ] Bernd Mathiske commented on MESOS-4113: --- @scalp42, thanks for being persistent! I don't see either how MESOS-4064 can be viewed as a duplicate of this issue here. I suspect it was closed assuming the "duplicate" link is correct. Further indication for this is that AFAICT none of the code in the reviews posted for MESOS-4064 addresses MESOS-4113. @hartem, can you confirm this view? Reopening this ticket. @scalp42, It would be great if you check the output from current master is the same as from 0.26.0. I suspect it is. > Docker Executor should not set container IP during bridged mode > --- > > Key: MESOS-4113 > URL: https://issues.apache.org/jira/browse/MESOS-4113 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.25.0, 0.26.0 >Reporter: Sargun Dhillon >Assignee: Artem Harutyunyan > Labels: mesosphere > > The docker executor currently sets the IP address of the container into > ContainerStatus.NetworkInfo.IPAddresses. This isn't a good thing, because > during bridged mode execution, it makes it so that that IP address is > useless, since it's behind the Docker NAT. I would like a flag that disables > filling the IP address in, and allows it to fall back to the agent IP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.
[ https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-2857: -- Sprint: Mesosphere Sprint 23 (was: Mesosphere Sprint 23, Mesosphere Sprint 24) > FetcherCacheTest.LocalCachedExtract is flaky. > - > > Key: MESOS-2857 > URL: https://issues.apache.org/jira/browse/MESOS-2857 > Project: Mesos > Issue Type: Bug > Components: fetcher, test >Reporter: Benjamin Mahler >Assignee: Benjamin Bannier > Labels: flaky-test, mesosphere > > From jenkins: > {noformat} > [ RUN ] FetcherCacheTest.LocalCachedExtract > Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj' > I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms > I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns > I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns > I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in > 8967ns > I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the > db in 7762ns > I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery > I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status > I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to > STARTING > I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 717888ns > I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to > STARTING > I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status > I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status > received a broadcasted recover request > I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from > a replica in STARTING status > I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING > I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 432335ns > I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to > VOTING > I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos > group > I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated > I0610 20:04:48.602905 24594 master.cpp:363] Master > 20150610-200448-3875541420-32907-24561 (dbade881e927) started on > 172.17.0.231:32907 > I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" > --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" > --framework_sorter="drf" --help="false" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" > --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" > --zk_session_timeout="10secs" > I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing > authenticated frameworks to register > I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing > authenticated slaves to register > I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for > authentication from > '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials' > I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' > authenticator > I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled > I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical > allocator process > I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given > I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is > master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561 > I0610 20:04:48.607466 24594 master.cpp:1489] Elected as the leading master! > I0610 20:04:48.607481 24594 master.cpp:1259] Recovering from registrar > I0610 20:04:48.607712 24594 registrar.cpp:313] Recovering registrar > I0610 20:04:48.608543 24588 log.cpp:661] Attempting to start the writer > I0610 20:04:48.610231 24588
[jira] [Commented] (MESOS-3552) CHECK failure due to floating point precision on reservation request
[ https://issues.apache.org/jira/browse/MESOS-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063799#comment-15063799 ] Bernd Mathiske commented on MESOS-3552: --- commit 7a57b0c6c403d3c5dd6b67087f8727d1b348b625 Author: Bernd MathiskeDate: Fri Dec 18 10:59:52 2015 +0100 Ported approx. Option CPU resource number comparison to v1. Review: https://reviews.apache.org/r/40903/ > CHECK failure due to floating point precision on reservation request > > > Key: MESOS-3552 > URL: https://issues.apache.org/jira/browse/MESOS-3552 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Mandeep Chadha >Assignee: Mandeep Chadha > Labels: mesosphere, tech-debt > Fix For: 0.26.0 > > > result.cpus() == cpus() check is failing due to ( double == double ) > comparison problem. > Root Cause : > Framework requested 0.1 cpu reservation for the first task. So far so good. > Next Reserve operation — lead to double operations resulting in following > double values : > results.cpus() : 23.9964472863211995 cpus() : 24 > And the check ( result.cpus() == cpus() ) failed. > The double arithmetic operations caused results.cpus() value to be : > 23.9964472863211995 and hence ( 23.9964472863211995 > == 24 ) failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4130) Document how the fetcher can reach across a proxy connection.
Bernd Mathiske created MESOS-4130: - Summary: Document how the fetcher can reach across a proxy connection. Key: MESOS-4130 URL: https://issues.apache.org/jira/browse/MESOS-4130 Project: Mesos Issue Type: Documentation Components: fetcher Reporter: Bernd Mathiske The fetcher uses libcurl for downloading content from HTTP, HTTPS, etc. There is no source code in the pertinent parts of "net.hpp" that deals with proxy settings. However, libcurl automatically picks up certain environment variables and adjusts its settings accordingly. See "man libcurl-tutorial" for details. See section "Proxies", subsection "Environment Variables". If you follow this recipe in your Mesos agent startup script, you can use a proxy. We should document this in the fetcher (cache) doc (http://mesos.apache.org/documentation/latest/fetcher/). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4120) Make DiscoveryInfo dynamically updatable
[ https://issues.apache.org/jira/browse/MESOS-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4120: -- Affects Version/s: 0.26.0 0.25.0 Target Version/s: 0.27.0 > Make DiscoveryInfo dynamically updatable > > > Key: MESOS-4120 > URL: https://issues.apache.org/jira/browse/MESOS-4120 > Project: Mesos > Issue Type: Improvement >Affects Versions: 0.25.0, 0.26.0 >Reporter: Sargun Dhillon >Priority: Critical > Labels: mesosphere > > K8s tasks can dynamically update what they expose to make discoverable by the > cluster. Unfortunately, all DiscoveryInfo the cluster is immutable, at the > time of task start. > We would like to enable DiscoveryInfo to be dynamically updatable, so that > executors can change what they're advertising based on their internal state, > versus requiring DiscoveryInfo to be known prior to starting the tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4113) Docker Executor should not set container IP during bridged mode
[ https://issues.apache.org/jira/browse/MESOS-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-4113: -- Affects Version/s: 0.26.0 > Docker Executor should not set container IP during bridged mode > --- > > Key: MESOS-4113 > URL: https://issues.apache.org/jira/browse/MESOS-4113 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.25.0, 0.26.0 >Reporter: Sargun Dhillon > Labels: mesosphere > > The docker executor currently sets the IP address of the container into > ContainerStatus.NetworkInfo.IPAddresses. This isn't a good thing, because > during bridged mode execution, it makes it so that that IP address is > useless, since it's behind the Docker NAT. I would like a flag that disables > filling the IP address in, and allows it to fall back to the agent IP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4119) Add support for enabling --3way to apply-reviews.py.
[ https://issues.apache.org/jira/browse/MESOS-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052943#comment-15052943 ] Bernd Mathiske commented on MESOS-4119: --- Since you marked this "newbie", please explain to newbies what you mean by --3way and what apply-reviews is in general. > Add support for enabling --3way to apply-reviews.py. > > > Key: MESOS-4119 > URL: https://issues.apache.org/jira/browse/MESOS-4119 > Project: Mesos > Issue Type: Task >Reporter: Artem Harutyunyan > Labels: beginner, mesosphere, newbie > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4080) Clean up HTTP authentication in quota endpoints
[ https://issues.apache.org/jira/browse/MESOS-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052930#comment-15052930 ] Bernd Mathiske commented on MESOS-4080: --- Can you please be more specific about the tech debt mentioned? > Clean up HTTP authentication in quota endpoints > --- > > Key: MESOS-4080 > URL: https://issues.apache.org/jira/browse/MESOS-4080 > Project: Mesos > Issue Type: Task > Components: HTTP API, master >Reporter: Jan Schlicht >Assignee: Jan Schlicht >Priority: Critical > Labels: mesosphere, quota, tech-debt > > The authentification of quota requests introduces some technical dept that > will be resolved by the refactored HTTP based authentification. This ticket > tracks the work related to cleaning up the quota handling to use the new HTTP > API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3086) Create cgroups TasksKiller for non freeze subsystems.
[ https://issues.apache.org/jira/browse/MESOS-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3086: -- Sprint: Mesosphere Sprint 15, Mesosphere Sprint 16, Mesosphere Sprint 17, Mesosphere Sprint 18, Mesosphere Sprint 19, Mesosphere Sprint 20, Mesosphere Sprint 21, Mesosphere Sprint 22 (was: Mesosphere Sprint 15, Mesosphere Sprint 16, Mesosphere Sprint 17, Mesosphere Sprint 18, Mesosphere Sprint 19, Mesosphere Sprint 20, Mesosphere Sprint 21, Mesosphere Sprint 22, Mesosphere Sprint 23) > Create cgroups TasksKiller for non freeze subsystems. > - > > Key: MESOS-3086 > URL: https://issues.apache.org/jira/browse/MESOS-3086 > Project: Mesos > Issue Type: Bug >Reporter: Joerg Schad >Assignee: Joerg Schad > Labels: mesosphere > > We have a number of test issues when we cannot remove cgroups (in case there > are still related tasks running) in cases where the freezer subsystem is not > available. > In the current code > (https://github.com/apache/mesos/blob/0.22.1/src/linux/cgroups.cpp#L1728) we > will fallback to a very simple mechnism of recursivly trying to remove the > cgroups which fails if there are still tasks running. > Therefore we need an additional (NonFreeze)TasksKiller which doesn't rely > on the freezer subsystem. > This problem caused issues when running 'sudo make check' during 0.23 release > testing, where BenH provided already a better error message with > b1a23d6a52c31b8c5c840ab01902dbe00cb1feef / https://reviews.apache.org/r/36604. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4075) Continue test suite execution across crashing tests.
Bernd Mathiske created MESOS-4075: - Summary: Continue test suite execution across crashing tests. Key: MESOS-4075 URL: https://issues.apache.org/jira/browse/MESOS-4075 Project: Mesos Issue Type: Improvement Components: test Affects Versions: 0.26.0 Reporter: Bernd Mathiske Currently, mesos-tests.sh exits when a test crashes. This is inconvenient when trying to find out all tests that fail. mesos-tests.sh should rate a test that crashes as failed and continue the same way as if the test merely returned with a failure result and exited properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3208) Fetch checksum files to inform fetcher cache use
[ https://issues.apache.org/jira/browse/MESOS-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske updated MESOS-3208: -- Assignee: (was: Bernd Mathiske) > Fetch checksum files to inform fetcher cache use > > > Key: MESOS-3208 > URL: https://issues.apache.org/jira/browse/MESOS-3208 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Bernd Mathiske >Priority: Minor > > This is the first part of phase 1 as described in the comments for > MESOS-2073. We add a field to CommandInfo::URI that contains the URI of a > checksum file. When this file has new content, then the contents of the > associated value URI needs to be refreshed in the fetcher cache. > In this implementation step, we just add the above basic functionality > (download, checksum comparison). In later steps, we will add more control > flow to cover corner cases and thus make this feature more useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)