[jira] [Commented] (MESOS-4479) Implement reservation labels
[ https://issues.apache.org/jira/browse/MESOS-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135457#comment-15135457 ] Michael Park commented on MESOS-4479: - {noformat} commit 4dbebcfaf2e8c399b2343b932d19677790db020e Author: Joseph Wu Date: Fri Feb 5 17:56:00 2016 -0800 Fixed compilation on Ubuntu 15. A few signed-unsigned comparisons introduced by https://reviews.apache.org/r/42751/ Review: https://reviews.apache.org/r/43276/ {noformat} > Implement reservation labels > > > Key: MESOS-4479 > URL: https://issues.apache.org/jira/browse/MESOS-4479 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: labels, mesosphere, reservations > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4615) ContainerLoggerTest.DefaultToSandbox is flaky
Greg Mann created MESOS-4615: Summary: ContainerLoggerTest.DefaultToSandbox is flaky Key: MESOS-4615 URL: https://issues.apache.org/jira/browse/MESOS-4615 Project: Mesos Issue Type: Bug Components: tests Affects Versions: 0.27.0 Environment: CentOS 7, gcc, libevent & SSL enabled Reporter: Greg Mann Just saw this failure on the ASF CI: {code} [ RUN ] ContainerLoggerTest.DefaultToSandbox I0206 01:25:03.766458 2824 leveldb.cpp:174] Opened db in 72.979786ms I0206 01:25:03.811712 2824 leveldb.cpp:181] Compacted db in 45.162067ms I0206 01:25:03.811810 2824 leveldb.cpp:196] Created db iterator in 26090ns I0206 01:25:03.811828 2824 leveldb.cpp:202] Seeked to beginning of db in 3173ns I0206 01:25:03.811839 2824 leveldb.cpp:271] Iterated through 0 keys in the db in 497ns I0206 01:25:03.811900 2824 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0206 01:25:03.812785 2849 recover.cpp:447] Starting replica recovery I0206 01:25:03.813043 2849 recover.cpp:473] Replica is in EMPTY status I0206 01:25:03.814668 2854 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (371)@172.17.0.8:37843 I0206 01:25:03.815210 2849 recover.cpp:193] Received a recover response from a replica in EMPTY status I0206 01:25:03.815732 2854 recover.cpp:564] Updating replica status to STARTING I0206 01:25:03.819664 2857 master.cpp:376] Master 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de (74ef606c4063) started on 172.17.0.8:37843 I0206 01:25:03.819703 2857 master.cpp:378] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/h5vu5I/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" --work_dir="/tmp/h5vu5I/master" --zk_session_timeout="10secs" I0206 01:25:03.820241 2857 master.cpp:423] Master only allowing authenticated frameworks to register I0206 01:25:03.820257 2857 master.cpp:428] Master only allowing authenticated slaves to register I0206 01:25:03.820269 2857 credentials.hpp:35] Loading credentials for authentication from '/tmp/h5vu5I/credentials' I0206 01:25:03.821110 2857 master.cpp:468] Using default 'crammd5' authenticator I0206 01:25:03.821311 2857 master.cpp:537] Using default 'basic' HTTP authenticator I0206 01:25:03.821636 2857 master.cpp:571] Authorization enabled I0206 01:25:03.821979 2846 hierarchical.cpp:144] Initialized hierarchical allocator process I0206 01:25:03.822057 2846 whitelist_watcher.cpp:77] No whitelist given I0206 01:25:03.825460 2847 master.cpp:1712] The newly elected leader is master@172.17.0.8:37843 with id 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de I0206 01:25:03.825512 2847 master.cpp:1725] Elected as the leading master! I0206 01:25:03.825533 2847 master.cpp:1470] Recovering from registrar I0206 01:25:03.825835 2847 registrar.cpp:307] Recovering registrar I0206 01:25:03.848212 2854 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 32.226093ms I0206 01:25:03.848299 2854 replica.cpp:320] Persisted replica status to STARTING I0206 01:25:03.848702 2854 recover.cpp:473] Replica is in STARTING status I0206 01:25:03.850728 2858 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (373)@172.17.0.8:37843 I0206 01:25:03.851230 2854 recover.cpp:193] Received a recover response from a replica in STARTING status I0206 01:25:03.852018 2854 recover.cpp:564] Updating replica status to VOTING I0206 01:25:03.881681 2854 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 29.184163ms I0206 01:25:03.881772 2854 replica.cpp:320] Persisted replica status to VOTING I0206 01:25:03.882058 2854 recover.cpp:578] Successfully joined the Paxos group I0206 01:25:03.882258 2854 recover.cpp:462] Recover process terminated I0206 01:25:03.883076 2854 log.cpp:659] Attempting to start the writer I0206 01:25:03.885040 2854 replica.cpp:493] Replica received implicit promise request from (374)@172.17.0.8:37843 with proposal 1 I0206 01:25:03.915132 2854 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 29.980589ms
[jira] [Commented] (MESOS-4614) SlaveRecoveryTest/0.CleanupHTTPExecutor is flaky
[ https://issues.apache.org/jira/browse/MESOS-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135385#comment-15135385 ] Anand Mazumdar commented on MESOS-4614: --- The executor did not even send the {{Subscribe}} call after it connected with the agent. This is similar to the behavior that we have been observing with another flaky test in {{MESOS-3273}} in which the example test framework does not send the initial {{SUBSCRIBE}} call. > SlaveRecoveryTest/0.CleanupHTTPExecutor is flaky > > > Key: MESOS-4614 > URL: https://issues.apache.org/jira/browse/MESOS-4614 > Project: Mesos > Issue Type: Bug > Components: HTTP API, slave, tests >Affects Versions: 0.27.0 > Environment: CentOS 7, gcc, libevent & SSL enabled >Reporter: Greg Mann > Labels: flaky-test, mesosphere > > Just saw this failure on the ASF CI: > {code} > [ RUN ] SlaveRecoveryTest/0.CleanupHTTPExecutor > I0206 00:22:44.791671 2824 leveldb.cpp:174] Opened db in 2.539372ms > I0206 00:22:44.792459 2824 leveldb.cpp:181] Compacted db in 740473ns > I0206 00:22:44.792510 2824 leveldb.cpp:196] Created db iterator in 24164ns > I0206 00:22:44.792532 2824 leveldb.cpp:202] Seeked to beginning of db in > 1831ns > I0206 00:22:44.792548 2824 leveldb.cpp:271] Iterated through 0 keys in the > db in 342ns > I0206 00:22:44.792605 2824 replica.cpp:779] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0206 00:22:44.793256 2847 recover.cpp:447] Starting replica recovery > I0206 00:22:44.793480 2847 recover.cpp:473] Replica is in EMPTY status > I0206 00:22:44.794538 2847 replica.cpp:673] Replica in EMPTY status received > a broadcasted recover request from (9472)@172.17.0.2:43484 > I0206 00:22:44.795040 2848 recover.cpp:193] Received a recover response from > a replica in EMPTY status > I0206 00:22:44.795644 2848 recover.cpp:564] Updating replica status to > STARTING > I0206 00:22:44.796519 2850 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 752810ns > I0206 00:22:44.796545 2850 replica.cpp:320] Persisted replica status to > STARTING > I0206 00:22:44.796725 2848 recover.cpp:473] Replica is in STARTING status > I0206 00:22:44.797828 2857 replica.cpp:673] Replica in STARTING status > received a broadcasted recover request from (9473)@172.17.0.2:43484 > I0206 00:22:44.798355 2850 recover.cpp:193] Received a recover response from > a replica in STARTING status > I0206 00:22:44.799193 2850 recover.cpp:564] Updating replica status to VOTING > I0206 00:22:44.799583 2855 master.cpp:376] Master > 0b206a40-a9c3-4d44-a5bd-8032d60a32ca (6632562f1ade) started on > 172.17.0.2:43484 > I0206 00:22:44.799609 2855 master.cpp:378] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/n2FxQV/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="100secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" > --work_dir="/tmp/n2FxQV/master" --zk_session_timeout="10secs" > I0206 00:22:44.71 2855 master.cpp:423] Master only allowing > authenticated frameworks to register > I0206 00:22:44.89 2855 master.cpp:428] Master only allowing > authenticated slaves to register > I0206 00:22:44.800020 2855 credentials.hpp:35] Loading credentials for > authentication from '/tmp/n2FxQV/credentials' > I0206 00:22:44.800245 2850 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 679345ns > I0206 00:22:44.800370 2850 replica.cpp:320] Persisted replica status to > VOTING > I0206 00:22:44.800397 2855 master.cpp:468] Using default 'crammd5' > authenticator > I0206 00:22:44.800693 2855 master.cpp:537] Using default 'basic' HTTP > authenticator > I0206 00:22:44.800815 2855 master.cpp:571] Authorization enabled > I0206 00:22:44.801216 2850 recover.cpp:578] Successfully joined the Paxos > group > I0206 00:22:44.801604 2850 recover.cpp:462] Recover process terminated > I0206 00:22:44.801759 2856 whitelist_watcher.cpp:77] No whitelist given > I0206 00:22:44.801725 2847 hierarchical.cpp
[jira] [Created] (MESOS-4614) SlaveRecoveryTest/0.CleanupHTTPExecutor is flaky
Greg Mann created MESOS-4614: Summary: SlaveRecoveryTest/0.CleanupHTTPExecutor is flaky Key: MESOS-4614 URL: https://issues.apache.org/jira/browse/MESOS-4614 Project: Mesos Issue Type: Bug Components: HTTP API, slave, tests Affects Versions: 0.27.0 Environment: CentOS 7, gcc, libevent & SSL enabled Reporter: Greg Mann Just saw this failure on the ASF CI: {code} [ RUN ] SlaveRecoveryTest/0.CleanupHTTPExecutor I0206 00:22:44.791671 2824 leveldb.cpp:174] Opened db in 2.539372ms I0206 00:22:44.792459 2824 leveldb.cpp:181] Compacted db in 740473ns I0206 00:22:44.792510 2824 leveldb.cpp:196] Created db iterator in 24164ns I0206 00:22:44.792532 2824 leveldb.cpp:202] Seeked to beginning of db in 1831ns I0206 00:22:44.792548 2824 leveldb.cpp:271] Iterated through 0 keys in the db in 342ns I0206 00:22:44.792605 2824 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0206 00:22:44.793256 2847 recover.cpp:447] Starting replica recovery I0206 00:22:44.793480 2847 recover.cpp:473] Replica is in EMPTY status I0206 00:22:44.794538 2847 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (9472)@172.17.0.2:43484 I0206 00:22:44.795040 2848 recover.cpp:193] Received a recover response from a replica in EMPTY status I0206 00:22:44.795644 2848 recover.cpp:564] Updating replica status to STARTING I0206 00:22:44.796519 2850 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 752810ns I0206 00:22:44.796545 2850 replica.cpp:320] Persisted replica status to STARTING I0206 00:22:44.796725 2848 recover.cpp:473] Replica is in STARTING status I0206 00:22:44.797828 2857 replica.cpp:673] Replica in STARTING status received a broadcasted recover request from (9473)@172.17.0.2:43484 I0206 00:22:44.798355 2850 recover.cpp:193] Received a recover response from a replica in STARTING status I0206 00:22:44.799193 2850 recover.cpp:564] Updating replica status to VOTING I0206 00:22:44.799583 2855 master.cpp:376] Master 0b206a40-a9c3-4d44-a5bd-8032d60a32ca (6632562f1ade) started on 172.17.0.2:43484 I0206 00:22:44.799609 2855 master.cpp:378] Flags at startup: --acls="" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/n2FxQV/credentials" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" --registry_strict="true" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" --work_dir="/tmp/n2FxQV/master" --zk_session_timeout="10secs" I0206 00:22:44.71 2855 master.cpp:423] Master only allowing authenticated frameworks to register I0206 00:22:44.89 2855 master.cpp:428] Master only allowing authenticated slaves to register I0206 00:22:44.800020 2855 credentials.hpp:35] Loading credentials for authentication from '/tmp/n2FxQV/credentials' I0206 00:22:44.800245 2850 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 679345ns I0206 00:22:44.800370 2850 replica.cpp:320] Persisted replica status to VOTING I0206 00:22:44.800397 2855 master.cpp:468] Using default 'crammd5' authenticator I0206 00:22:44.800693 2855 master.cpp:537] Using default 'basic' HTTP authenticator I0206 00:22:44.800815 2855 master.cpp:571] Authorization enabled I0206 00:22:44.801216 2850 recover.cpp:578] Successfully joined the Paxos group I0206 00:22:44.801604 2850 recover.cpp:462] Recover process terminated I0206 00:22:44.801759 2856 whitelist_watcher.cpp:77] No whitelist given I0206 00:22:44.801725 2847 hierarchical.cpp:144] Initialized hierarchical allocator process I0206 00:22:44.803982 2855 master.cpp:1712] The newly elected leader is master@172.17.0.2:43484 with id 0b206a40-a9c3-4d44-a5bd-8032d60a32ca I0206 00:22:44.804026 2855 master.cpp:1725] Elected as the leading master! I0206 00:22:44.804059 2855 master.cpp:1470] Recovering from registrar I0206 00:22:44.804424 2855 registrar.cpp:307] Recovering registrar I0206 00:22:44.805202 2855 log.cpp:659] Attempting to start the writer I0206 00:22:44.806782 2856 replica.cpp:493] Replica received implicit promise request from (9475)@172.17.0.2:43484 with proposal 1 I0206 00:22:44.807368 2856 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb
[jira] [Commented] (MESOS-4479) Implement reservation labels
[ https://issues.apache.org/jira/browse/MESOS-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135349#comment-15135349 ] Joseph Wu commented on MESOS-4479: -- Fix for Ubuntu15 compilation: https://reviews.apache.org/r/43276/ > Implement reservation labels > > > Key: MESOS-4479 > URL: https://issues.apache.org/jira/browse/MESOS-4479 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: labels, mesosphere, reservations > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4603) GTEST crashes when starting/stopping many times in succession
[ https://issues.apache.org/jira/browse/MESOS-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-4603: --- Labels: mesosphere tests (was: tests) > GTEST crashes when starting/stopping many times in succession > - > > Key: MESOS-4603 > URL: https://issues.apache.org/jira/browse/MESOS-4603 > Project: Mesos > Issue Type: Bug > Components: tests > Environment: clang 3.4, ubuntu 14.04 >Reporter: Kevin Klues > Labels: mesosphere, tests > > After running: > run-one-until-failure 3rdparty/libprocess/libprocess-tests > At least one iteration of running the tests fails in under a minute with the > following stack trace. The stack trace is differnt sometimes, but it always > seems to error out in ~ProcessManager(). > {noformat} > *** Aborted at 1454643530 (unix time) try "date -d @1454643530" if you are > using GNU date *** > PC: @ 0x7f7812f4d1a0 (unknown) > *** SIGSEGV (@0x0) received by PID 168122 (TID 0x7f780298f700) from PID 0; > stack trace: *** > @ 0x7f7814451340 (unknown) > @ 0x7f7812f4d1a0 (unknown) > @ 0x5f06a0 process::Process<>::self() > @ 0x777220 > _ZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKZZNS_4http8internal7requestERKNS3_7RequestEbENK3$_1clENS3_10ConnectionEEUlvE_PvSA_SD_EENS_6FutureIT_EEPKNS_7ProcessIT0_EEMSI_FSF_T1_T2_ET3_T4_ > @ 0x77714c > _ZN7process13AsyncExecutor7executeIZZNS_4http8internal7requestERKNS2_7RequestEbENK3$_1clENS2_10ConnectionEEUlvE_EENS_6FutureI7NothingEERKT_PN5boost9enable_ifINSG_7is_voidINSt9result_ofIFSD_vEE4typeEEEvE4typeE > @ 0x77709e > _ZN7process5asyncIZZNS_4http8internal7requestERKNS1_7RequestEbENK3$_1clENS1_10ConnectionEEUlvE_EENS_6FutureI7NothingEERKT_PN5boost9enable_ifINSF_7is_voidINSt9result_ofIFSC_vEE4typeEEEvE4typeE > @ 0x777046 > _ZZZN7process4http8internal7requestERKNS0_7RequestEbENK3$_1clENS0_10ConnectionEENKUlvE0_clEv > @ 0x777019 > _ZZNK7process6FutureI7NothingE5onAnyIZZNS_4http8internal7requestERKNS4_7RequestEbENK3$_1clENS4_10ConnectionEEUlvE0_vEERKS2_OT_NS2_10LessPreferEENUlSD_E_clESD_ > @ 0x776e02 > _ZNSt17_Function_handlerIFvRKN7process6FutureI7NothingEEEZNKS3_5onAnyIZZNS0_4http8internal7requestERKNS8_7RequestEbENK3$_1clENS8_10ConnectionEEUlvE0_vEES5_OT_NS3_10LessPreferEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ > @ 0x43f888 std::function<>::operator()() > @ 0x4464ec > _ZN7process8internal3runISt8functionIFvRKNS_6FutureI7NothingJRS5_EEEvRKSt6vectorIT_SaISC_EEDpOT0_ > @ 0x446305 process::Future<>::set() > @ 0x44f90a > _ZNKSt7_Mem_fnIMN7process6FutureI7NothingEEFbRKS2_EEclIJS5_EvEEbRS3_DpOT_ > @ 0x44f7ae > _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureI7NothingEEFbRKS3_EES4_St12_PlaceholderILi16__callIbJS6_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE > @ 0x44f72d > _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureI7NothingEEFbRKS3_EES4_St12_PlaceholderILi1clIJS6_EbEET0_DpOT_ > @ 0x44f6dd > _ZZNK7process6FutureI7NothingE7onReadyISt5_BindIFSt7_Mem_fnIMS2_FbRKS1_EES2_St12_PlaceholderILi1bEERKS2_OT_NS2_6PreferEENUlS7_E_clES7_ > @ 0x44f492 > _ZNSt17_Function_handlerIFvRK7NothingEZNK7process6FutureIS0_E7onReadyISt5_BindIFSt7_Mem_fnIMS6_FbS2_EES6_St12_PlaceholderILi1bEERKS6_OT_NS6_6PreferEEUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x446d68 std::function<>::operator()() > @ 0x44644c > _ZN7process8internal3runISt8functionIFvRK7NothingEEJRS3_EEEvRKSt6vectorIT_SaISA_EEDpOT0_ > @ 0x4462e7 process::Future<>::set() > @ 0x50d5c7 process::Promise<>::set() > @ 0x77c53b > process::http::internal::ConnectionProcess::disconnect() > @ 0x792710 process::http::internal::ConnectionProcess::_read() > @ 0x794356 > _ZZN7process8dispatchINS_4http8internal17ConnectionProcessERKNS_6FutureISsEES5_EEvRKNS_3PIDIT_EEMS9_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESI_ > @ 0x793fa2 > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchINS0_4http8internal17ConnectionProcessERKNS0_6FutureISsEES9_EEvRKNS0_3PIDIT_EEMSD_FvT0_ET1_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x810958 std::function<>::operator()() > @ 0x7fb854 process::ProcessBase::visit() > @ 0x8581ce process::DispatchEvent::visit() > @ 0x43d631 process::ProcessBase::serve() > @ 0x7f9604 process::ProcessManager::resume() > @ 0x8017a5 > process::ProcessManager::init_threads()::$_1::operator()() > @ 0x8016e3 > _ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvE3$_1St17reference_wrapperIKSt11atomic_boolE
[jira] [Updated] (MESOS-4604) ROOT_DOCKER_DockerHealthyTask is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-4604: - Sprint: Mesosphere Sprint 28 Story Points: 3 Labels: flaky-test mesosphere test (was: flaky-test test) > ROOT_DOCKER_DockerHealthyTask is flaky. > --- > > Key: MESOS-4604 > URL: https://issues.apache.org/jira/browse/MESOS-4604 > Project: Mesos > Issue Type: Bug > Components: tests > Environment: CentOS 6/7, Ubuntu 15.04 on AWS. >Reporter: Jan Schlicht >Assignee: Joseph Wu > Labels: flaky-test, mesosphere, test > > Log from Teamcity that is running {{sudo ./bin/mesos-tests.sh}} on AWS EC2 > instances: > {noformat} > [18:27:14][Step 8/8] [--] 8 tests from HealthCheckTest > [18:27:14][Step 8/8] [ RUN ] HealthCheckTest.HealthyTask > [18:27:17][Step 8/8] [ OK ] HealthCheckTest.HealthyTask ( ms) > [18:27:17][Step 8/8] [ RUN ] > HealthCheckTest.ROOT_DOCKER_DockerHealthyTask > [18:27:36][Step 8/8] ../../src/tests/health_check_tests.cpp:388: Failure > [18:27:36][Step 8/8] Failed to wait 15secs for termination > [18:27:36][Step 8/8] F0204 18:27:35.981302 23085 logging.cpp:64] RAW: Pure > virtual method called > [18:27:36][Step 8/8] @ 0x7f7077055e1c google::LogMessage::Fail() > [18:27:36][Step 8/8] @ 0x7f707705ba6f google::RawLog__() > [18:27:36][Step 8/8] @ 0x7f70760f76c9 __cxa_pure_virtual > [18:27:36][Step 8/8] @ 0xa9423c > mesos::internal::tests::Cluster::Slaves::shutdown() > [18:27:36][Step 8/8] @ 0x1074e45 > mesos::internal::tests::MesosTest::ShutdownSlaves() > [18:27:36][Step 8/8] @ 0x1074de4 > mesos::internal::tests::MesosTest::Shutdown() > [18:27:36][Step 8/8] @ 0x1070ec7 > mesos::internal::tests::MesosTest::TearDown() > [18:27:36][Step 8/8] @ 0x16eb7b2 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > [18:27:36][Step 8/8] @ 0x16e61a9 > testing::internal::HandleExceptionsInMethodIfSupported<>() > [18:27:36][Step 8/8] @ 0x16c56aa testing::Test::Run() > [18:27:36][Step 8/8] @ 0x16c5e89 testing::TestInfo::Run() > [18:27:36][Step 8/8] @ 0x16c650a testing::TestCase::Run() > [18:27:36][Step 8/8] @ 0x16cd1f6 > testing::internal::UnitTestImpl::RunAllTests() > [18:27:36][Step 8/8] @ 0x16ec513 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > [18:27:36][Step 8/8] @ 0x16e6df1 > testing::internal::HandleExceptionsInMethodIfSupported<>() > [18:27:36][Step 8/8] @ 0x16cbe26 testing::UnitTest::Run() > [18:27:36][Step 8/8] @ 0xe54c84 RUN_ALL_TESTS() > [18:27:36][Step 8/8] @ 0xe54867 main > [18:27:36][Step 8/8] @ 0x7f7071560a40 (unknown) > [18:27:36][Step 8/8] @ 0x9b52d9 _start > [18:27:36][Step 8/8] Aborted (core dumped) > [18:27:36][Step 8/8] Process exited with code 134 > {noformat} > Happens with Ubuntu 15.04, CentOS 6, CentOS 7 _quite_ often. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4604) ROOT_DOCKER_DockerHealthyTask is flaky.
[ https://issues.apache.org/jira/browse/MESOS-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu reassigned MESOS-4604: Assignee: Joseph Wu > ROOT_DOCKER_DockerHealthyTask is flaky. > --- > > Key: MESOS-4604 > URL: https://issues.apache.org/jira/browse/MESOS-4604 > Project: Mesos > Issue Type: Bug > Components: tests > Environment: CentOS 6/7, Ubuntu 15.04 on AWS. >Reporter: Jan Schlicht >Assignee: Joseph Wu > Labels: flaky-test, test > > Log from Teamcity that is running {{sudo ./bin/mesos-tests.sh}} on AWS EC2 > instances: > {noformat} > [18:27:14][Step 8/8] [--] 8 tests from HealthCheckTest > [18:27:14][Step 8/8] [ RUN ] HealthCheckTest.HealthyTask > [18:27:17][Step 8/8] [ OK ] HealthCheckTest.HealthyTask ( ms) > [18:27:17][Step 8/8] [ RUN ] > HealthCheckTest.ROOT_DOCKER_DockerHealthyTask > [18:27:36][Step 8/8] ../../src/tests/health_check_tests.cpp:388: Failure > [18:27:36][Step 8/8] Failed to wait 15secs for termination > [18:27:36][Step 8/8] F0204 18:27:35.981302 23085 logging.cpp:64] RAW: Pure > virtual method called > [18:27:36][Step 8/8] @ 0x7f7077055e1c google::LogMessage::Fail() > [18:27:36][Step 8/8] @ 0x7f707705ba6f google::RawLog__() > [18:27:36][Step 8/8] @ 0x7f70760f76c9 __cxa_pure_virtual > [18:27:36][Step 8/8] @ 0xa9423c > mesos::internal::tests::Cluster::Slaves::shutdown() > [18:27:36][Step 8/8] @ 0x1074e45 > mesos::internal::tests::MesosTest::ShutdownSlaves() > [18:27:36][Step 8/8] @ 0x1074de4 > mesos::internal::tests::MesosTest::Shutdown() > [18:27:36][Step 8/8] @ 0x1070ec7 > mesos::internal::tests::MesosTest::TearDown() > [18:27:36][Step 8/8] @ 0x16eb7b2 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > [18:27:36][Step 8/8] @ 0x16e61a9 > testing::internal::HandleExceptionsInMethodIfSupported<>() > [18:27:36][Step 8/8] @ 0x16c56aa testing::Test::Run() > [18:27:36][Step 8/8] @ 0x16c5e89 testing::TestInfo::Run() > [18:27:36][Step 8/8] @ 0x16c650a testing::TestCase::Run() > [18:27:36][Step 8/8] @ 0x16cd1f6 > testing::internal::UnitTestImpl::RunAllTests() > [18:27:36][Step 8/8] @ 0x16ec513 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > [18:27:36][Step 8/8] @ 0x16e6df1 > testing::internal::HandleExceptionsInMethodIfSupported<>() > [18:27:36][Step 8/8] @ 0x16cbe26 testing::UnitTest::Run() > [18:27:36][Step 8/8] @ 0xe54c84 RUN_ALL_TESTS() > [18:27:36][Step 8/8] @ 0xe54867 main > [18:27:36][Step 8/8] @ 0x7f7071560a40 (unknown) > [18:27:36][Step 8/8] @ 0x9b52d9 _start > [18:27:36][Step 8/8] Aborted (core dumped) > [18:27:36][Step 8/8] Process exited with code 134 > {noformat} > Happens with Ubuntu 15.04, CentOS 6, CentOS 7 _quite_ often. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4479) Implement reservation labels
[ https://issues.apache.org/jira/browse/MESOS-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135259#comment-15135259 ] Michael Park commented on MESOS-4479: - {noformat} commit 8b5cb55e6f8f1ed78ee43e7d497d9f01f8f0e5fd Author: Neil Conway Date: Fri Feb 5 14:12:22 2016 -0800 Allowed `createLabel` to take an optional `value`. This better matches the underlying protobuf definition. Review: https://reviews.apache.org/r/42753/ {noformat} {noformat} commit 60015ea893dd0dbd96077035a9155c90012173bc Author: Neil Conway Date: Fri Feb 5 14:12:14 2016 -0800 Fixed some typos in test case comments. Review: https://reviews.apache.org/r/42752/ {noformat} {noformat} commit b5833d4d7a8358326149abd1f8d090be0335a7c6 Author: Neil Conway Date: Fri Feb 5 14:11:40 2016 -0800 Tweaked some resource test cases. We should check that two reservations with the same role but different principals are considered distinct. Review: https://reviews.apache.org/r/42751/ {noformat} {noformat} commit d9d966d9e636fd4bee8b902742eaa9cf6dd1b342 Author: Neil Conway Date: Fri Feb 5 14:11:33 2016 -0800 Added `Resources::size()`. Review: https://reviews.apache.org/r/43239/ {noformat} > Implement reservation labels > > > Key: MESOS-4479 > URL: https://issues.apache.org/jira/browse/MESOS-4479 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: labels, mesosphere, reservations > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4479) Implement reservation labels
[ https://issues.apache.org/jira/browse/MESOS-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135255#comment-15135255 ] Michael Park commented on MESOS-4479: - {noformat} commit 0226620747e1769434a1a83da547bfc3470a9549 Author: Neil Conway Date: Thu Feb 4 14:47:13 2016 -0800 Used `std::any_of` instead of `std::count_if` when validating IDs. This makes the intent slightly clearer. In principle, it should save a few cycles as well, but nothing significant. Also, clarify the name of a helper function. Review: https://reviews.apache.org/r/42750/ {noformat} > Implement reservation labels > > > Key: MESOS-4479 > URL: https://issues.apache.org/jira/browse/MESOS-4479 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: labels, mesosphere, reservations > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4609) Subprocess should be more intelligent about setting/inheriting libprocess environment variables
[ https://issues.apache.org/jira/browse/MESOS-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134994#comment-15134994 ] Joseph Wu edited comment on MESOS-4609 at 2/5/16 11:26 PM: --- || Reviews || Summary || | https://reviews.apache.org/r/43260/ https://reviews.apache.org/r/43261/ | Some refactoring of {{process::initialize}} | | https://reviews.apache.org/r/43271/ | Modifications to {{subprocess}} | | https://reviews.apache.org/r/43272/ | Refactor of containerizer, fetcher, container logger | was (Author: kaysoky): || Reviews || Summary || | https://reviews.apache.org/r/43260/ https://reviews.apache.org/r/43261/ | Some refactoring of {{process::initialize}} | | TODO | | > Subprocess should be more intelligent about setting/inheriting libprocess > environment variables > > > Key: MESOS-4609 > URL: https://issues.apache.org/jira/browse/MESOS-4609 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.0 >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere > > Mostly copied from [this > comment|https://issues.apache.org/jira/browse/MESOS-4598?focusedCommentId=15133497&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133497] > A subprocess inheriting the environment variables {{LIBPROCESS_*}} may run > into some accidental fatalities: > | || Subprocess uses libprocess || Subprocess is something else || > || Subprocess sets/inherits the same {{PORT}} by accident | Bind failure -> > exit | Nothing happens (?) | > || Subprocess sets a different {{PORT}} on purpose | Bind success (?) | > Nothing happens (?) | > (?) = means this is usually the case, but not 100%. > A complete fix would look something like: > * If the {{subprocess}} call gets {{environment = None()}}, we should > automatically remove {{LIBPROCESS_PORT}} from the inherited environment. > * The parts of > [{{executorEnvironment}}|https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265] > dealing with libprocess & libmesos should be refactored into libprocess as a > helper. We would use this helper for the Containerizer, Fetcher, and > ContainerLogger module. > * If the {{subprocess}} call is given {{LIBPROCESS_PORT == > os::getenv("LIBPROCESS_PORT")}}, we can LOG(WARN) and unset the env var > locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4582) state.json serving duplicate "active" fields
[ https://issues.apache.org/jira/browse/MESOS-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135228#comment-15135228 ] Michael Park commented on MESOS-4582: - [~marco-mesos] I've already looked it up, and the presence of duplicate keys is valid JSON. Many JSON libraries (Go, Python, C#, etc) simply use the last instance of the duplicate keys. Those same libraries make it hard (impossible?) to generate a JSON with duplicate keys. My proposal here is to take the same approach where we are tolerant of input with duplicate keys, but don't generate JSON with duplicate keys in our output. > state.json serving duplicate "active" fields > > > Key: MESOS-4582 > URL: https://issues.apache.org/jira/browse/MESOS-4582 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.0 >Reporter: Michael Gummelt >Assignee: Michael Park >Priority: Blocker > Attachments: error.json > > > state.json is serving duplicate "active" fields in frameworks. See the > framework "47df96c2-3f85-4bc5-b781-709b2c30c752-" In the attached file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4613) Mesos when used with --log_dir generates hundreds of thousands of log files per day
[ https://issues.apache.org/jira/browse/MESOS-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-4613: -- Affects Version/s: 0.25.0 > Mesos when used with --log_dir generates hundreds of thousands of log files > per day > --- > > Key: MESOS-4613 > URL: https://issues.apache.org/jira/browse/MESOS-4613 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 >Reporter: Lukas Loesche > > We're using mesos with --log_dir=/var/log/mesos > Lately in addition to the mesos-master and mesos-slave log there's also been > mesos-fetcher logs written into this directory. > It seems that every process generates a new log file with a unique file name > containing the date and pid. For mesos-master and mesos-slave this makes > sense. For mesos-fetcher not so much. > On a moderately busy agent it's currently generating 200k log files per day. > On our particular system this would cause logrotate to segfault. And standard > tools like 'rm mesos-fetcher*' won't work because there's too many files to > expand the command. > I also noted that a lot of the created files are zero bytes. So for now we're > running a cron every minute > {noformat} > find /var/log/mesos -size 0 -name 'mesos-fetcher*' -delete > {noformat} > as a workaround. > Anyways it would be nice if there was an option to make the mesos-fetcher > write into a single log file instead of creating thousands of individual > files. > Or if that's easier to implement an option to only write the master and slave > log but not the fetcher logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4613) Mesos when used with --log_dir generates hundreds of thousands of log files per day
Lukas Loesche created MESOS-4613: Summary: Mesos when used with --log_dir generates hundreds of thousands of log files per day Key: MESOS-4613 URL: https://issues.apache.org/jira/browse/MESOS-4613 Project: Mesos Issue Type: Bug Reporter: Lukas Loesche We're using mesos with --log_dir=/var/log/mesos Lately in addition to the mesos-master and mesos-slave log there's also been mesos-fetcher logs written into this directory. It seems that every process generates a new log file with a unique file name containing the date and pid. For mesos-master and mesos-slave this makes sense. For mesos-fetcher not so much. On a moderately busy agent it's currently generating 200k log files per day. On our particular system this would cause logrotate to segfault. And standard tools like 'rm mesos-fetcher*' won't work because there's too many files to expand the command. I also noted that a lot of the created files are zero bytes. So for now we're running a cron every minute {noformat} find /var/log/mesos -size 0 -name 'mesos-fetcher*' -delete {noformat} as a workaround. Anyways it would be nice if there was an option to make the mesos-fetcher write into a single log file instead of creating thousands of individual files. Or if that's easier to implement an option to only write the master and slave log but not the fetcher logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4612) Update to Zookeeper 3.4.7
Cody Maloney created MESOS-4612: --- Summary: Update to Zookeeper 3.4.7 Key: MESOS-4612 URL: https://issues.apache.org/jira/browse/MESOS-4612 Project: Mesos Issue Type: Improvement Reporter: Cody Maloney See: http://zookeeper.apache.org/doc/r3.4.7/releasenotes.html for improvements / bug fixes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1806) Substituting etcd for Zookeeper
[ https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135189#comment-15135189 ] Brandon Philips commented on MESOS-1806: I added an overview section to the etcd v3 API docs with video overviews to the changes: https://github.com/coreos/etcd/blob/master/Documentation/rfc/v3api.md#overview > Substituting etcd for Zookeeper > --- > > Key: MESOS-1806 > URL: https://issues.apache.org/jira/browse/MESOS-1806 > Project: Mesos > Issue Type: Task > Components: leader election >Reporter: Ed Ropple >Assignee: Shuai Lin >Priority: Minor > >eropple: Could you also file a new JIRA for Mesos to drop ZK > in favor of etcd or ReplicatedLog? Would love to get some momentum going on > that one. > -- > Consider it filed. =) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4610) MasterContender/MasterDetector should be loadable as modules
[ https://issues.apache.org/jira/browse/MESOS-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135190#comment-15135190 ] Mark Cavage commented on MESOS-4610: Review posted here: https://reviews.apache.org/r/43269 > MasterContender/MasterDetector should be loadable as modules > > > Key: MESOS-4610 > URL: https://issues.apache.org/jira/browse/MESOS-4610 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Mark Cavage > > Currently mesos depends on Zookeeper for leader election and notification to > slaves, although there is a C++ hierarchy in the code to support alternatives > (e.g., unit tests use an in-memory implementation). From an operational > perspective, many organizations/users do not want to take a dependency on > Zookeeper, and use an alternative solution to implementing leader election. > Our organization in particular, very much wants this, and as a reference > there have been several requests from the community (see referenced tickets) > to replace with etcd/consul/etc. > This ticket will serve as the work effort to modularize the > MasterContender/MasterDetector APIs such that integrators can build a > pluggable solution of their choice; this ticket will not fold in any > implementations such as etcd et al., but simply move this hierarchy to be > fully pluggable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4066) Agent should not return partial state when a request is made to /state endpoint during recovery.
[ https://issues.apache.org/jira/browse/MESOS-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-4066: -- Summary: Agent should not return partial state when a request is made to /state endpoint during recovery. (was: Expose when agent is recovering in the agent's /state endpoint.) > Agent should not return partial state when a request is made to /state > endpoint during recovery. > > > Key: MESOS-4066 > URL: https://issues.apache.org/jira/browse/MESOS-4066 > Project: Mesos > Issue Type: Task > Components: slave >Reporter: Benjamin Mahler >Assignee: Vinod Kone > Labels: mesosphere > > Currently when a user is hitting /state.json on the agent, it may return > partial state if the agent has failed over and is recovering. There is > currently no clear way to tell if this is the case when looking at a > response, so the user may incorrectly interpret the agent as being empty of > tasks. > We could consider exposing the 'state' enum of the agent in the endpoint: > {code} > enum State > { > RECOVERING, // Slave is doing recovery. > DISCONNECTED, // Slave is not connected to the master. > RUNNING, // Slave has (re-)registered. > TERMINATING, // Slave is shutting down. > } state; > {code} > This may be a bit tricky to maintain as far as backwards-compatibility of the > endpoint, if we were to alter this enum. > Exposing this would allow users to be more informed about the state of the > agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-1356) Uncaught exceptions
[ https://issues.apache.org/jira/browse/MESOS-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Browning reassigned MESOS-1356: --- Assignee: Michael Browning > Uncaught exceptions > --- > > Key: MESOS-1356 > URL: https://issues.apache.org/jira/browse/MESOS-1356 > Project: Mesos > Issue Type: Bug >Reporter: Niklas Quarfot Nielsen >Assignee: Michael Browning > Labels: coverity, newbie > > We usually do _not_ use exceptions in Mesos, but some libraries may and we > should handle them and perhaps convert them into Try<>/Error. > > *** CID 1213893: Uncaught exception (UNCAUGHT_EXCEPT) > /src/slave/containerizer/linux_launcher.cpp: 148 in > mesos::internal::slave::_childMain(const std::tr1::function &, int > *)() > 142 return (*func)(); > 143 } > 144 > 145 > 146 // Helper that creates a new session then blocks on reading the pipe > before > 147 // calling the supplied function. > >>> CID 1213893: Uncaught exception (UNCAUGHT_EXCEPT) > >>> In function "_childMain" an exception of type > >>> "std::tr1::bad_function_call" is thrown and never caught. > 148 static int _childMain( > 149 const lambda::function& childFunction, > 150 int pipes[2]) > 151 { > 152 // In child. > 153 os::close(pipes[1]); > > *** CID 1213894: Uncaught exception (UNCAUGHT_EXCEPT) > /src/slave/containerizer/linux_launcher.cpp: 137 in > mesos::internal::slave::childMain(void *)() > 131 > 132 return Nothing(); > 133 } > 134 > 135 > 136 // Helper for clone() which expects an int(void*). > >>> CID 1213894: Uncaught exception (UNCAUGHT_EXCEPT) > >>> In function "childMain" an exception of type > >>> "std::tr1::bad_function_call" is thrown and never caught. > 137 static int childMain(void* child) > 138 { > 139 const lambda::function* func = > 140 static_cast*> (child); > 141 > 142 return (*func)(); > > *** CID 1213895: Uncaught exception (UNCAUGHT_EXCEPT) > /src/usage/main.cpp: 72 in main() > 66<< endl > 67<< "Supported options:" << endl > 68<< flags.usage(); > 69 } > 70 > 71 > >>> CID 1213895: Uncaught exception (UNCAUGHT_EXCEPT) > >>> In function "main" an exception of type > >>> "google::protobuf::FatalException" is thrown and never caught. > 72 int main(int argc, char** argv) > 73 { > 74 GOOGLE_PROTOBUF_VERIFY_VERSION; > 75 > 76 Flags flags; > 77 > /src/usage/main.cpp: 72 in main() > 66<< endl > 67<< "Supported options:" << endl > 68<< flags.usage(); > 69 } > 70 > 71 > >>> CID 1213895: Uncaught exception (UNCAUGHT_EXCEPT) > >>> In function "main" an exception of type > >>> "google::protobuf::FatalException" is thrown and never caught. > 72 int main(int argc, char** argv) > 73 { > 74 GOOGLE_PROTOBUF_VERIFY_VERSION; > 75 > 76 Flags flags; > 77 > /src/usage/main.cpp: 72 in main() > 66<< endl > 67<< "Supported options:" << endl > 68<< flags.usage(); > 69 } > 70 > 71 > >>> CID 1213895: Uncaught exception (UNCAUGHT_EXCEPT) > >>> In function "main" an exception of type > >>> "google::protobuf::FatalException" is thrown and never caught. > 72 int main(int argc, char** argv) > 73 { > 74 GOOGLE_PROTOBUF_VERIFY_VERSION; > 75 > 76 Flags flags; > 77 > > *** CID 1213896: Uncaught exception (UNCAUGHT_EXCEPT) > /src/launcher/executor.cpp: 423 in main() > 417 }; > 418 > 419 } // namespace internal { > 420 } // namespace mesos { > 421 > 422 > >>> CID 1213896: Uncaught exception (UNCAUGHT_EXCEPT) > >>> In function "main" an exception of type "std::tr1::bad_function_call" > >>> is thrown and never caught. > 423 int main(int argc, char** argv) > 424 { > 425 mesos::internal::CommandExecutor executor; > 426 mesos::MesosExecutorDriver driver(&executor); > 427 return driver.run() == mesos::DRIVER_STOPPED ? 0 : 1; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4582) state.json serving duplicate "active" fields
[ https://issues.apache.org/jira/browse/MESOS-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135137#comment-15135137 ] Marco Massenzio edited comment on MESOS-4582 at 2/5/16 10:17 PM: - I'm almost sure that duplicate keys are not legal JSON - worth checking the standard, but I'd be in favor of keeping the checks and throwing back a 406 (Bad Request). Incidentally, as almost *all* JSON libraries in most languages (I know of Java, Python, C++, Scala) model JSON documents with the {{map}} structure, it is virtually impossible (or, at best, extremely difficult) to generate a JSON document with duplicate keys (even assuming that such a thing is syntactically correct). If you want, I can look it up later this weekend and find out what the JSON standard says? Thanks for fixing it! was (Author: marco-mesos): I'm almost sure that duplicate keys are not legal JSON - worth checking the standard, but I'd be in favor of keepin the checks and throwing back a 406 (Bad Request). Incidentally, as almost *all* JSON libraries in most languages (I know of Java, Python, C++, Scala) model JSON documents with the {{map}} structure, it is virtually impossible (or, at best, extremely difficult) to generate a JSON document with duplicate keys (even assuming that such a thing is syntactically correct). If you want, I can look it up later this weekend and find out what the JSON standard says? Thanks for fixing it! > state.json serving duplicate "active" fields > > > Key: MESOS-4582 > URL: https://issues.apache.org/jira/browse/MESOS-4582 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.0 >Reporter: Michael Gummelt >Assignee: Michael Park >Priority: Blocker > Attachments: error.json > > > state.json is serving duplicate "active" fields in frameworks. See the > framework "47df96c2-3f85-4bc5-b781-709b2c30c752-" In the attached file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4582) state.json serving duplicate "active" fields
[ https://issues.apache.org/jira/browse/MESOS-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135137#comment-15135137 ] Marco Massenzio edited comment on MESOS-4582 at 2/5/16 10:17 PM: - I'm almost sure that duplicate keys are not legal JSON - worth checking the standard, but I'd be in favor of keepin the checks and throwing back a 406 (Bad Request). Incidentally, as almost *all* JSON libraries in most languages (I know of Java, Python, C++, Scala) model JSON documents with the {{map}} structure, it is virtually impossible (or, at best, extremely difficult) to generate a JSON document with duplicate keys (even assuming that such a thing is syntactically correct). If you want, I can look it up later this weekend and find out what the JSON standard says? Thanks for fixing it! was (Author: marco-mesos): I'm almost sure that duplicate keys are not legal JSON - worth checking the standard, but I'd be in favor of keepin the checks and throwing back a 406 (Bad Request). If you want, I can look it up later this weekend and find out what the JSON standard says? Thanks for fixing it! > state.json serving duplicate "active" fields > > > Key: MESOS-4582 > URL: https://issues.apache.org/jira/browse/MESOS-4582 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.0 >Reporter: Michael Gummelt >Assignee: Michael Park >Priority: Blocker > Attachments: error.json > > > state.json is serving duplicate "active" fields in frameworks. See the > framework "47df96c2-3f85-4bc5-b781-709b2c30c752-" In the attached file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4582) state.json serving duplicate "active" fields
[ https://issues.apache.org/jira/browse/MESOS-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135137#comment-15135137 ] Marco Massenzio commented on MESOS-4582: I'm almost sure that duplicate keys are not legal JSON - worth checking the standard, but I'd be in favor of keepin the checks and throwing back a 406 (Bad Request). If you want, I can look it up later this weekend and find out what the JSON standard says? Thanks for fixing it! > state.json serving duplicate "active" fields > > > Key: MESOS-4582 > URL: https://issues.apache.org/jira/browse/MESOS-4582 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.0 >Reporter: Michael Gummelt >Assignee: Michael Park >Priority: Blocker > Attachments: error.json > > > state.json is serving duplicate "active" fields in frameworks. See the > framework "47df96c2-3f85-4bc5-b781-709b2c30c752-" In the attached file -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4578) docker run -c is deprecated
[ https://issues.apache.org/jira/browse/MESOS-4578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-4578: Target Version/s: 0.27.1 > docker run -c is deprecated > --- > > Key: MESOS-4578 > URL: https://issues.apache.org/jira/browse/MESOS-4578 > Project: Mesos > Issue Type: Improvement > Components: containerization, docker >Affects Versions: 0.26.0 > Environment: CoreOS 7 >Reporter: Cody Maloney > Labels: mesosphere, newbie > Fix For: 0.27.1 > > > When running mesos slave with the docker containerizer enabled on CoreOS > 766.4.0, launching docker containers results in the following in stderr: > {noformat} > Warning: '-c' is deprecated, it will be replaced by '--cpu-shares' soon. See > usage. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4005) Support workdir runtime configuration from image
[ https://issues.apache.org/jira/browse/MESOS-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113225#comment-15113225 ] Gilbert Song edited comment on MESOS-4005 at 2/5/16 9:37 PM: - https://reviews.apache.org/r/43167/ https://reviews.apache.org/r/43168/ https://reviews.apache.org/r/43083/ was (Author: gilbert): https://reviews.apache.org/r/42540/ > Support workdir runtime configuration from image > - > > Key: MESOS-4005 > URL: https://issues.apache.org/jira/browse/MESOS-4005 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Timothy Chen >Assignee: Gilbert Song > Labels: mesosphere, unified-containerizer-mvp > > We need to support workdir runtime configuration returned from image such as > Dockerfile. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4004) Support default entrypoint and command runtime config in Mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113228#comment-15113228 ] Gilbert Song edited comment on MESOS-4004 at 2/5/16 9:36 PM: - https://reviews.apache.org/r/43081/ https://reviews.apache.org/r/43082/ was (Author: gilbert): https://reviews.apache.org/r/42539/ > Support default entrypoint and command runtime config in Mesos containerizer > > > Key: MESOS-4004 > URL: https://issues.apache.org/jira/browse/MESOS-4004 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Timothy Chen >Assignee: Gilbert Song > Labels: mesosphere, unified-containerizer-mvp > > We need to use the entrypoint and command runtime configuration returned from > image to be used in Mesos containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4383) Support docker runtime configuration env var from image.
[ https://issues.apache.org/jira/browse/MESOS-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113230#comment-15113230 ] Gilbert Song edited comment on MESOS-4383 at 2/5/16 9:35 PM: - https://reviews.apache.org/r/43037/ was (Author: gilbert): https://reviews.apache.org/r/42538/ > Support docker runtime configuration env var from image. > > > Key: MESOS-4383 > URL: https://issues.apache.org/jira/browse/MESOS-4383 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: mesosphere, unified-containerizer-mvp > > We need to support env var configuration returned from docker image in mesos > containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4517) Introduce docker runtime isolator.
[ https://issues.apache.org/jira/browse/MESOS-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135048#comment-15135048 ] Gilbert Song commented on MESOS-4517: - https://reviews.apache.org/r/43021/ https://reviews.apache.org/r/43022/ https://reviews.apache.org/r/43036/ > Introduce docker runtime isolator. > -- > > Key: MESOS-4517 > URL: https://issues.apache.org/jira/browse/MESOS-4517 > Project: Mesos > Issue Type: Bug > Components: isolation >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: mesosphere > > Currently docker image default configuration are included in `ProvisionInfo`. > We should grab necessary config from `ProvisionInfo` into `ContainerInfo`, > and handle all these runtime informations inside of docker runtime isolator. > Return a `ContainerLaunchInfo` containing `working_dir`, `env` and merged > `commandInfo`, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4601) Don't dump stack trace on failure to bind()
[ https://issues.apache.org/jira/browse/MESOS-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135005#comment-15135005 ] Joseph Wu commented on MESOS-4601: -- Note, I effectively made this change in the refactor here: https://reviews.apache.org/r/43261/ > Don't dump stack trace on failure to bind() > --- > > Key: MESOS-4601 > URL: https://issues.apache.org/jira/browse/MESOS-4601 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Assignee: Yong Tang > Labels: errorhandling, libprocess, mesosphere, newbie > > We should do {{EXIT(EXIT_FAILURE)}} rather than {{LOG(FATAL)}}, both for this > code path and a few other expected error conditions in libprocess network > initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4609) Subprocess should be more intelligent about setting/inheriting libprocess environment variables
[ https://issues.apache.org/jira/browse/MESOS-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134994#comment-15134994 ] Joseph Wu commented on MESOS-4609: -- || Reviews || Summary || | https://reviews.apache.org/r/43260/ https://reviews.apache.org/r/43261/ | Some refactoring of {{process::initialize}} | | TODO | | > Subprocess should be more intelligent about setting/inheriting libprocess > environment variables > > > Key: MESOS-4609 > URL: https://issues.apache.org/jira/browse/MESOS-4609 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.0 >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere > > Mostly copied from [this > comment|https://issues.apache.org/jira/browse/MESOS-4598?focusedCommentId=15133497&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133497] > A subprocess inheriting the environment variables {{LIBPROCESS_*}} may run > into some accidental fatalities: > | || Subprocess uses libprocess || Subprocess is something else || > || Subprocess sets/inherits the same {{PORT}} by accident | Bind failure -> > exit | Nothing happens (?) | > || Subprocess sets a different {{PORT}} on purpose | Bind success (?) | > Nothing happens (?) | > (?) = means this is usually the case, but not 100%. > A complete fix would look something like: > * If the {{subprocess}} call gets {{environment = None()}}, we should > automatically remove {{LIBPROCESS_PORT}} from the inherited environment. > * The parts of > [{{executorEnvironment}}|https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265] > dealing with libprocess & libmesos should be refactored into libprocess as a > helper. We would use this helper for the Containerizer, Fetcher, and > ContainerLogger module. > * If the {{subprocess}} call is given {{LIBPROCESS_PORT == > os::getenv("LIBPROCESS_PORT")}}, we can LOG(WARN) and unset the env var > locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4611) Passing a lambda to dispatch() always matches the template returning void
[ https://issues.apache.org/jira/browse/MESOS-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-4611: --- Description: The following idiom does not currently compile: {code} Future initialized = dispatch(pid, [] () -> Nothing { return Nothing(); }); {code} This seems non-intuitive because the following template exists for dispatch: {code} template Future dispatch(const UPID& pid, const std::function& f) { std::shared_ptr> promise(new Promise()); std::shared_ptr> f_( new std::function( [=](ProcessBase*) { promise->set(f()); })); internal::dispatch(pid, f_); return promise->future(); } {code} However, lambdas cannot be implicitly cast to a corresponding std::function type. To make this work, you have to explicitly type the lambda before passing it to dispatch. {code} std::function f = []() { return Nothing(); }; Future initialized = dispatch(pid, f); {code} We should add template support to allow lambdas to be passed to dispatch() without explicit typing. was: The following idiom does not currently compile: {code} Future initialized = dispatch(pid, [] () -> Nothing { return Nothing(); }) {code} This seems non-intuitive because the following template exists for dispatch: {code} template Future dispatch(const UPID& pid, const std::function& f) { std::shared_ptr> promise(new Promise()); std::shared_ptr> f_( new std::function( [=](ProcessBase*) { promise->set(f()); })); internal::dispatch(pid, f_); return promise->future(); } {code} To make this work, you have to explicitly type the lambda before passing it to dispatch. {code} std::function f = []() { return Nothing(); }; Future initialized = dispatch(pid, f); {code} We should add template support to allow lambdas to be passed to dispatch() without explicit typing. > Passing a lambda to dispatch() always matches the template returning void > - > > Key: MESOS-4611 > URL: https://issues.apache.org/jira/browse/MESOS-4611 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Kevin Klues > Labels: dispatch, libprocess, mesosphere > > The following idiom does not currently compile: > {code} > Future initialized = dispatch(pid, [] () -> Nothing { > return Nothing(); > }); > {code} > This seems non-intuitive because the following template exists for dispatch: > {code} > template > Future dispatch(const UPID& pid, const std::function& f) > { > std::shared_ptr> promise(new Promise()); > > std::shared_ptr> f_( > new std::function( > [=](ProcessBase*) { > promise->set(f()); > })); > internal::dispatch(pid, f_); > > return promise->future(); > } > {code} > However, lambdas cannot be implicitly cast to a corresponding > std::function type. > To make this work, you have to explicitly type the lambda before passing it > to dispatch. > {code} > std::function f = []() { return Nothing(); }; > Future initialized = dispatch(pid, f); > {code} > We should add template support to allow lambdas to be passed to dispatch() > without explicit typing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4611) Passing a lambda to dispatch() always matches the template returning void
Kevin Klues created MESOS-4611: -- Summary: Passing a lambda to dispatch() always matches the template returning void Key: MESOS-4611 URL: https://issues.apache.org/jira/browse/MESOS-4611 Project: Mesos Issue Type: Bug Components: libprocess Reporter: Kevin Klues The following idiom does not currently compile: {code} Future initialized = dispatch(pid, [] () -> Nothing { return Nothing(); }) {code} This seems non-intuitive because the following template exists for dispatch: {code} template Future dispatch(const UPID& pid, const std::function& f) { std::shared_ptr> promise(new Promise()); std::shared_ptr> f_( new std::function( [=](ProcessBase*) { promise->set(f()); })); internal::dispatch(pid, f_); return promise->future(); } {code} To make this work, you have to explicitly type the lambda before passing it to dispatch. {code} std::function f = []() { return Nothing(); }; Future initialized = dispatch(pid, f); {code} We should add template support to allow lambdas to be passed to dispatch() without explicit typing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4610) MasterContender/MasterDetector should be loadable as modules
[ https://issues.apache.org/jira/browse/MESOS-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Cavage updated MESOS-4610: --- External issue ID: (was: MESOS-1806) > MasterContender/MasterDetector should be loadable as modules > > > Key: MESOS-4610 > URL: https://issues.apache.org/jira/browse/MESOS-4610 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Mark Cavage > > Currently mesos depends on Zookeeper for leader election and notification to > slaves, although there is a C++ hierarchy in the code to support alternatives > (e.g., unit tests use an in-memory implementation). From an operational > perspective, many organizations/users do not want to take a dependency on > Zookeeper, and use an alternative solution to implementing leader election. > Our organization in particular, very much wants this, and as a reference > there have been several requests from the community (see referenced tickets) > to replace with etcd/consul/etc. > This ticket will serve as the work effort to modularize the > MasterContender/MasterDetector APIs such that integrators can build a > pluggable solution of their choice; this ticket will not fold in any > implementations such as etcd et al., but simply move this hierarchy to be > fully pluggable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4610) MasterContender/MasterDetector should be loadable as modules
[ https://issues.apache.org/jira/browse/MESOS-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Cavage updated MESOS-4610: --- External issue ID: MESOS-1806 > MasterContender/MasterDetector should be loadable as modules > > > Key: MESOS-4610 > URL: https://issues.apache.org/jira/browse/MESOS-4610 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Mark Cavage > > Currently mesos depends on Zookeeper for leader election and notification to > slaves, although there is a C++ hierarchy in the code to support alternatives > (e.g., unit tests use an in-memory implementation). From an operational > perspective, many organizations/users do not want to take a dependency on > Zookeeper, and use an alternative solution to implementing leader election. > Our organization in particular, very much wants this, and as a reference > there have been several requests from the community (see referenced tickets) > to replace with etcd/consul/etc. > This ticket will serve as the work effort to modularize the > MasterContender/MasterDetector APIs such that integrators can build a > pluggable solution of their choice; this ticket will not fold in any > implementations such as etcd et al., but simply move this hierarchy to be > fully pluggable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4610) MasterContender/MasterDetector should be loadable as modules
Mark Cavage created MESOS-4610: -- Summary: MasterContender/MasterDetector should be loadable as modules Key: MESOS-4610 URL: https://issues.apache.org/jira/browse/MESOS-4610 Project: Mesos Issue Type: Improvement Components: master Reporter: Mark Cavage Currently mesos depends on Zookeeper for leader election and notification to slaves, although there is a C++ hierarchy in the code to support alternatives (e.g., unit tests use an in-memory implementation). From an operational perspective, many organizations/users do not want to take a dependency on Zookeeper, and use an alternative solution to implementing leader election. Our organization in particular, very much wants this, and as a reference there have been several requests from the community (see referenced tickets) to replace with etcd/consul/etc. This ticket will serve as the work effort to modularize the MasterContender/MasterDetector APIs such that integrators can build a pluggable solution of their choice; this ticket will not fold in any implementations such as etcd et al., but simply move this hierarchy to be fully pluggable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4609) Subprocess should be more intelligent about setting/inheriting libprocess environment variables
[ https://issues.apache.org/jira/browse/MESOS-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-4609: - Description: Mostly copied from [this comment|https://issues.apache.org/jira/browse/MESOS-4598?focusedCommentId=15133497&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133497] A subprocess inheriting the environment variables {{LIBPROCESS_*}} may run into some accidental fatalities: | || Subprocess uses libprocess || Subprocess is something else || || Subprocess sets/inherits the same {{PORT}} by accident | Bind failure -> exit | Nothing happens (?) | || Subprocess sets a different {{PORT}} on purpose | Bind success (?) | Nothing happens (?) | (?) = means this is usually the case, but not 100%. A complete fix would look something like: * If the {{subprocess}} call gets {{environment = None()}}, we should automatically remove {{LIBPROCESS_PORT}} from the inherited environment. * The parts of [{{executorEnvironment}}|https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265] dealing with libprocess & libmesos should be refactored into libprocess as a helper. We would use this helper for the Containerizer, Fetcher, and ContainerLogger module. * If the {{subprocess}} call is given {{LIBPROCESS_PORT == os::getenv("LIBPROCESS_PORT")}}, we can LOG(WARN) and unset the env var locally. was: Mostly copied from [this comment|https://issues.apache.org/jira/browse/MESOS-4598?focusedCommentId=15133497&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133497] A subprocess inheriting the environment variables {{LIBPROCESS_*}} may run into some accidental fatalities: | || Subprocess uses libprocess || Subprocess is something else || || Subprocess sets/inherits the same {{PORT}} by accident | Bind failure -> exit Option #1 above prevents accidental inheritance | Nothing happens (?) | || Subprocess sets a different {{PORT}} on purpose | Bind success (?) | Nothing happens (?) | (?) = means this is usually the case, but not 100%. A complete fix would look something like: * If the {{subprocess}} call gets {{environment = None()}}, we should automatically remove {{LIBPROCESS_PORT}} from the inherited environment. * The parts of [{{executorEnvironment}}|https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265] dealing with libprocess & libmesos should be refactored into libprocess as a helper. We would use this helper for the Containerizer, Fetcher, and ContainerLogger module. * If the {{subprocess}} call is given {{LIBPROCESS_PORT == os::getenv("LIBPROCESS_PORT")}}, we can LOG(WARN) and unset the env var locally. > Subprocess should be more intelligent about setting/inheriting libprocess > environment variables > > > Key: MESOS-4609 > URL: https://issues.apache.org/jira/browse/MESOS-4609 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.0 >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere > > Mostly copied from [this > comment|https://issues.apache.org/jira/browse/MESOS-4598?focusedCommentId=15133497&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133497] > A subprocess inheriting the environment variables {{LIBPROCESS_*}} may run > into some accidental fatalities: > | || Subprocess uses libprocess || Subprocess is something else || > || Subprocess sets/inherits the same {{PORT}} by accident | Bind failure -> > exit | Nothing happens (?) | > || Subprocess sets a different {{PORT}} on purpose | Bind success (?) | > Nothing happens (?) | > (?) = means this is usually the case, but not 100%. > A complete fix would look something like: > * If the {{subprocess}} call gets {{environment = None()}}, we should > automatically remove {{LIBPROCESS_PORT}} from the inherited environment. > * The parts of > [{{executorEnvironment}}|https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265] > dealing with libprocess & libmesos should be refactored into libprocess as a > helper. We would use this helper for the Containerizer, Fetcher, and > ContainerLogger module. > * If the {{subprocess}} call is given {{LIBPROCESS_PORT == > os::getenv("LIBPROCESS_PORT")}}, we can LOG(WARN) and unset the env var > locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4603) GTEST crashes when starting/stopping many times in succession
[ https://issues.apache.org/jira/browse/MESOS-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134673#comment-15134673 ] Joseph Wu commented on MESOS-4603: -- Possibly related to some races in {{process::finalize}}, which only gets called at the end of the libprocess tests currently. > GTEST crashes when starting/stopping many times in succession > - > > Key: MESOS-4603 > URL: https://issues.apache.org/jira/browse/MESOS-4603 > Project: Mesos > Issue Type: Bug > Components: tests > Environment: clang 3.4, ubuntu 14.04 >Reporter: Kevin Klues > Labels: tests > > After running: > run-one-until-failure 3rdparty/libprocess/libprocess-tests > At least one iteration of running the tests fails in under a minute with the > following stack trace. The stack trace is differnt sometimes, but it always > seems to error out in ~ProcessManager(). > {noformat} > *** Aborted at 1454643530 (unix time) try "date -d @1454643530" if you are > using GNU date *** > PC: @ 0x7f7812f4d1a0 (unknown) > *** SIGSEGV (@0x0) received by PID 168122 (TID 0x7f780298f700) from PID 0; > stack trace: *** > @ 0x7f7814451340 (unknown) > @ 0x7f7812f4d1a0 (unknown) > @ 0x5f06a0 process::Process<>::self() > @ 0x777220 > _ZN7process8dispatchI7NothingNS_20AsyncExecutorProcessERKZZNS_4http8internal7requestERKNS3_7RequestEbENK3$_1clENS3_10ConnectionEEUlvE_PvSA_SD_EENS_6FutureIT_EEPKNS_7ProcessIT0_EEMSI_FSF_T1_T2_ET3_T4_ > @ 0x77714c > _ZN7process13AsyncExecutor7executeIZZNS_4http8internal7requestERKNS2_7RequestEbENK3$_1clENS2_10ConnectionEEUlvE_EENS_6FutureI7NothingEERKT_PN5boost9enable_ifINSG_7is_voidINSt9result_ofIFSD_vEE4typeEEEvE4typeE > @ 0x77709e > _ZN7process5asyncIZZNS_4http8internal7requestERKNS1_7RequestEbENK3$_1clENS1_10ConnectionEEUlvE_EENS_6FutureI7NothingEERKT_PN5boost9enable_ifINSF_7is_voidINSt9result_ofIFSC_vEE4typeEEEvE4typeE > @ 0x777046 > _ZZZN7process4http8internal7requestERKNS0_7RequestEbENK3$_1clENS0_10ConnectionEENKUlvE0_clEv > @ 0x777019 > _ZZNK7process6FutureI7NothingE5onAnyIZZNS_4http8internal7requestERKNS4_7RequestEbENK3$_1clENS4_10ConnectionEEUlvE0_vEERKS2_OT_NS2_10LessPreferEENUlSD_E_clESD_ > @ 0x776e02 > _ZNSt17_Function_handlerIFvRKN7process6FutureI7NothingEEEZNKS3_5onAnyIZZNS0_4http8internal7requestERKNS8_7RequestEbENK3$_1clENS8_10ConnectionEEUlvE0_vEES5_OT_NS3_10LessPreferEEUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ > @ 0x43f888 std::function<>::operator()() > @ 0x4464ec > _ZN7process8internal3runISt8functionIFvRKNS_6FutureI7NothingJRS5_EEEvRKSt6vectorIT_SaISC_EEDpOT0_ > @ 0x446305 process::Future<>::set() > @ 0x44f90a > _ZNKSt7_Mem_fnIMN7process6FutureI7NothingEEFbRKS2_EEclIJS5_EvEEbRS3_DpOT_ > @ 0x44f7ae > _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureI7NothingEEFbRKS3_EES4_St12_PlaceholderILi16__callIbJS6_EJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE > @ 0x44f72d > _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureI7NothingEEFbRKS3_EES4_St12_PlaceholderILi1clIJS6_EbEET0_DpOT_ > @ 0x44f6dd > _ZZNK7process6FutureI7NothingE7onReadyISt5_BindIFSt7_Mem_fnIMS2_FbRKS1_EES2_St12_PlaceholderILi1bEERKS2_OT_NS2_6PreferEENUlS7_E_clES7_ > @ 0x44f492 > _ZNSt17_Function_handlerIFvRK7NothingEZNK7process6FutureIS0_E7onReadyISt5_BindIFSt7_Mem_fnIMS6_FbS2_EES6_St12_PlaceholderILi1bEERKS6_OT_NS6_6PreferEEUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x446d68 std::function<>::operator()() > @ 0x44644c > _ZN7process8internal3runISt8functionIFvRK7NothingEEJRS3_EEEvRKSt6vectorIT_SaISA_EEDpOT0_ > @ 0x4462e7 process::Future<>::set() > @ 0x50d5c7 process::Promise<>::set() > @ 0x77c53b > process::http::internal::ConnectionProcess::disconnect() > @ 0x792710 process::http::internal::ConnectionProcess::_read() > @ 0x794356 > _ZZN7process8dispatchINS_4http8internal17ConnectionProcessERKNS_6FutureISsEES5_EEvRKNS_3PIDIT_EEMS9_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESI_ > @ 0x793fa2 > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchINS0_4http8internal17ConnectionProcessERKNS0_6FutureISsEES9_EEvRKNS0_3PIDIT_EEMSD_FvT0_ET1_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x810958 std::function<>::operator()() > @ 0x7fb854 process::ProcessBase::visit() > @ 0x8581ce process::DispatchEvent::visit() > @ 0x43d631 process::ProcessBase::serve() > @ 0x7f9604 process::ProcessManager::resume() > @ 0x8017a5 > process::ProcessManager::init_threads()::$_1::operator()() >
[jira] [Updated] (MESOS-4609) Subprocess should be more intelligent about setting/inheriting libprocess environment variables
[ https://issues.apache.org/jira/browse/MESOS-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-4609: - Target Version/s: 0.28.0 (was: 0.27.1) > Subprocess should be more intelligent about setting/inheriting libprocess > environment variables > > > Key: MESOS-4609 > URL: https://issues.apache.org/jira/browse/MESOS-4609 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.0 >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere > > Mostly copied from [this > comment|https://issues.apache.org/jira/browse/MESOS-4598?focusedCommentId=15133497&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133497] > A subprocess inheriting the environment variables {{LIBPROCESS_*}} may run > into some accidental fatalities: > | || Subprocess uses libprocess || Subprocess is something else || > || Subprocess sets/inherits the same {{PORT}} by accident | Bind failure -> > exit > Option #1 above prevents accidental inheritance | Nothing happens (?) | > || Subprocess sets a different {{PORT}} on purpose | Bind success (?) | > Nothing happens (?) | > (?) = means this is usually the case, but not 100%. > A complete fix would look something like: > * If the {{subprocess}} call gets {{environment = None()}}, we should > automatically remove {{LIBPROCESS_PORT}} from the inherited environment. > * The parts of > [{{executorEnvironment}}|https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265] > dealing with libprocess & libmesos should be refactored into libprocess as a > helper. We would use this helper for the Containerizer, Fetcher, and > ContainerLogger module. > * If the {{subprocess}} call is given {{LIBPROCESS_PORT == > os::getenv("LIBPROCESS_PORT")}}, we can LOG(WARN) and unset the env var > locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4609) Subprocess should be more intelligent about setting/inheriting libprocess environment variables
[ https://issues.apache.org/jira/browse/MESOS-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-4609: - Story Points: 2 (was: 1) > Subprocess should be more intelligent about setting/inheriting libprocess > environment variables > > > Key: MESOS-4609 > URL: https://issues.apache.org/jira/browse/MESOS-4609 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.0 >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere > > Mostly copied from [this > comment|https://issues.apache.org/jira/browse/MESOS-4598?focusedCommentId=15133497&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133497] > A subprocess inheriting the environment variables {{LIBPROCESS_*}} may run > into some accidental fatalities: > | || Subprocess uses libprocess || Subprocess is something else || > || Subprocess sets/inherits the same {{PORT}} by accident | Bind failure -> > exit > Option #1 above prevents accidental inheritance | Nothing happens (?) | > || Subprocess sets a different {{PORT}} on purpose | Bind success (?) | > Nothing happens (?) | > (?) = means this is usually the case, but not 100%. > A complete fix would look something like: > * If the {{subprocess}} call gets {{environment = None()}}, we should > automatically remove {{LIBPROCESS_PORT}} from the inherited environment. > * The parts of > [{{executorEnvironment}}|https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265] > dealing with libprocess & libmesos should be refactored into libprocess as a > helper. We would use this helper for the Containerizer, Fetcher, and > ContainerLogger module. > * If the {{subprocess}} call is given {{LIBPROCESS_PORT == > os::getenv("LIBPROCESS_PORT")}}, we can LOG(WARN) and unset the env var > locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4609) Subprocess should be more intelligent about setting/inheriting libprocess environment variables
[ https://issues.apache.org/jira/browse/MESOS-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-4609: - Description: Mostly copied from [this comment|https://issues.apache.org/jira/browse/MESOS-4598?focusedCommentId=15133497&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133497] A subprocess inheriting the environment variables {{LIBPROCESS_*}} may run into some accidental fatalities: | || Subprocess uses libprocess || Subprocess is something else || || Subprocess sets/inherits the same {{PORT}} by accident | Bind failure -> exit Option #1 above prevents accidental inheritance | Nothing happens (?) | || Subprocess sets a different {{PORT}} on purpose | Bind success (?) | Nothing happens (?) | (?) = means this is usually the case, but not 100%. A complete fix would look something like: * If the {{subprocess}} call gets {{environment = None()}}, we should automatically remove {{LIBPROCESS_PORT}} from the inherited environment. * The parts of [{{executorEnvironment}}|https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265] dealing with libprocess & libmesos should be refactored into libprocess as a helper. We would use this helper for the Containerizer, Fetcher, and ContainerLogger module. * If the {{subprocess}} call is given {{LIBPROCESS_PORT == os::getenv("LIBPROCESS_PORT")}}, we can LOG(WARN) and unset the env var locally. was: The {{LogrotateContainerLogger}} starts libprocess-using subprocesses. Libprocess initialization will attempt to resolve the IP from the hostname. If a DNS service is not available, this step will fail, which terminates the logger subprocess prematurely. Since the logger subprocesses live on the agent, they should use the same {{LIBPROCESS_IP}} supplied to the agent. > Subprocess should be more intelligent about setting/inheriting libprocess > environment variables > > > Key: MESOS-4609 > URL: https://issues.apache.org/jira/browse/MESOS-4609 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.0 >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere > > Mostly copied from [this > comment|https://issues.apache.org/jira/browse/MESOS-4598?focusedCommentId=15133497&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133497] > A subprocess inheriting the environment variables {{LIBPROCESS_*}} may run > into some accidental fatalities: > | || Subprocess uses libprocess || Subprocess is something else || > || Subprocess sets/inherits the same {{PORT}} by accident | Bind failure -> > exit > Option #1 above prevents accidental inheritance | Nothing happens (?) | > || Subprocess sets a different {{PORT}} on purpose | Bind success (?) | > Nothing happens (?) | > (?) = means this is usually the case, but not 100%. > A complete fix would look something like: > * If the {{subprocess}} call gets {{environment = None()}}, we should > automatically remove {{LIBPROCESS_PORT}} from the inherited environment. > * The parts of > [{{executorEnvironment}}|https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265] > dealing with libprocess & libmesos should be refactored into libprocess as a > helper. We would use this helper for the Containerizer, Fetcher, and > ContainerLogger module. > * If the {{subprocess}} call is given {{LIBPROCESS_PORT == > os::getenv("LIBPROCESS_PORT")}}, we can LOG(WARN) and unset the env var > locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4609) Subprocess should be more intelligent about setting/inheriting libprocess environment variables
[ https://issues.apache.org/jira/browse/MESOS-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-4609: - Fix Version/s: (was: 0.27.1) > Subprocess should be more intelligent about setting/inheriting libprocess > environment variables > > > Key: MESOS-4609 > URL: https://issues.apache.org/jira/browse/MESOS-4609 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.27.0 >Reporter: Joseph Wu >Assignee: Joseph Wu > Labels: mesosphere > > Mostly copied from [this > comment|https://issues.apache.org/jira/browse/MESOS-4598?focusedCommentId=15133497&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15133497] > A subprocess inheriting the environment variables {{LIBPROCESS_*}} may run > into some accidental fatalities: > | || Subprocess uses libprocess || Subprocess is something else || > || Subprocess sets/inherits the same {{PORT}} by accident | Bind failure -> > exit > Option #1 above prevents accidental inheritance | Nothing happens (?) | > || Subprocess sets a different {{PORT}} on purpose | Bind success (?) | > Nothing happens (?) | > (?) = means this is usually the case, but not 100%. > A complete fix would look something like: > * If the {{subprocess}} call gets {{environment = None()}}, we should > automatically remove {{LIBPROCESS_PORT}} from the inherited environment. > * The parts of > [{{executorEnvironment}}|https://github.com/apache/mesos/blame/master/src/slave/containerizer/containerizer.cpp#L265] > dealing with libprocess & libmesos should be refactored into libprocess as a > helper. We would use this helper for the Containerizer, Fetcher, and > ContainerLogger module. > * If the {{subprocess}} call is given {{LIBPROCESS_PORT == > os::getenv("LIBPROCESS_PORT")}}, we can LOG(WARN) and unset the env var > locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4609) Subprocess should be more intelligent about setting/inheriting libprocess environment variables
Joseph Wu created MESOS-4609: Summary: Subprocess should be more intelligent about setting/inheriting libprocess environment variables Key: MESOS-4609 URL: https://issues.apache.org/jira/browse/MESOS-4609 Project: Mesos Issue Type: Bug Affects Versions: 0.27.0 Reporter: Joseph Wu Assignee: Joseph Wu Fix For: 0.27.1 The {{LogrotateContainerLogger}} starts libprocess-using subprocesses. Libprocess initialization will attempt to resolve the IP from the hostname. If a DNS service is not available, this step will fail, which terminates the logger subprocess prematurely. Since the logger subprocesses live on the agent, they should use the same {{LIBPROCESS_IP}} supplied to the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4607) Docker image create should not return any error with env var
Gilbert Song created MESOS-4607: --- Summary: Docker image create should not return any error with env var Key: MESOS-4607 URL: https://issues.apache.org/jira/browse/MESOS-4607 Project: Mesos Issue Type: Bug Components: docker Reporter: Gilbert Song Priority: Minor In docker image create behavior, entrypoint and environment variables are read from docker inspect. Error should not be returned from finding any wrong-formatted env var, which may possibly block docker containerizer. Specifically, we may want to just `LOG(WARNING)` for those unexpected env var (Please see https://github.com/apache/mesos/blob/master/src/docker/docker.cpp#L388~#L395). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4608) Consider deprecating `slave(1)` delegate in favor of `slave` on Agent
Anand Mazumdar created MESOS-4608: - Summary: Consider deprecating `slave(1)` delegate in favor of `slave` on Agent Key: MESOS-4608 URL: https://issues.apache.org/jira/browse/MESOS-4608 Project: Mesos Issue Type: Improvement Components: HTTP API Reporter: Anand Mazumdar Historically, we were using a {{slave(1)}} delegate on the agent while initializing {{libprocess}}. This meant that all root HTTP requests to agent {{ip:port}} were forwarded to {{slave(1)}} route. With MESOS-4255, we added the ability to pass in the process ID to the agent constructor. Hence, we should now be able to use {{slave}} as the delegate instead of {{slave(1)}}. This would however need to go through a deprecation cycle as there might be existing users relying on the {{slave(1)}} endpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4587) Docker environment variables must be able to contain the equal sign
[ https://issues.apache.org/jira/browse/MESOS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-4587: -- Shepherd: Jie Yu > Docker environment variables must be able to contain the equal sign > --- > > Key: MESOS-4587 > URL: https://issues.apache.org/jira/browse/MESOS-4587 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.25.0, 0.26.0, 0.27.0 >Reporter: Martin Tapp >Assignee: Shuai Lin > Labels: containerizer > Fix For: 0.27.1 > > > Note: Affects 0.26 and 0.27. > The Jupyter Docker all-spark-notebook uses equal sign ('=') in Docker ENV > declarations (for instance, > https://github.com/jupyter/docker-stacks/blob/master/all-spark-notebook/Dockerfile#L51). > This causes a mesos Unexpected Env format for 'ContainerConfig.Env' error. > The problem is the tokenization code at > https://github.com/apache/mesos/blob/21e080c5ae6ef03556c7a2b588e034a916c7a05a/src/docker/docker.cpp#L386 > which needs to only look at the first equal sign. Docker ENV declarations > can also be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4590) Add test case for reservations with same role, different principals
[ https://issues.apache.org/jira/browse/MESOS-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4590: --- Shepherd: Michael Park > Add test case for reservations with same role, different principals > --- > > Key: MESOS-4590 > URL: https://issues.apache.org/jira/browse/MESOS-4590 > Project: Mesos > Issue Type: Task > Components: master, test >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere, reservations, test > > We don't have a test case that covers $SUBJECT; we probably should. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history
[ https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134538#comment-15134538 ] Kevin Klues commented on MESOS-3307: I'm all for query parameters to filter this stuff, but others seem to disagree. (See the thread above). > Configurable size of completed task / framework history > --- > > Key: MESOS-3307 > URL: https://issues.apache.org/jira/browse/MESOS-3307 > Project: Mesos > Issue Type: Bug >Reporter: Ian Babrou >Assignee: Kevin Klues > Labels: mesosphere > Fix For: 0.27.0 > > > We try to make Mesos work with multiple frameworks and mesos-dns at the same > time. The goal is to have set of frameworks per team / project on a single > Mesos cluster. > At this point our mesos state.json is at 4mb and it takes a while to > assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively > pushing mesos-master CPU usage through the roof. It's at 100%+ all the time. > Here's the problem: > {noformat} > mesos λ curl -s http://mesos-master:5050/master/state.json | jq > .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n >1 "20150606-001827-252388362-5050-5982-0003" > 16 "20150606-001827-252388362-5050-5982-0005" > 18 "20150606-001827-252388362-5050-5982-0029" > 73 "20150606-001827-252388362-5050-5982-0007" > 141 "20150606-001827-252388362-5050-5982-0009" > 154 "20150820-154817-302720010-5050-15320-" > 289 "20150606-001827-252388362-5050-5982-0004" > 510 "20150606-001827-252388362-5050-5982-0012" > 666 "20150606-001827-252388362-5050-5982-0028" > 923 "20150116-002612-269165578-5050-32204-0003" > 1000 "20150606-001827-252388362-5050-5982-0001" > 1000 "20150606-001827-252388362-5050-5982-0006" > 1000 "20150606-001827-252388362-5050-5982-0010" > 1000 "20150606-001827-252388362-5050-5982-0011" > 1000 "20150606-001827-252388362-5050-5982-0027" > mesos λ fgrep 1000 -r src/master > src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10; > src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = > 1000; > {noformat} > Active tasks are just 6% of state.json response: > {noformat} > mesos λ cat ~/temp/mesos-state.json | jq -c . | wc >1 14796 4138942 > mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc > 16 37 252774 > {noformat} > I see four options that can improve the situation: > 1. Add query string param to exclude completed tasks from state.json and use > it in mesos-dns and similar tools. There is no need for mesos-dns to know > about completed tasks, it's just extra load on master and mesos-dns. > 2. Make history size configurable. > 3. Make JSON serialization faster. With 1s of tasks even without history > it would take a lot of time to serialize tasks for mesos-dns. Doing it every > 60 seconds instead of every 5 seconds isn't really an option. > 4. Create event bus for mesos master. Marathon has it and it'd be nice to > have it in Mesos. This way mesos-dns could avoid polling master state and > switch to listening for events. > All can be done independently. > Note to mesosphere folks: please start distributing debug symbols with your > distribution. I was asking for it for a while and it is really helpful: > https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501 > Perf report for leading master: > !http://i.imgur.com/iz7C3o0.png! > I'm on 0.23.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4587) Docker environment variables must be able to contain the equal sign
[ https://issues.apache.org/jira/browse/MESOS-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4587: -- Target Version/s: 0.27.1 > Docker environment variables must be able to contain the equal sign > --- > > Key: MESOS-4587 > URL: https://issues.apache.org/jira/browse/MESOS-4587 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.25.0, 0.26.0, 0.27.0 >Reporter: Martin Tapp >Assignee: Shuai Lin > Labels: containerizer > Fix For: 0.27.1 > > > Note: Affects 0.26 and 0.27. > The Jupyter Docker all-spark-notebook uses equal sign ('=') in Docker ENV > declarations (for instance, > https://github.com/jupyter/docker-stacks/blob/master/all-spark-notebook/Dockerfile#L51). > This causes a mesos Unexpected Env format for 'ContainerConfig.Env' error. > The problem is the tokenization code at > https://github.com/apache/mesos/blob/21e080c5ae6ef03556c7a2b588e034a916c7a05a/src/docker/docker.cpp#L386 > which needs to only look at the first equal sign. Docker ENV declarations > can also be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1806) Substituting etcd for Zookeeper
[ https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134372#comment-15134372 ] Ivan Vučica commented on MESOS-1806: FWIW, being able to not run Zookeeper would mean one less JVM-based service running on my low-memory VPS nodes. > Substituting etcd for Zookeeper > --- > > Key: MESOS-1806 > URL: https://issues.apache.org/jira/browse/MESOS-1806 > Project: Mesos > Issue Type: Task > Components: leader election >Reporter: Ed Ropple >Assignee: Shuai Lin >Priority: Minor > >eropple: Could you also file a new JIRA for Mesos to drop ZK > in favor of etcd or ReplicatedLog? Would love to get some momentum going on > that one. > -- > Consider it filed. =) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4071) Master crash during framework teardown ( Check failed: total.resources.contains(slaveId))
[ https://issues.apache.org/jira/browse/MESOS-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134309#comment-15134309 ] alexius ludeman commented on MESOS-4071: The tester runs in a continuous serial loop over 8 tests. All tasks are using cpu allocation set to 0.1. There are between 1 to 8 tasks launched per test. At the end of each test, all tasks are killed. No dynamic reservations are used for any test. If further information is needed to reproduce please follow up with me. Thanks > Master crash during framework teardown ( Check failed: > total.resources.contains(slaveId)) > - > > Key: MESOS-4071 > URL: https://issues.apache.org/jira/browse/MESOS-4071 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 0.25.0 >Reporter: Mandeep Chadha >Assignee: Neil Conway > Labels: mesosphere > > Stack Trace : > NOTE : Replaced IP address with XX.XX.XX.XX > {code} > I1204 10:31:03.391127 2588810 master.cpp:5564] Processing TEARDOWN call for > framework 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 > (mloop-coprocesses-183c4999-9ce9-47b2-bc96-a865c672fcbb (TEST) at > scheduler-c8ab2103-cf36-40d8-8a2d-a6b69a8fc...@xx.xx.xx.xx:35237 > I1204 10:31:03.391177 2588810 master.cpp:5576] Removing framework > 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 > (mloop-coprocesses-183c4999-9ce9-47b2-bc96-a865c672fcbb (TEST)) at > schedulerc8ab2103-cf36-40d8-8a2d-a6b69a8fc...@xx.xx.xx.xx:35237 > I1204 10:31:03.391337 2588805 hierarchical.hpp:605] Deactivated framework > 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 > F1204 10:31:03.395500 2588810 sorter.cpp:233] Check failed: > total.resources.contains(slaveId) > *** Check failure stack trace: *** > @ 0x7f2b3dda53d8 google::LogMessage::Fail() > @ 0x7f2b3dda5327 google::LogMessage::SendToLog() > @ 0x7f2b3dda4d38 google::LogMessage::Flush() > @ 0x7f2b3dda7a6c google::LogMessageFatal::~LogMessageFatal() > @ 0x7f2b3d3351a1 > mesos::internal::master::allocator::DRFSorter::remove() > @ 0x7f2b3d0b8c29 > mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework() > @ 0x7f2b3d0ca823 > _ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_11FrameworkIDES6_EEvRKNS_3PIDIT_EEMSA_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESJ_ > @ 0x7f2b3d0dc8dc > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS5_11FrameworkIDESA_EEvRKNS0_3PIDIT_EEMSE_FvT0_ET1_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2 > _ > @ 0x7f2b3dd2cc35 std::function<>::operator()() > @ 0x7f2b3dd15ae5 process::ProcessBase::visit() > @ 0x7f2b3dd188e2 process::DispatchEvent::visit() > @ 0x472366 process::ProcessBase::serve() > @ 0x7f2b3dd1203f process::ProcessManager::resume() > @ 0x7f2b3dd061b2 process::internal::schedule() > @ 0x7f2b3dd63efd > _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Inde > x_tupleIJXspT_EEE > @ 0x7f2b3dd63e4d std::_Bind_simple<>::operator()() > @ 0x7f2b3dd63de6 std::thread::_Impl<>::_M_run() > @ 0x318c2b6470 (unknown) > @ 0x318b2079d1 (unknown) > @ 0x318aae8b5d (unknown) > @ (nil) (unknown) > Aborted (core dumped) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2162) Consider a C++ implementation of CoreOS AppContainer spec
[ https://issues.apache.org/jira/browse/MESOS-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134203#comment-15134203 ] Craig W commented on MESOS-2162: It would be nice to have the option to run with rkt, especially since it's hit 1.0 (https://coreos.com/blog/rkt-hits-1.0.html). > Consider a C++ implementation of CoreOS AppContainer spec > - > > Key: MESOS-2162 > URL: https://issues.apache.org/jira/browse/MESOS-2162 > Project: Mesos > Issue Type: Story > Components: containerization >Reporter: Dominic Hamon > Labels: gsoc2015, mesosphere, twitter > > CoreOS have released a > [specification|https://github.com/coreos/rocket/blob/master/app-container/SPEC.md] > for a container abstraction as an alternative to Docker. They have also > released a reference implementation, [rocket|https://coreos.com/blog/rocket/]. > We should consider a C++ implementation of the specification to have parity > with the community and then use this implementation for our containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4601) Don't dump stack trace on failure to bind()
[ https://issues.apache.org/jira/browse/MESOS-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134107#comment-15134107 ] Yong Tang commented on MESOS-4601: -- Will take a look at this issue as I have some free time in the next couple of weeks. > Don't dump stack trace on failure to bind() > --- > > Key: MESOS-4601 > URL: https://issues.apache.org/jira/browse/MESOS-4601 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Assignee: Yong Tang > Labels: errorhandling, libprocess, mesosphere, newbie > > We should do {{EXIT(EXIT_FAILURE)}} rather than {{LOG(FATAL)}}, both for this > code path and a few other expected error conditions in libprocess network > initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4601) Don't dump stack trace on failure to bind()
[ https://issues.apache.org/jira/browse/MESOS-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yong Tang reassigned MESOS-4601: Assignee: Yong Tang > Don't dump stack trace on failure to bind() > --- > > Key: MESOS-4601 > URL: https://issues.apache.org/jira/browse/MESOS-4601 > Project: Mesos > Issue Type: Bug > Components: libprocess >Reporter: Neil Conway >Assignee: Yong Tang > Labels: errorhandling, libprocess, mesosphere, newbie > > We should do {{EXIT(EXIT_FAILURE)}} rather than {{LOG(FATAL)}}, both for this > code path and a few other expected error conditions in libprocess network > initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4595) Add support for newest pre-defined Perf events to PerfEventIsolator
[ https://issues.apache.org/jira/browse/MESOS-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bartek Plotka updated MESOS-4595: - Description: Currently, Perf Event Isolator is able to monitor all (specified in {{--perf_events=...}}) Perf Events, but it can map only part of them in {{ResourceUsage.proto}} (to be more exact in [PerfStatistics.proto | https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L862]) Since the last time {{PerfStatistics.proto}} was updated, list of supported events expanded much and is growing constantly. I have created some comparison table: || Events type || Num of matched events in PerfStatistics vs perf 4.3.3 || perf 4.3.3 events || | HW events | 8 | 8 | | SW events | 9 | 10 | | HW cache event | 20 | 20 | | *Kernel PMU events* | *0* | *37* | | Tracepoint events | 0 | billion (: | For advance analysis (e.g during Oversubscription in QoS Controller) having support for additional events is crucial. For instance in [Serenity|https://github.com/mesosphere/serenity] we based some of our revocation algorithms on the new [CMT| https://01.org/packet-processing/cache-monitoring-technology-memory-bandwidth-monitoring-cache-allocation-technology-code-and-data] feature which gives additional, useful event called {{llc_occupancy}}. I think we all agree that it would be great to support more (or even all) perf events in {{Mesos PerfEventIsolator}} (: Let's start a discussion over the approach. Within this task we have three issues: # What events do we want to support in Mesos? ## all? ## only add Kernel PMU Events? --- I don't have a strong opinion on that, since i have never used {{Tracepoint events}}. We currently need PMU events. # How to add new (or modify existing) events in {{mesos.proto}}? We can distinguish here 3 approaches: *# Add new events statically in {{PerfStatistics.proto}} as separate optional fields. (like it is currently) *# Instead of optional fields in {{PerfStatistics.proto}} message we could have a {{key-value}} map (something like {{labels}} in other messages) and feed it dynamically in {{PerfEventIsolator}} *# We could mix above approaches and just add mentioned map to existing {{PerfStatistics.proto}} for additional events (: --- IMO: Approaches 1) is somehow explicit - users can view what events to expect (although they are parsed in a different manner e.g {{"-"}} to {{"_"}}), but we would end with a looong message and a lot of copy-paste work. And we have to maintain that! Approach 2 & 3 are more elastic, and we don't have problem mentioned in the issue below (: And we *always* support *all* perf events in all kernel versions (: IMO approaches 2 & 3 are the best. # How to support different naming format? For instance {{intel_cqm/llc_occupancy/}} with {{"/"}} in name or {{migrate:mm_migrate_pages}} with {{":"}}. I don't think it is possible to have these as the field names in {{.proto}} syntax was: Currently, Perf Event Isolator is able to monitor all (specified in {{--perf_events=...}}) Perf Events, but it can map only part of them in {{ResourceUsage.proto}} (to be more exact in [PerfStatistics.proto | https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L862]) Since the last time {{PerfStatistics.proto}} was updated, list of supported events expanded much and is growing constantly. I have created some comparison table: || Events type || Num of matched events in PerfStatistics vs perf 4.3.3 || perf 4.3.3 events || | HW events | 8 | 8 | | SW events | 9 | 10 | | HW cache event | 20 | 20 | | *Kernel PMU events* | *0* | *37* | | Tracepoint events | 0 | billion (: | For advance analysis (e.g during Oversubscription in QoS Controller) having support for additional events is crucial. For instance in [Serenity|https://github.com/mesosphere/serenity] we based some of our revocation algorithms on the new [CMT| https://01.org/packet-processing/cache-monitoring-technology-memory-bandwidth-monitoring-cache-allocation-technology-code-and-data] feature which gives additional, useful event called {{llc_occupancy}}. I think we all agree that it would be great to support more (or even all) perf events in {{Mesos PerfEventIsolator}} (: Let's start a discussion over the approach. Within this task we have three issues: # What events do we want to support in Mesos? ## all? ## only add Kernel PMU Events? --- I don't have a strong opinion on that, since i have never used {{Tracepoint events}}. We currently need PMU events. # How to add new (or modify existing) events in {{mesos.proto}}? We can distinguish here 3 approaches: *# Add new events statically in {{PerfStatistics.proto}} as separate optional fields. (like it is currently) *# Instead of optional fields in {{PerfStatistics.proto}} message we could have a {{key-value}} map (something like {{labels}} in other messages) and feed it dynamically in {{PerfEventIso
[jira] [Updated] (MESOS-4595) Add support for newest pre-defined Perf events to PerfEventIsolator
[ https://issues.apache.org/jira/browse/MESOS-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bartek Plotka updated MESOS-4595: - Description: Currently, Perf Event Isolator is able to monitor all (specified in {{--perf_events=...}}) Perf Events, but it can map only part of them in {{ResourceUsage.proto}} (to be more exact in [PerfStatistics.proto | https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L862]) Since the last time {{PerfStatistics.proto}} was updated, list of supported events expanded much and is growing constantly. I have created some comparison table: || Events type || Num of matched events in PerfStatistics vs perf 4.3.3 || perf 4.3.3 events || | HW events | 8 | 8 | | SW events | 9 | 10 | | HW cache event | 20 | 20 | | *Kernel PMU events* | *0* | *37* | | Tracepoint events | 0 | billion (: | For advance analysis (e.g during Oversubscription in QoS Controller) having support for additional events is crucial. For instance in [Serenity|https://github.com/mesosphere/serenity] we based some of our revocation algorithms on the new [CMT| https://01.org/packet-processing/cache-monitoring-technology-memory-bandwidth-monitoring-cache-allocation-technology-code-and-data] feature which gives additional, useful event called {{llc_occupancy}}. I think we all agree that it would be great to support more (or even all) perf events in {{Mesos PerfEventIsolator}} (: Let's start a discussion over the approach. Within this task we have three issues: # What events do we want to support in Mesos? ## all? ## only add Kernel PMU Events? --- I don't have a strong opinion on that, since i have never used {{Tracepoint events}}. We currently need PMU events. # How to add new (or modify existing) events in {{mesos.proto}}? We can distinguish here 3 approaches: *# Add new events statically in {{PerfStatistics.proto}} as separate optional fields. (like it is currently) *# Instead of optional fields in {{PerfStatistics.proto}} message we could have a {{key-value}} map (something like {{labels}} in other messages) and feed it dynamically in {{PerfEventIsolator}} *# We could mix above approaches and just add mentioned map to existing {{PerfStatistics.proto}} for additional events (: --- IMO: Approach 1) is somehow explicit - users can view what events to expect (although they are parsed in a different manner e.g {{"-"}} to {{"_"}}), but we would end with a looong message and a lot of copy-paste work. And we have to maintain that! Approach 2 & 3 are more elastic, and we don't have problem mentioned in the issue below (: And we *always* support *all* perf events in all kernel versions (: IMO approaches 2 & 3 are the best. # How to support different naming format? For instance {{intel_cqm/llc_occupancy/}} with {{"/"}} in name or {{migrate:mm_migrate_pages}} with {{":"}}. I don't think it is possible to have these as the field names in {{.proto}} syntax was: Currently, Perf Event Isolator is able to monitor all (specified in {{--perf_events=...}}) Perf Events, but it can map only part of them in {{ResourceUsage.proto}} (to be more exact in [PerfStatistics.proto | https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L862]) Since the last time {{PerfStatistics.proto}} was updated, list of supported events expanded much and is growing constantly. I have created some comparison table: || Events type || Num of matched events in PerfStatistics vs perf 4.3.3 || perf 4.3.3 events || | HW events | 8 | 8 | | SW events | 9 | 10 | | HW cache event | 20 | 20 | | *Kernel PMU events* | *0* | *37* | | Tracepoint events | 0 | billion (: | For advance analysis (e.g during Oversubscription in QoS Controller) having support for additional events is crucial. For instance in [Serenity|https://github.com/mesosphere/serenity] we based some of our revocation algorithms on the new [CMT| https://01.org/packet-processing/cache-monitoring-technology-memory-bandwidth-monitoring-cache-allocation-technology-code-and-data] feature which gives additional, useful event called {{llc_occupancy}}. I think we all agree that it would be great to support more (or even all) perf events in {{Mesos PerfEventIsolator}} (: Let's start a discussion over the approach. Within this task we have three issues: # What events do we want to support in Mesos? ## all? ## only add Kernel PMU Events? --- I don't have a strong opinion on that, since i have never used {{Tracepoint events}}. We currently need PMU events. # How to add new (or modify existing) events in {{mesos.proto}}? We can distinguish here 3 approaches: *# Add new events statically in {{PerfStatistics.proto}} as a separate optional fields. (like it is currently) *# Instead of optional fields in {{PerfStatistics.proto}} message we could have a {{key-value}} map (something like {{labels}} in other messages) and feed it dynamically in {{PerfEventIso
[jira] [Commented] (MESOS-4595) Add support for newest pre-defined Perf events to PerfEventIsolator
[ https://issues.apache.org/jira/browse/MESOS-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134027#comment-15134027 ] Niklas Quarfot Nielsen commented on MESOS-4595: --- The structure of `PerfStatistics` is nice, but as you mention, doesn't scale well with the massive number of available counters. I like the idea of a Labels field with an encoding like you mention: "/hw_counters/XYZ", "/kernel_pmu/ZYX", etc. Populating that field should probably be guarded with a flag to the perf isolator, so the resource statistics doesn't explode in size if folks don't need all the information. > Add support for newest pre-defined Perf events to PerfEventIsolator > --- > > Key: MESOS-4595 > URL: https://issues.apache.org/jira/browse/MESOS-4595 > Project: Mesos > Issue Type: Task > Components: isolation >Reporter: Bartek Plotka >Assignee: Bartek Plotka > > Currently, Perf Event Isolator is able to monitor all (specified in > {{--perf_events=...}}) Perf Events, but it can map only part of them in > {{ResourceUsage.proto}} (to be more exact in [PerfStatistics.proto | > https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L862]) > Since the last time {{PerfStatistics.proto}} was updated, list of supported > events expanded much and is growing constantly. I have created some > comparison table: > || Events type || Num of matched events in PerfStatistics vs perf 4.3.3 || > perf 4.3.3 events || > | HW events | 8 | 8 | > | SW events | 9 | 10 | > | HW cache event | 20 | 20 | > | *Kernel PMU events* | *0* | *37* | > | Tracepoint events | 0 | billion (: | > For advance analysis (e.g during Oversubscription in QoS Controller) having > support for additional events is crucial. For instance in > [Serenity|https://github.com/mesosphere/serenity] we based some of our > revocation algorithms on the new [CMT| > https://01.org/packet-processing/cache-monitoring-technology-memory-bandwidth-monitoring-cache-allocation-technology-code-and-data] > feature which gives additional, useful event called {{llc_occupancy}}. > I think we all agree that it would be great to support more (or even all) > perf events in {{Mesos PerfEventIsolator}} (: > > Let's start a discussion over the approach. Within this task we have three > issues: > # What events do we want to support in Mesos? > ## all? > ## only add Kernel PMU Events? > --- > I don't have a strong opinion on that, since i have never used {{Tracepoint > events}}. We currently need PMU events. > # How to add new (or modify existing) events in {{mesos.proto}}? > We can distinguish here 3 approaches: > *# Add new events statically in {{PerfStatistics.proto}} as a separate > optional fields. (like it is currently) > *# Instead of optional fields in {{PerfStatistics.proto}} message we could > have a {{key-value}} map (something like {{labels}} in other messages) and > feed it dynamically in {{PerfEventIsolator}} > *# We could mix above approaches and just add mentioned map to existing > {{PerfStatistics.proto}} for additional events (: > --- > IMO: Approach 1) is somehow explicit - users can view what events to expect > (although they are parsed in a different manner e.g {{"-"}} to {{"_"}}), but > we would end with a looong message and a lot of copy-paste work. And we have > to maintain that! > Approach 2 & 3 are more elastic, and we don't have problem mentioned in the > issue below (: And we *always* support *all* perf events in all kernel > versions (: > IMO approaches 2 & 3 are the best. > # How to support different naming format? For instance > {{intel_cqm/llc_occupancy/}} with {{"/"}} in name or > {{migrate:mm_migrate_pages}} with {{":"}}. I don't think it is possible to > have these as the field names in {{.proto}} syntax -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4606) Add IPv6 support to net::IP and net::IPNetwork
Benno Evers created MESOS-4606: -- Summary: Add IPv6 support to net::IP and net::IPNetwork Key: MESOS-4606 URL: https://issues.apache.org/jira/browse/MESOS-4606 Project: Mesos Issue Type: Improvement Components: stout Reporter: Benno Evers Assignee: Benno Evers Priority: Minor The classes net::IP and net::IPNetwork should to be able to store IPv6 addresses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4605) Upgrading mesos should not (re)enable mesos master or slave
Grégoire Bellon-Gervais created MESOS-4605: -- Summary: Upgrading mesos should not (re)enable mesos master or slave Key: MESOS-4605 URL: https://issues.apache.org/jira/browse/MESOS-4605 Project: Mesos Issue Type: Bug Components: general Affects Versions: 0.27.0 Environment: debian8 Reporter: Grégoire Bellon-Gervais Priority: Minor Hello, I'm under debian 8 and I use official repository to install mesos (and the deb files) : deb http://repos.mesosphere.io/debian jessie main I have 3 mesos masters and 3 mesos slaves. On masters, mesos slaves are not started (and must not be started), same for mesos slaves, mesos masters must not be started. During each upgrade, I have to disable manually after the upgrade the "not installed component". Here the log on a mesos slave for example : Setting up mesos (0.27.0-0.2.190.debian81) ... Created symlink from /etc/systemd/system/multi-user.target.wants/mesos-master.service to /lib/systemd/system/mesos-master.service. Processing triggers for libc-bin (2.19-18+deb8u2) ... ... So, once upgrade is done, I have to issue the following command : systemctl disable mesos-master.service It should not be necessary I think. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history
[ https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133913#comment-15133913 ] Tymofii commented on MESOS-3307: Yes, it generates JSON much faster now, but we still having lots and lots completed tasks and frameworks there, which we don't care about for service discovery, but want to keep them for history. Wouldn't it be great to have some basic filtering for /state endpoint to get only active tasks/frameworks, only tasks or particular framework, only slaves information etc.? /state-summary endpoint introduced recently doesn't fit service discovery requirements. > Configurable size of completed task / framework history > --- > > Key: MESOS-3307 > URL: https://issues.apache.org/jira/browse/MESOS-3307 > Project: Mesos > Issue Type: Bug >Reporter: Ian Babrou >Assignee: Kevin Klues > Labels: mesosphere > Fix For: 0.27.0 > > > We try to make Mesos work with multiple frameworks and mesos-dns at the same > time. The goal is to have set of frameworks per team / project on a single > Mesos cluster. > At this point our mesos state.json is at 4mb and it takes a while to > assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively > pushing mesos-master CPU usage through the roof. It's at 100%+ all the time. > Here's the problem: > {noformat} > mesos λ curl -s http://mesos-master:5050/master/state.json | jq > .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n >1 "20150606-001827-252388362-5050-5982-0003" > 16 "20150606-001827-252388362-5050-5982-0005" > 18 "20150606-001827-252388362-5050-5982-0029" > 73 "20150606-001827-252388362-5050-5982-0007" > 141 "20150606-001827-252388362-5050-5982-0009" > 154 "20150820-154817-302720010-5050-15320-" > 289 "20150606-001827-252388362-5050-5982-0004" > 510 "20150606-001827-252388362-5050-5982-0012" > 666 "20150606-001827-252388362-5050-5982-0028" > 923 "20150116-002612-269165578-5050-32204-0003" > 1000 "20150606-001827-252388362-5050-5982-0001" > 1000 "20150606-001827-252388362-5050-5982-0006" > 1000 "20150606-001827-252388362-5050-5982-0010" > 1000 "20150606-001827-252388362-5050-5982-0011" > 1000 "20150606-001827-252388362-5050-5982-0027" > mesos λ fgrep 1000 -r src/master > src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10; > src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = > 1000; > {noformat} > Active tasks are just 6% of state.json response: > {noformat} > mesos λ cat ~/temp/mesos-state.json | jq -c . | wc >1 14796 4138942 > mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc > 16 37 252774 > {noformat} > I see four options that can improve the situation: > 1. Add query string param to exclude completed tasks from state.json and use > it in mesos-dns and similar tools. There is no need for mesos-dns to know > about completed tasks, it's just extra load on master and mesos-dns. > 2. Make history size configurable. > 3. Make JSON serialization faster. With 1s of tasks even without history > it would take a lot of time to serialize tasks for mesos-dns. Doing it every > 60 seconds instead of every 5 seconds isn't really an option. > 4. Create event bus for mesos master. Marathon has it and it'd be nice to > have it in Mesos. This way mesos-dns could avoid polling master state and > switch to listening for events. > All can be done independently. > Note to mesosphere folks: please start distributing debug symbols with your > distribution. I was asking for it for a while and it is really helpful: > https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501 > Perf report for leading master: > !http://i.imgur.com/iz7C3o0.png! > I'm on 0.23.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history
[ https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133914#comment-15133914 ] Tymofii commented on MESOS-3307: Yes, it generates JSON much faster now, but we still having lots and lots completed tasks and frameworks there, which we don't care about for service discovery, but want to keep them for history. Wouldn't it be great to have some basic filtering for /state endpoint to get only active tasks/frameworks, only tasks or particular framework, only slaves information etc.? /state-summary endpoint introduced recently doesn't fit service discovery requirements. > Configurable size of completed task / framework history > --- > > Key: MESOS-3307 > URL: https://issues.apache.org/jira/browse/MESOS-3307 > Project: Mesos > Issue Type: Bug >Reporter: Ian Babrou >Assignee: Kevin Klues > Labels: mesosphere > Fix For: 0.27.0 > > > We try to make Mesos work with multiple frameworks and mesos-dns at the same > time. The goal is to have set of frameworks per team / project on a single > Mesos cluster. > At this point our mesos state.json is at 4mb and it takes a while to > assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively > pushing mesos-master CPU usage through the roof. It's at 100%+ all the time. > Here's the problem: > {noformat} > mesos λ curl -s http://mesos-master:5050/master/state.json | jq > .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n >1 "20150606-001827-252388362-5050-5982-0003" > 16 "20150606-001827-252388362-5050-5982-0005" > 18 "20150606-001827-252388362-5050-5982-0029" > 73 "20150606-001827-252388362-5050-5982-0007" > 141 "20150606-001827-252388362-5050-5982-0009" > 154 "20150820-154817-302720010-5050-15320-" > 289 "20150606-001827-252388362-5050-5982-0004" > 510 "20150606-001827-252388362-5050-5982-0012" > 666 "20150606-001827-252388362-5050-5982-0028" > 923 "20150116-002612-269165578-5050-32204-0003" > 1000 "20150606-001827-252388362-5050-5982-0001" > 1000 "20150606-001827-252388362-5050-5982-0006" > 1000 "20150606-001827-252388362-5050-5982-0010" > 1000 "20150606-001827-252388362-5050-5982-0011" > 1000 "20150606-001827-252388362-5050-5982-0027" > mesos λ fgrep 1000 -r src/master > src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10; > src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = > 1000; > {noformat} > Active tasks are just 6% of state.json response: > {noformat} > mesos λ cat ~/temp/mesos-state.json | jq -c . | wc >1 14796 4138942 > mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc > 16 37 252774 > {noformat} > I see four options that can improve the situation: > 1. Add query string param to exclude completed tasks from state.json and use > it in mesos-dns and similar tools. There is no need for mesos-dns to know > about completed tasks, it's just extra load on master and mesos-dns. > 2. Make history size configurable. > 3. Make JSON serialization faster. With 1s of tasks even without history > it would take a lot of time to serialize tasks for mesos-dns. Doing it every > 60 seconds instead of every 5 seconds isn't really an option. > 4. Create event bus for mesos master. Marathon has it and it'd be nice to > have it in Mesos. This way mesos-dns could avoid polling master state and > switch to listening for events. > All can be done independently. > Note to mesosphere folks: please start distributing debug symbols with your > distribution. I was asking for it for a while and it is really helpful: > https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501 > Perf report for leading master: > !http://i.imgur.com/iz7C3o0.png! > I'm on 0.23.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-3307) Configurable size of completed task / framework history
[ https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tymofii updated MESOS-3307: --- Comment: was deleted (was: Yes, it generates JSON much faster now, but we still having lots and lots completed tasks and frameworks there, which we don't care about for service discovery, but want to keep them for history. Wouldn't it be great to have some basic filtering for /state endpoint to get only active tasks/frameworks, only tasks or particular framework, only slaves information etc.? /state-summary endpoint introduced recently doesn't fit service discovery requirements.) > Configurable size of completed task / framework history > --- > > Key: MESOS-3307 > URL: https://issues.apache.org/jira/browse/MESOS-3307 > Project: Mesos > Issue Type: Bug >Reporter: Ian Babrou >Assignee: Kevin Klues > Labels: mesosphere > Fix For: 0.27.0 > > > We try to make Mesos work with multiple frameworks and mesos-dns at the same > time. The goal is to have set of frameworks per team / project on a single > Mesos cluster. > At this point our mesos state.json is at 4mb and it takes a while to > assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively > pushing mesos-master CPU usage through the roof. It's at 100%+ all the time. > Here's the problem: > {noformat} > mesos λ curl -s http://mesos-master:5050/master/state.json | jq > .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n >1 "20150606-001827-252388362-5050-5982-0003" > 16 "20150606-001827-252388362-5050-5982-0005" > 18 "20150606-001827-252388362-5050-5982-0029" > 73 "20150606-001827-252388362-5050-5982-0007" > 141 "20150606-001827-252388362-5050-5982-0009" > 154 "20150820-154817-302720010-5050-15320-" > 289 "20150606-001827-252388362-5050-5982-0004" > 510 "20150606-001827-252388362-5050-5982-0012" > 666 "20150606-001827-252388362-5050-5982-0028" > 923 "20150116-002612-269165578-5050-32204-0003" > 1000 "20150606-001827-252388362-5050-5982-0001" > 1000 "20150606-001827-252388362-5050-5982-0006" > 1000 "20150606-001827-252388362-5050-5982-0010" > 1000 "20150606-001827-252388362-5050-5982-0011" > 1000 "20150606-001827-252388362-5050-5982-0027" > mesos λ fgrep 1000 -r src/master > src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10; > src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = > 1000; > {noformat} > Active tasks are just 6% of state.json response: > {noformat} > mesos λ cat ~/temp/mesos-state.json | jq -c . | wc >1 14796 4138942 > mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc > 16 37 252774 > {noformat} > I see four options that can improve the situation: > 1. Add query string param to exclude completed tasks from state.json and use > it in mesos-dns and similar tools. There is no need for mesos-dns to know > about completed tasks, it's just extra load on master and mesos-dns. > 2. Make history size configurable. > 3. Make JSON serialization faster. With 1s of tasks even without history > it would take a lot of time to serialize tasks for mesos-dns. Doing it every > 60 seconds instead of every 5 seconds isn't really an option. > 4. Create event bus for mesos master. Marathon has it and it'd be nice to > have it in Mesos. This way mesos-dns could avoid polling master state and > switch to listening for events. > All can be done independently. > Note to mesosphere folks: please start distributing debug symbols with your > distribution. I was asking for it for a while and it is really helpful: > https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501 > Perf report for leading master: > !http://i.imgur.com/iz7C3o0.png! > I'm on 0.23.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4604) ROOT_DOCKER_DockerHealthyTask is flaky.
Jan Schlicht created MESOS-4604: --- Summary: ROOT_DOCKER_DockerHealthyTask is flaky. Key: MESOS-4604 URL: https://issues.apache.org/jira/browse/MESOS-4604 Project: Mesos Issue Type: Bug Components: tests Environment: CentOS 6/7, Ubuntu 15.04 on AWS. Reporter: Jan Schlicht Log from Teamcity that is running {{sudo ./bin/mesos-tests.sh}} on AWS EC2 instances: {noformat} [18:27:14][Step 8/8] [--] 8 tests from HealthCheckTest [18:27:14][Step 8/8] [ RUN ] HealthCheckTest.HealthyTask [18:27:17][Step 8/8] [ OK ] HealthCheckTest.HealthyTask ( ms) [18:27:17][Step 8/8] [ RUN ] HealthCheckTest.ROOT_DOCKER_DockerHealthyTask [18:27:36][Step 8/8] ../../src/tests/health_check_tests.cpp:388: Failure [18:27:36][Step 8/8] Failed to wait 15secs for termination [18:27:36][Step 8/8] F0204 18:27:35.981302 23085 logging.cpp:64] RAW: Pure virtual method called [18:27:36][Step 8/8] @ 0x7f7077055e1c google::LogMessage::Fail() [18:27:36][Step 8/8] @ 0x7f707705ba6f google::RawLog__() [18:27:36][Step 8/8] @ 0x7f70760f76c9 __cxa_pure_virtual [18:27:36][Step 8/8] @ 0xa9423c mesos::internal::tests::Cluster::Slaves::shutdown() [18:27:36][Step 8/8] @ 0x1074e45 mesos::internal::tests::MesosTest::ShutdownSlaves() [18:27:36][Step 8/8] @ 0x1074de4 mesos::internal::tests::MesosTest::Shutdown() [18:27:36][Step 8/8] @ 0x1070ec7 mesos::internal::tests::MesosTest::TearDown() [18:27:36][Step 8/8] @ 0x16eb7b2 testing::internal::HandleSehExceptionsInMethodIfSupported<>() [18:27:36][Step 8/8] @ 0x16e61a9 testing::internal::HandleExceptionsInMethodIfSupported<>() [18:27:36][Step 8/8] @ 0x16c56aa testing::Test::Run() [18:27:36][Step 8/8] @ 0x16c5e89 testing::TestInfo::Run() [18:27:36][Step 8/8] @ 0x16c650a testing::TestCase::Run() [18:27:36][Step 8/8] @ 0x16cd1f6 testing::internal::UnitTestImpl::RunAllTests() [18:27:36][Step 8/8] @ 0x16ec513 testing::internal::HandleSehExceptionsInMethodIfSupported<>() [18:27:36][Step 8/8] @ 0x16e6df1 testing::internal::HandleExceptionsInMethodIfSupported<>() [18:27:36][Step 8/8] @ 0x16cbe26 testing::UnitTest::Run() [18:27:36][Step 8/8] @ 0xe54c84 RUN_ALL_TESTS() [18:27:36][Step 8/8] @ 0xe54867 main [18:27:36][Step 8/8] @ 0x7f7071560a40 (unknown) [18:27:36][Step 8/8] @ 0x9b52d9 _start [18:27:36][Step 8/8] Aborted (core dumped) [18:27:36][Step 8/8] Process exited with code 134 {noformat} Happens with Ubuntu 15.04, CentOS 6, CentOS 7 _quite_ often. -- This message was sent by Atlassian JIRA (v6.3.4#6332)