[jira] [Commented] (MESOS-3733) ContentType/SchedulerTest.Suppress/0 is flaky
[ https://issues.apache.org/jira/browse/MESOS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968843#comment-14968843 ] Guangya Liu commented on MESOS-3733: RR: https://reviews.apache.org/r/39548/ > ContentType/SchedulerTest.Suppress/0 is flaky > - > > Key: MESOS-3733 > URL: https://issues.apache.org/jira/browse/MESOS-3733 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Reporter: Anand Mazumdar >Assignee: Guangya Liu > Labels: flaky-test > > Showed up on ASF CI: > https://builds.apache.org/job/Mesos/931/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console > {code} > [ RUN ] ContentType/SchedulerTest.Suppress/0 > Using temporary directory '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi' > I1014 17:34:11.225731 27650 leveldb.cpp:176] Opened db in 2.974504ms > I1014 17:34:11.226856 27650 leveldb.cpp:183] Compacted db in 980779ns > I1014 17:34:11.227028 27650 leveldb.cpp:198] Created db iterator in 37641ns > I1014 17:34:11.227159 27650 leveldb.cpp:204] Seeked to beginning of db in > 14959ns > I1014 17:34:11.227283 27650 leveldb.cpp:273] Iterated through 0 keys in the > db in 14672ns > I1014 17:34:11.227449 27650 replica.cpp:746] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1014 17:34:11.228469 27680 recover.cpp:449] Starting replica recovery > I1014 17:34:11.229202 27673 recover.cpp:475] Replica is in EMPTY status > I1014 17:34:11.231384 27673 replica.cpp:642] Replica in EMPTY status received > a broadcasted recover request from (10262)@172.17.2.194:37545 > I1014 17:34:11.231745 27673 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1014 17:34:11.234242 27680 master.cpp:376] Master > 0cc41e7f-8d87-4c2f-9543-3f7198f9fdaf (23af00e0dbe0) started on > 172.17.2.194:37545 > I1014 17:34:11.234283 27680 master.cpp:378] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="false" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" > --work_dir="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/master" > --zk_session_timeout="10secs" > I1014 17:34:11.234679 27680 master.cpp:425] Master allowing unauthenticated > frameworks to register > I1014 17:34:11.234694 27680 master.cpp:428] Master only allowing > authenticated slaves to register > I1014 17:34:11.234705 27680 credentials.hpp:37] Loading credentials for > authentication from > '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials' > I1014 17:34:11.235251 27673 recover.cpp:566] Updating replica status to > STARTING > I1014 17:34:11.235857 27680 master.cpp:467] Using default 'crammd5' > authenticator > I1014 17:34:11.236006 27680 master.cpp:504] Authorization enabled > I1014 17:34:11.236187 27673 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 729504ns > I1014 17:34:11.236224 27673 replica.cpp:323] Persisted replica status to > STARTING > I1014 17:34:11.236227 27678 whitelist_watcher.cpp:79] No whitelist given > I1014 17:34:11.236366 27676 hierarchical.cpp:140] Initialized hierarchical > allocator process > I1014 17:34:11.236495 27677 recover.cpp:475] Replica is in STARTING status > I1014 17:34:11.237670 27678 replica.cpp:642] Replica in STARTING status > received a broadcasted recover request from (10263)@172.17.2.194:37545 > I1014 17:34:11.238782 27673 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1014 17:34:11.238916 27672 master.cpp:1609] The newly elected leader is > master@172.17.2.194:37545 with id 0cc41e7f-8d87-4c2f-9543-3f7198f9fdaf > I1014 17:34:11.238993 27672 master.cpp:1622] Elected as the leading master! > I1014 17:34:11.239013 27672 master.cpp:1382] Recovering from registrar > I1014 17:34:11.239480 27672 recover.cpp:566] Updating replica status to VOTING > I1014 17:34:11.239630 27675 registrar.cpp:309] Recovering registrar > I1014 17:34:11.240074 27673 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 452562ns > I1014 17:34:11.240137 27673 replica.cpp:323] Persisted replica status
[jira] [Commented] (MESOS-3733) ContentType/SchedulerTest.Suppress/0 is flaky
[ https://issues.apache.org/jira/browse/MESOS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968185#comment-14968185 ] Guangya Liu commented on MESOS-3733: [~vi...@twitter.com] This is very similar with MESOS-3789 , the difference is MESOS-3789 is failed at ContentType/SchedulerTest.Suppress/1 but not ContentType/SchedulerTest.Suppress/0 . Can you please show more detail what is the difference of ContentType/SchedulerTest.Suppress/0 and ContentType/SchedulerTest.Suppress/1 ? I also tried to reproduce in my local env but failed to reproduce, will check more. {code} I1021 19:17:43.270341 30954 slave.cpp:2284] Updated checkpointed resources from to ../../src/tests/scheduler_tests.cpp:1028: Failure Value of: event.isPending() Actual: false Expected: true I1021 19:17:43.276475 30920 master.cpp:925] Master terminating I1021 19:17:43.276880 30949 hierarchical.cpp:364] Removed slave 242dc5ed-402d-4873-be6d-9bad1f3296f9-S0 I1021 19:17:43.277751 30945 hierarchical.cpp:220] Removed framework 242dc5ed-402d-4873-be6d-9bad1f3296f9- I1021 19:17:43.277863 30941 slave.cpp:3258] master@172.17.3.153:57838 exited W1021 19:17:43.277899 30941 slave.cpp:3261] Master disconnected! Waiting for a new master to be elected I1021 19:17:43.303658 30920 slave.cpp:606] Slave terminating [ FAILED ] ContentType/SchedulerTest.Suppress/1, where GetParam() = application/json (172 ms) {code} > ContentType/SchedulerTest.Suppress/0 is flaky > - > > Key: MESOS-3733 > URL: https://issues.apache.org/jira/browse/MESOS-3733 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Reporter: Anand Mazumdar >Assignee: Guangya Liu > Labels: flaky-test > > Showed up on ASF CI: > https://builds.apache.org/job/Mesos/931/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console > {code} > [ RUN ] ContentType/SchedulerTest.Suppress/0 > Using temporary directory '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi' > I1014 17:34:11.225731 27650 leveldb.cpp:176] Opened db in 2.974504ms > I1014 17:34:11.226856 27650 leveldb.cpp:183] Compacted db in 980779ns > I1014 17:34:11.227028 27650 leveldb.cpp:198] Created db iterator in 37641ns > I1014 17:34:11.227159 27650 leveldb.cpp:204] Seeked to beginning of db in > 14959ns > I1014 17:34:11.227283 27650 leveldb.cpp:273] Iterated through 0 keys in the > db in 14672ns > I1014 17:34:11.227449 27650 replica.cpp:746] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1014 17:34:11.228469 27680 recover.cpp:449] Starting replica recovery > I1014 17:34:11.229202 27673 recover.cpp:475] Replica is in EMPTY status > I1014 17:34:11.231384 27673 replica.cpp:642] Replica in EMPTY status received > a broadcasted recover request from (10262)@172.17.2.194:37545 > I1014 17:34:11.231745 27673 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1014 17:34:11.234242 27680 master.cpp:376] Master > 0cc41e7f-8d87-4c2f-9543-3f7198f9fdaf (23af00e0dbe0) started on > 172.17.2.194:37545 > I1014 17:34:11.234283 27680 master.cpp:378] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="false" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" > --work_dir="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/master" > --zk_session_timeout="10secs" > I1014 17:34:11.234679 27680 master.cpp:425] Master allowing unauthenticated > frameworks to register > I1014 17:34:11.234694 27680 master.cpp:428] Master only allowing > authenticated slaves to register > I1014 17:34:11.234705 27680 credentials.hpp:37] Loading credentials for > authentication from > '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials' > I1014 17:34:11.235251 27673 recover.cpp:566] Updating replica status to > STARTING > I1014 17:34:11.235857 27680 master.cpp:467] Using default 'crammd5' > authenticator > I1014 17:34:11.236006 27680 master.cpp:504] Authorization enabled > I1014 17:34:11.236187 27673 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 729504ns > I1014
[jira] [Commented] (MESOS-3733) ContentType/SchedulerTest.Suppress/0 is flaky
[ https://issues.apache.org/jira/browse/MESOS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957397#comment-14957397 ] Anand Mazumdar commented on MESOS-3733: --- [~gyliu] Can you take a look since it looks related to https://reviews.apache.org/r/38124 ? > ContentType/SchedulerTest.Suppress/0 is flaky > - > > Key: MESOS-3733 > URL: https://issues.apache.org/jira/browse/MESOS-3733 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Reporter: Anand Mazumdar > > Showed up on ASF CI: > https://builds.apache.org/job/Mesos/931/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console > {code} > [ RUN ] ContentType/SchedulerTest.Suppress/0 > Using temporary directory '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi' > I1014 17:34:11.225731 27650 leveldb.cpp:176] Opened db in 2.974504ms > I1014 17:34:11.226856 27650 leveldb.cpp:183] Compacted db in 980779ns > I1014 17:34:11.227028 27650 leveldb.cpp:198] Created db iterator in 37641ns > I1014 17:34:11.227159 27650 leveldb.cpp:204] Seeked to beginning of db in > 14959ns > I1014 17:34:11.227283 27650 leveldb.cpp:273] Iterated through 0 keys in the > db in 14672ns > I1014 17:34:11.227449 27650 replica.cpp:746] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1014 17:34:11.228469 27680 recover.cpp:449] Starting replica recovery > I1014 17:34:11.229202 27673 recover.cpp:475] Replica is in EMPTY status > I1014 17:34:11.231384 27673 replica.cpp:642] Replica in EMPTY status received > a broadcasted recover request from (10262)@172.17.2.194:37545 > I1014 17:34:11.231745 27673 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1014 17:34:11.234242 27680 master.cpp:376] Master > 0cc41e7f-8d87-4c2f-9543-3f7198f9fdaf (23af00e0dbe0) started on > 172.17.2.194:37545 > I1014 17:34:11.234283 27680 master.cpp:378] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="false" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" > --work_dir="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/master" > --zk_session_timeout="10secs" > I1014 17:34:11.234679 27680 master.cpp:425] Master allowing unauthenticated > frameworks to register > I1014 17:34:11.234694 27680 master.cpp:428] Master only allowing > authenticated slaves to register > I1014 17:34:11.234705 27680 credentials.hpp:37] Loading credentials for > authentication from > '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials' > I1014 17:34:11.235251 27673 recover.cpp:566] Updating replica status to > STARTING > I1014 17:34:11.235857 27680 master.cpp:467] Using default 'crammd5' > authenticator > I1014 17:34:11.236006 27680 master.cpp:504] Authorization enabled > I1014 17:34:11.236187 27673 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 729504ns > I1014 17:34:11.236224 27673 replica.cpp:323] Persisted replica status to > STARTING > I1014 17:34:11.236227 27678 whitelist_watcher.cpp:79] No whitelist given > I1014 17:34:11.236366 27676 hierarchical.cpp:140] Initialized hierarchical > allocator process > I1014 17:34:11.236495 27677 recover.cpp:475] Replica is in STARTING status > I1014 17:34:11.237670 27678 replica.cpp:642] Replica in STARTING status > received a broadcasted recover request from (10263)@172.17.2.194:37545 > I1014 17:34:11.238782 27673 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1014 17:34:11.238916 27672 master.cpp:1609] The newly elected leader is > master@172.17.2.194:37545 with id 0cc41e7f-8d87-4c2f-9543-3f7198f9fdaf > I1014 17:34:11.238993 27672 master.cpp:1622] Elected as the leading master! > I1014 17:34:11.239013 27672 master.cpp:1382] Recovering from registrar > I1014 17:34:11.239480 27672 recover.cpp:566] Updating replica status to VOTING > I1014 17:34:11.239630 27675 registrar.cpp:309] Recovering registrar > I1014 17:34:11.240074 27673 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 452562ns > I1014 17:34:11.240137 27673 replica.cpp:323] Persisted replica status to >