[jira] [Commented] (MESOS-3733) ContentType/SchedulerTest.Suppress/0 is flaky

2015-10-22 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968843#comment-14968843
 ] 

Guangya Liu commented on MESOS-3733:


RR: https://reviews.apache.org/r/39548/

> ContentType/SchedulerTest.Suppress/0 is flaky
> -
>
> Key: MESOS-3733
> URL: https://issues.apache.org/jira/browse/MESOS-3733
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>  Labels: flaky-test
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/931/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] ContentType/SchedulerTest.Suppress/0
> Using temporary directory '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi'
> I1014 17:34:11.225731 27650 leveldb.cpp:176] Opened db in 2.974504ms
> I1014 17:34:11.226856 27650 leveldb.cpp:183] Compacted db in 980779ns
> I1014 17:34:11.227028 27650 leveldb.cpp:198] Created db iterator in 37641ns
> I1014 17:34:11.227159 27650 leveldb.cpp:204] Seeked to beginning of db in 
> 14959ns
> I1014 17:34:11.227283 27650 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 14672ns
> I1014 17:34:11.227449 27650 replica.cpp:746] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1014 17:34:11.228469 27680 recover.cpp:449] Starting replica recovery
> I1014 17:34:11.229202 27673 recover.cpp:475] Replica is in EMPTY status
> I1014 17:34:11.231384 27673 replica.cpp:642] Replica in EMPTY status received 
> a broadcasted recover request from (10262)@172.17.2.194:37545
> I1014 17:34:11.231745 27673 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1014 17:34:11.234242 27680 master.cpp:376] Master 
> 0cc41e7f-8d87-4c2f-9543-3f7198f9fdaf (23af00e0dbe0) started on 
> 172.17.2.194:37545
> I1014 17:34:11.234283 27680 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/master" 
> --zk_session_timeout="10secs"
> I1014 17:34:11.234679 27680 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1014 17:34:11.234694 27680 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I1014 17:34:11.234705 27680 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials'
> I1014 17:34:11.235251 27673 recover.cpp:566] Updating replica status to 
> STARTING
> I1014 17:34:11.235857 27680 master.cpp:467] Using default 'crammd5' 
> authenticator
> I1014 17:34:11.236006 27680 master.cpp:504] Authorization enabled
> I1014 17:34:11.236187 27673 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 729504ns
> I1014 17:34:11.236224 27673 replica.cpp:323] Persisted replica status to 
> STARTING
> I1014 17:34:11.236227 27678 whitelist_watcher.cpp:79] No whitelist given
> I1014 17:34:11.236366 27676 hierarchical.cpp:140] Initialized hierarchical 
> allocator process
> I1014 17:34:11.236495 27677 recover.cpp:475] Replica is in STARTING status
> I1014 17:34:11.237670 27678 replica.cpp:642] Replica in STARTING status 
> received a broadcasted recover request from (10263)@172.17.2.194:37545
> I1014 17:34:11.238782 27673 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1014 17:34:11.238916 27672 master.cpp:1609] The newly elected leader is 
> master@172.17.2.194:37545 with id 0cc41e7f-8d87-4c2f-9543-3f7198f9fdaf
> I1014 17:34:11.238993 27672 master.cpp:1622] Elected as the leading master!
> I1014 17:34:11.239013 27672 master.cpp:1382] Recovering from registrar
> I1014 17:34:11.239480 27672 recover.cpp:566] Updating replica status to VOTING
> I1014 17:34:11.239630 27675 registrar.cpp:309] Recovering registrar
> I1014 17:34:11.240074 27673 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 452562ns
> I1014 17:34:11.240137 27673 replica.cpp:323] Persisted replica status 

[jira] [Commented] (MESOS-3733) ContentType/SchedulerTest.Suppress/0 is flaky

2015-10-21 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968185#comment-14968185
 ] 

Guangya Liu commented on MESOS-3733:


[~vi...@twitter.com] This is very similar with MESOS-3789 , the difference is 
MESOS-3789 is failed at ContentType/SchedulerTest.Suppress/1 but not 
ContentType/SchedulerTest.Suppress/0 . Can you please show more detail what is 
the difference of ContentType/SchedulerTest.Suppress/0 and 
ContentType/SchedulerTest.Suppress/1 ?

I also tried to reproduce in my local env but failed to reproduce, will check 
more.

{code}
I1021 19:17:43.270341 30954 slave.cpp:2284] Updated checkpointed resources from 
 to 
../../src/tests/scheduler_tests.cpp:1028: Failure
Value of: event.isPending()
  Actual: false
Expected: true
I1021 19:17:43.276475 30920 master.cpp:925] Master terminating
I1021 19:17:43.276880 30949 hierarchical.cpp:364] Removed slave 
242dc5ed-402d-4873-be6d-9bad1f3296f9-S0
I1021 19:17:43.277751 30945 hierarchical.cpp:220] Removed framework 
242dc5ed-402d-4873-be6d-9bad1f3296f9-
I1021 19:17:43.277863 30941 slave.cpp:3258] master@172.17.3.153:57838 exited
W1021 19:17:43.277899 30941 slave.cpp:3261] Master disconnected! Waiting for a 
new master to be elected
I1021 19:17:43.303658 30920 slave.cpp:606] Slave terminating
[  FAILED  ] ContentType/SchedulerTest.Suppress/1, where GetParam() = 
application/json (172 ms)
{code}

> ContentType/SchedulerTest.Suppress/0 is flaky
> -
>
> Key: MESOS-3733
> URL: https://issues.apache.org/jira/browse/MESOS-3733
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>  Labels: flaky-test
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/931/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] ContentType/SchedulerTest.Suppress/0
> Using temporary directory '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi'
> I1014 17:34:11.225731 27650 leveldb.cpp:176] Opened db in 2.974504ms
> I1014 17:34:11.226856 27650 leveldb.cpp:183] Compacted db in 980779ns
> I1014 17:34:11.227028 27650 leveldb.cpp:198] Created db iterator in 37641ns
> I1014 17:34:11.227159 27650 leveldb.cpp:204] Seeked to beginning of db in 
> 14959ns
> I1014 17:34:11.227283 27650 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 14672ns
> I1014 17:34:11.227449 27650 replica.cpp:746] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1014 17:34:11.228469 27680 recover.cpp:449] Starting replica recovery
> I1014 17:34:11.229202 27673 recover.cpp:475] Replica is in EMPTY status
> I1014 17:34:11.231384 27673 replica.cpp:642] Replica in EMPTY status received 
> a broadcasted recover request from (10262)@172.17.2.194:37545
> I1014 17:34:11.231745 27673 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1014 17:34:11.234242 27680 master.cpp:376] Master 
> 0cc41e7f-8d87-4c2f-9543-3f7198f9fdaf (23af00e0dbe0) started on 
> 172.17.2.194:37545
> I1014 17:34:11.234283 27680 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/master" 
> --zk_session_timeout="10secs"
> I1014 17:34:11.234679 27680 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1014 17:34:11.234694 27680 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I1014 17:34:11.234705 27680 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials'
> I1014 17:34:11.235251 27673 recover.cpp:566] Updating replica status to 
> STARTING
> I1014 17:34:11.235857 27680 master.cpp:467] Using default 'crammd5' 
> authenticator
> I1014 17:34:11.236006 27680 master.cpp:504] Authorization enabled
> I1014 17:34:11.236187 27673 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 729504ns
> I1014 

[jira] [Commented] (MESOS-3733) ContentType/SchedulerTest.Suppress/0 is flaky

2015-10-14 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957397#comment-14957397
 ] 

Anand Mazumdar commented on MESOS-3733:
---

[~gyliu] Can you take a look since it looks related to 
https://reviews.apache.org/r/38124 ?

> ContentType/SchedulerTest.Suppress/0 is flaky
> -
>
> Key: MESOS-3733
> URL: https://issues.apache.org/jira/browse/MESOS-3733
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Anand Mazumdar
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/931/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] ContentType/SchedulerTest.Suppress/0
> Using temporary directory '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi'
> I1014 17:34:11.225731 27650 leveldb.cpp:176] Opened db in 2.974504ms
> I1014 17:34:11.226856 27650 leveldb.cpp:183] Compacted db in 980779ns
> I1014 17:34:11.227028 27650 leveldb.cpp:198] Created db iterator in 37641ns
> I1014 17:34:11.227159 27650 leveldb.cpp:204] Seeked to beginning of db in 
> 14959ns
> I1014 17:34:11.227283 27650 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 14672ns
> I1014 17:34:11.227449 27650 replica.cpp:746] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1014 17:34:11.228469 27680 recover.cpp:449] Starting replica recovery
> I1014 17:34:11.229202 27673 recover.cpp:475] Replica is in EMPTY status
> I1014 17:34:11.231384 27673 replica.cpp:642] Replica in EMPTY status received 
> a broadcasted recover request from (10262)@172.17.2.194:37545
> I1014 17:34:11.231745 27673 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1014 17:34:11.234242 27680 master.cpp:376] Master 
> 0cc41e7f-8d87-4c2f-9543-3f7198f9fdaf (23af00e0dbe0) started on 
> 172.17.2.194:37545
> I1014 17:34:11.234283 27680 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/master" 
> --zk_session_timeout="10secs"
> I1014 17:34:11.234679 27680 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1014 17:34:11.234694 27680 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I1014 17:34:11.234705 27680 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials'
> I1014 17:34:11.235251 27673 recover.cpp:566] Updating replica status to 
> STARTING
> I1014 17:34:11.235857 27680 master.cpp:467] Using default 'crammd5' 
> authenticator
> I1014 17:34:11.236006 27680 master.cpp:504] Authorization enabled
> I1014 17:34:11.236187 27673 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 729504ns
> I1014 17:34:11.236224 27673 replica.cpp:323] Persisted replica status to 
> STARTING
> I1014 17:34:11.236227 27678 whitelist_watcher.cpp:79] No whitelist given
> I1014 17:34:11.236366 27676 hierarchical.cpp:140] Initialized hierarchical 
> allocator process
> I1014 17:34:11.236495 27677 recover.cpp:475] Replica is in STARTING status
> I1014 17:34:11.237670 27678 replica.cpp:642] Replica in STARTING status 
> received a broadcasted recover request from (10263)@172.17.2.194:37545
> I1014 17:34:11.238782 27673 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1014 17:34:11.238916 27672 master.cpp:1609] The newly elected leader is 
> master@172.17.2.194:37545 with id 0cc41e7f-8d87-4c2f-9543-3f7198f9fdaf
> I1014 17:34:11.238993 27672 master.cpp:1622] Elected as the leading master!
> I1014 17:34:11.239013 27672 master.cpp:1382] Recovering from registrar
> I1014 17:34:11.239480 27672 recover.cpp:566] Updating replica status to VOTING
> I1014 17:34:11.239630 27675 registrar.cpp:309] Recovering registrar
> I1014 17:34:11.240074 27673 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 452562ns
> I1014 17:34:11.240137 27673 replica.cpp:323] Persisted replica status to 
>