[ https://issues.apache.org/jira/browse/MESOS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065720#comment-14065720 ]
Timothy Chen commented on MESOS-1613: ------------------------------------- >From the log it's actually failing because it couldn't find the healthy_state >set when the task is killed, which is added in 169ec89. Both Apache Jenkins and on my Ubuntu they both passes, so I'm not really sure how this is failing as it should get picked up. How does the Twitter CI run? Does it do a make clean and re clone before running it again? > HealthCheckTest.ConsecutiveFailures is flaky > -------------------------------------------- > > Key: MESOS-1613 > URL: https://issues.apache.org/jira/browse/MESOS-1613 > Project: Mesos > Issue Type: Bug > Components: test > Affects Versions: 0.20.0 > Environment: Ubuntu 10.04 GCC > Reporter: Vinod Kone > Assignee: Timothy Chen > > {code} > [ RUN ] HealthCheckTest.ConsecutiveFailures > Using temporary directory '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV' > I0717 04:39:59.288471 5009 leveldb.cpp:176] Opened db in 21.575631ms > I0717 04:39:59.295274 5009 leveldb.cpp:183] Compacted db in 6.471982ms > I0717 04:39:59.295552 5009 leveldb.cpp:198] Created db iterator in 16783ns > I0717 04:39:59.296026 5009 leveldb.cpp:204] Seeked to beginning of db in > 2125ns > I0717 04:39:59.296257 5009 leveldb.cpp:273] Iterated through 0 keys in the > db in 10747ns > I0717 04:39:59.296584 5009 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0717 04:39:59.297322 5033 recover.cpp:425] Starting replica recovery > I0717 04:39:59.297413 5033 recover.cpp:451] Replica is in EMPTY status > I0717 04:39:59.297824 5033 replica.cpp:638] Replica in EMPTY status received > a broadcasted recover request > I0717 04:39:59.297899 5033 recover.cpp:188] Received a recover response from > a replica in EMPTY status > I0717 04:39:59.297997 5033 recover.cpp:542] Updating replica status to > STARTING > I0717 04:39:59.301985 5031 master.cpp:288] Master > 20140717-043959-16842879-40280-5009 (lucid) started on 127.0.1.1:40280 > I0717 04:39:59.302026 5031 master.cpp:325] Master only allowing > authenticated frameworks to register > I0717 04:39:59.302032 5031 master.cpp:330] Master only allowing > authenticated slaves to register > I0717 04:39:59.302039 5031 credentials.hpp:36] Loading credentials for > authentication from > '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV/credentials' > I0717 04:39:59.302283 5031 master.cpp:359] Authorization enabled > I0717 04:39:59.302971 5031 hierarchical_allocator_process.hpp:301] > Initializing hierarchical allocator process with master : > master@127.0.1.1:40280 > I0717 04:39:59.303022 5031 master.cpp:122] No whitelist given. Advertising > offers for all slaves > I0717 04:39:59.303390 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 5.325097ms > I0717 04:39:59.303419 5033 replica.cpp:320] Persisted replica status to > STARTING > I0717 04:39:59.304076 5030 master.cpp:1128] The newly elected leader is > master@127.0.1.1:40280 with id 20140717-043959-16842879-40280-5009 > I0717 04:39:59.304095 5030 master.cpp:1141] Elected as the leading master! > I0717 04:39:59.304102 5030 master.cpp:959] Recovering from registrar > I0717 04:39:59.304182 5030 registrar.cpp:313] Recovering registrar > I0717 04:39:59.304635 5033 recover.cpp:451] Replica is in STARTING status > I0717 04:39:59.304962 5033 replica.cpp:638] Replica in STARTING status > received a broadcasted recover request > I0717 04:39:59.305026 5033 recover.cpp:188] Received a recover response from > a replica in STARTING status > I0717 04:39:59.305130 5033 recover.cpp:542] Updating replica status to VOTING > I0717 04:39:59.310416 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 5.204157ms > I0717 04:39:59.310459 5033 replica.cpp:320] Persisted replica status to > VOTING > I0717 04:39:59.310534 5033 recover.cpp:556] Successfully joined the Paxos > group > I0717 04:39:59.310607 5033 recover.cpp:440] Recover process terminated > I0717 04:39:59.310773 5033 log.cpp:656] Attempting to start the writer > I0717 04:39:59.311157 5033 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I0717 04:39:59.313451 5033 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 2.271822ms > I0717 04:39:59.313627 5033 replica.cpp:342] Persisted promised to 1 > I0717 04:39:59.318038 5031 coordinator.cpp:230] Coordinator attemping to > fill missing position > I0717 04:39:59.318430 5031 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I0717 04:39:59.323459 5031 leveldb.cpp:343] Persisting action (8 bytes) to > leveldb took 5.004323ms > I0717 04:39:59.323493 5031 replica.cpp:676] Persisted action at 0 > I0717 04:39:59.323799 5031 replica.cpp:508] Replica received write request > for position 0 > I0717 04:39:59.323837 5031 leveldb.cpp:438] Reading position from leveldb > took 21901ns > I0717 04:39:59.329038 5031 leveldb.cpp:343] Persisting action (14 bytes) to > leveldb took 5.175998ms > I0717 04:39:59.329244 5031 replica.cpp:676] Persisted action at 0 > I0717 04:39:59.332335 5037 replica.cpp:655] Replica received learned notice > for position 0 > I0717 04:39:59.337894 5037 leveldb.cpp:343] Persisting action (16 bytes) to > leveldb took 5.532845ms > I0717 04:39:59.337926 5037 replica.cpp:676] Persisted action at 0 > I0717 04:39:59.337934 5037 replica.cpp:661] Replica learned NOP action at > position 0 > I0717 04:39:59.338152 5037 log.cpp:672] Writer started with ending position 0 > I0717 04:39:59.338438 5037 leveldb.cpp:438] Reading position from leveldb > took 15499ns > I0717 04:39:59.339648 5037 registrar.cpp:346] Successfully fetched the > registry (0B) > I0717 04:39:59.339676 5037 registrar.cpp:422] Attempting to update the > 'registry' > I0717 04:39:59.340885 5037 log.cpp:680] Attempting to append 117 bytes to > the log > I0717 04:39:59.340929 5037 coordinator.cpp:340] Coordinator attempting to > write APPEND action at position 1 > I0717 04:39:59.341130 5037 replica.cpp:508] Replica received write request > for position 1 > I0717 04:39:59.343456 5037 leveldb.cpp:343] Persisting action (134 bytes) to > leveldb took 2.30603ms > I0717 04:39:59.343638 5037 replica.cpp:676] Persisted action at 1 > I0717 04:39:59.352447 5034 replica.cpp:655] Replica received learned notice > for position 1 > I0717 04:39:59.365022 5034 leveldb.cpp:343] Persisting action (136 bytes) to > leveldb took 12.534333ms > I0717 04:39:59.365072 5034 replica.cpp:676] Persisted action at 1 > I0717 04:39:59.365082 5034 replica.cpp:661] Replica learned APPEND action at > position 1 > I0717 04:39:59.365470 5034 registrar.cpp:479] Successfully updated 'registry' > I0717 04:39:59.365520 5034 registrar.cpp:372] Successfully recovered > registrar > I0717 04:39:59.365578 5034 log.cpp:699] Attempting to truncate the log to 1 > I0717 04:39:59.365656 5034 master.cpp:986] Recovered 0 slaves from the > Registry (81B) ; allowing 10mins for slaves to re-register > I0717 04:39:59.365696 5034 coordinator.cpp:340] Coordinator attempting to > write TRUNCATE action at position 2 > I0717 04:39:59.366221 5030 replica.cpp:508] Replica received write request > for position 2 > I0717 04:39:59.374423 5030 leveldb.cpp:343] Persisting action (16 bytes) to > leveldb took 8.172997ms > I0717 04:39:59.374459 5030 replica.cpp:676] Persisted action at 2 > I0717 04:39:59.374671 5030 replica.cpp:655] Replica received learned notice > for position 2 > I0717 04:39:59.383867 5030 leveldb.cpp:343] Persisting action (18 bytes) to > leveldb took 9.168201ms > I0717 04:39:59.383939 5030 leveldb.cpp:401] Deleting ~1 keys from leveldb > took 33989ns > I0717 04:39:59.383952 5030 replica.cpp:676] Persisted action at 2 > I0717 04:39:59.383960 5030 replica.cpp:661] Replica learned TRUNCATE action > at position 2 > I0717 04:39:59.387097 5009 containerizer.cpp:124] Using isolation: > posix/cpu,posix/mem > I0717 04:39:59.391551 5009 sched.cpp:139] Version: 0.20.0 > I0717 04:39:59.392025 5036 slave.cpp:168] Slave started on > 116)@127.0.1.1:40280 > I0717 04:39:59.392042 5036 credentials.hpp:84] Loading credential for > authentication from > '/tmp/HealthCheckTest_ConsecutiveFailures_sOluTu/credential' > I0717 04:39:59.392148 5036 slave.cpp:266] Slave using credential for: > test-principal > I0717 04:39:59.392262 5036 slave.cpp:279] Slave resources: cpus(*):2; > mem(*):1024; disk(*):1024; ports(*):[31000-32000] > I0717 04:39:59.392325 5036 slave.cpp:324] Slave hostname: lucid > I0717 04:39:59.392336 5036 slave.cpp:325] Slave checkpoint: false > I0717 04:39:59.392853 5036 state.cpp:33] Recovering state from > '/tmp/HealthCheckTest_ConsecutiveFailures_sOluTu/meta' > I0717 04:39:59.393005 5036 sched.cpp:235] New master detected at > master@127.0.1.1:40280 > I0717 04:39:59.393023 5036 sched.cpp:285] Authenticating with master > master@127.0.1.1:40280 > I0717 04:39:59.393102 5036 status_update_manager.cpp:193] Recovering status > update manager > I0717 04:39:59.393179 5036 authenticatee.hpp:128] Creating new client SASL > connection > I0717 04:39:59.393326 5036 master.cpp:3512] Authenticating > scheduler-37c02322-fe0b-4cd3-b1b5-b5aab929c19c@127.0.1.1:40280 > I0717 04:39:59.393432 5036 containerizer.cpp:287] Recovering containerizer > I0717 04:39:59.393569 5036 authenticator.hpp:156] Creating new server SASL > connection > I0717 04:39:59.393748 5036 authenticatee.hpp:219] Received SASL > authentication mechanisms: CRAM-MD5 > I0717 04:39:59.393764 5036 authenticatee.hpp:245] Attempting to authenticate > with mechanism 'CRAM-MD5' > I0717 04:39:59.393842 5036 authenticator.hpp:262] Received SASL > authentication start > I0717 04:39:59.393888 5036 authenticator.hpp:384] Authentication requires > more steps > I0717 04:39:59.393934 5036 slave.cpp:3127] Finished recovery > I0717 04:39:59.394132 5036 authenticatee.hpp:265] Received SASL > authentication step > I0717 04:39:59.394208 5036 authenticator.hpp:290] Received SASL > authentication step > I0717 04:39:59.394234 5036 auxprop.cpp:81] Request to lookup properties for > user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' > SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false > I0717 04:39:59.394243 5036 auxprop.cpp:153] Looking up auxiliary property > '*userPassword' > I0717 04:39:59.394255 5036 auxprop.cpp:153] Looking up auxiliary property > '*cmusaslsecretCRAM-MD5' > I0717 04:39:59.394268 5036 auxprop.cpp:81] Request to lookup properties for > user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' > SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true > I0717 04:39:59.394274 5036 auxprop.cpp:103] Skipping auxiliary property > '*userPassword' since SASL_AUXPROP_AUTHZID == true > I0717 04:39:59.394279 5036 auxprop.cpp:103] Skipping auxiliary property > '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true > I0717 04:39:59.394290 5036 authenticator.hpp:376] Authentication success > I0717 04:39:59.394330 5036 slave.cpp:600] New master detected at > master@127.0.1.1:40280 > I0717 04:39:59.394348 5036 slave.cpp:676] Authenticating with master > master@127.0.1.1:40280 > I0717 04:39:59.394392 5036 slave.cpp:649] Detecting new master > I0717 04:39:59.394434 5036 authenticatee.hpp:305] Authentication success > I0717 04:39:59.394461 5036 master.cpp:3552] Successfully authenticated > principal 'test-principal' at > scheduler-37c02322-fe0b-4cd3-b1b5-b5aab929c19c@127.0.1.1:40280 > I0717 04:39:59.394516 5036 status_update_manager.cpp:167] New master > detected at master@127.0.1.1:40280 > I0717 04:39:59.394542 5036 authenticatee.hpp:128] Creating new client SASL > connection > I0717 04:39:59.394605 5036 sched.cpp:359] Successfully authenticated with > master master@127.0.1.1:40280 > I0717 04:39:59.394618 5036 sched.cpp:478] Sending registration request to > master@127.0.1.1:40280 > I0717 04:39:59.394666 5036 master.cpp:3512] Authenticating > slave(116)@127.0.1.1:40280 > I0717 04:39:59.394728 5036 master.cpp:1247] Received registration request > from scheduler-37c02322-fe0b-4cd3-b1b5-b5aab929c19c@127.0.1.1:40280 > I0717 04:39:59.394757 5036 master.cpp:1207] Authorizing framework principal > 'test-principal' to receive offers for role '*' > I0717 04:39:59.394816 5036 authenticator.hpp:156] Creating new server SASL > connection > I0717 04:39:59.394917 5036 authenticatee.hpp:219] Received SASL > authentication mechanisms: CRAM-MD5 > I0717 04:39:59.394932 5036 authenticatee.hpp:245] Attempting to authenticate > with mechanism 'CRAM-MD5' > I0717 04:39:59.394989 5036 master.cpp:1306] Registering framework > 20140717-043959-16842879-40280-5009-0000 at > scheduler-37c02322-fe0b-4cd3-b1b5-b5aab929c19c@127.0.1.1:40280 > I0717 04:39:59.395094 5036 authenticator.hpp:262] Received SASL > authentication start > I0717 04:39:59.395126 5036 authenticator.hpp:384] Authentication requires > more steps > I0717 04:39:59.395166 5036 sched.cpp:409] Framework registered with > 20140717-043959-16842879-40280-5009-0000 > I0717 04:39:59.395187 5036 sched.cpp:423] Scheduler::registered took 9209ns > I0717 04:39:59.395221 5036 hierarchical_allocator_process.hpp:331] Added > framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:39:59.395228 5036 hierarchical_allocator_process.hpp:724] No > resources available to allocate! > I0717 04:39:59.395234 5036 hierarchical_allocator_process.hpp:686] Performed > allocation for 0 slaves in 5941ns > I0717 04:39:59.395278 5036 authenticatee.hpp:265] Received SASL > authentication step > I0717 04:39:59.395310 5036 authenticator.hpp:290] Received SASL > authentication step > I0717 04:39:59.395323 5036 auxprop.cpp:81] Request to lookup properties for > user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' > SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false > I0717 04:39:59.395329 5036 auxprop.cpp:153] Looking up auxiliary property > '*userPassword' > I0717 04:39:59.395336 5036 auxprop.cpp:153] Looking up auxiliary property > '*cmusaslsecretCRAM-MD5' > I0717 04:39:59.395344 5036 auxprop.cpp:81] Request to lookup properties for > user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' > SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true > I0717 04:39:59.395349 5036 auxprop.cpp:103] Skipping auxiliary property > '*userPassword' since SASL_AUXPROP_AUTHZID == true > I0717 04:39:59.395354 5036 auxprop.cpp:103] Skipping auxiliary property > '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true > I0717 04:39:59.395362 5036 authenticator.hpp:376] Authentication success > I0717 04:39:59.395385 5036 authenticatee.hpp:305] Authentication success > I0717 04:39:59.395402 5036 master.cpp:3552] Successfully authenticated > principal 'test-principal' at slave(116)@127.0.1.1:40280 > I0717 04:39:59.395457 5036 slave.cpp:733] Successfully authenticated with > master master@127.0.1.1:40280 > I0717 04:39:59.395503 5036 slave.cpp:971] Will retry registration in > 15.135662ms if necessary > I0717 04:39:59.395601 5037 master.cpp:2789] Registering slave at > slave(116)@127.0.1.1:40280 (lucid) with id > 20140717-043959-16842879-40280-5009-0 > I0717 04:39:59.395699 5037 registrar.cpp:422] Attempting to update the > 'registry' > I0717 04:39:59.396891 5031 log.cpp:680] Attempting to append 287 bytes to > the log > I0717 04:39:59.396939 5031 coordinator.cpp:340] Coordinator attempting to > write APPEND action at position 3 > I0717 04:39:59.397140 5031 replica.cpp:508] Replica received write request > for position 3 > I0717 04:39:59.405050 5031 leveldb.cpp:343] Persisting action (306 bytes) to > leveldb took 7.878137ms > I0717 04:39:59.405097 5031 replica.cpp:676] Persisted action at 3 > I0717 04:39:59.405342 5031 replica.cpp:655] Replica received learned notice > for position 3 > I0717 04:39:59.412286 5037 slave.cpp:971] Will retry registration in > 9.096666ms if necessary > I0717 04:39:59.412365 5037 master.cpp:2777] Ignoring register slave message > from slave(116)@127.0.1.1:40280 (lucid) as admission is already in progress > I0717 04:39:59.414619 5031 leveldb.cpp:343] Persisting action (308 bytes) to > leveldb took 9.256058ms > I0717 04:39:59.414646 5031 replica.cpp:676] Persisted action at 3 > I0717 04:39:59.414654 5031 replica.cpp:661] Replica learned APPEND action at > position 3 > I0717 04:39:59.415129 5031 registrar.cpp:479] Successfully updated 'registry' > I0717 04:39:59.415212 5031 log.cpp:699] Attempting to truncate the log to 3 > I0717 04:39:59.415269 5031 master.cpp:2829] Registered slave > 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) > I0717 04:39:59.415293 5031 master.cpp:3980] Adding slave > 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) > with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] > I0717 04:39:59.415415 5031 coordinator.cpp:340] Coordinator attempting to > write TRUNCATE action at position 4 > I0717 04:39:59.415485 5031 slave.cpp:767] Registered with master > master@127.0.1.1:40280; given slave ID 20140717-043959-16842879-40280-5009-0 > I0717 04:39:59.415550 5031 hierarchical_allocator_process.hpp:444] Added > slave 20140717-043959-16842879-40280-5009-0 (lucid) with cpus(*):2; > mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; > mem(*):1024; disk(*):1024; ports(*):[31000-32000] available) > I0717 04:39:59.415602 5031 hierarchical_allocator_process.hpp:750] Offering > cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave > 20140717-043959-16842879-40280-5009-0 to framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:39:59.415699 5031 hierarchical_allocator_process.hpp:706] Performed > allocation for slave 20140717-043959-16842879-40280-5009-0 in 119984ns > I0717 04:39:59.415757 5031 slave.cpp:2324] Received ping from > slave-observer(104)@127.0.1.1:40280 > I0717 04:39:59.415824 5031 master.hpp:800] Adding offer > 20140717-043959-16842879-40280-5009-0 with resources cpus(*):2; mem(*):1024; > disk(*):1024; ports(*):[31000-32000] on slave > 20140717-043959-16842879-40280-5009-0 (lucid) > I0717 04:39:59.415864 5031 master.cpp:3459] Sending 1 offers to framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:39:59.416179 5033 replica.cpp:508] Replica received write request > for position 4 > I0717 04:39:59.416697 5031 sched.cpp:546] Scheduler::resourceOffers took > 34644ns > I0717 04:39:59.417361 5037 master.hpp:810] Removing offer > 20140717-043959-16842879-40280-5009-0 with resources cpus(*):2; mem(*):1024; > disk(*):1024; ports(*):[31000-32000] on slave > 20140717-043959-16842879-40280-5009-0 (lucid) > I0717 04:39:59.417414 5037 master.cpp:2133] Processing reply for offers: [ > 20140717-043959-16842879-40280-5009-0 ] on slave > 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) > for framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:39:59.417453 5037 master.cpp:2219] Authorizing framework principal > 'test-principal' to launch task 1 as user 'jenkins' > I0717 04:39:59.417733 5037 master.hpp:772] Adding task 1 with resources > cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave > 20140717-043959-16842879-40280-5009-0 (lucid) > I0717 04:39:59.417768 5037 master.cpp:2285] Launching task 1 of framework > 20140717-043959-16842879-40280-5009-0000 with resources cpus(*):2; > mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave > 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) > I0717 04:39:59.417913 5037 slave.cpp:1002] Got assigned task 1 for framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:39:59.418223 5037 slave.cpp:1112] Launching task 1 for framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:39:59.419910 5037 slave.cpp:1222] Queuing task '1' for executor 1 > of framework '20140717-043959-16842879-40280-5009-0000 > I0717 04:39:59.420047 5037 containerizer.cpp:427] Starting container > '31868c4e-911b-4198-88b2-2ec5a7022adb' for executor '1' of framework > '20140717-043959-16842879-40280-5009-0000' > I0717 04:39:59.420281 5037 slave.cpp:561] Successfully attached file > '/tmp/HealthCheckTest_ConsecutiveFailures_sOluTu/slaves/20140717-043959-16842879-40280-5009-0/frameworks/20140717-043959-16842879-40280-5009-0000/executors/1/runs/31868c4e-911b-4198-88b2-2ec5a7022adb' > I0717 04:39:59.424384 5033 leveldb.cpp:343] Persisting action (16 bytes) to > leveldb took 8.154005ms > I0717 04:39:59.424438 5033 replica.cpp:676] Persisted action at 4 > I0717 04:39:59.424990 5035 replica.cpp:655] Replica received learned notice > for position 4 > I0717 04:39:59.425606 5037 launcher.cpp:137] Forked child with pid '5746' > for container '31868c4e-911b-4198-88b2-2ec5a7022adb' > I0717 04:39:59.428802 5037 containerizer.cpp:537] Fetching URIs for > container '31868c4e-911b-4198-88b2-2ec5a7022adb' using command > '/var/jenkins/workspace/mesos-ubuntu-10.04-gcc/src/mesos-fetcher' > I0717 04:39:59.434170 5035 leveldb.cpp:343] Persisting action (18 bytes) to > leveldb took 9.084582ms > I0717 04:39:59.434267 5035 leveldb.cpp:401] Deleting ~2 keys from leveldb > took 45795ns > I0717 04:39:59.434283 5035 replica.cpp:676] Persisted action at 4 > I0717 04:39:59.434299 5035 replica.cpp:661] Replica learned TRUNCATE action > at position 4 > 2014-07-17 > 04:40:00,248:5009(0x2ad17a97f700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:46988] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > I0717 04:40:00.276789 5035 slave.cpp:2469] Monitoring executor '1' of > framework '20140717-043959-16842879-40280-5009-0000' in container > '31868c4e-911b-4198-88b2-2ec5a7022adb' > I0717 04:40:00.312453 5035 hierarchical_allocator_process.hpp:686] Performed > allocation for 1 slaves in 43185ns > I0717 04:40:00.318514 5035 slave.cpp:1733] Got registration for executor '1' > of framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:00.319555 5035 slave.cpp:1852] Flushing queued task 1 for > executor '1' of framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:00.320169 5038 process.cpp:1098] Socket closed while receiving > I0717 04:40:00.325037 5035 slave.cpp:2087] Handling status update > TASK_RUNNING (UUID: a769a032-5aa7-41fb-9830-d6745df0e18e) for task 1 of > framework 20140717-043959-16842879-40280-5009-0000 from > executor(1)@127.0.1.1:59033 > I0717 04:40:00.325140 5035 status_update_manager.cpp:320] Received status > update TASK_RUNNING (UUID: a769a032-5aa7-41fb-9830-d6745df0e18e) for task 1 > of framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:00.325155 5035 status_update_manager.cpp:499] Creating > StatusUpdate stream for task 1 of framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:00.325295 5035 status_update_manager.cpp:373] Forwarding status > update TASK_RUNNING (UUID: a769a032-5aa7-41fb-9830-d6745df0e18e) for task 1 > of framework 20140717-043959-16842879-40280-5009-0000 to > master@127.0.1.1:40280 > I0717 04:40:00.325455 5035 master.cpp:3120] Status update TASK_RUNNING > (UUID: a769a032-5aa7-41fb-9830-d6745df0e18e) for task 1 of framework > 20140717-043959-16842879-40280-5009-0000 from slave > 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) > I0717 04:40:00.325495 5035 slave.cpp:2245] Status update manager > successfully handled status update TASK_RUNNING (UUID: > a769a032-5aa7-41fb-9830-d6745df0e18e) for task 1 of framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:00.325506 5035 slave.cpp:2251] Sending acknowledgement for > status update TASK_RUNNING (UUID: a769a032-5aa7-41fb-9830-d6745df0e18e) for > task 1 of framework 20140717-043959-16842879-40280-5009-0000 to > executor(1)@127.0.1.1:59033 > I0717 04:40:00.325717 5035 sched.cpp:637] Scheduler::statusUpdate took > 31980ns > I0717 04:40:00.325794 5035 master.cpp:2639] Forwarding status update > acknowledgement a769a032-5aa7-41fb-9830-d6745df0e18e for task 1 of framework > 20140717-043959-16842879-40280-5009-0000 to slave > 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) > I0717 04:40:00.325899 5035 status_update_manager.cpp:398] Received status > update acknowledgement (UUID: a769a032-5aa7-41fb-9830-d6745df0e18e) for task > 1 of framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:00.325950 5035 slave.cpp:1673] Status update manager > successfully handled status update acknowledgement (UUID: > a769a032-5aa7-41fb-9830-d6745df0e18e) for task 1 of framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:00.326431 5038 process.cpp:1098] Socket closed while receiving > I0717 04:40:01.322517 5031 hierarchical_allocator_process.hpp:686] Performed > allocation for 1 slaves in 42022ns > I0717 04:40:01.376111 5031 slave.cpp:2087] Handling status update > TASK_RUNNING (UUID: 8e06aa77-eb92-486e-a20f-0d67752a7754) for task 1 in > health state unhealthy of framework 20140717-043959-16842879-40280-5009-0000 > from executor(1)@127.0.1.1:59033 > I0717 04:40:01.376238 5031 status_update_manager.cpp:320] Received status > update TASK_RUNNING (UUID: 8e06aa77-eb92-486e-a20f-0d67752a7754) for task 1 > in health state unhealthy of framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:01.376266 5031 status_update_manager.cpp:373] Forwarding status > update TASK_RUNNING (UUID: 8e06aa77-eb92-486e-a20f-0d67752a7754) for task 1 > in health state unhealthy of framework > 20140717-043959-16842879-40280-5009-0000 to master@127.0.1.1:40280 > I0717 04:40:01.376397 5031 master.cpp:3120] Status update TASK_RUNNING > (UUID: 8e06aa77-eb92-486e-a20f-0d67752a7754) for task 1 in health state > unhealthy of framework 20140717-043959-16842879-40280-5009-0000 from slave > 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) > I0717 04:40:01.376454 5031 slave.cpp:2245] Status update manager > successfully handled status update TASK_RUNNING (UUID: > 8e06aa77-eb92-486e-a20f-0d67752a7754) for task 1 in health state unhealthy of > framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:01.376466 5031 slave.cpp:2251] Sending acknowledgement for > status update TASK_RUNNING (UUID: 8e06aa77-eb92-486e-a20f-0d67752a7754) for > task 1 in health state unhealthy of framework > 20140717-043959-16842879-40280-5009-0000 to executor(1)@127.0.1.1:59033 > I0717 04:40:01.376648 5031 sched.cpp:637] Scheduler::statusUpdate took > 18560ns > I0717 04:40:01.376713 5031 master.cpp:2639] Forwarding status update > acknowledgement 8e06aa77-eb92-486e-a20f-0d67752a7754 for task 1 of framework > 20140717-043959-16842879-40280-5009-0000 to slave > 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) > I0717 04:40:01.376857 5035 status_update_manager.cpp:398] Received status > update acknowledgement (UUID: 8e06aa77-eb92-486e-a20f-0d67752a7754) for task > 1 of framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:01.376911 5035 slave.cpp:1673] Status update manager > successfully handled status update acknowledgement (UUID: > 8e06aa77-eb92-486e-a20f-0d67752a7754) for task 1 of framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:01.377177 5038 process.cpp:1098] Socket closed while receiving > 2014-07-17 > 04:40:04,062:5009(0x2ad17a97f700):ZOO_WARN@zookeeper_interest@1557: Exceeded > deadline by 483ms > 2014-07-17 > 04:40:04,062:5009(0x2ad17a97f700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:46988] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > I0717 04:40:04.063637 5030 hierarchical_allocator_process.hpp:686] Performed > allocation for 1 slaves in 51405ns > I0717 04:40:04.134610 5034 slave.cpp:2087] Handling status update > TASK_RUNNING (UUID: cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task 1 in > health state unhealthy of framework 20140717-043959-16842879-40280-5009-0000 > from executor(1)@127.0.1.1:59033 > I0717 04:40:04.134814 5034 status_update_manager.cpp:320] Received status > update TASK_RUNNING (UUID: cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task 1 > in health state unhealthy of framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:04.134857 5034 status_update_manager.cpp:373] Forwarding status > update TASK_RUNNING (UUID: cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task 1 > in health state unhealthy of framework > 20140717-043959-16842879-40280-5009-0000 to master@127.0.1.1:40280 > I0717 04:40:04.135036 5034 master.cpp:3120] Status update TASK_RUNNING > (UUID: cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task 1 in health state > unhealthy of framework 20140717-043959-16842879-40280-5009-0000 from slave > 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) > I0717 04:40:04.135073 5034 slave.cpp:2245] Status update manager > successfully handled status update TASK_RUNNING (UUID: > cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task 1 in health state unhealthy of > framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:04.135083 5034 slave.cpp:2251] Sending acknowledgement for > status update TASK_RUNNING (UUID: cc04d02c-753e-4fc8-b35a-61cc33b4d796) for > task 1 in health state unhealthy of framework > 20140717-043959-16842879-40280-5009-0000 to executor(1)@127.0.1.1:59033 > I0717 04:40:04.135267 5034 sched.cpp:637] Scheduler::statusUpdate took > 27539ns > I0717 04:40:04.135339 5034 master.cpp:2639] Forwarding status update > acknowledgement cc04d02c-753e-4fc8-b35a-61cc33b4d796 for task 1 of framework > 20140717-043959-16842879-40280-5009-0000 to slave > 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) > I0717 04:40:04.135452 5034 status_update_manager.cpp:398] Received status > update acknowledgement (UUID: cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task > 1 of framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:04.135503 5034 slave.cpp:1673] Status update manager > successfully handled status update acknowledgement (UUID: > cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task 1 of framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:04.135925 5038 process.cpp:1098] Socket closed while receiving > I0717 04:40:04.312355 5036 master.cpp:122] No whitelist given. Advertising > offers for all slaves > I0717 04:40:05.072487 5036 hierarchical_allocator_process.hpp:686] Performed > allocation for 1 slaves in 49651ns > I0717 04:40:05.084954 5036 slave.cpp:2087] Handling status update > TASK_RUNNING (UUID: 5ec41670-9320-4ecb-9767-6c020d771cd9) for task 1 in > health state unhealthy of framework 20140717-043959-16842879-40280-5009-0000 > from executor(1)@127.0.1.1:59033 > I0717 04:40:05.085073 5036 status_update_manager.cpp:320] Received status > update TASK_RUNNING (UUID: 5ec41670-9320-4ecb-9767-6c020d771cd9) for task 1 > in health state unhealthy of framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:05.085100 5036 status_update_manager.cpp:373] Forwarding status > update TASK_RUNNING (UUID: 5ec41670-9320-4ecb-9767-6c020d771cd9) for task 1 > in health state unhealthy of framework > 20140717-043959-16842879-40280-5009-0000 to master@127.0.1.1:40280 > I0717 04:40:05.085227 5036 master.cpp:3120] Status update TASK_RUNNING > (UUID: 5ec41670-9320-4ecb-9767-6c020d771cd9) for task 1 in health state > unhealthy of framework 20140717-043959-16842879-40280-5009-0000 from slave > 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) > I0717 04:40:05.085263 5036 slave.cpp:2245] Status update manager > successfully handled status update TASK_RUNNING (UUID: > 5ec41670-9320-4ecb-9767-6c020d771cd9) for task 1 in health state unhealthy of > framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:05.085273 5036 slave.cpp:2251] Sending acknowledgement for > status update TASK_RUNNING (UUID: 5ec41670-9320-4ecb-9767-6c020d771cd9) for > task 1 in health state unhealthy of framework > 20140717-043959-16842879-40280-5009-0000 to executor(1)@127.0.1.1:59033 > I0717 04:40:05.087087 5037 sched.cpp:637] Scheduler::statusUpdate took > 19205ns > I0717 04:40:05.087158 5030 master.cpp:2639] Forwarding status update > acknowledgement 5ec41670-9320-4ecb-9767-6c020d771cd9 for task 1 of framework > 20140717-043959-16842879-40280-5009-0000 to slave > 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) > I0717 04:40:05.087779 5038 process.cpp:1098] Socket closed while receiving > I0717 04:40:05.088122 5032 status_update_manager.cpp:398] Received status > update acknowledgement (UUID: 5ec41670-9320-4ecb-9767-6c020d771cd9) for task > 1 of framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:05.088302 5036 slave.cpp:1673] Status update manager > successfully handled status update acknowledgement (UUID: > 5ec41670-9320-4ecb-9767-6c020d771cd9) for task 1 of framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:06.082578 5031 hierarchical_allocator_process.hpp:686] Performed > allocation for 1 slaves in 71517ns > I0717 04:40:07.092687 5037 hierarchical_allocator_process.hpp:686] Performed > allocation for 1 slaves in 34535ns > 2014-07-17 > 04:40:07,399:5009(0x2ad17a97f700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:46988] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > I0717 04:40:08.102656 5035 hierarchical_allocator_process.hpp:686] Performed > allocation for 1 slaves in 70999ns > I0717 04:40:09.112658 5031 hierarchical_allocator_process.hpp:686] Performed > allocation for 1 slaves in 35887ns > I0717 04:40:09.322443 5037 master.cpp:122] No whitelist given. Advertising > offers for all slaves > 2014-07-17 > 04:40:11,494:5009(0x2ad17a97f700):ZOO_WARN@zookeeper_interest@1557: Exceeded > deadline by 764ms > 2014-07-17 > 04:40:11,495:5009(0x2ad17a97f700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:46988] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > I0717 04:40:11.499094 5037 hierarchical_allocator_process.hpp:686] Performed > allocation for 1 slaves in 46842ns > I0717 04:40:12.502545 5034 hierarchical_allocator_process.hpp:686] Performed > allocation for 1 slaves in 34124ns > I0717 04:40:13.512511 5032 hierarchical_allocator_process.hpp:686] Performed > allocation for 1 slaves in 33974ns > I0717 04:40:14.332535 5031 master.cpp:122] No whitelist given. Advertising > offers for all slaves > I0717 04:40:14.422514 5037 slave.cpp:2324] Received ping from > slave-observer(104)@127.0.1.1:40280 > I0717 04:40:14.522536 5037 hierarchical_allocator_process.hpp:686] Performed > allocation for 1 slaves in 46113ns > 2014-07-17 > 04:40:14,832:5009(0x2ad17a97f700):ZOO_ERROR@handle_socket_error_msg@1697: > Socket [127.0.0.1:46988] zk retcode=-4, errno=111(Connection refused): server > refused to accept the client > tests/health_check_tests.cpp:247: Failure > Failed to wait 10secs for status4 > tests/health_check_tests.cpp:222: Failure > Actual function call count doesn't match EXPECT_CALL(sched, > statusUpdate(&driver, _))... > Expected: to be called 6 times > Actual: called 4 times - unsatisfied and active > I0717 04:40:15.093816 5009 master.cpp:625] Master terminating > I0717 04:40:15.093973 5009 master.hpp:790] Removing task 1 with resources > cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave > 20140717-043959-16842879-40280-5009-0 (lucid) > W0717 04:40:15.094081 5009 master.cpp:4266] Removing task 1 of framework > 20140717-043959-16842879-40280-5009-0000 and slave > 20140717-043959-16842879-40280-5009-0 in non-terminal state TASK_RUNNING > I0717 04:40:15.097363 5036 slave.cpp:2331] master@127.0.1.1:40280 exited > W0717 04:40:15.097385 5036 slave.cpp:2334] Master disconnected! Waiting for > a new master to be elected > I0717 04:40:15.097913 5032 containerizer.cpp:903] Destroying container > '31868c4e-911b-4198-88b2-2ec5a7022adb' > I0717 04:40:15.146953 5038 process.cpp:1037] Socket closed while receiving > I0717 04:40:15.532558 5035 containerizer.cpp:1019] Executor for container > '31868c4e-911b-4198-88b2-2ec5a7022adb' has exited > I0717 04:40:15.532965 5035 slave.cpp:2527] Executor '1' of framework > 20140717-043959-16842879-40280-5009-0000 terminated with signal Killed > I0717 04:40:15.534054 5035 slave.cpp:2087] Handling status update > TASK_FAILED (UUID: 360ed575-2fac-4a2b-9a5f-794f703546bf) for task 1 of > framework 20140717-043959-16842879-40280-5009-0000 from @0.0.0.0:0 > I0717 04:40:15.534097 5035 slave.cpp:3769] Terminating task 1 > W0717 04:40:15.534314 5035 containerizer.cpp:809] Ignoring update for > unknown container: 31868c4e-911b-4198-88b2-2ec5a7022adb > I0717 04:40:15.534819 5030 status_update_manager.cpp:320] Received status > update TASK_FAILED (UUID: 360ed575-2fac-4a2b-9a5f-794f703546bf) for task 1 of > framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:15.534883 5030 status_update_manager.cpp:373] Forwarding status > update TASK_FAILED (UUID: 360ed575-2fac-4a2b-9a5f-794f703546bf) for task 1 of > framework 20140717-043959-16842879-40280-5009-0000 to master@127.0.1.1:40280 > I0717 04:40:15.535047 5030 slave.cpp:2245] Status update manager > successfully handled status update TASK_FAILED (UUID: > 360ed575-2fac-4a2b-9a5f-794f703546bf) for task 1 of framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:15.535524 5009 slave.cpp:484] Slave terminating > I0717 04:40:15.535552 5009 slave.cpp:1406] Asked to shut down framework > 20140717-043959-16842879-40280-5009-0000 by @0.0.0.0:0 > I0717 04:40:15.535562 5009 slave.cpp:1431] Shutting down framework > 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:15.535588 5009 slave.cpp:2661] Cleaning up executor '1' of > framework 20140717-043959-16842879-40280-5009-0000 > I0717 04:40:15.535840 5009 slave.cpp:2736] Cleaning up framework > 20140717-043959-16842879-40280-5009-0000 > [ FAILED ] HealthCheckTest.ConsecutiveFailures (16270 ms) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)