[jira] [Commented] (MESOS-1613) HealthCheckTest.ConsecutiveFailures is flaky

Benjamin Mahler (JIRA) Wed, 06 Aug 2014 15:55:49 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088425#comment-14088425
 ]


Benjamin Mahler commented on MESOS-1613:
----------------------------------------

So far only Twitter CI is exposing this flakiness. I've pasted the full logs in 
the comment above, are you looking for logging from the command executor? We 
may want to investigate wiring up the tests to expose them in the output to 
make this easier for you to debug.

> HealthCheckTest.ConsecutiveFailures is flaky
> --------------------------------------------
>
>                 Key: MESOS-1613
>                 URL: https://issues.apache.org/jira/browse/MESOS-1613
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.20.0
>         Environment: Ubuntu 10.04 GCC
>            Reporter: Vinod Kone
>            Assignee: Timothy Chen
>
> {code}
> [ RUN      ] HealthCheckTest.ConsecutiveFailures
> Using temporary directory '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV'
> I0717 04:39:59.288471  5009 leveldb.cpp:176] Opened db in 21.575631ms
> I0717 04:39:59.295274  5009 leveldb.cpp:183] Compacted db in 6.471982ms
> I0717 04:39:59.295552  5009 leveldb.cpp:198] Created db iterator in 16783ns
> I0717 04:39:59.296026  5009 leveldb.cpp:204] Seeked to beginning of db in 
> 2125ns
> I0717 04:39:59.296257  5009 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 10747ns
> I0717 04:39:59.296584  5009 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0717 04:39:59.297322  5033 recover.cpp:425] Starting replica recovery
> I0717 04:39:59.297413  5033 recover.cpp:451] Replica is in EMPTY status
> I0717 04:39:59.297824  5033 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0717 04:39:59.297899  5033 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0717 04:39:59.297997  5033 recover.cpp:542] Updating replica status to 
> STARTING
> I0717 04:39:59.301985  5031 master.cpp:288] Master 
> 20140717-043959-16842879-40280-5009 (lucid) started on 127.0.1.1:40280
> I0717 04:39:59.302026  5031 master.cpp:325] Master only allowing 
> authenticated frameworks to register
> I0717 04:39:59.302032  5031 master.cpp:330] Master only allowing 
> authenticated slaves to register
> I0717 04:39:59.302039  5031 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/HealthCheckTest_ConsecutiveFailures_AzK0OV/credentials'
> I0717 04:39:59.302283  5031 master.cpp:359] Authorization enabled
> I0717 04:39:59.302971  5031 hierarchical_allocator_process.hpp:301] 
> Initializing hierarchical allocator process with master : 
> master@127.0.1.1:40280
> I0717 04:39:59.303022  5031 master.cpp:122] No whitelist given. Advertising 
> offers for all slaves
> I0717 04:39:59.303390  5033 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 5.325097ms
> I0717 04:39:59.303419  5033 replica.cpp:320] Persisted replica status to 
> STARTING
> I0717 04:39:59.304076  5030 master.cpp:1128] The newly elected leader is 
> master@127.0.1.1:40280 with id 20140717-043959-16842879-40280-5009
> I0717 04:39:59.304095  5030 master.cpp:1141] Elected as the leading master!
> I0717 04:39:59.304102  5030 master.cpp:959] Recovering from registrar
> I0717 04:39:59.304182  5030 registrar.cpp:313] Recovering registrar
> I0717 04:39:59.304635  5033 recover.cpp:451] Replica is in STARTING status
> I0717 04:39:59.304962  5033 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0717 04:39:59.305026  5033 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0717 04:39:59.305130  5033 recover.cpp:542] Updating replica status to VOTING
> I0717 04:39:59.310416  5033 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 5.204157ms
> I0717 04:39:59.310459  5033 replica.cpp:320] Persisted replica status to 
> VOTING
> I0717 04:39:59.310534  5033 recover.cpp:556] Successfully joined the Paxos 
> group
> I0717 04:39:59.310607  5033 recover.cpp:440] Recover process terminated
> I0717 04:39:59.310773  5033 log.cpp:656] Attempting to start the writer
> I0717 04:39:59.311157  5033 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0717 04:39:59.313451  5033 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 2.271822ms
> I0717 04:39:59.313627  5033 replica.cpp:342] Persisted promised to 1
> I0717 04:39:59.318038  5031 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0717 04:39:59.318430  5031 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0717 04:39:59.323459  5031 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 5.004323ms
> I0717 04:39:59.323493  5031 replica.cpp:676] Persisted action at 0
> I0717 04:39:59.323799  5031 replica.cpp:508] Replica received write request 
> for position 0
> I0717 04:39:59.323837  5031 leveldb.cpp:438] Reading position from leveldb 
> took 21901ns
> I0717 04:39:59.329038  5031 leveldb.cpp:343] Persisting action (14 bytes) to 
> leveldb took 5.175998ms
> I0717 04:39:59.329244  5031 replica.cpp:676] Persisted action at 0
> I0717 04:39:59.332335  5037 replica.cpp:655] Replica received learned notice 
> for position 0
> I0717 04:39:59.337894  5037 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 5.532845ms
> I0717 04:39:59.337926  5037 replica.cpp:676] Persisted action at 0
> I0717 04:39:59.337934  5037 replica.cpp:661] Replica learned NOP action at 
> position 0
> I0717 04:39:59.338152  5037 log.cpp:672] Writer started with ending position 0
> I0717 04:39:59.338438  5037 leveldb.cpp:438] Reading position from leveldb 
> took 15499ns
> I0717 04:39:59.339648  5037 registrar.cpp:346] Successfully fetched the 
> registry (0B)
> I0717 04:39:59.339676  5037 registrar.cpp:422] Attempting to update the 
> 'registry'
> I0717 04:39:59.340885  5037 log.cpp:680] Attempting to append 117 bytes to 
> the log
> I0717 04:39:59.340929  5037 coordinator.cpp:340] Coordinator attempting to 
> write APPEND action at position 1
> I0717 04:39:59.341130  5037 replica.cpp:508] Replica received write request 
> for position 1
> I0717 04:39:59.343456  5037 leveldb.cpp:343] Persisting action (134 bytes) to 
> leveldb took 2.30603ms
> I0717 04:39:59.343638  5037 replica.cpp:676] Persisted action at 1
> I0717 04:39:59.352447  5034 replica.cpp:655] Replica received learned notice 
> for position 1
> I0717 04:39:59.365022  5034 leveldb.cpp:343] Persisting action (136 bytes) to 
> leveldb took 12.534333ms
> I0717 04:39:59.365072  5034 replica.cpp:676] Persisted action at 1
> I0717 04:39:59.365082  5034 replica.cpp:661] Replica learned APPEND action at 
> position 1
> I0717 04:39:59.365470  5034 registrar.cpp:479] Successfully updated 'registry'
> I0717 04:39:59.365520  5034 registrar.cpp:372] Successfully recovered 
> registrar
> I0717 04:39:59.365578  5034 log.cpp:699] Attempting to truncate the log to 1
> I0717 04:39:59.365656  5034 master.cpp:986] Recovered 0 slaves from the 
> Registry (81B) ; allowing 10mins for slaves to re-register
> I0717 04:39:59.365696  5034 coordinator.cpp:340] Coordinator attempting to 
> write TRUNCATE action at position 2
> I0717 04:39:59.366221  5030 replica.cpp:508] Replica received write request 
> for position 2
> I0717 04:39:59.374423  5030 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 8.172997ms
> I0717 04:39:59.374459  5030 replica.cpp:676] Persisted action at 2
> I0717 04:39:59.374671  5030 replica.cpp:655] Replica received learned notice 
> for position 2
> I0717 04:39:59.383867  5030 leveldb.cpp:343] Persisting action (18 bytes) to 
> leveldb took 9.168201ms
> I0717 04:39:59.383939  5030 leveldb.cpp:401] Deleting ~1 keys from leveldb 
> took 33989ns
> I0717 04:39:59.383952  5030 replica.cpp:676] Persisted action at 2
> I0717 04:39:59.383960  5030 replica.cpp:661] Replica learned TRUNCATE action 
> at position 2
> I0717 04:39:59.387097  5009 containerizer.cpp:124] Using isolation: 
> posix/cpu,posix/mem
> I0717 04:39:59.391551  5009 sched.cpp:139] Version: 0.20.0
> I0717 04:39:59.392025  5036 slave.cpp:168] Slave started on 
> 116)@127.0.1.1:40280
> I0717 04:39:59.392042  5036 credentials.hpp:84] Loading credential for 
> authentication from 
> '/tmp/HealthCheckTest_ConsecutiveFailures_sOluTu/credential'
> I0717 04:39:59.392148  5036 slave.cpp:266] Slave using credential for: 
> test-principal
> I0717 04:39:59.392262  5036 slave.cpp:279] Slave resources: cpus(*):2; 
> mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0717 04:39:59.392325  5036 slave.cpp:324] Slave hostname: lucid
> I0717 04:39:59.392336  5036 slave.cpp:325] Slave checkpoint: false
> I0717 04:39:59.392853  5036 state.cpp:33] Recovering state from 
> '/tmp/HealthCheckTest_ConsecutiveFailures_sOluTu/meta'
> I0717 04:39:59.393005  5036 sched.cpp:235] New master detected at 
> master@127.0.1.1:40280
> I0717 04:39:59.393023  5036 sched.cpp:285] Authenticating with master 
> master@127.0.1.1:40280
> I0717 04:39:59.393102  5036 status_update_manager.cpp:193] Recovering status 
> update manager
> I0717 04:39:59.393179  5036 authenticatee.hpp:128] Creating new client SASL 
> connection
> I0717 04:39:59.393326  5036 master.cpp:3512] Authenticating 
> scheduler-37c02322-fe0b-4cd3-b1b5-b5aab929c19c@127.0.1.1:40280
> I0717 04:39:59.393432  5036 containerizer.cpp:287] Recovering containerizer
> I0717 04:39:59.393569  5036 authenticator.hpp:156] Creating new server SASL 
> connection
> I0717 04:39:59.393748  5036 authenticatee.hpp:219] Received SASL 
> authentication mechanisms: CRAM-MD5
> I0717 04:39:59.393764  5036 authenticatee.hpp:245] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I0717 04:39:59.393842  5036 authenticator.hpp:262] Received SASL 
> authentication start
> I0717 04:39:59.393888  5036 authenticator.hpp:384] Authentication requires 
> more steps
> I0717 04:39:59.393934  5036 slave.cpp:3127] Finished recovery
> I0717 04:39:59.394132  5036 authenticatee.hpp:265] Received SASL 
> authentication step
> I0717 04:39:59.394208  5036 authenticator.hpp:290] Received SASL 
> authentication step
> I0717 04:39:59.394234  5036 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' 
> SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I0717 04:39:59.394243  5036 auxprop.cpp:153] Looking up auxiliary property 
> '*userPassword'
> I0717 04:39:59.394255  5036 auxprop.cpp:153] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> I0717 04:39:59.394268  5036 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' 
> SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I0717 04:39:59.394274  5036 auxprop.cpp:103] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I0717 04:39:59.394279  5036 auxprop.cpp:103] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I0717 04:39:59.394290  5036 authenticator.hpp:376] Authentication success
> I0717 04:39:59.394330  5036 slave.cpp:600] New master detected at 
> master@127.0.1.1:40280
> I0717 04:39:59.394348  5036 slave.cpp:676] Authenticating with master 
> master@127.0.1.1:40280
> I0717 04:39:59.394392  5036 slave.cpp:649] Detecting new master
> I0717 04:39:59.394434  5036 authenticatee.hpp:305] Authentication success
> I0717 04:39:59.394461  5036 master.cpp:3552] Successfully authenticated 
> principal 'test-principal' at 
> scheduler-37c02322-fe0b-4cd3-b1b5-b5aab929c19c@127.0.1.1:40280
> I0717 04:39:59.394516  5036 status_update_manager.cpp:167] New master 
> detected at master@127.0.1.1:40280
> I0717 04:39:59.394542  5036 authenticatee.hpp:128] Creating new client SASL 
> connection
> I0717 04:39:59.394605  5036 sched.cpp:359] Successfully authenticated with 
> master master@127.0.1.1:40280
> I0717 04:39:59.394618  5036 sched.cpp:478] Sending registration request to 
> master@127.0.1.1:40280
> I0717 04:39:59.394666  5036 master.cpp:3512] Authenticating 
> slave(116)@127.0.1.1:40280
> I0717 04:39:59.394728  5036 master.cpp:1247] Received registration request 
> from scheduler-37c02322-fe0b-4cd3-b1b5-b5aab929c19c@127.0.1.1:40280
> I0717 04:39:59.394757  5036 master.cpp:1207] Authorizing framework principal 
> 'test-principal' to receive offers for role '*'
> I0717 04:39:59.394816  5036 authenticator.hpp:156] Creating new server SASL 
> connection
> I0717 04:39:59.394917  5036 authenticatee.hpp:219] Received SASL 
> authentication mechanisms: CRAM-MD5
> I0717 04:39:59.394932  5036 authenticatee.hpp:245] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I0717 04:39:59.394989  5036 master.cpp:1306] Registering framework 
> 20140717-043959-16842879-40280-5009-0000 at 
> scheduler-37c02322-fe0b-4cd3-b1b5-b5aab929c19c@127.0.1.1:40280
> I0717 04:39:59.395094  5036 authenticator.hpp:262] Received SASL 
> authentication start
> I0717 04:39:59.395126  5036 authenticator.hpp:384] Authentication requires 
> more steps
> I0717 04:39:59.395166  5036 sched.cpp:409] Framework registered with 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:39:59.395187  5036 sched.cpp:423] Scheduler::registered took 9209ns
> I0717 04:39:59.395221  5036 hierarchical_allocator_process.hpp:331] Added 
> framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:39:59.395228  5036 hierarchical_allocator_process.hpp:724] No 
> resources available to allocate!
> I0717 04:39:59.395234  5036 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 0 slaves in 5941ns
> I0717 04:39:59.395278  5036 authenticatee.hpp:265] Received SASL 
> authentication step
> I0717 04:39:59.395310  5036 authenticator.hpp:290] Received SASL 
> authentication step
> I0717 04:39:59.395323  5036 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' 
> SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I0717 04:39:59.395329  5036 auxprop.cpp:153] Looking up auxiliary property 
> '*userPassword'
> I0717 04:39:59.395336  5036 auxprop.cpp:153] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> I0717 04:39:59.395344  5036 auxprop.cpp:81] Request to lookup properties for 
> user: 'test-principal' realm: 'lucid' server FQDN: 'lucid' 
> SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I0717 04:39:59.395349  5036 auxprop.cpp:103] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I0717 04:39:59.395354  5036 auxprop.cpp:103] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I0717 04:39:59.395362  5036 authenticator.hpp:376] Authentication success
> I0717 04:39:59.395385  5036 authenticatee.hpp:305] Authentication success
> I0717 04:39:59.395402  5036 master.cpp:3552] Successfully authenticated 
> principal 'test-principal' at slave(116)@127.0.1.1:40280
> I0717 04:39:59.395457  5036 slave.cpp:733] Successfully authenticated with 
> master master@127.0.1.1:40280
> I0717 04:39:59.395503  5036 slave.cpp:971] Will retry registration in 
> 15.135662ms if necessary
> I0717 04:39:59.395601  5037 master.cpp:2789] Registering slave at 
> slave(116)@127.0.1.1:40280 (lucid) with id 
> 20140717-043959-16842879-40280-5009-0
> I0717 04:39:59.395699  5037 registrar.cpp:422] Attempting to update the 
> 'registry'
> I0717 04:39:59.396891  5031 log.cpp:680] Attempting to append 287 bytes to 
> the log
> I0717 04:39:59.396939  5031 coordinator.cpp:340] Coordinator attempting to 
> write APPEND action at position 3
> I0717 04:39:59.397140  5031 replica.cpp:508] Replica received write request 
> for position 3
> I0717 04:39:59.405050  5031 leveldb.cpp:343] Persisting action (306 bytes) to 
> leveldb took 7.878137ms
> I0717 04:39:59.405097  5031 replica.cpp:676] Persisted action at 3
> I0717 04:39:59.405342  5031 replica.cpp:655] Replica received learned notice 
> for position 3
> I0717 04:39:59.412286  5037 slave.cpp:971] Will retry registration in 
> 9.096666ms if necessary
> I0717 04:39:59.412365  5037 master.cpp:2777] Ignoring register slave message 
> from slave(116)@127.0.1.1:40280 (lucid) as admission is already in progress
> I0717 04:39:59.414619  5031 leveldb.cpp:343] Persisting action (308 bytes) to 
> leveldb took 9.256058ms
> I0717 04:39:59.414646  5031 replica.cpp:676] Persisted action at 3
> I0717 04:39:59.414654  5031 replica.cpp:661] Replica learned APPEND action at 
> position 3
> I0717 04:39:59.415129  5031 registrar.cpp:479] Successfully updated 'registry'
> I0717 04:39:59.415212  5031 log.cpp:699] Attempting to truncate the log to 3
> I0717 04:39:59.415269  5031 master.cpp:2829] Registered slave 
> 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid)
> I0717 04:39:59.415293  5031 master.cpp:3980] Adding slave 
> 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) 
> with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0717 04:39:59.415415  5031 coordinator.cpp:340] Coordinator attempting to 
> write TRUNCATE action at position 4
> I0717 04:39:59.415485  5031 slave.cpp:767] Registered with master 
> master@127.0.1.1:40280; given slave ID 20140717-043959-16842879-40280-5009-0
> I0717 04:39:59.415550  5031 hierarchical_allocator_process.hpp:444] Added 
> slave 20140717-043959-16842879-40280-5009-0 (lucid) with cpus(*):2; 
> mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; 
> mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
> I0717 04:39:59.415602  5031 hierarchical_allocator_process.hpp:750] Offering 
> cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 
> 20140717-043959-16842879-40280-5009-0 to framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:39:59.415699  5031 hierarchical_allocator_process.hpp:706] Performed 
> allocation for slave 20140717-043959-16842879-40280-5009-0 in 119984ns
> I0717 04:39:59.415757  5031 slave.cpp:2324] Received ping from 
> slave-observer(104)@127.0.1.1:40280
> I0717 04:39:59.415824  5031 master.hpp:800] Adding offer 
> 20140717-043959-16842879-40280-5009-0 with resources cpus(*):2; mem(*):1024; 
> disk(*):1024; ports(*):[31000-32000] on slave 
> 20140717-043959-16842879-40280-5009-0 (lucid)
> I0717 04:39:59.415864  5031 master.cpp:3459] Sending 1 offers to framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:39:59.416179  5033 replica.cpp:508] Replica received write request 
> for position 4
> I0717 04:39:59.416697  5031 sched.cpp:546] Scheduler::resourceOffers took 
> 34644ns
> I0717 04:39:59.417361  5037 master.hpp:810] Removing offer 
> 20140717-043959-16842879-40280-5009-0 with resources cpus(*):2; mem(*):1024; 
> disk(*):1024; ports(*):[31000-32000] on slave 
> 20140717-043959-16842879-40280-5009-0 (lucid)
> I0717 04:39:59.417414  5037 master.cpp:2133] Processing reply for offers: [ 
> 20140717-043959-16842879-40280-5009-0 ] on slave 
> 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid) 
> for framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:39:59.417453  5037 master.cpp:2219] Authorizing framework principal 
> 'test-principal' to launch task 1 as user 'jenkins'
> I0717 04:39:59.417733  5037 master.hpp:772] Adding task 1 with resources 
> cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 
> 20140717-043959-16842879-40280-5009-0 (lucid)
> I0717 04:39:59.417768  5037 master.cpp:2285] Launching task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000 with resources cpus(*):2; 
> mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 
> 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid)
> I0717 04:39:59.417913  5037 slave.cpp:1002] Got assigned task 1 for framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:39:59.418223  5037 slave.cpp:1112] Launching task 1 for framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:39:59.419910  5037 slave.cpp:1222] Queuing task '1' for executor 1 
> of framework '20140717-043959-16842879-40280-5009-0000
> I0717 04:39:59.420047  5037 containerizer.cpp:427] Starting container 
> '31868c4e-911b-4198-88b2-2ec5a7022adb' for executor '1' of framework 
> '20140717-043959-16842879-40280-5009-0000'
> I0717 04:39:59.420281  5037 slave.cpp:561] Successfully attached file 
> '/tmp/HealthCheckTest_ConsecutiveFailures_sOluTu/slaves/20140717-043959-16842879-40280-5009-0/frameworks/20140717-043959-16842879-40280-5009-0000/executors/1/runs/31868c4e-911b-4198-88b2-2ec5a7022adb'
> I0717 04:39:59.424384  5033 leveldb.cpp:343] Persisting action (16 bytes) to 
> leveldb took 8.154005ms
> I0717 04:39:59.424438  5033 replica.cpp:676] Persisted action at 4
> I0717 04:39:59.424990  5035 replica.cpp:655] Replica received learned notice 
> for position 4
> I0717 04:39:59.425606  5037 launcher.cpp:137] Forked child with pid '5746' 
> for container '31868c4e-911b-4198-88b2-2ec5a7022adb'
> I0717 04:39:59.428802  5037 containerizer.cpp:537] Fetching URIs for 
> container '31868c4e-911b-4198-88b2-2ec5a7022adb' using command 
> '/var/jenkins/workspace/mesos-ubuntu-10.04-gcc/src/mesos-fetcher'
> I0717 04:39:59.434170  5035 leveldb.cpp:343] Persisting action (18 bytes) to 
> leveldb took 9.084582ms
> I0717 04:39:59.434267  5035 leveldb.cpp:401] Deleting ~2 keys from leveldb 
> took 45795ns
> I0717 04:39:59.434283  5035 replica.cpp:676] Persisted action at 4
> I0717 04:39:59.434299  5035 replica.cpp:661] Replica learned TRUNCATE action 
> at position 4
> 2014-07-17 
> 04:40:00,248:5009(0x2ad17a97f700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:46988] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> I0717 04:40:00.276789  5035 slave.cpp:2469] Monitoring executor '1' of 
> framework '20140717-043959-16842879-40280-5009-0000' in container 
> '31868c4e-911b-4198-88b2-2ec5a7022adb'
> I0717 04:40:00.312453  5035 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 1 slaves in 43185ns
> I0717 04:40:00.318514  5035 slave.cpp:1733] Got registration for executor '1' 
> of framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:00.319555  5035 slave.cpp:1852] Flushing queued task 1 for 
> executor '1' of framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:00.320169  5038 process.cpp:1098] Socket closed while receiving
> I0717 04:40:00.325037  5035 slave.cpp:2087] Handling status update 
> TASK_RUNNING (UUID: a769a032-5aa7-41fb-9830-d6745df0e18e) for task 1 of 
> framework 20140717-043959-16842879-40280-5009-0000 from 
> executor(1)@127.0.1.1:59033
> I0717 04:40:00.325140  5035 status_update_manager.cpp:320] Received status 
> update TASK_RUNNING (UUID: a769a032-5aa7-41fb-9830-d6745df0e18e) for task 1 
> of framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:00.325155  5035 status_update_manager.cpp:499] Creating 
> StatusUpdate stream for task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:00.325295  5035 status_update_manager.cpp:373] Forwarding status 
> update TASK_RUNNING (UUID: a769a032-5aa7-41fb-9830-d6745df0e18e) for task 1 
> of framework 20140717-043959-16842879-40280-5009-0000 to 
> master@127.0.1.1:40280
> I0717 04:40:00.325455  5035 master.cpp:3120] Status update TASK_RUNNING 
> (UUID: a769a032-5aa7-41fb-9830-d6745df0e18e) for task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000 from slave 
> 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid)
> I0717 04:40:00.325495  5035 slave.cpp:2245] Status update manager 
> successfully handled status update TASK_RUNNING (UUID: 
> a769a032-5aa7-41fb-9830-d6745df0e18e) for task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:00.325506  5035 slave.cpp:2251] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: a769a032-5aa7-41fb-9830-d6745df0e18e) for 
> task 1 of framework 20140717-043959-16842879-40280-5009-0000 to 
> executor(1)@127.0.1.1:59033
> I0717 04:40:00.325717  5035 sched.cpp:637] Scheduler::statusUpdate took 
> 31980ns
> I0717 04:40:00.325794  5035 master.cpp:2639] Forwarding status update 
> acknowledgement a769a032-5aa7-41fb-9830-d6745df0e18e for task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000 to slave 
> 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid)
> I0717 04:40:00.325899  5035 status_update_manager.cpp:398] Received status 
> update acknowledgement (UUID: a769a032-5aa7-41fb-9830-d6745df0e18e) for task 
> 1 of framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:00.325950  5035 slave.cpp:1673] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> a769a032-5aa7-41fb-9830-d6745df0e18e) for task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:00.326431  5038 process.cpp:1098] Socket closed while receiving
> I0717 04:40:01.322517  5031 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 1 slaves in 42022ns
> I0717 04:40:01.376111  5031 slave.cpp:2087] Handling status update 
> TASK_RUNNING (UUID: 8e06aa77-eb92-486e-a20f-0d67752a7754) for task 1 in 
> health state unhealthy of framework 20140717-043959-16842879-40280-5009-0000 
> from executor(1)@127.0.1.1:59033
> I0717 04:40:01.376238  5031 status_update_manager.cpp:320] Received status 
> update TASK_RUNNING (UUID: 8e06aa77-eb92-486e-a20f-0d67752a7754) for task 1 
> in health state unhealthy of framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:01.376266  5031 status_update_manager.cpp:373] Forwarding status 
> update TASK_RUNNING (UUID: 8e06aa77-eb92-486e-a20f-0d67752a7754) for task 1 
> in health state unhealthy of framework 
> 20140717-043959-16842879-40280-5009-0000 to master@127.0.1.1:40280
> I0717 04:40:01.376397  5031 master.cpp:3120] Status update TASK_RUNNING 
> (UUID: 8e06aa77-eb92-486e-a20f-0d67752a7754) for task 1 in health state 
> unhealthy of framework 20140717-043959-16842879-40280-5009-0000 from slave 
> 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid)
> I0717 04:40:01.376454  5031 slave.cpp:2245] Status update manager 
> successfully handled status update TASK_RUNNING (UUID: 
> 8e06aa77-eb92-486e-a20f-0d67752a7754) for task 1 in health state unhealthy of 
> framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:01.376466  5031 slave.cpp:2251] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: 8e06aa77-eb92-486e-a20f-0d67752a7754) for 
> task 1 in health state unhealthy of framework 
> 20140717-043959-16842879-40280-5009-0000 to executor(1)@127.0.1.1:59033
> I0717 04:40:01.376648  5031 sched.cpp:637] Scheduler::statusUpdate took 
> 18560ns
> I0717 04:40:01.376713  5031 master.cpp:2639] Forwarding status update 
> acknowledgement 8e06aa77-eb92-486e-a20f-0d67752a7754 for task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000 to slave 
> 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid)
> I0717 04:40:01.376857  5035 status_update_manager.cpp:398] Received status 
> update acknowledgement (UUID: 8e06aa77-eb92-486e-a20f-0d67752a7754) for task 
> 1 of framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:01.376911  5035 slave.cpp:1673] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> 8e06aa77-eb92-486e-a20f-0d67752a7754) for task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:01.377177  5038 process.cpp:1098] Socket closed while receiving
> 2014-07-17 
> 04:40:04,062:5009(0x2ad17a97f700):ZOO_WARN@zookeeper_interest@1557: Exceeded 
> deadline by 483ms
> 2014-07-17 
> 04:40:04,062:5009(0x2ad17a97f700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:46988] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> I0717 04:40:04.063637  5030 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 1 slaves in 51405ns
> I0717 04:40:04.134610  5034 slave.cpp:2087] Handling status update 
> TASK_RUNNING (UUID: cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task 1 in 
> health state unhealthy of framework 20140717-043959-16842879-40280-5009-0000 
> from executor(1)@127.0.1.1:59033
> I0717 04:40:04.134814  5034 status_update_manager.cpp:320] Received status 
> update TASK_RUNNING (UUID: cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task 1 
> in health state unhealthy of framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:04.134857  5034 status_update_manager.cpp:373] Forwarding status 
> update TASK_RUNNING (UUID: cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task 1 
> in health state unhealthy of framework 
> 20140717-043959-16842879-40280-5009-0000 to master@127.0.1.1:40280
> I0717 04:40:04.135036  5034 master.cpp:3120] Status update TASK_RUNNING 
> (UUID: cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task 1 in health state 
> unhealthy of framework 20140717-043959-16842879-40280-5009-0000 from slave 
> 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid)
> I0717 04:40:04.135073  5034 slave.cpp:2245] Status update manager 
> successfully handled status update TASK_RUNNING (UUID: 
> cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task 1 in health state unhealthy of 
> framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:04.135083  5034 slave.cpp:2251] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: cc04d02c-753e-4fc8-b35a-61cc33b4d796) for 
> task 1 in health state unhealthy of framework 
> 20140717-043959-16842879-40280-5009-0000 to executor(1)@127.0.1.1:59033
> I0717 04:40:04.135267  5034 sched.cpp:637] Scheduler::statusUpdate took 
> 27539ns
> I0717 04:40:04.135339  5034 master.cpp:2639] Forwarding status update 
> acknowledgement cc04d02c-753e-4fc8-b35a-61cc33b4d796 for task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000 to slave 
> 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid)
> I0717 04:40:04.135452  5034 status_update_manager.cpp:398] Received status 
> update acknowledgement (UUID: cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task 
> 1 of framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:04.135503  5034 slave.cpp:1673] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> cc04d02c-753e-4fc8-b35a-61cc33b4d796) for task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:04.135925  5038 process.cpp:1098] Socket closed while receiving
> I0717 04:40:04.312355  5036 master.cpp:122] No whitelist given. Advertising 
> offers for all slaves
> I0717 04:40:05.072487  5036 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 1 slaves in 49651ns
> I0717 04:40:05.084954  5036 slave.cpp:2087] Handling status update 
> TASK_RUNNING (UUID: 5ec41670-9320-4ecb-9767-6c020d771cd9) for task 1 in 
> health state unhealthy of framework 20140717-043959-16842879-40280-5009-0000 
> from executor(1)@127.0.1.1:59033
> I0717 04:40:05.085073  5036 status_update_manager.cpp:320] Received status 
> update TASK_RUNNING (UUID: 5ec41670-9320-4ecb-9767-6c020d771cd9) for task 1 
> in health state unhealthy of framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:05.085100  5036 status_update_manager.cpp:373] Forwarding status 
> update TASK_RUNNING (UUID: 5ec41670-9320-4ecb-9767-6c020d771cd9) for task 1 
> in health state unhealthy of framework 
> 20140717-043959-16842879-40280-5009-0000 to master@127.0.1.1:40280
> I0717 04:40:05.085227  5036 master.cpp:3120] Status update TASK_RUNNING 
> (UUID: 5ec41670-9320-4ecb-9767-6c020d771cd9) for task 1 in health state 
> unhealthy of framework 20140717-043959-16842879-40280-5009-0000 from slave 
> 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid)
> I0717 04:40:05.085263  5036 slave.cpp:2245] Status update manager 
> successfully handled status update TASK_RUNNING (UUID: 
> 5ec41670-9320-4ecb-9767-6c020d771cd9) for task 1 in health state unhealthy of 
> framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:05.085273  5036 slave.cpp:2251] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: 5ec41670-9320-4ecb-9767-6c020d771cd9) for 
> task 1 in health state unhealthy of framework 
> 20140717-043959-16842879-40280-5009-0000 to executor(1)@127.0.1.1:59033
> I0717 04:40:05.087087  5037 sched.cpp:637] Scheduler::statusUpdate took 
> 19205ns
> I0717 04:40:05.087158  5030 master.cpp:2639] Forwarding status update 
> acknowledgement 5ec41670-9320-4ecb-9767-6c020d771cd9 for task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000 to slave 
> 20140717-043959-16842879-40280-5009-0 at slave(116)@127.0.1.1:40280 (lucid)
> I0717 04:40:05.087779  5038 process.cpp:1098] Socket closed while receiving
> I0717 04:40:05.088122  5032 status_update_manager.cpp:398] Received status 
> update acknowledgement (UUID: 5ec41670-9320-4ecb-9767-6c020d771cd9) for task 
> 1 of framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:05.088302  5036 slave.cpp:1673] Status update manager 
> successfully handled status update acknowledgement (UUID: 
> 5ec41670-9320-4ecb-9767-6c020d771cd9) for task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:06.082578  5031 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 1 slaves in 71517ns
> I0717 04:40:07.092687  5037 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 1 slaves in 34535ns
> 2014-07-17 
> 04:40:07,399:5009(0x2ad17a97f700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:46988] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> I0717 04:40:08.102656  5035 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 1 slaves in 70999ns
> I0717 04:40:09.112658  5031 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 1 slaves in 35887ns
> I0717 04:40:09.322443  5037 master.cpp:122] No whitelist given. Advertising 
> offers for all slaves
> 2014-07-17 
> 04:40:11,494:5009(0x2ad17a97f700):ZOO_WARN@zookeeper_interest@1557: Exceeded 
> deadline by 764ms
> 2014-07-17 
> 04:40:11,495:5009(0x2ad17a97f700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:46988] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> I0717 04:40:11.499094  5037 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 1 slaves in 46842ns
> I0717 04:40:12.502545  5034 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 1 slaves in 34124ns
> I0717 04:40:13.512511  5032 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 1 slaves in 33974ns
> I0717 04:40:14.332535  5031 master.cpp:122] No whitelist given. Advertising 
> offers for all slaves
> I0717 04:40:14.422514  5037 slave.cpp:2324] Received ping from 
> slave-observer(104)@127.0.1.1:40280
> I0717 04:40:14.522536  5037 hierarchical_allocator_process.hpp:686] Performed 
> allocation for 1 slaves in 46113ns
> 2014-07-17 
> 04:40:14,832:5009(0x2ad17a97f700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:46988] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> tests/health_check_tests.cpp:247: Failure
> Failed to wait 10secs for status4
> tests/health_check_tests.cpp:222: Failure
> Actual function call count doesn't match EXPECT_CALL(sched, 
> statusUpdate(&driver, _))...
>          Expected: to be called 6 times
>            Actual: called 4 times - unsatisfied and active
> I0717 04:40:15.093816  5009 master.cpp:625] Master terminating
> I0717 04:40:15.093973  5009 master.hpp:790] Removing task 1 with resources 
> cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 
> 20140717-043959-16842879-40280-5009-0 (lucid)
> W0717 04:40:15.094081  5009 master.cpp:4266] Removing task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000 and slave 
> 20140717-043959-16842879-40280-5009-0 in non-terminal state TASK_RUNNING
> I0717 04:40:15.097363  5036 slave.cpp:2331] master@127.0.1.1:40280 exited
> W0717 04:40:15.097385  5036 slave.cpp:2334] Master disconnected! Waiting for 
> a new master to be elected
> I0717 04:40:15.097913  5032 containerizer.cpp:903] Destroying container 
> '31868c4e-911b-4198-88b2-2ec5a7022adb'
> I0717 04:40:15.146953  5038 process.cpp:1037] Socket closed while receiving
> I0717 04:40:15.532558  5035 containerizer.cpp:1019] Executor for container 
> '31868c4e-911b-4198-88b2-2ec5a7022adb' has exited
> I0717 04:40:15.532965  5035 slave.cpp:2527] Executor '1' of framework 
> 20140717-043959-16842879-40280-5009-0000 terminated with signal Killed
> I0717 04:40:15.534054  5035 slave.cpp:2087] Handling status update 
> TASK_FAILED (UUID: 360ed575-2fac-4a2b-9a5f-794f703546bf) for task 1 of 
> framework 20140717-043959-16842879-40280-5009-0000 from @0.0.0.0:0
> I0717 04:40:15.534097  5035 slave.cpp:3769] Terminating task 1
> W0717 04:40:15.534314  5035 containerizer.cpp:809] Ignoring update for 
> unknown container: 31868c4e-911b-4198-88b2-2ec5a7022adb
> I0717 04:40:15.534819  5030 status_update_manager.cpp:320] Received status 
> update TASK_FAILED (UUID: 360ed575-2fac-4a2b-9a5f-794f703546bf) for task 1 of 
> framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:15.534883  5030 status_update_manager.cpp:373] Forwarding status 
> update TASK_FAILED (UUID: 360ed575-2fac-4a2b-9a5f-794f703546bf) for task 1 of 
> framework 20140717-043959-16842879-40280-5009-0000 to master@127.0.1.1:40280
> I0717 04:40:15.535047  5030 slave.cpp:2245] Status update manager 
> successfully handled status update TASK_FAILED (UUID: 
> 360ed575-2fac-4a2b-9a5f-794f703546bf) for task 1 of framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:15.535524  5009 slave.cpp:484] Slave terminating
> I0717 04:40:15.535552  5009 slave.cpp:1406] Asked to shut down framework 
> 20140717-043959-16842879-40280-5009-0000 by @0.0.0.0:0
> I0717 04:40:15.535562  5009 slave.cpp:1431] Shutting down framework 
> 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:15.535588  5009 slave.cpp:2661] Cleaning up executor '1' of 
> framework 20140717-043959-16842879-40280-5009-0000
> I0717 04:40:15.535840  5009 slave.cpp:2736] Cleaning up framework 
> 20140717-043959-16842879-40280-5009-0000
> [  FAILED  ] HealthCheckTest.ConsecutiveFailures (16270 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MESOS-1613) HealthCheckTest.ConsecutiveFailures is flaky

Reply via email to