[ 
https://issues.apache.org/jira/browse/MESOS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983676#comment-13983676
 ] 

Jie Yu commented on MESOS-1257:
-------------------------------

{noformat}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ExternalContainerizerTest
[ RUN      ] ExternalContainerizerTest.Launch
I0428 22:23:00.340616 38392 leveldb.cpp:174] Opened db in 104.029492ms
I0428 22:23:00.366174 38392 leveldb.cpp:181] Compacted db in 25.506759ms
I0428 22:23:00.366230 38392 leveldb.cpp:196] Created db iterator in 10125ns
I0428 22:23:00.366256 38392 leveldb.cpp:202] Seeked to beginning of db in 957ns
I0428 22:23:00.366276 38392 leveldb.cpp:271] Iterated through 0 keys in the db 
in 2632ns
I0428 22:23:00.366341 38392 replica.cpp:729] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0428 22:23:00.367028 38406 recover.cpp:425] Starting replica recovery
I0428 22:23:00.368113 38416 recover.cpp:451] Replica is in EMPTY status
I0428 22:23:00.368957 38415 master.cpp:266] Master 
20140428-222300-2081170186-56899-38392 (smfd-aki-27-sr1.devel.twitter.com) 
started on 10.35.12.124:56899
I0428 22:23:00.369123 38415 master.cpp:303] Master only allowing authenticated 
frameworks to register
I0428 22:23:00.369155 38415 master.cpp:308] Master only allowing authenticated 
slaves to register
I0428 22:23:00.369169 38415 credentials.hpp:35] Loading credentials for 
authentication
W0428 22:23:00.369273 38415 credentials.hpp:48] Failed to stat credentials file 
'file:///tmp/ExternalContainerizerTest_Launch_Y3fWIS/credentials': No such file 
or directory
I0428 22:23:00.369925 38417 replica.cpp:626] Replica in EMPTY status received a 
broadcasted recover request
I0428 22:23:00.370126 38411 recover.cpp:188] Received a recover response from a 
replica in EMPTY status
I0428 22:23:00.370458 38411 recover.cpp:542] Updating replica status to STARTING
I0428 22:23:00.371877 38415 master.cpp:922] The newly elected leader is 
[email protected]:56899 with id 
20140428-222300-2081170186-56899-38392
I0428 22:23:00.371901 38415 master.cpp:932] Elected as the leading master!
I0428 22:23:00.371908 38415 master.cpp:753] Recovering from registrar
I0428 22:23:00.371978 38408 registrar.cpp:275] Recovering registrarI0428 
22:23:00.407814 38406 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb 
took 37.170669ms
I0428 22:23:00.407845 38406 replica.cpp:320] Persisted replica status to 
STARTINGI0428 22:23:00.407955 38421 recover.cpp:451] Replica is in STARTING 
status
I0428 22:23:00.408587 38413 replica.cpp:626] Replica in STARTING status 
received a broadcasted recover req
uest
I0428 22:23:00.408690 38420 recover.cpp:188] Received a recover response from a 
replica in STARTING status
I0428 22:23:00.408895 38409 recover.cpp:542] Updating replica status to 
VOTINGI0428 22:23:00.449610 38420 leveldb.cpp:304] Persisting metadata (8 
bytes) to leveldb took 40.610437ms
I0428 22:23:00.449666 38420 replica.cpp:320] Persisted replica status to VOTING
I0428 22:23:00.449789 38420 recover.cpp:556] Successfully joined the Paxos 
groupI0428 22:23:00.450228 38418 log.cpp:656] Attempting to start the writer
I0428 22:23:00.450232 38420 recover.cpp:440] Recover process terminated
I0428 22:23:00.451287 38408 replica.cpp:474] Replica received implicit promise 
request with proposal 1
I0428 22:23:00.499662 38408 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 48.347021msI0428 22:23:00.499693 38408 replica.cpp:342] Persisted 
promised to 1
I0428 22:23:00.500437 38411 coordinator.cpp:229] Coordinator attemping to fill 
missing position
I0428 22:23:00.501473 38418 replica.cpp:375] Replica received explicit promise 
request for position 0 with
 proposal 2
I0428 22:23:00.533135 38418 leveldb.cpp:341] Persisting action (8 bytes) to 
leveldb took 31.637459msI0428 22:23:00.533172 38418 replica.cpp:664] Persisted 
action at 0
I0428 22:23:00.534690 38413 replica.cpp:508] Replica received write request for 
position 0
I0428 22:23:00.534750 38413 leveldb.cpp:436] Reading position from leveldb took 
32447ns
I0428 22:23:00.566571 38413 leveldb.cpp:341] Persisting action (14 bytes) to 
leveldb took 31.790409ms
I0428 22:23:00.566607 38413 replica.cpp:664] Persisted action at 0
I0428 22:23:00.566897 38408 replica.cpp:643] Replica received learned notice 
for position 0
I0428 22:23:00.599982 38408 leveldb.cpp:341] Persisting action (16 bytes) to 
leveldb took 33.063264ms
I0428 22:23:00.600013 38408 replica.cpp:664] Persisted action at 0
I0428 22:23:00.600023 38408 replica.cpp:649] Replica learned NOP action at 
position 0
I0428 22:23:00.600342 38408 log.cpp:672] Writer started with ending position 0
I0428 22:23:00.600965 38417 leveldb.cpp:436] Reading position from leveldb took 
14091ns
I0428 22:23:00.602330 38408 registrar.cpp:308] Successfully recovered registrar
I0428 22:23:00.602370 38408 registrar.cpp:379] Attempting to update the 
'registry'
I0428 22:23:00.603744 38409 log.cpp:680] Attempting to append 154 bytes to the 
log
I0428 22:23:00.603799 38409 coordinator.cpp:339] Coordinator attempting to 
write APPEND action at position
 1
I0428 22:23:00.604338 38421 replica.cpp:508] Replica received write request for 
position 1
I0428 22:23:00.633527 38421 leveldb.cpp:341] Persisting action (173 bytes) to 
leveldb took 29.15323ms
I0428 22:23:00.633633 38421 replica.cpp:664] Persisted action at 1
I0428 22:23:00.667116 38421 replica.cpp:649] Replica learned APPEND action at 
position 1
I0428 22:23:00.667664 38419 registrar.cpp:427] Successfully updated 'registry'
I0428 22:23:00.667712 38414 log.cpp:699] Attempting to truncate the log to 1
I0428 22:23:00.667996 38412 coordinator.cpp:339] Coordinator attempting to 
write TRUNCATE action at position 2
I0428 22:23:00.668097 38406 master.cpp:780] Recovered 0 slaves from the 
Registry (116B) ; allowing 10mins for slaves to re-register
I0428 22:23:00.668689 38407 replica.cpp:508] Replica received write request for 
position 2
I0428 22:23:00.670699 38406 slave.cpp:140] Slave started on 
1)@10.35.12.124:56899
I0428 22:23:00.670717 38406 credentials.hpp:35] Loading credentials for 
authentication
W0428 22:23:00.670778 38406 credentials.hpp:48] Failed to stat credentials file 
'file:///tmp/ExternalContainerizerTest_Launch_owsfFk/credential': No such file 
or directory
I0428 22:23:00.670802 38406 slave.cpp:231] Slave using credential for: 
test-principal
I0428 22:23:00.670929 38406 slave.cpp:244] Slave resources: cpus(*):2; 
mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0428 22:23:00.671417 38406 slave.cpp:272] Slave hostname: 
smfd-aki-27-sr1.devel.twitter.com
I0428 22:23:00.671435 38406 slave.cpp:273] Slave checkpoint: false
I0428 22:23:00.671880 38420 state.cpp:33] Recovering state from 
'/tmp/ExternalContainerizerTest_Launch_owsfFk/meta'
I0428 22:23:00.672060 38418 status_update_manager.cpp:193] Recovering status 
update manager
I0428 22:23:00.672376 38419 slave.cpp:2943] Finished recovery
I0428 22:23:00.672938 38419 slave.cpp:525] New master detected at 
[email protected]:56899
I0428 22:23:00.673044 38409 status_update_manager.cpp:167] New master detected 
at [email protected]:56899
I0428 22:23:00.673053 38419 slave.cpp:585] Authenticating with master 
[email protected]:56899
I0428 22:23:00.673149 38419 slave.cpp:558] Detecting new master
I0428 22:23:00.673156 38392 sched.cpp:121] Version: 0.19.0
I0428 22:23:00.673193 38406 authenticatee.hpp:104] Initializing client SASL
I0428 22:23:00.673578 38421 sched.cpp:217] New master detected at 
[email protected]:56899
I0428 22:23:00.673614 38421 sched.cpp:268] Authenticating with master 
[email protected]:56899
I0428 22:23:00.675853 38406 authenticatee.hpp:128] Creating new client SASL 
connection
I0428 22:23:00.675935 38410 authenticatee.hpp:128] Creating new client SASL 
connection
I0428 22:23:00.676058 38416 master.cpp:2795] Authenticating 
slave(1)@10.35.12.124:56899
I0428 22:23:00.676229 38416 master.cpp:2795] Authenticating 
scheduler(1)@10.35.12.124:56899
I0428 22:23:00.676236 38415 authenticator.hpp:91] Initializing server SASL
I0428 22:23:00.676518 38415 authenticator.hpp:148] Creating new server SASL 
connection
I0428 22:23:00.676591 38413 authenticator.hpp:148] Creating new server SASL 
connection
I0428 22:23:00.676673 38419 authenticatee.hpp:219] Received SASL authentication 
mechanisms: CRAM-MD5
I0428 22:23:00.676681 38415 authenticatee.hpp:219] Received SASL authentication 
mechanisms: CRAM-MD5
I0428 22:23:00.676712 38419 authenticatee.hpp:245] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I0428 22:23:00.676724 38415 authenticatee.hpp:245] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I0428 22:23:00.676769 38419 authenticator.hpp:254] Received SASL authentication 
start
I0428 22:23:00.676789 38408 authenticator.hpp:254] Received SASL authentication 
startI0428 22:23:00.676833 38419 authenticator.hpp:342] Authentication requires 
more steps
I0428 22:23:00.676842 38408 authenticator.hpp:342] Authentication requires more 
stepsI0428 22:23:00.676924 38415 authenticatee.hpp:265] Received SASL 
authentication step
I0428 22:23:00.676980 38408 authenticatee.hpp:265] Received SASL authentication 
stepI0428 22:23:00.677053 38415 authenticator.hpp:282] Received SASL 
authentication step
I0428 22:23:00.677078 38408 authenticator.hpp:282] Received SASL authentication 
stepI0428 22:23:00.677119 38408 authenticator.hpp:334] Authentication 
successI0428 22:23:00.677150 38415 authenticator.hpp:334] Authentication success
I0428 22:23:00.677166 38419 authenticatee.hpp:305] Authentication successI0428 
22:23:00.677191 38406 master.cpp:2835] Successfully authenticated 
scheduler(1)@10.35.12.124:56899
I0428 22:23:00.677217 38413 authenticatee.hpp:305] Authentication success
I0428 22:23:00.677294 38418 sched.cpp:342] Successfully authenticated with 
master [email protected]:56899
I0428 22:23:00.677392 38406 master.cpp:2835] Successfully authenticated 
slave(1)@10.35.12.124:56899I0428 22:23:00.677415 38421 slave.cpp:642] 
Successfully authenticated with master [email protected]:56899
I0428 22:23:00.677521 38406 master.cpp:981] Received registration request from 
scheduler(1)@10.35.12.124:56899I0428 22:23:00.677588 38406 master.cpp:999] 
Registering framework 20140428-222300-2081170186-56899-38392-0
000 at scheduler(1)@10.35.12.124:56899I0428 22:23:00.677680 38408 
sched.cpp:392] Framework registered with 
20140428-222300-2081170186-56899-38392-0000
I0428 22:23:00.677763 38415 hierarchical_allocator_process.hpp:332] Added 
framework 20140428-222300-2081170186-56899-38392-0000
I0428 22:23:00.677886 38419 registrar.cpp:379] Attempting to update the 
'registry'I0428 22:23:00.700630 38407 leveldb.cpp:341] Persisting action (16 
bytes) to leveldb took 31.911511ms
I0428 22:23:00.700665 38407 replica.cpp:664] Persisted action at 2I0428 
22:23:00.701063 38417 replica.cpp:643] Replica received learned notice for 
position 2
I0428 22:23:00.734374 38417 leveldb.cpp:341] Persisting action (18 bytes) to 
leveldb took 33.184478msI0428 22:23:00.734499 38417 leveldb.cpp:399] Deleting 
~1 keys from leveldb took 28792ns
I0428 22:23:00.734519 38417 replica.cpp:664] Persisted action at 2
I0428 22:23:00.734545 38417 replica.cpp:649] Replica learned TRUNCATE action at 
position 2
I0428 22:23:00.734843 38406 log.cpp:680] Attempting to append 381 bytes to the 
log
I0428 22:23:00.734926 38412 coordinator.cpp:339] Coordinator attempting to 
write APPEND action at position
 3
I0428 22:23:00.735265 38418 replica.cpp:508] Replica received write request for 
position 3
I0428 22:23:00.767824 38418 leveldb.cpp:341] Persisting action (400 bytes) to 
leveldb took 32.533169ms
I0428 22:23:00.767859 38418 replica.cpp:664] Persisted action at 3I0428 
22:23:00.768105 38416 replica.cpp:643] Replica received learned notice for 
position 3I0428 22:23:00.801277 38416 leveldb.cpp:341] Persisting action (402 
bytes) to leveldb took 33.142168ms
I0428 22:23:00.801463 38416 replica.cpp:664] Persisted action at 3
I0428 22:23:00.801492 38416 replica.cpp:649] Replica learned APPEND action at 
position 3
I0428 22:23:00.801730 38415 registrar.cpp:427] Successfully updated 'registry'
I0428 22:23:00.801798 38413 log.cpp:699] Attempting to truncate the log to 3
I0428 22:23:00.801950 38412 coordinator.cpp:339] Coordinator attempting to 
write TRUNCATE action at position 4
I0428 22:23:00.801964 38415 master.cpp:2169] Admitted slave on 
smfd-aki-27-sr1.devel.twitter.com at slave(1)@10.35.12.124:56899
I0428 22:23:00.802059 38415 master.cpp:3283] Adding slave 
20140428-222300-2081170186-56899-38392-0 at smfd-aki-27-sr1.devel.twitter.com 
with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0428 22:23:00.802479 38406 hierarchical_allocator_process.hpp:445] Added slave 
20140428-222300-2081170186-56899-38392-0 (smfd-aki-27-sr1.devel.twitter.com) 
with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and 
cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
I0428 22:23:00.802542 38414 slave.cpp:675] Registered with master 
[email protected]:56899; given slave ID 
20140428-222300-2081170186-56899-38392-0
I0428 22:23:00.802589 38408 replica.cpp:508] Replica received write request for 
position 4
I0428 22:23:00.802711 38406 master.cpp:2744] Sending 1 offers to framework 
20140428-222300-2081170186-56899-38392-0000
I0428 22:23:00.803745 38419 master.cpp:1806] Processing reply for offers: [ 
20140428-222300-2081170186-56899-38392-0 ] on slave 
20140428-222300-2081170186-56899-38392-0 (smfd-aki-27-sr1.devel.twitter.com) 
for framework 20140428-222300-2081170186-56899-38392-0000
I0428 22:23:00.803886 38419 master.hpp:558] Adding task 1 with resources 
cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 
20140428-222300-2081170186-56899-38392-0 (smfd-aki-27-sr1.devel.twitter.com)
I0428 22:23:00.803973 38419 master.cpp:2919] Launching task 1 of framework 
20140428-222300-2081170186-56899-38392-0000 with resources cpus(*):2; 
mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 
20140428-222300-2081170186-56899-38392-0 (smfd-aki-27-sr1.devel.twitter.com)
I0428 22:23:00.804112 38419 slave.cpp:905] Got assigned task 1 for framework 
20140428-222300-2081170186-56899-38392-0000
I0428 22:23:00.804406 38419 slave.cpp:1015] Launching task 1 for framework 
20140428-222300-2081170186-56899-38392-0000
I0428 22:23:00.806025 38409 external_containerizer.cpp:266] Launching container 
'8e2ce0ec-74ad-4b1a-bb86-e6f21212c533'
I0428 22:23:00.806040 38419 slave.cpp:1125] Queuing task '1' for executor 1 of 
framework '20140428-222300-2081170186-56899-38392-0000
I0428 22:23:00.834636 38408 leveldb.cpp:341] Persisting action (16 bytes) to 
leveldb took 32.016229ms
I0428 22:23:00.834686 38408 replica.cpp:664] Persisted action at 4
I0428 22:23:00.835474 38408 replica.cpp:643] Replica received learned notice 
for position 4
I0428 22:23:00.869412 38408 leveldb.cpp:341] Persisting action (18 bytes) to 
leveldb took 33.904971ms
I0428 22:23:00.869474 38408 leveldb.cpp:399] Deleting ~2 keys from leveldb took 
23805ns
I0428 22:23:00.869503 38408 replica.cpp:664] Persisted action at 4
I0428 22:23:00.869516 38408 replica.cpp:649] Replica learned TRUNCATE action at 
position 4
E0428 22:23:01.813274 38414 slave.cpp:2237] Container 
'8e2ce0ec-74ad-4b1a-bb86-e6f21212c533' for executor '1' of framework 
'20140428-222300-2081170186-56899-38392-0000' failed to start: Could not launch 
container '8e2ce0ec-74ad-4b1a-bb86-e6f21212c533': External containerizer failed 
(status: 1)
../../../mesos/src/tests/external_containerizer_test.cpp:202: Failure
Failed to wait 10secs for status
{noformat}

> ExternalContainerizerTest.Launch is flaky
> -----------------------------------------
>
>                 Key: MESOS-1257
>                 URL: https://issues.apache.org/jira/browse/MESOS-1257
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Jie Yu
>            Assignee: Till Toenshoff
>
> {noformat}
> [----------] 1 test from ExternalContainerizerTest
> [ RUN      ] ExternalContainerizerTest.Launch
> 2014-04-28 
> 17:04:14,345:44634(0x7f94deffd700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:52535] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2014-04-28 
> 17:04:17,681:44634(0x7f94deffd700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:52535] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2014-04-28 
> 17:04:21,018:44634(0x7f94deffd700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:52535] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> ../../../mesos/src/tests/external_containerizer_test.cpp:202: Failure
> Failed to wait 10secs for status
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to