[ https://issues.apache.org/jira/browse/MESOS-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985667#comment-13985667 ]
Benjamin Mahler commented on MESOS-1250: ---------------------------------------- [~xujyan] Is this assigned to you or in progress? I'm seeing fail locally as well. > ReservationAllocatorTest.ResourcesReturned flaky - repeated_field.h:824 CHECK > failed: (index) < (size()) > -------------------------------------------------------------------------------------------------------- > > Key: MESOS-1250 > URL: https://issues.apache.org/jira/browse/MESOS-1250 > Project: Mesos > Issue Type: Bug > Reporter: Yan Xu > > {noformat:title=The CHECK failure} > [libprotobuf FATAL > ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824] > CHECK failed: (index) < (size()): > {noformat} > {noformat:title=Full log} > [ RUN ] ReservationAllocatorTest.ResourcesReturned > I0425 18:24:35.375020 23940 leveldb.cpp:174] Opened db in 80.11386ms > I0425 18:24:35.385205 23940 leveldb.cpp:181] Compacted db in 10.16346ms > I0425 18:24:35.385226 23940 leveldb.cpp:196] Created db iterator in 3534ns > I0425 18:24:35.385233 23940 leveldb.cpp:202] Seeked to beginning of db in > 513ns > I0425 18:24:35.385239 23940 leveldb.cpp:271] Iterated through 0 keys in the > db in 200ns > I0425 18:24:35.385251 23940 replica.cpp:729] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0425 18:24:35.385484 23963 recover.cpp:425] Starting replica recovery > I0425 18:24:35.385722 23961 recover.cpp:451] Replica is in EMPTY status > I0425 18:24:35.385973 23963 master.cpp:266] Master > 20140425-182435-1032504131-35984-23940 (juno.apache.org) started on > 67.195.138.61:35984 > I0425 18:24:35.385993 23963 master.cpp:304] Master only allowing > authenticated frameworks to register > I0425 18:24:35.386001 23963 credentials.hpp:35] Loading credentials for > authentication > W0425 18:24:35.386051 23963 credentials.hpp:48] Failed to stat credentials > file > 'file:///tmp/ReservationAllocatorTest_ResourcesReturned_K72a9s/credentials': > No such file or directory > I0425 18:24:35.386209 23966 replica.cpp:626] Replica in EMPTY status received > a broadcasted recover request > I0425 18:24:35.386451 23964 recover.cpp:188] Received a recover response from > a replica in EMPTY status > I0425 18:24:35.386603 23966 hierarchical_allocator_process.hpp:302] > Initializing hierarchical allocator process with master : > master@67.195.138.61:35984 > I0425 18:24:35.386672 23966 master.cpp:104] No whitelist given. Advertising > offers for all slaves > I0425 18:24:35.386672 23964 recover.cpp:542] Updating replica status to > STARTING > I0425 18:24:35.387045 23968 master.cpp:918] The newly elected leader is > master@67.195.138.61:35984 with id 20140425-182435-1032504131-35984-23940 > I0425 18:24:35.387058 23968 master.cpp:928] Elected as the leading master! > I0425 18:24:35.387064 23968 master.cpp:749] Recovering from registrar > I0425 18:24:35.387130 23968 registrar.cpp:275] Recovering registrar > I0425 18:24:35.416841 23964 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 30.043801ms > I0425 18:24:35.416859 23964 replica.cpp:320] Persisted replica status to > STARTING > I0425 18:24:35.416947 23964 recover.cpp:451] Replica is in STARTING status > I0425 18:24:35.417343 23968 replica.cpp:626] Replica in STARTING status > received a broadcasted recover request > I0425 18:24:35.417513 23968 recover.cpp:188] Received a recover response from > a replica in STARTING status > I0425 18:24:35.417755 23967 recover.cpp:542] Updating replica status to VOTING > I0425 18:24:35.428218 23968 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 10.386348ms > I0425 18:24:35.428267 23968 replica.cpp:320] Persisted replica status to > VOTING > I0425 18:24:35.428336 23967 recover.cpp:556] Successfully joined the Paxos > group > I0425 18:24:35.428447 23967 recover.cpp:440] Recover process terminated > I0425 18:24:35.428591 23967 log.cpp:656] Attempting to start the writer > I0425 18:24:35.429041 23966 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I0425 18:24:35.436518 23966 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 7.457582ms > I0425 18:24:35.436539 23966 replica.cpp:342] Persisted promised to 1 > I0425 18:24:35.436861 23964 coordinator.cpp:229] Coordinator attemping to > fill missing position > I0425 18:24:35.436869 23967 hierarchical_allocator_process.hpp:726] No > resources available to allocate! > I0425 18:24:35.436878 23967 hierarchical_allocator_process.hpp:688] Performed > allocation for 0 slaves in 15305ns > I0425 18:24:35.437417 23968 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I0425 18:24:35.444845 23968 leveldb.cpp:341] Persisting action (8 bytes) to > leveldb took 7.408147ms > I0425 18:24:35.444866 23968 replica.cpp:664] Persisted action at 0 > I0425 18:24:35.445366 23961 replica.cpp:508] Replica received write request > for position 0 > I0425 18:24:35.445392 23961 leveldb.cpp:436] Reading position from leveldb > took 12659ns > I0425 18:24:35.453176 23961 leveldb.cpp:341] Persisting action (14 bytes) to > leveldb took 7.767811ms > I0425 18:24:35.453198 23961 replica.cpp:664] Persisted action at 0 > I0425 18:24:35.453548 23965 replica.cpp:643] Replica received learned notice > for position 0 > I0425 18:24:35.461506 23965 leveldb.cpp:341] Persisting action (16 bytes) to > leveldb took 7.919252ms > I0425 18:24:35.461527 23965 replica.cpp:664] Persisted action at 0 > I0425 18:24:35.461536 23965 replica.cpp:649] Replica learned NOP action at > position 0 > I0425 18:24:35.461809 23966 log.cpp:672] Writer started with ending position 0 > I0425 18:24:35.462334 23961 leveldb.cpp:436] Reading position from leveldb > took 15347ns > I0425 18:24:35.464083 23967 registrar.cpp:308] Successfully recovered > registrar > I0425 18:24:35.464109 23967 registrar.cpp:379] Attempting to update the > 'registry' > I0425 18:24:35.465710 23962 log.cpp:680] Attempting to append 137 bytes to > the log > I0425 18:24:35.465852 23963 coordinator.cpp:339] Coordinator attempting to > write APPEND action at position 1 > I0425 18:24:35.466189 23968 replica.cpp:508] Replica received write request > for position 1 > I0425 18:24:35.475623 23968 leveldb.cpp:341] Persisting action (156 bytes) to > leveldb took 9.417683ms > I0425 18:24:35.475642 23968 replica.cpp:664] Persisted action at 1 > I0425 18:24:35.475893 23968 replica.cpp:643] Replica received learned notice > for position 1 > I0425 18:24:35.483952 23968 leveldb.cpp:341] Persisting action (158 bytes) to > leveldb took 8.039894ms > I0425 18:24:35.483976 23968 replica.cpp:664] Persisted action at 1 > I0425 18:24:35.483985 23968 replica.cpp:649] Replica learned APPEND action at > position 1 > I0425 18:24:35.484416 23965 registrar.cpp:427] Successfully updated 'registry' > I0425 18:24:35.484467 23968 log.cpp:699] Attempting to truncate the log to 1 > I0425 18:24:35.484530 23962 coordinator.cpp:339] Coordinator attempting to > write TRUNCATE action at position 2 > I0425 18:24:35.484622 23961 master.cpp:776] Recovered 0 slaves from the > Registry (99B) ; allowing 10mins for slaves to re-register > I0425 18:24:35.485131 23968 replica.cpp:508] Replica received write request > for position 2 > I0425 18:24:35.487083 23965 hierarchical_allocator_process.hpp:726] No > resources available to allocate! > I0425 18:24:35.487120 23965 hierarchical_allocator_process.hpp:688] Performed > allocation for 0 slaves in 51286ns > I0425 18:24:35.487566 23963 slave.cpp:130] Slave started on > 6)@67.195.138.61:35984 > I0425 18:24:35.487763 23963 slave.cpp:218] Slave resources: cpus(role1):1; > mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; mem(*):200; > disk(*):0; ports(*):[31000-32000] > I0425 18:24:35.487838 23963 slave.cpp:246] Slave hostname: juno.apache.org > I0425 18:24:35.487845 23963 slave.cpp:247] Slave checkpoint: false > I0425 18:24:35.488287 23965 state.cpp:33] Recovering state from > '/tmp/ReservationAllocatorTest_ResourcesReturned_cRsGAx/meta' > I0425 18:24:35.488456 23963 status_update_manager.cpp:193] Recovering status > update manager > I0425 18:24:35.488628 23963 slave.cpp:2804] Finished recovery > I0425 18:24:35.488865 23963 slave.cpp:499] New master detected at > master@67.195.138.61:35984 > I0425 18:24:35.488914 23963 slave.cpp:733] Will retry registration in > 7.800343128secs if necessary > I0425 18:24:35.488931 23963 slave.cpp:524] Detecting new master > I0425 18:24:35.488968 23963 status_update_manager.cpp:167] New master > detected at master@67.195.138.61:35984 > I0425 18:24:35.489169 23963 registrar.cpp:379] Attempting to update the > 'registry' > I0425 18:24:35.494820 23968 leveldb.cpp:341] Persisting action (16 bytes) to > leveldb took 9.670136ms > I0425 18:24:35.494839 23968 replica.cpp:664] Persisted action at 2 > I0425 18:24:35.495146 23962 replica.cpp:643] Replica received learned notice > for position 2 > I0425 18:24:35.503154 23962 leveldb.cpp:341] Persisting action (18 bytes) to > leveldb took 7.987444ms > I0425 18:24:35.503190 23962 leveldb.cpp:399] Deleting ~1 keys from leveldb > took 16160ns > I0425 18:24:35.503201 23962 replica.cpp:664] Persisted action at 2 > I0425 18:24:35.503209 23962 replica.cpp:649] Replica learned TRUNCATE action > at position 2 > I0425 18:24:35.503527 23961 log.cpp:680] Attempting to append 438 bytes to > the log > I0425 18:24:35.503597 23968 coordinator.cpp:339] Coordinator attempting to > write APPEND action at position 3 > I0425 18:24:35.503970 23967 replica.cpp:508] Replica received write request > for position 3 > I0425 18:24:35.511482 23967 leveldb.cpp:341] Persisting action (457 bytes) to > leveldb took 7.473886ms > I0425 18:24:35.511502 23967 replica.cpp:664] Persisted action at 3 > I0425 18:24:35.511749 23968 replica.cpp:643] Replica received learned notice > for position 3 > I0425 18:24:35.519814 23968 leveldb.cpp:341] Persisting action (459 bytes) to > leveldb took 8.043537ms > I0425 18:24:35.519836 23968 replica.cpp:664] Persisted action at 3 > I0425 18:24:35.519845 23968 replica.cpp:649] Replica learned APPEND action at > position 3 > I0425 18:24:35.520158 23965 registrar.cpp:427] Successfully updated 'registry' > I0425 18:24:35.520236 23964 log.cpp:699] Attempting to truncate the log to 3 > I0425 18:24:35.520279 23965 master.cpp:2139] Admitted slave on > juno.apache.org at slave(6)@67.195.138.61:35984 > I0425 18:24:35.520285 23964 coordinator.cpp:339] Coordinator attempting to > write TRUNCATE action at position 4 > I0425 18:24:35.520306 23965 master.cpp:3213] Adding slave > 20140425-182435-1032504131-35984-23940-0 at juno.apache.org with > cpus(role1):1; mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; > mem(*):200; disk(*):0; ports(*):[31000-32000] > I0425 18:24:35.520434 23967 slave.cpp:542] Registered with master > master@67.195.138.61:35984; given slave ID > 20140425-182435-1032504131-35984-23940-0 > [libprotobuf FATAL > ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824] > CHECK failed: (index) < (size()): > I0425 18:24:35.520853 23964 replica.cpp:508] Replica received write request > for position 4 > I0425 18:24:35.520884 23963 hierarchical_allocator_process.hpp:445] Added > slave 20140425-182435-1032504131-35984-23940-0 (juno.apache.org) with > cpus(role1):1; mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; > mem(*):200; disk(*):0; ports(*):[31000-32000] (and cpus(role1):1; > mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; mem(*):200; > disk(*):0; ports(*):[31000-32000] available) > I0425 18:24:35.520987 23963 hierarchical_allocator_process.hpp:708] Performed > allocation for slave 20140425-182435-1032504131-35984-23940-0 in 17123ns > ../../src/tests/allocator_tests.cpp:523: Failure > Actual function call count doesn't match EXPECT_CALL(allocator, slaveAdded(_, > _, _))... > Expected: to be called twice > Actual: called once - unsatisfied and active > unknown file: Failure > C++ exception with description "CHECK failed: (index) < (size()): " thrown in > the test body. > I0425 18:24:35.521409 23940 master.cpp:547] Master terminating > I0425 18:24:35.521536 23965 slave.cpp:2012] master@67.195.138.61:35984 exited > W0425 18:24:35.521553 23965 slave.cpp:2015] Master disconnected! Waiting for > a new master to be elected > I0425 18:24:35.528180 23964 leveldb.cpp:341] Persisting action (16 bytes) to > leveldb took 7.286467ms > I0425 18:24:35.528199 23964 replica.cpp:664] Persisted action at 4 > I0425 18:24:35.528659 23940 slave.cpp:388] Slave terminating > [ FAILED ] ReservationAllocatorTest.ResourcesReturned (235 ms) > {noformat} > {noformat:title=This is only the result of the CHECK failure and irrelevant} > Actual function call count doesn't match EXPECT_CALL(allocator, slaveAdded(_, > _, _))... > Expected: to be called twice > Actual: called once - unsatisfied and active > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)