[ 
https://issues.apache.org/jira/browse/MESOS-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985667#comment-13985667
 ] 

Benjamin Mahler commented on MESOS-1250:
----------------------------------------

[~xujyan] Is this assigned to you or in progress? I'm seeing fail locally as 
well.

> ReservationAllocatorTest.ResourcesReturned flaky - repeated_field.h:824 CHECK 
> failed: (index) < (size())
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-1250
>                 URL: https://issues.apache.org/jira/browse/MESOS-1250
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Yan Xu
>
> {noformat:title=The CHECK failure}
> [libprotobuf FATAL 
> ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824]
>  CHECK failed: (index) < (size()): 
> {noformat}
> {noformat:title=Full log}
> [ RUN      ] ReservationAllocatorTest.ResourcesReturned
> I0425 18:24:35.375020 23940 leveldb.cpp:174] Opened db in 80.11386ms
> I0425 18:24:35.385205 23940 leveldb.cpp:181] Compacted db in 10.16346ms
> I0425 18:24:35.385226 23940 leveldb.cpp:196] Created db iterator in 3534ns
> I0425 18:24:35.385233 23940 leveldb.cpp:202] Seeked to beginning of db in 
> 513ns
> I0425 18:24:35.385239 23940 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 200ns
> I0425 18:24:35.385251 23940 replica.cpp:729] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0425 18:24:35.385484 23963 recover.cpp:425] Starting replica recovery
> I0425 18:24:35.385722 23961 recover.cpp:451] Replica is in EMPTY status
> I0425 18:24:35.385973 23963 master.cpp:266] Master 
> 20140425-182435-1032504131-35984-23940 (juno.apache.org) started on 
> 67.195.138.61:35984
> I0425 18:24:35.385993 23963 master.cpp:304] Master only allowing 
> authenticated frameworks to register
> I0425 18:24:35.386001 23963 credentials.hpp:35] Loading credentials for 
> authentication
> W0425 18:24:35.386051 23963 credentials.hpp:48] Failed to stat credentials 
> file 
> 'file:///tmp/ReservationAllocatorTest_ResourcesReturned_K72a9s/credentials': 
> No such file or directory
> I0425 18:24:35.386209 23966 replica.cpp:626] Replica in EMPTY status received 
> a broadcasted recover request
> I0425 18:24:35.386451 23964 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0425 18:24:35.386603 23966 hierarchical_allocator_process.hpp:302] 
> Initializing hierarchical allocator process with master : 
> master@67.195.138.61:35984
> I0425 18:24:35.386672 23966 master.cpp:104] No whitelist given. Advertising 
> offers for all slaves
> I0425 18:24:35.386672 23964 recover.cpp:542] Updating replica status to 
> STARTING
> I0425 18:24:35.387045 23968 master.cpp:918] The newly elected leader is 
> master@67.195.138.61:35984 with id 20140425-182435-1032504131-35984-23940
> I0425 18:24:35.387058 23968 master.cpp:928] Elected as the leading master!
> I0425 18:24:35.387064 23968 master.cpp:749] Recovering from registrar
> I0425 18:24:35.387130 23968 registrar.cpp:275] Recovering registrar
> I0425 18:24:35.416841 23964 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 30.043801ms
> I0425 18:24:35.416859 23964 replica.cpp:320] Persisted replica status to 
> STARTING
> I0425 18:24:35.416947 23964 recover.cpp:451] Replica is in STARTING status
> I0425 18:24:35.417343 23968 replica.cpp:626] Replica in STARTING status 
> received a broadcasted recover request
> I0425 18:24:35.417513 23968 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0425 18:24:35.417755 23967 recover.cpp:542] Updating replica status to VOTING
> I0425 18:24:35.428218 23968 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 10.386348ms
> I0425 18:24:35.428267 23968 replica.cpp:320] Persisted replica status to 
> VOTING
> I0425 18:24:35.428336 23967 recover.cpp:556] Successfully joined the Paxos 
> group
> I0425 18:24:35.428447 23967 recover.cpp:440] Recover process terminated
> I0425 18:24:35.428591 23967 log.cpp:656] Attempting to start the writer
> I0425 18:24:35.429041 23966 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0425 18:24:35.436518 23966 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 7.457582ms
> I0425 18:24:35.436539 23966 replica.cpp:342] Persisted promised to 1
> I0425 18:24:35.436861 23964 coordinator.cpp:229] Coordinator attemping to 
> fill missing position
> I0425 18:24:35.436869 23967 hierarchical_allocator_process.hpp:726] No 
> resources available to allocate!
> I0425 18:24:35.436878 23967 hierarchical_allocator_process.hpp:688] Performed 
> allocation for 0 slaves in 15305ns
> I0425 18:24:35.437417 23968 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0425 18:24:35.444845 23968 leveldb.cpp:341] Persisting action (8 bytes) to 
> leveldb took 7.408147ms
> I0425 18:24:35.444866 23968 replica.cpp:664] Persisted action at 0
> I0425 18:24:35.445366 23961 replica.cpp:508] Replica received write request 
> for position 0
> I0425 18:24:35.445392 23961 leveldb.cpp:436] Reading position from leveldb 
> took 12659ns
> I0425 18:24:35.453176 23961 leveldb.cpp:341] Persisting action (14 bytes) to 
> leveldb took 7.767811ms
> I0425 18:24:35.453198 23961 replica.cpp:664] Persisted action at 0
> I0425 18:24:35.453548 23965 replica.cpp:643] Replica received learned notice 
> for position 0
> I0425 18:24:35.461506 23965 leveldb.cpp:341] Persisting action (16 bytes) to 
> leveldb took 7.919252ms
> I0425 18:24:35.461527 23965 replica.cpp:664] Persisted action at 0
> I0425 18:24:35.461536 23965 replica.cpp:649] Replica learned NOP action at 
> position 0
> I0425 18:24:35.461809 23966 log.cpp:672] Writer started with ending position 0
> I0425 18:24:35.462334 23961 leveldb.cpp:436] Reading position from leveldb 
> took 15347ns
> I0425 18:24:35.464083 23967 registrar.cpp:308] Successfully recovered 
> registrar
> I0425 18:24:35.464109 23967 registrar.cpp:379] Attempting to update the 
> 'registry'
> I0425 18:24:35.465710 23962 log.cpp:680] Attempting to append 137 bytes to 
> the log
> I0425 18:24:35.465852 23963 coordinator.cpp:339] Coordinator attempting to 
> write APPEND action at position 1
> I0425 18:24:35.466189 23968 replica.cpp:508] Replica received write request 
> for position 1
> I0425 18:24:35.475623 23968 leveldb.cpp:341] Persisting action (156 bytes) to 
> leveldb took 9.417683ms
> I0425 18:24:35.475642 23968 replica.cpp:664] Persisted action at 1
> I0425 18:24:35.475893 23968 replica.cpp:643] Replica received learned notice 
> for position 1
> I0425 18:24:35.483952 23968 leveldb.cpp:341] Persisting action (158 bytes) to 
> leveldb took 8.039894ms
> I0425 18:24:35.483976 23968 replica.cpp:664] Persisted action at 1
> I0425 18:24:35.483985 23968 replica.cpp:649] Replica learned APPEND action at 
> position 1
> I0425 18:24:35.484416 23965 registrar.cpp:427] Successfully updated 'registry'
> I0425 18:24:35.484467 23968 log.cpp:699] Attempting to truncate the log to 1
> I0425 18:24:35.484530 23962 coordinator.cpp:339] Coordinator attempting to 
> write TRUNCATE action at position 2
> I0425 18:24:35.484622 23961 master.cpp:776] Recovered 0 slaves from the 
> Registry (99B) ; allowing 10mins for slaves to re-register
> I0425 18:24:35.485131 23968 replica.cpp:508] Replica received write request 
> for position 2
> I0425 18:24:35.487083 23965 hierarchical_allocator_process.hpp:726] No 
> resources available to allocate!
> I0425 18:24:35.487120 23965 hierarchical_allocator_process.hpp:688] Performed 
> allocation for 0 slaves in 51286ns
> I0425 18:24:35.487566 23963 slave.cpp:130] Slave started on 
> 6)@67.195.138.61:35984
> I0425 18:24:35.487763 23963 slave.cpp:218] Slave resources: cpus(role1):1; 
> mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; mem(*):200; 
> disk(*):0; ports(*):[31000-32000]
> I0425 18:24:35.487838 23963 slave.cpp:246] Slave hostname: juno.apache.org
> I0425 18:24:35.487845 23963 slave.cpp:247] Slave checkpoint: false
> I0425 18:24:35.488287 23965 state.cpp:33] Recovering state from 
> '/tmp/ReservationAllocatorTest_ResourcesReturned_cRsGAx/meta'
> I0425 18:24:35.488456 23963 status_update_manager.cpp:193] Recovering status 
> update manager
> I0425 18:24:35.488628 23963 slave.cpp:2804] Finished recovery
> I0425 18:24:35.488865 23963 slave.cpp:499] New master detected at 
> master@67.195.138.61:35984
> I0425 18:24:35.488914 23963 slave.cpp:733] Will retry registration in 
> 7.800343128secs if necessary
> I0425 18:24:35.488931 23963 slave.cpp:524] Detecting new master
> I0425 18:24:35.488968 23963 status_update_manager.cpp:167] New master 
> detected at master@67.195.138.61:35984
> I0425 18:24:35.489169 23963 registrar.cpp:379] Attempting to update the 
> 'registry'
> I0425 18:24:35.494820 23968 leveldb.cpp:341] Persisting action (16 bytes) to 
> leveldb took 9.670136ms
> I0425 18:24:35.494839 23968 replica.cpp:664] Persisted action at 2
> I0425 18:24:35.495146 23962 replica.cpp:643] Replica received learned notice 
> for position 2
> I0425 18:24:35.503154 23962 leveldb.cpp:341] Persisting action (18 bytes) to 
> leveldb took 7.987444ms
> I0425 18:24:35.503190 23962 leveldb.cpp:399] Deleting ~1 keys from leveldb 
> took 16160ns
> I0425 18:24:35.503201 23962 replica.cpp:664] Persisted action at 2
> I0425 18:24:35.503209 23962 replica.cpp:649] Replica learned TRUNCATE action 
> at position 2
> I0425 18:24:35.503527 23961 log.cpp:680] Attempting to append 438 bytes to 
> the log
> I0425 18:24:35.503597 23968 coordinator.cpp:339] Coordinator attempting to 
> write APPEND action at position 3
> I0425 18:24:35.503970 23967 replica.cpp:508] Replica received write request 
> for position 3
> I0425 18:24:35.511482 23967 leveldb.cpp:341] Persisting action (457 bytes) to 
> leveldb took 7.473886ms
> I0425 18:24:35.511502 23967 replica.cpp:664] Persisted action at 3
> I0425 18:24:35.511749 23968 replica.cpp:643] Replica received learned notice 
> for position 3
> I0425 18:24:35.519814 23968 leveldb.cpp:341] Persisting action (459 bytes) to 
> leveldb took 8.043537ms
> I0425 18:24:35.519836 23968 replica.cpp:664] Persisted action at 3
> I0425 18:24:35.519845 23968 replica.cpp:649] Replica learned APPEND action at 
> position 3
> I0425 18:24:35.520158 23965 registrar.cpp:427] Successfully updated 'registry'
> I0425 18:24:35.520236 23964 log.cpp:699] Attempting to truncate the log to 3
> I0425 18:24:35.520279 23965 master.cpp:2139] Admitted slave on 
> juno.apache.org at slave(6)@67.195.138.61:35984
> I0425 18:24:35.520285 23964 coordinator.cpp:339] Coordinator attempting to 
> write TRUNCATE action at position 4
> I0425 18:24:35.520306 23965 master.cpp:3213] Adding slave 
> 20140425-182435-1032504131-35984-23940-0 at juno.apache.org with 
> cpus(role1):1; mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; 
> mem(*):200; disk(*):0; ports(*):[31000-32000]
> I0425 18:24:35.520434 23967 slave.cpp:542] Registered with master 
> master@67.195.138.61:35984; given slave ID 
> 20140425-182435-1032504131-35984-23940-0
> [libprotobuf FATAL 
> ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/google/protobuf/repeated_field.h:824]
>  CHECK failed: (index) < (size()): 
> I0425 18:24:35.520853 23964 replica.cpp:508] Replica received write request 
> for position 4
> I0425 18:24:35.520884 23963 hierarchical_allocator_process.hpp:445] Added 
> slave 20140425-182435-1032504131-35984-23940-0 (juno.apache.org) with 
> cpus(role1):1; mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; 
> mem(*):200; disk(*):0; ports(*):[31000-32000] (and cpus(role1):1; 
> mem(role1):200; cpus(role2):2; mem(role2):600; cpus(*):1; mem(*):200; 
> disk(*):0; ports(*):[31000-32000] available)
> I0425 18:24:35.520987 23963 hierarchical_allocator_process.hpp:708] Performed 
> allocation for slave 20140425-182435-1032504131-35984-23940-0 in 17123ns
> ../../src/tests/allocator_tests.cpp:523: Failure
> Actual function call count doesn't match EXPECT_CALL(allocator, slaveAdded(_, 
> _, _))...
>          Expected: to be called twice
>            Actual: called once - unsatisfied and active
> unknown file: Failure
> C++ exception with description "CHECK failed: (index) < (size()): " thrown in 
> the test body.
> I0425 18:24:35.521409 23940 master.cpp:547] Master terminating
> I0425 18:24:35.521536 23965 slave.cpp:2012] master@67.195.138.61:35984 exited
> W0425 18:24:35.521553 23965 slave.cpp:2015] Master disconnected! Waiting for 
> a new master to be elected
> I0425 18:24:35.528180 23964 leveldb.cpp:341] Persisting action (16 bytes) to 
> leveldb took 7.286467ms
> I0425 18:24:35.528199 23964 replica.cpp:664] Persisted action at 4
> I0425 18:24:35.528659 23940 slave.cpp:388] Slave terminating
> [  FAILED  ] ReservationAllocatorTest.ResourcesReturned (235 ms)
> {noformat}
> {noformat:title=This is only the result of the CHECK failure and irrelevant}
> Actual function call count doesn't match EXPECT_CALL(allocator, slaveAdded(_, 
> _, _))...
>          Expected: to be called twice
>            Actual: called once - unsatisfied and active
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to