Here is my test state machine definition. Thanks for reviewing! https://github.com/neutronsharc/tools/blob/master/HcdDiskStateModelFactory.java
On Mon, Apr 25, 2016 at 9:18 PM, kishore g <[email protected]> wrote: > Can you paste the state machine definition as well. > On Apr 25, 2016 8:41 PM, "Neutron sharc" <[email protected]> wrote: > >> In my small test I have only 1 resource with 4 partitions. Every >> partition has its unique name. >> >> Looking at the log output, it seems that every state transition >> message is sent twice to the same target. Please see the complete >> log at end of this post. I experimented with both SEMI_AUTO and >> USER_DEFINED >> >> Here is the ideal state of the only resource (Pool0). >> { >> "id" : "Pool0", >> "listFields" : { >> "Pool0_0" : [ "host1_disk1", "host1_disk2", "host1_disk3" ], >> "Pool0_1" : [ "host1_disk1", "host1_disk2", "host1_disk3" ], >> "Pool0_2" : [ "host1_disk2", "host1_disk1", "host1_disk3" ], >> "Pool0_3" : [ "host1_disk3", "host1_disk1", "host1_disk2" ] >> }, >> "mapFields" : { >> "Pool0_0" : { >> "host1_disk1" : "MASTER", >> "host1_disk2" : "SLAVE", >> "host1_disk3" : "SLAVE" >> }, >> "Pool0_1" : { >> "host1_disk1" : "MASTER", >> "host1_disk2" : "SLAVE", >> "host1_disk3" : "SLAVE" >> }, >> "Pool0_2" : { >> "host1_disk1" : "SLAVE", >> "host1_disk2" : "MASTER", >> "host1_disk3" : "SLAVE" >> }, >> "Pool0_3" : { >> "host1_disk1" : "SLAVE", >> "host1_disk2" : "SLAVE", >> "host1_disk3" : "MASTER" >> } >> }, >> "simpleFields" : { >> "IDEAL_STATE_MODE" : "AUTO", >> "MAX_PARTITIONS_PER_INSTANCE" : "1", >> "NUM_PARTITIONS" : "4", >> "REBALANCE_MODE" : "SEMI_AUTO", >> "REPLICAS" : "3", >> "STATE_MODEL_DEF_REF" : "HcdDiskStateModel", >> "STATE_MODEL_FACTORY_NAME" : "DEFAULT" >> } >> } >> >> ====== >> log output: >> >> [INFO 2016-04-25 20:13:46,062 com.hcd.hcdadmin.HcdAdmin:502] will >> create pool Pool0 >> >> [INFO 2016-04-25 20:13:46,120 com.hcd.hcdadmin.HcdAdmin:242] found >> available live disks: >> >> [INFO 2016-04-25 20:13:46,120 com.hcd.hcdadmin.HcdAdmin:244] host1_disk1 >> >> [INFO 2016-04-25 20:13:46,120 com.hcd.hcdadmin.HcdAdmin:244] host1_disk2 >> >> [INFO 2016-04-25 20:13:46,120 com.hcd.hcdadmin.HcdAdmin:244] host1_disk3 >> >> [INFO 2016-04-25 20:13:46,272 com.hcd.hcdadmin.HcdPool:227] assign >> pool partition Pool0_0 to disks: [host1_disk3, host1_disk2, >> host1_disk1] >> >> [INFO 2016-04-25 20:13:46,456 com.hcd.hcdadmin.HcdPool:227] assign >> pool partition Pool0_1 to disks: [host1_disk3, host1_disk2, >> host1_disk1] >> >> [INFO 2016-04-25 20:13:46,610 com.hcd.hcdadmin.HcdPool:227] assign >> pool partition Pool0_2 to disks: [host1_disk3, host1_disk2, >> host1_disk1] >> >> [INFO 2016-04-25 20:13:46,769 com.hcd.hcdadmin.HcdPool:227] assign >> pool partition Pool0_3 to disks: [host1_disk3, host1_disk2, >> host1_disk1] >> >> [INFO 2016-04-25 20:13:46,905 com.hcd.hcdadmin.HcdPool:265] assigned >> ideal state to LUN Pool0 >> >> [INFO 2016-04-25 20:13:47,428 com.hcd.hcdadmin.HcdAdmin:261] has >> created pool Pool0 in cluster TryHelixCluster1 >> >> [INFO 2016-04-25 20:13:47,429 com.hcd.hcdadmin.HcdAdmin:505] >> rebalance for pool Pool0 >> >> [INFO 2016-04-25 20:13:47,881 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk2 : >> Pool0/Pool0_1: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:47,881 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk1 : >> Pool0/Pool0_2: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:47,887 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk3 : >> Pool0/Pool0_2: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:47,912 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk1 : >> Pool0/Pool0_3: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:47,913 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk2 : >> Pool0/Pool0_2: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:47,920 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk3 : >> Pool0/Pool0_0: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:47,954 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk1 : >> Pool0/Pool0_1: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:47,954 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk2 : >> Pool0/Pool0_0: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:47,972 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk3 : >> Pool0/Pool0_3: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,009 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk1 : >> Pool0/Pool0_0: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,010 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk3 : >> Pool0/Pool0_1: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,015 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk2 : >> Pool0/Pool0_3: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,265 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk2 : >> Pool0/Pool0_1: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,265 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk1 : >> Pool0/Pool0_2: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,269 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk3 : >> Pool0/Pool0_2: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,278 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk2 : >> Pool0/Pool0_2: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,284 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk3 : >> Pool0/Pool0_0: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,286 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk1 : >> Pool0/Pool0_3: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,302 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk1 : >> Pool0/Pool0_1: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,316 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk2 : >> Pool0/Pool0_0: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,319 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk3 : >> Pool0/Pool0_3: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,340 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk3 : >> Pool0/Pool0_1: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,348 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk2 : >> Pool0/Pool0_3: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,349 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:111] host1_disk1 : >> Pool0/Pool0_0: transit from OFFLINE to SLAVE >> >> [INFO 2016-04-25 20:13:48,461 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:125] host1_disk2 : >> Pool0/Pool0_2: transit from SLAVE to MASTER >> >> [INFO 2016-04-25 20:13:48,464 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:125] host1_disk3 : >> Pool0/Pool0_3: transit from SLAVE to MASTER >> >> [INFO 2016-04-25 20:13:48,465 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:125] host1_disk1 : >> Pool0/Pool0_0: transit from SLAVE to MASTER >> >> [INFO 2016-04-25 20:13:48,466 >> com.hcd.hcdadmin.HcdDiskStateModelFactory:125] host1_disk1 : >> Pool0/Pool0_1: transit from SLAVE to MASTER >> >> [ERROR 2016-04-25 20:13:48,544 >> org.apache.helix.messaging.handling.HelixStateTransitionHandler:118] >> Current state of stateModel does not match the fromState in Message, >> Current State:MASTER, message expected:SLAVE, partition: Pool0_2, >> from: host1_admin, to: host1_disk2 >> >> [ERROR 2016-04-25 20:13:48,549 >> org.apache.helix.messaging.handling.HelixStateTransitionHandler:118] >> Current state of stateModel does not match the fromState in Message, >> Current State:MASTER, message expected:SLAVE, partition: Pool0_3, >> from: host1_admin, to: host1_disk3 >> >> [ERROR 2016-04-25 20:13:48,549 >> org.apache.helix.messaging.handling.HelixStateTransitionHandler:118] >> Current state of stateModel does not match the fromState in Message, >> Current State:MASTER, message expected:SLAVE, partition: Pool0_0, >> from: host1_admin, to: host1_disk1 >> >> [ERROR 2016-04-25 20:13:48,555 >> org.apache.helix.messaging.handling.HelixStateTransitionHandler:118] >> Current state of stateModel does not match the fromState in Message, >> Current State:MASTER, message expected:SLAVE, partition: Pool0_1, >> from: host1_admin, to: host1_disk1 >> >> [ERROR 2016-04-25 20:13:48,556 >> org.apache.helix.messaging.handling.HelixTask:143] Message execution >> failed. msgId: 4e18677e-a4d1-41ee-bb71-fea6f5e2c5c3, errorMsg: >> >> org.apache.helix.messaging.handling.HelixStateTransitionHandler$HelixStateMismatchException: >> Current state of stateModel does not match the fromState in Message, >> Current State:MASTER, message expected:SLAVE, partition: Pool0_2, >> from: host1_admin, to: host1_disk2 >> >> [ERROR 2016-04-25 20:13:48,559 >> org.apache.helix.messaging.handling.HelixTask:143] Message execution >> failed. msgId: 1199845a-1e49-4ea0-9a53-b52bf5afd816, errorMsg: >> >> org.apache.helix.messaging.handling.HelixStateTransitionHandler$HelixStateMismatchException: >> Current state of stateModel does not match the fromState in Message, >> Current State:MASTER, message expected:SLAVE, partition: Pool0_3, >> from: host1_admin, to: host1_disk3 >> >> [ERROR 2016-04-25 20:13:48,560 >> org.apache.helix.messaging.handling.HelixTask:143] Message execution >> failed. msgId: 35f504ce-b332-4fd4-a893-1a400047621e, errorMsg: >> >> org.apache.helix.messaging.handling.HelixStateTransitionHandler$HelixStateMismatchException: >> Current state of stateModel does not match the fromState in Message, >> Current State:MASTER, message expected:SLAVE, partition: Pool0_0, >> from: host1_admin, to: host1_disk1 >> >> [ERROR 2016-04-25 20:13:48,573 >> org.apache.helix.messaging.handling.HelixStateTransitionHandler:385] >> Skip internal error. errCode: ERROR, errMsg: Current state of >> stateModel does not match the fromState in Message, Current >> State:MASTER, message expected:SLAVE, partition: Pool0_2, from: >> host1_admin, to: host1_disk2 >> >> [ERROR 2016-04-25 20:13:48,576 >> org.apache.helix.messaging.handling.HelixStateTransitionHandler:385] >> Skip internal error. errCode: ERROR, errMsg: Current state of >> stateModel does not match the fromState in Message, Current >> State:MASTER, message expected:SLAVE, partition: Pool0_0, from: >> host1_admin, to: host1_disk1 >> >> [ERROR 2016-04-25 20:13:48,577 >> org.apache.helix.messaging.handling.HelixStateTransitionHandler:385] >> Skip internal error. errCode: ERROR, errMsg: Current state of >> stateModel does not match the fromState in Message, Current >> State:MASTER, message expected:SLAVE, partition: Pool0_3, from: >> host1_admin, to: host1_disk3 >> >> [ERROR 2016-04-25 20:13:48,580 >> org.apache.helix.messaging.handling.HelixTask:143] Message execution >> failed. msgId: 7573c64d-e055-4eb1-b845-e22c82542437, errorMsg: >> >> org.apache.helix.messaging.handling.HelixStateTransitionHandler$HelixStateMismatchException: >> Current state of stateModel does not match the fromState in Message, >> Current State:MASTER, message expected:SLAVE, partition: Pool0_1, >> from: host1_admin, to: host1_disk1 >> >> [ERROR 2016-04-25 20:13:48,594 >> org.apache.helix.messaging.handling.HelixStateTransitionHandler:385] >> Skip internal error. errCode: ERROR, errMsg: Current state of >> stateModel does not match the fromState in Message, Current >> State:MASTER, message expected:SLAVE, partition: Pool0_1, from: >> host1_admin, to: host1_disk1 >> >> >> [INFO 2016-04-25 20:13:51,517 com.hcd.hcdadmin.HcdExternalView:68] >> >> Cluster TryHelixCluster1, show external view of LUNs >> >> >> host1_disk1 host1_disk2 host1_disk3 >> >> Pool0_0 M S S >> >> Pool0_1 M S S >> >> Pool0_2 S M S >> >> Pool0_3 S S M >> >> On Sat, Apr 23, 2016 at 11:35 PM, kishore g <[email protected]> wrote: >> > How many resources do you have. Partition names must be unique across the >> > entire cluster. Can you also paste the idealstate for the resources >> > >> > On Sat, Apr 23, 2016 at 10:39 PM, Neutron sharc <[email protected]> >> > wrote: >> > >> >> Hi Helix team, >> >> >> >> I keep seeing this error from HelixStateTransitionHandler when the >> >> state machine is running. It seems a partition's actual state doesn't >> >> match with the state marked in controller message. What are the usual >> >> causes? I'm using helix >> >> 0.7.1. Here is my maven pom.xml: >> >> >> >> <dependency> >> >> <groupId>org.apache.helix</groupId> >> >> <artifactId>helix-core</artifactId> >> >> <version>0.7.1</version> >> >> </dependency> >> >> >> >> >> >> >> >> [ERROR 2016-04-21 19:51:09,943 >> >> org.apache.helix.messaging.handling.HelixStateTransitionHandler:118] >> >> Current state of stateModel does not match the fromState in Message, >> >> Current State:MASTER, message expected:SLAVE, partition: >> >> host1_Pool0_0, from: host1_admin, to: host1_disk1 >> >> >> >> [ERROR 2016-04-21 19:51:09,959 >> >> org.apache.helix.messaging.handling.HelixTask:143] Message execution >> >> failed. msgId: 26c891b8-dd81-4e0c-8b99-6c62b856db5f, errorMsg: >> >> >> >> >> org.apache.helix.messaging.handling.HelixStateTransitionHandler$HelixStateMismatchException: >> >> Current state of stateModel does not match the fromState in Message, >> >> Current State:MASTER, message expected:SLAVE, partition: >> >> host1_Pool0_0, from: host1_admin, to: host1_disk1 >> >> >> >> [ERROR 2016-04-21 19:51:09,975 >> >> org.apache.helix.messaging.handling.HelixStateTransitionHandler:385] >> >> Skip internal error. errCode: ERROR, errMsg: Current state of >> >> stateModel does not match the fromState in Message, Current >> >> State:MASTER, message expected:SLAVE, partition: host1_Pool0_0, from: >> >> host1_admin, to: host1_disk1 >> >> >> >> >> >> Another problem I see is: my ideal state defines a partition has 3 >> >> replicas, but the resource's external view shows sometime that a >> >> partition has 4 replicas. >> >> >> >> >> >> Any hints? Thanks! >> >> >>
