[jira] [Commented] (IGNITE-1123) Instability and broken topology when multiple server and client nodes are restarted
[ https://issues.apache.org/jira/browse/IGNITE-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284450#comment-15284450 ] Lev commented on IGNITE-1123: - thanks, I will try that. > Instability and broken topology when multiple server and client nodes are > restarted > --- > > Key: IGNITE-1123 > URL: https://issues.apache.org/jira/browse/IGNITE-1123 > Project: Ignite > Issue Type: Sub-task > Components: clients, general >Affects Versions: sprint-7 >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Critical > Labels: Muted_test > > The bug is always reproduced with > TcpDiscoveryMultiThreadedTest.testMultiThreadedClientsServersRestart. > The test starts multiple servers and clients and then restarts them from > multiple thread. At some point it will lead to one or all of the following: > 1) Broken topology on a client side: > {noformat} > java.lang.AssertionError: TcpDiscoveryNodeAddFinishedMessage > [nodeId=70576075-b528-43f4-b490-33d079dc7007, > super=TcpDiscoveryAbstractMessage [sndNodeId=null, > id=8f7e19c8e41-10b88275-1868-4faf-9ae0-d61d627b1001, > verifierNodeId=10b88275-1868-4faf-9ae0-d61d627b1001, topVer=89, pendingIdx=0, > isClient=false]] > at > org.apache.ignite.spi.discovery.tcp.ClientImpl.updateTopologyHistory(ClientImpl.java:589) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl.access$2500(ClientImpl.java:48) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processNodeAddFinishedMessage(ClientImpl.java:1370) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:1227) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processClientReconnectMessage(ClientImpl.java:1552) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:1235) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1197) > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > {noformat} > 2) Clients segmentation that is not properly processed by > GridCachePartionExchangeManager and that leads to the test hang: > {noformat} > Still waiting for initial partition map exchange > [fut=GridDhtPartitionsExchangeFuture [dummy=false, forcePreload=false, > reassign=false, discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=60b736c0-a2ad-4465-a3c1-7656e1fa9006, addrs=[127.0.0.1], > sockAddrs=[/127.0.0.1:0], discPort=0, order=255, intOrder=0, loc=true, > ver=1.4.1#19700101-sha1:, isClient=true], topVer=255, > nodeId8=60b736c0, msg=null, type=NODE_JOINED, tstamp=1436877471568], > rcvdIds=GridConcurrentHashSet > [elements=[7098ffd9-f81b-40bf-9b9e-b0935d394007, > 30b30924-a9e7-45fb-9aeb-361bbb482003, 00d7e953-ce8b-45e0-a1f3-be7a6dea1000, > 301559de-a129-4b85-852f-f8325649f003, 7078cd7a-e6b9-4bed-b829-a2e792a0c007, > 20de002e-7e98-4e55-b25f-d873e25db002, 00c444ad-b221-4695-9a6d-5ea529779000, > 40dc3691-7b20-41d7-a436-65ad27f74004, 308f0e4a-507a-4da6-b086-bdecc08e1003, > 20b1d488-7aa1-41d7-ac0b-e8730002, 00da0ab2-9441-4cc2-b787-c34dcf6a2000, > 4096e7dd-e3fe-4704-9d11-3b267430e004, 1059ba84-4ca7-4d8f-9563-b90334d48001]], > rmtIds=[30b30924-a9e7-45fb-9aeb-361bbb482003, > 20de002e-7e98-4e55-b25f-d873e25db002, 4096e7dd-e3fe-4704-9d11-3b267430e004, > 00c444ad-b221-4695-9a6d-5ea529779000], exchId=GridDhtPartitionExchangeId > [topVer=AffinityTopologyVersion [topVer=255, minorTopVer=0], nodeId=60b736c0, > evt=NODE_JOINED], init=true, ready=false, replied=false, added=true, > initFut=GridFutureAdapter [resFlag=2, res=true, startTime=1436877471578, > endTime=1436877471578, ignoreInterrupts=false, lsnr=null, state=DONE], > topSnapshot=null, lastVer=null, partReleaseFut=null, skipPreload=true, > clientOnlyExchange=true, oldest=20de002e-7e98-4e55-b25f-d873e25db002, > oldestOrder=254, evtLatch=0, remaining=[], super=GridFutureAdapter > [resFlag=0, res=null, startTime=1436877471578, endTime=0, > ignoreInterrupts=false, lsnr=null, state=INIT]]] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (IGNITE-1123) Instability and broken topology when multiple server and client nodes are restarted
[ https://issues.apache.org/jira/browse/IGNITE-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284429#comment-15284429 ] Denis Magda commented on IGNITE-1123: - A lot of fixes and optimizations have been done in this area already. I would recommend switching to one of the latest nightly build and check if you have any issue on your side https://builds.apache.org/view/H-L/view/Ignite/job/Ignite-nightly/lastSuccessfulBuild/ > Instability and broken topology when multiple server and client nodes are > restarted > --- > > Key: IGNITE-1123 > URL: https://issues.apache.org/jira/browse/IGNITE-1123 > Project: Ignite > Issue Type: Sub-task > Components: clients, general >Affects Versions: sprint-7 >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Critical > Labels: Muted_test > > The bug is always reproduced with > TcpDiscoveryMultiThreadedTest.testMultiThreadedClientsServersRestart. > The test starts multiple servers and clients and then restarts them from > multiple thread. At some point it will lead to one or all of the following: > 1) Broken topology on a client side: > {noformat} > java.lang.AssertionError: TcpDiscoveryNodeAddFinishedMessage > [nodeId=70576075-b528-43f4-b490-33d079dc7007, > super=TcpDiscoveryAbstractMessage [sndNodeId=null, > id=8f7e19c8e41-10b88275-1868-4faf-9ae0-d61d627b1001, > verifierNodeId=10b88275-1868-4faf-9ae0-d61d627b1001, topVer=89, pendingIdx=0, > isClient=false]] > at > org.apache.ignite.spi.discovery.tcp.ClientImpl.updateTopologyHistory(ClientImpl.java:589) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl.access$2500(ClientImpl.java:48) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processNodeAddFinishedMessage(ClientImpl.java:1370) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:1227) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processClientReconnectMessage(ClientImpl.java:1552) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:1235) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1197) > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > {noformat} > 2) Clients segmentation that is not properly processed by > GridCachePartionExchangeManager and that leads to the test hang: > {noformat} > Still waiting for initial partition map exchange > [fut=GridDhtPartitionsExchangeFuture [dummy=false, forcePreload=false, > reassign=false, discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=60b736c0-a2ad-4465-a3c1-7656e1fa9006, addrs=[127.0.0.1], > sockAddrs=[/127.0.0.1:0], discPort=0, order=255, intOrder=0, loc=true, > ver=1.4.1#19700101-sha1:, isClient=true], topVer=255, > nodeId8=60b736c0, msg=null, type=NODE_JOINED, tstamp=1436877471568], > rcvdIds=GridConcurrentHashSet > [elements=[7098ffd9-f81b-40bf-9b9e-b0935d394007, > 30b30924-a9e7-45fb-9aeb-361bbb482003, 00d7e953-ce8b-45e0-a1f3-be7a6dea1000, > 301559de-a129-4b85-852f-f8325649f003, 7078cd7a-e6b9-4bed-b829-a2e792a0c007, > 20de002e-7e98-4e55-b25f-d873e25db002, 00c444ad-b221-4695-9a6d-5ea529779000, > 40dc3691-7b20-41d7-a436-65ad27f74004, 308f0e4a-507a-4da6-b086-bdecc08e1003, > 20b1d488-7aa1-41d7-ac0b-e8730002, 00da0ab2-9441-4cc2-b787-c34dcf6a2000, > 4096e7dd-e3fe-4704-9d11-3b267430e004, 1059ba84-4ca7-4d8f-9563-b90334d48001]], > rmtIds=[30b30924-a9e7-45fb-9aeb-361bbb482003, > 20de002e-7e98-4e55-b25f-d873e25db002, 4096e7dd-e3fe-4704-9d11-3b267430e004, > 00c444ad-b221-4695-9a6d-5ea529779000], exchId=GridDhtPartitionExchangeId > [topVer=AffinityTopologyVersion [topVer=255, minorTopVer=0], nodeId=60b736c0, > evt=NODE_JOINED], init=true, ready=false, replied=false, added=true, > initFut=GridFutureAdapter [resFlag=2, res=true, startTime=1436877471578, > endTime=1436877471578, ignoreInterrupts=false, lsnr=null, state=DONE], > topSnapshot=null, lastVer=null, partReleaseFut=null, skipPreload=true, > clientOnlyExchange=true, oldest=20de002e-7e98-4e55-b25f-d873e25db002, > oldestOrder=254, evtLatch=0, remaining=[], super=GridFutureAdapter > [resFlag=0, res=null, startTime=1436877471578, endTime=0, > ignoreInterrupts=false, lsnr=null, state=INIT]]] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (IGNITE-1123) Instability and broken topology when multiple server and client nodes are restarted
[ https://issues.apache.org/jira/browse/IGNITE-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284421#comment-15284421 ] Lev commented on IGNITE-1123: - Any update on this issue, will it be fixed anytime soon? > Instability and broken topology when multiple server and client nodes are > restarted > --- > > Key: IGNITE-1123 > URL: https://issues.apache.org/jira/browse/IGNITE-1123 > Project: Ignite > Issue Type: Sub-task > Components: clients, general >Affects Versions: sprint-7 >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Critical > Labels: Muted_test > > The bug is always reproduced with > TcpDiscoveryMultiThreadedTest.testMultiThreadedClientsServersRestart. > The test starts multiple servers and clients and then restarts them from > multiple thread. At some point it will lead to one or all of the following: > 1) Broken topology on a client side: > {noformat} > java.lang.AssertionError: TcpDiscoveryNodeAddFinishedMessage > [nodeId=70576075-b528-43f4-b490-33d079dc7007, > super=TcpDiscoveryAbstractMessage [sndNodeId=null, > id=8f7e19c8e41-10b88275-1868-4faf-9ae0-d61d627b1001, > verifierNodeId=10b88275-1868-4faf-9ae0-d61d627b1001, topVer=89, pendingIdx=0, > isClient=false]] > at > org.apache.ignite.spi.discovery.tcp.ClientImpl.updateTopologyHistory(ClientImpl.java:589) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl.access$2500(ClientImpl.java:48) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processNodeAddFinishedMessage(ClientImpl.java:1370) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:1227) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processClientReconnectMessage(ClientImpl.java:1552) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:1235) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1197) > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > {noformat} > 2) Clients segmentation that is not properly processed by > GridCachePartionExchangeManager and that leads to the test hang: > {noformat} > Still waiting for initial partition map exchange > [fut=GridDhtPartitionsExchangeFuture [dummy=false, forcePreload=false, > reassign=false, discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=60b736c0-a2ad-4465-a3c1-7656e1fa9006, addrs=[127.0.0.1], > sockAddrs=[/127.0.0.1:0], discPort=0, order=255, intOrder=0, loc=true, > ver=1.4.1#19700101-sha1:, isClient=true], topVer=255, > nodeId8=60b736c0, msg=null, type=NODE_JOINED, tstamp=1436877471568], > rcvdIds=GridConcurrentHashSet > [elements=[7098ffd9-f81b-40bf-9b9e-b0935d394007, > 30b30924-a9e7-45fb-9aeb-361bbb482003, 00d7e953-ce8b-45e0-a1f3-be7a6dea1000, > 301559de-a129-4b85-852f-f8325649f003, 7078cd7a-e6b9-4bed-b829-a2e792a0c007, > 20de002e-7e98-4e55-b25f-d873e25db002, 00c444ad-b221-4695-9a6d-5ea529779000, > 40dc3691-7b20-41d7-a436-65ad27f74004, 308f0e4a-507a-4da6-b086-bdecc08e1003, > 20b1d488-7aa1-41d7-ac0b-e8730002, 00da0ab2-9441-4cc2-b787-c34dcf6a2000, > 4096e7dd-e3fe-4704-9d11-3b267430e004, 1059ba84-4ca7-4d8f-9563-b90334d48001]], > rmtIds=[30b30924-a9e7-45fb-9aeb-361bbb482003, > 20de002e-7e98-4e55-b25f-d873e25db002, 4096e7dd-e3fe-4704-9d11-3b267430e004, > 00c444ad-b221-4695-9a6d-5ea529779000], exchId=GridDhtPartitionExchangeId > [topVer=AffinityTopologyVersion [topVer=255, minorTopVer=0], nodeId=60b736c0, > evt=NODE_JOINED], init=true, ready=false, replied=false, added=true, > initFut=GridFutureAdapter [resFlag=2, res=true, startTime=1436877471578, > endTime=1436877471578, ignoreInterrupts=false, lsnr=null, state=DONE], > topSnapshot=null, lastVer=null, partReleaseFut=null, skipPreload=true, > clientOnlyExchange=true, oldest=20de002e-7e98-4e55-b25f-d873e25db002, > oldestOrder=254, evtLatch=0, remaining=[], super=GridFutureAdapter > [resFlag=0, res=null, startTime=1436877471578, endTime=0, > ignoreInterrupts=false, lsnr=null, state=INIT]]] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (IGNITE-1123) Instability and broken topology when multiple server and client nodes are restarted
[ https://issues.apache.org/jira/browse/IGNITE-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734370#comment-14734370 ] Denis Magda commented on IGNITE-1123: - Quite similar topology related issues arise when executing {{testMultiThreadedClientsRestart}} that only restarts clients node from multiple threads. {noformat} [09:31:15]W: [org.apache.ignite:ignite-core] java.lang.AssertionError: lastVer=961, newVer=963, locNode=TcpDiscoveryNode [id=601f5093-d79b-413a-989f-17c319cce006, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:0], discPort=0, order=963, intOrder=0, lastExchangeTime=1441693875034, loc=true, ver=1.4.0#19700101-sha1:, isClient=true], msg=TcpDiscoveryNodeAddFinishedMessage [nodeId=601f5093-d79b-413a-989f-17c319cce006, super=TcpDiscoveryAbstractMessage [sndNodeId=40033004-b638-488e-9883-4fde13a9f004, id=eacf7bbaf41-1044072a-f3d6-4d26-94cd-31c6325e3001, verifierNodeId=1044072a-f3d6-4d26-94cd-31c6325e3001, topVer=963, pendingIdx=0, isClient=false]] [09:31:15]W: [org.apache.ignite:ignite-core]at org.apache.ignite.spi.discovery.tcp.ClientImpl.updateTopologyHistory(ClientImpl.java:720) [09:31:15]W: [org.apache.ignite:ignite-core]at org.apache.ignite.spi.discovery.tcp.ClientImpl.access$2700(ClientImpl.java:118) [09:31:15]W: [org.apache.ignite:ignite-core]at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processNodeAddFinishedMessage(ClientImpl.java:1656) [09:31:15]W: [org.apache.ignite:ignite-core]at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:1537) [09:31:15]W: [org.apache.ignite:ignite-core]at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1465) [09:31:15]W: [org.apache.ignite:ignite-core]at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) {noformat} We have to revisit client discovery SPI impl. > Instability and broken topology when multiple server and client nodes are > restarted > --- > > Key: IGNITE-1123 > URL: https://issues.apache.org/jira/browse/IGNITE-1123 > Project: Ignite > Issue Type: Bug > Components: clients, general >Affects Versions: sprint-7 >Reporter: Denis Magda >Assignee: Denis Magda >Priority: Critical > Fix For: ignite-1.5 > > > The bug is always reproduced with > TcpDiscoveryMultiThreadedTest.testMultiThreadedClientsServersRestart. > The test starts multiple servers and clients and then restarts them from > multiple thread. At some point it will lead to one or all of the following: > 1) Broken topology on a client side: > {noformat} > java.lang.AssertionError: TcpDiscoveryNodeAddFinishedMessage > [nodeId=70576075-b528-43f4-b490-33d079dc7007, > super=TcpDiscoveryAbstractMessage [sndNodeId=null, > id=8f7e19c8e41-10b88275-1868-4faf-9ae0-d61d627b1001, > verifierNodeId=10b88275-1868-4faf-9ae0-d61d627b1001, topVer=89, pendingIdx=0, > isClient=false]] > at > org.apache.ignite.spi.discovery.tcp.ClientImpl.updateTopologyHistory(ClientImpl.java:589) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl.access$2500(ClientImpl.java:48) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processNodeAddFinishedMessage(ClientImpl.java:1370) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:1227) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processClientReconnectMessage(ClientImpl.java:1552) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:1235) > at > org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1197) > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > {noformat} > 2) Clients segmentation that is not properly processed by > GridCachePartionExchangeManager and that leads to the test hang: > {noformat} > Still waiting for initial partition map exchange > [fut=GridDhtPartitionsExchangeFuture [dummy=false, forcePreload=false, > reassign=false, discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode > [id=60b736c0-a2ad-4465-a3c1-7656e1fa9006, addrs=[127.0.0.1], > sockAddrs=[/127.0.0.1:0], discPort=0, order=255, intOrder=0, loc=true, > ver=1.4.1#19700101-sha1:, isClient=true], topVer=255, > nodeId8=60b736c0, msg=null, type=NODE_JOINED, tstamp=1436877471568], > rcvdIds=GridConcurrentHashSet > [elements=[7098ffd9-f81b-40bf-9b9e-b0935d394007, > 30b30924-a9e7-45fb-9aeb-361bbb482003, 00d7e953-ce8b-45e0-a1f3-be7a6dea1000, > 301559de-a129-4b85-852f-f8325649f003, 70