Hi Asitha,

I want to know in high level what happens to slots when a node is killed.

1. A new slot coordinator should be elected. Until then other nodes might
connect to that node.


2015-07-14 17:59:18,207] ERROR {org.wso2.andes.thrift.MBThriftClient} -
 Could not connect to the Thrift Server
10.100.1.146:7611java.net.ConnectException:
Connection refused
org.apache.thrift.transport.TTransportException: java.net.ConnectException:
Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
at
org.wso2.andes.thrift.MBThriftClient.reConnectToServer(MBThriftClient.java:281)
at org.wso2.andes.thrift.MBThriftClient.getSlot(MBThriftClient.java:78)
at
org.wso2.andes.kernel.slot.SlotCoordinatorCluster.getSlot(SlotCoordinatorCluster.java:44)
at
org.wso2.andes.kernel.slot.SlotDeliveryWorker.requestSlot(SlotDeliveryWorker.java:301)
at
org.wso2.andes.kernel.slot.SlotDeliveryWorker.run(SlotDeliveryWorker.java:109)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
... 8 more
[2015-07-14 17:59:18,210] ERROR
{org.wso2.andes.kernel.slot.SlotDeliveryWorker} -  Error occurred while
connecting to the thrift coordinator Coordinator has changed
org.wso2.andes.kernel.slot.ConnectionException: Coordinator has changed
at org.wso2.andes.thrift.MBThriftClient.getSlot(MBThriftClient.java:83)
at
org.wso2.andes.kernel.slot.SlotCoordinatorCluster.getSlot(SlotCoordinatorCluster.java:44)
at
org.wso2.andes.kernel.slot.SlotDeliveryWorker.requestSlot(SlotDeliveryWorker.java:301)
at
org.wso2.andes.kernel.slot.SlotDeliveryWorker.run(SlotDeliveryWorker.java:109)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException:
java.net.SocketException: Connection reset
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at
org.wso2.andes.thrift.slot.gen.SlotManagementService$Client.recv_getSlotInfo(SlotManagementService.java:101)
at
org.wso2.andes.thrift.slot.gen.SlotManagementService$Client.getSlotInfo(SlotManagementService.java:87)
at org.wso2.andes.thrift.MBThriftClient.getSlot(MBThriftClient.java:73)
... 6 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:196)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 14 more
[2015-07-14 17:59:18,211] ERROR {org.wso2.andes.thrift.MBThriftClient} -
 Could not initialize the Thrift client. java.net.ConnectException:
Connection refused
org.apache.thrift.transport.TTransportException: java.net.ConnectException:
Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
at
org.wso2.andes.thrift.MBThriftClient.getServiceClient(MBThriftClient.java:225)
at
org.wso2.andes.thrift.MBThriftClient.updateMessageId(MBThriftClient.java:117)
at
org.wso2.andes.kernel.slot.SlotCoordinatorCluster.updateMessageId(SlotCoordinatorCluster.java:53)
at
org.wso2.andes.kernel.slot.SlotMessageCounter.submitSlot(SlotMessageCounter.java:203)
at
org.wso2.andes.kernel.slot.SlotMessageCounter$1.run(SlotMessageCounter.java:103)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
... 7 more
[2015-07-14 17:59:23,212]  INFO {org.wso2.andes.thrift.MBThriftClient} -
 Reconnecting to Slot Coordinator 10.100.1.146:7612
[2015-07-14 17:59:23,218]  INFO {org.wso2.andes.thrift.MBThriftClient} -
 Reconnecting to Slot Coordinator 10.100.1.146:7612
[2015-07-14 17:59:23,237]  INFO {org.wso2.andes.kernel.AndesChannel} -
 Flow control disabled for channel 2.


2. Failover can be fast. When it is connected to new node, slot coordinator
might not be elected.

3. What happens to assigned slots to the killed node?

4. What if an overlapping slot comes at the instance of node being killed?
There is a risk of message duplication? Because overlapping slots must be
assigned to the same node as there might be already delivered messages.

5.  subscriber maps should be updated with failed over subscriber (remove
from previous node and added to the new node)

6. messages addressed to topics (duplicated for node) should be purged.

7. Flow control + node kill  - there are some edge cases surrounding that.

8. Messages inside disruptor. If we do not use publisher transactions some
messages might be lost (by design)


Basically what we need to do is, whatever message we get into MB and
written to DB, we should deliver them. Thus the distributed slot logic
should tolerate node kill.

Are above things addressed now? Can you describe in high level what happens
in above cases?

Thanks


-- 
*Hasitha Abeykoon*
Senior Software Engineer; WSO2, Inc.; http://wso2.com
*cell:* *+94 719363063*
*blog: **abeykoon.blogspot.com* <http://abeykoon.blogspot.com>
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to