[ https://issues.apache.org/jira/browse/IGNITE-14085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Semyon Danilov updated IGNITE-14085: ------------------------------------ Parent: (was: IGNITE-14081) Issue Type: Bug (was: Sub-task) > Implement message recovery protocol over handshake > -------------------------------------------------- > > Key: IGNITE-14085 > URL: https://issues.apache.org/jira/browse/IGNITE-14085 > Project: Ignite > Issue Type: Bug > Reporter: Anton Kalashnikov > Assignee: Semyon Danilov > Priority: Major > Labels: iep-66, ignite-3 > > First of all, we should introduce Communication Recovery Descriptor, a data > structure that holds information about a specific connection between two > nodes. It should hold the following data: > * Connection id (because we may have multiple connections between two nodes) > * Count of sent messages > * Count of received messages > * Count of acknowledgments received for sent messages > * Count of acknowledgments sent for received messages > * Queue of sent but not acknowledged messages > Every connection must have a bound recovery descriptor so in case of the > connectivity failure we can resend not-acknowledged messages. > The process of handshake should be as follows: > # Server receives incoming connection and sends its identity information > (launch id, consistent id) > # Client receives server information and sends its identity and recovery > information (connection id, number of received messages) > # Server receives client's recovery information and sends its own recovery > information > # Server sends all unacknowledged messages if any exists > # Client sends all unacknowledged messages if any exists > Connection should be considered ready for work after all the unacknowledged > messages are sent and acknowledged. > The process of sending and receiving a message should also change to this: > * Every message we are going to send must first be added to the communication > recovery descriptor's message queue and update the sent message counter. > * After receiving a message we should send an acknowledgement (we could also > send a batch acknowledgement, for example for every 5 received messages send > 1 ack) and update the received messages counter and the sent acknowledgements > counter. > * After receiving an acknowledgement message we must remove the sent message > from the CRD's queue and update the appropriate counter. > Extra attention should be paid for the counter management as messages are not > idempotent and handling same message twice can lead to an undefined behaviour. > Some of the message should not be counted at all (thus shall not be > acknowledged), for example: acknowledgement messages, handshakes, probably > something else. > It should also be noted that current messaging API has a public method for > sending a message without a need for acknowledgement, this should be handled > appropriately. -- This message was sent by Atlassian Jira (v8.20.7#820007)