[ 
https://issues.apache.org/jira/browse/IGNITE-14085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Semyon Danilov updated IGNITE-14085:
------------------------------------
        Parent:     (was: IGNITE-14081)
    Issue Type: Bug  (was: Sub-task)

> Implement message recovery protocol over handshake
> --------------------------------------------------
>
>                 Key: IGNITE-14085
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14085
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Anton Kalashnikov
>            Assignee: Semyon Danilov
>            Priority: Major
>              Labels: iep-66, ignite-3
>
> First of all, we should introduce Communication Recovery Descriptor, a data 
> structure that holds information about a specific connection between two 
> nodes. It should hold the following data:
> * Connection id (because we may have multiple connections between two nodes)
> * Count of sent messages
> * Count of received messages
> * Count of acknowledgments received for sent messages
> * Count of acknowledgments sent for received messages
> * Queue of sent but not acknowledged messages 
> Every connection must have a bound recovery descriptor so in case of the 
> connectivity failure we can resend not-acknowledged messages.
> The process of handshake should be as follows:
> # Server receives incoming connection and sends its identity information 
> (launch id, consistent id)
> # Client receives server information and sends its identity and recovery 
> information (connection id, number of received messages)
> # Server receives client's recovery information and sends its own recovery 
> information
> # Server sends all unacknowledged messages if any exists
> # Client sends all unacknowledged messages if any exists
> Connection should be considered ready for work after all the unacknowledged 
> messages are sent and acknowledged.
> The process of sending and receiving a message should also change to this:
> * Every message we are going to send must first be added to the communication 
> recovery descriptor's message queue and update the sent message counter. 
> * After receiving a message we should send an acknowledgement (we could also 
> send a batch acknowledgement, for example for every 5 received messages send 
> 1 ack) and update the received messages counter and the sent acknowledgements 
> counter.
> * After receiving an acknowledgement message we must remove the sent message 
> from the CRD's queue and update the appropriate counter.
> Extra attention should be paid for the counter management as messages are not 
> idempotent and handling same message twice can lead to an undefined behaviour.
> Some of the message should not be counted at all (thus shall not be 
> acknowledged), for example: acknowledgement messages, handshakes, probably 
> something else.
> It should also be noted that current messaging API has a public method for 
> sending a message without a need for acknowledgement, this should be handled 
> appropriately.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to