Hello Ignite devs, I came up with the following design proposal for IGNITE-4302 <https://issues.apache.org/jira/browse/IGNITE-4302>, which is very similar to described here <http://apache-ignite-developers.2346864.n4.nabble.com/IGNITE-4157-design-proposal-td11996.html> proposal for IGNITE-4157 <https://issues.apache.org/jira/browse/IGNITE-4157>, "*Use discovery custom messages instead of marshaller cache*".
Protocol is very similar to IGNITE-4157 and based on exchanging two messages: *MetadataProposed*/*MetadataAccepted*. 1. When a node wants to add or update existing metadata, it sends *MetadataProposed* message with the update. 2. Coordinator node checks for any conflicts with existing metadata for this typeId. In case of no conflicts coordinator assigns a version number to new metadata and sends Proposed message further. In case of conflict it simply marks this message as IN_CONFLICT and sends it to the original node. 3. All nodes upon receiving Proposed message update local metadata including version number and put it to "pending_acceptance" status. [1] 4. Coordinator acknowledges proposed message with *MetadataAccepted* message, marking it with the version number from proposed message. 5. Each node receiving accepted message checks local version number and the one from the message. If they are equal, metadata is considered as accepted by the cluster. [2] New nodes joining the cluster receive metadata on discovery phase. As discovery messages are delivered to clients asynchronously, there is a possibility that cluster may consider and agree upon update to metadata before client receives initial proposed message. In that case I suggest to attach version number of metadata BinaryObject was serialized with to that BinaryObject so client may check whether it can deserialize this object safely or need to request some updated version of metadata. [1] When a metadata is marked as "pending_acceptance" it should be prohibited to use it to serialize binary objects but it still can be used for deserialization. At the same time from code perspective I didn't find a clear distinction like "this method is used only for serialization and that method is called only during deserialization". So for now the only way I see is to block all operations for a given typeId until MetadataAccepted message for this typeId is arrived. [2] If MetadataAccepted message has a version number less than currently pending, it is ignored and all threads waiting for this metadata to be accepted remain blocked. It may be possible for new joining node to receive a MetadataAccepted message with version number greater than local copy, in that case it is fine to apply metadata update and declare metadata accepted. Please share with me your thoughts about suggested design. Any improvements, missed corner cases or other drawbacks are really appreciated. Thanks, Sergey.