I would also like to mention, that marshaller mappings are written to disk even 
if persistence is disabled.
So, this issue affects purely in-memory clusters as well.

Denis

> On 13 Aug 2019, at 17:06, Denis Mekhanikov <dmekhani...@gmail.com> wrote:
> 
> Hi!
> 
> When persistence is enabled, binary metadata is written to disk upon 
> registration. Currently it happens in the discovery thread, which makes 
> processing of related messages very slow.
> There are cases, when a lot of nodes and slow disks can make every binary 
> type be registered for several minutes. Plus it blocks processing of other 
> messages.
> 
> I propose starting a separate thread that will be responsible for writing 
> binary metadata to disk. So, binary type registration will be considered 
> finished before information about it will is written to disks on all nodes.
> 
> The main concern here is data consistency in cases when a node acknowledges 
> type registration and then fails before writing the metadata to disk.
> I see two parts of this issue:
> Nodes will have different metadata after restarting.
> If we write some data into a persisted cache and shut down nodes faster than 
> a new binary type is written to disk, then after a restart we won’t have a 
> binary type to work with.
> 
> The first case is similar to a situation, when one node fails, and after that 
> a new type is registered in the cluster. This issue is resolved by the 
> discovery data exchange. All nodes receive information about all binary types 
> in the initial discovery messages sent by other nodes. So, once you restart a 
> node, it will receive information, that it failed to finish writing to disk, 
> from other nodes.
> If all nodes shut down before finishing writing the metadata to disk, then 
> after a restart the type will be considered unregistered, so another 
> registration will be required.
> 
> The second case is a bit more complicated. But it can be resolved by making 
> the discovery threads on every node create a future, that will be completed 
> when writing to disk is finished. So, every node will have such future, that 
> will reflect the current state of persisting the metadata to disk.
> After that, if some operation needs this binary type, it will need to wait on 
> that future until flushing to disk is finished.
> This way discovery threads won’t be blocked, but other threads, that actually 
> need this type, will be.
> 
> Please let me know what you think about that.
> 
> Denis

Reply via email to