I would also like to mention, that marshaller mappings are written to disk even if persistence is disabled. So, this issue affects purely in-memory clusters as well.
Denis > On 13 Aug 2019, at 17:06, Denis Mekhanikov <dmekhani...@gmail.com> wrote: > > Hi! > > When persistence is enabled, binary metadata is written to disk upon > registration. Currently it happens in the discovery thread, which makes > processing of related messages very slow. > There are cases, when a lot of nodes and slow disks can make every binary > type be registered for several minutes. Plus it blocks processing of other > messages. > > I propose starting a separate thread that will be responsible for writing > binary metadata to disk. So, binary type registration will be considered > finished before information about it will is written to disks on all nodes. > > The main concern here is data consistency in cases when a node acknowledges > type registration and then fails before writing the metadata to disk. > I see two parts of this issue: > Nodes will have different metadata after restarting. > If we write some data into a persisted cache and shut down nodes faster than > a new binary type is written to disk, then after a restart we won’t have a > binary type to work with. > > The first case is similar to a situation, when one node fails, and after that > a new type is registered in the cluster. This issue is resolved by the > discovery data exchange. All nodes receive information about all binary types > in the initial discovery messages sent by other nodes. So, once you restart a > node, it will receive information, that it failed to finish writing to disk, > from other nodes. > If all nodes shut down before finishing writing the metadata to disk, then > after a restart the type will be considered unregistered, so another > registration will be required. > > The second case is a bit more complicated. But it can be resolved by making > the discovery threads on every node create a future, that will be completed > when writing to disk is finished. So, every node will have such future, that > will reflect the current state of persisting the metadata to disk. > After that, if some operation needs this binary type, it will need to wait on > that future until flushing to disk is finished. > This way discovery threads won’t be blocked, but other threads, that actually > need this type, will be. > > Please let me know what you think about that. > > Denis