Alexey, but in this case customer need to be informed, that whole (for example 1 node) cluster crash (power off) could lead to partial data unavailability. And may be further index corruption. 1. Why your meta takes a substantial size? may be context leaking ? 2. Could meta be compressed ?
>Среда, 14 августа 2019, 11:22 +03:00 от Alexei Scherbakov ><alexey.scherbak...@gmail.com>: > >Denis Mekhanikov, > >Currently metadata are fsync'ed on write. This might be the case of >slow-downs in case of metadata burst writes. >I think removing fsync could help to mitigate performance issues with >current implementation until proper solution will be implemented: moving >metadata to metastore. > > >вт, 13 авг. 2019 г. в 17:09, Denis Mekhanikov < dmekhani...@gmail.com >: > >> I would also like to mention, that marshaller mappings are written to disk >> even if persistence is disabled. >> So, this issue affects purely in-memory clusters as well. >> >> Denis >> >> > On 13 Aug 2019, at 17:06, Denis Mekhanikov < dmekhani...@gmail.com > >> wrote: >> > >> > Hi! >> > >> > When persistence is enabled, binary metadata is written to disk upon >> registration. Currently it happens in the discovery thread, which makes >> processing of related messages very slow. >> > There are cases, when a lot of nodes and slow disks can make every >> binary type be registered for several minutes. Plus it blocks processing of >> other messages. >> > >> > I propose starting a separate thread that will be responsible for >> writing binary metadata to disk. So, binary type registration will be >> considered finished before information about it will is written to disks on >> all nodes. >> > >> > The main concern here is data consistency in cases when a node >> acknowledges type registration and then fails before writing the metadata >> to disk. >> > I see two parts of this issue: >> > Nodes will have different metadata after restarting. >> > If we write some data into a persisted cache and shut down nodes faster >> than a new binary type is written to disk, then after a restart we won’t >> have a binary type to work with. >> > >> > The first case is similar to a situation, when one node fails, and after >> that a new type is registered in the cluster. This issue is resolved by the >> discovery data exchange. All nodes receive information about all binary >> types in the initial discovery messages sent by other nodes. So, once you >> restart a node, it will receive information, that it failed to finish >> writing to disk, from other nodes. >> > If all nodes shut down before finishing writing the metadata to disk, >> then after a restart the type will be considered unregistered, so another >> registration will be required. >> > >> > The second case is a bit more complicated. But it can be resolved by >> making the discovery threads on every node create a future, that will be >> completed when writing to disk is finished. So, every node will have such >> future, that will reflect the current state of persisting the metadata to >> disk. >> > After that, if some operation needs this binary type, it will need to >> wait on that future until flushing to disk is finished. >> > This way discovery threads won’t be blocked, but other threads, that >> actually need this type, will be. >> > >> > Please let me know what you think about that. >> > >> > Denis >> >> > >-- > >Best regards, >Alexei Scherbakov -- Zhenya Stanilovsky