Re[2]: Asynchronous registration of binary metadata

Zhenya Stanilovsky Wed, 14 Aug 2019 01:45:41 -0700

Alexey, but in this case customer need to be informed, that whole (for example 
1 node) cluster crash (power off) could lead to partial data unavailability.
And may be further index corruption.
1. Why your meta takes a substantial size? may be context leaking ?
2. Could meta be compressed ?



>Среда, 14 августа 2019, 11:22 +03:00 от Alexei Scherbakov 
><[email protected]>:
>
>Denis Mekhanikov,
>
>Currently metadata are fsync'ed on write. This might be the case of
>slow-downs in case of metadata burst writes.
>I think removing fsync could help to mitigate performance issues with
>current implementation until proper solution will be implemented: moving
>metadata to metastore.
>
>
>вт, 13 авг. 2019 г. в 17:09, Denis Mekhanikov < [email protected] >:
>
>> I would also like to mention, that marshaller mappings are written to disk
>> even if persistence is disabled.
>> So, this issue affects purely in-memory clusters as well.
>>
>> Denis
>>
>> > On 13 Aug 2019, at 17:06, Denis Mekhanikov < [email protected] >
>> wrote:
>> >
>> > Hi!
>> >
>> > When persistence is enabled, binary metadata is written to disk upon
>> registration. Currently it happens in the discovery thread, which makes
>> processing of related messages very slow.
>> > There are cases, when a lot of nodes and slow disks can make every
>> binary type be registered for several minutes. Plus it blocks processing of
>> other messages.
>> >
>> > I propose starting a separate thread that will be responsible for
>> writing binary metadata to disk. So, binary type registration will be
>> considered finished before information about it will is written to disks on
>> all nodes.
>> >
>> > The main concern here is data consistency in cases when a node
>> acknowledges type registration and then fails before writing the metadata
>> to disk.
>> > I see two parts of this issue:
>> > Nodes will have different metadata after restarting.
>> > If we write some data into a persisted cache and shut down nodes faster
>> than a new binary type is written to disk, then after a restart we won’t
>> have a binary type to work with.
>> >
>> > The first case is similar to a situation, when one node fails, and after
>> that a new type is registered in the cluster. This issue is resolved by the
>> discovery data exchange. All nodes receive information about all binary
>> types in the initial discovery messages sent by other nodes. So, once you
>> restart a node, it will receive information, that it failed to finish
>> writing to disk, from other nodes.
>> > If all nodes shut down before finishing writing the metadata to disk,
>> then after a restart the type will be considered unregistered, so another
>> registration will be required.
>> >
>> > The second case is a bit more complicated. But it can be resolved by
>> making the discovery threads on every node create a future, that will be
>> completed when writing to disk is finished. So, every node will have such
>> future, that will reflect the current state of persisting the metadata to
>> disk.
>> > After that, if some operation needs this binary type, it will need to
>> wait on that future until flushing to disk is finished.
>> > This way discovery threads won’t be blocked, but other threads, that
>> actually need this type, will be.
>> >
>> > Please let me know what you think about that.
>> >
>> > Denis
>>
>>
>
>-- 
>
>Best regards,
>Alexei Scherbakov


-- 
Zhenya Stanilovsky

Re[2]: Asynchronous registration of binary metadata

Reply via email to