[ https://issues.apache.org/jira/browse/IGNITE-19227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Puchkovskiy reassigned IGNITE-19227: ------------------------------------------ Assignee: Roman Puchkovskiy > Wait for schema awailability out of JRaft threads > ------------------------------------------------- > > Key: IGNITE-19227 > URL: https://issues.apache.org/jira/browse/IGNITE-19227 > Project: Ignite > Issue Type: Improvement > Reporter: Roman Puchkovskiy > Assignee: Roman Puchkovskiy > Priority: Major > Labels: iep-98, ignite-3 > > According to > [https://cwiki.apache.org/confluence/display/IGNITE/IEP-98%3A+Schema+Synchronization#IEP98:SchemaSynchronization-Waitingforsafetimeinthepast] > , we might need to wait for schema availability when fetching a schema. If > such waits happen inside a PartitionListener, JRaft threads might be blocked > for a noticeable amount of time (maybe even seconds). We should avoid this. > h3. In RW transactions > When a primary node is going to process a request, it waits till it has all > the schema versions for the corresponding timestamp (beginTs or commitTs) Top > (i.e. that MS SafeTime >= Top). {*}The wait happens outside of JRaft > threads{*}. Then it obtains the global schema revision SR of the latest > schema update that is not later than the corresponding timestamp. It then > builds a command (putting that SR inside) and submits it to RAFT. > When an AppendEntriesRequest is built, Replicator inspects all the entries it > includes in it, extracts SRs from each of them, takes max of them (as MSR, > for ‘max schema revision’) and puts it in the AppendEntriesRequest. > When the request is processed by a follower/learner, it compares the MSR from > the request with its locally known MSR (in the Catalog). If the request’s MSR > > local MSR, then the request is rejected (with reason EBUSY). It will be > retried by the leader after some time. As an optimization, we might wait for > some time in hope that the local MSR catches up with the request’s MSR. > As we need an additional field in AppendEntriesRequest that will only be used > by partition groups, we could add a generic container for properties to this > interface, like Map<String, Object> extras(). > To extract the SR from a command, we might just deserialize it completely, > but this requires a lot of work that is not necessary. We might serialize > commands having SR in a special way (putting SR in the very first bytes of > the message) to make its retrieval effective. > As the primary has already made sure that it has the schema versions needed > to execute the command, no waits will be needed on the primary node while > executing the RAFT command. > As secondaries/learners refuse AppendEntries which they cannot execute > waitless, they will not have to wait at all in JRaft threads. > A case when the RAFT leader is not collocated with the primary is possible. > We can add the same validation for ActionRequests: pass the required SR > inside an ActionRequest, validate it in ActionRequestProcessor and reject > requests having SR above the local MSR. > h3. In RO transactions > When processing an RO transaction, we just wait for MS SafeTime. This is made > out of RAFT, so no special measures are needed. -- This message was sent by Atlassian Jira (v8.20.10#820010)