[ 
https://issues.apache.org/jira/browse/IGNITE-19227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy reassigned IGNITE-19227:
------------------------------------------

    Assignee: Roman Puchkovskiy

> Wait for schema awailability out of JRaft threads
> -------------------------------------------------
>
>                 Key: IGNITE-19227
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19227
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Roman Puchkovskiy
>            Assignee: Roman Puchkovskiy
>            Priority: Major
>              Labels: iep-98, ignite-3
>
> According to 
> [https://cwiki.apache.org/confluence/display/IGNITE/IEP-98%3A+Schema+Synchronization#IEP98:SchemaSynchronization-Waitingforsafetimeinthepast]
>  , we might need to wait for schema availability when fetching a schema. If 
> such waits happen inside a PartitionListener, JRaft threads might be blocked 
> for a noticeable amount of time (maybe even seconds). We should avoid this.
> h3. In RW transactions
> When a primary node is going to process a request, it waits till it has all 
> the schema versions for the corresponding timestamp (beginTs or commitTs) Top 
> (i.e. that MS SafeTime >= Top). {*}The wait happens outside of JRaft 
> threads{*}. Then it obtains the global schema revision SR of the latest 
> schema update that is not later than the corresponding timestamp. It then 
> builds a command (putting that SR inside) and submits it to RAFT.
> When an AppendEntriesRequest is built, Replicator inspects all the entries it 
> includes in it, extracts SRs from each of them, takes max of them (as MSR, 
> for ‘max schema revision’) and puts it in the AppendEntriesRequest.
> When the request is processed by a follower/learner, it compares the MSR from 
> the request with its locally known MSR (in the Catalog). If the request’s MSR 
> > local MSR, then the request is rejected (with reason EBUSY). It will be 
> retried by the leader after some time. As an optimization, we might wait for 
> some time in hope that the local MSR catches up with the request’s MSR.
> As we need an additional field in AppendEntriesRequest that will only be used 
> by partition groups, we could add a generic container for properties to this 
> interface, like Map<String, Object> extras().
> To extract the SR from a command, we might just deserialize it completely, 
> but this requires a lot of work that is not necessary. We might serialize 
> commands having SR in a special way (putting SR in the very first bytes of 
> the message) to make its retrieval effective.
> As the primary has already made sure that it has the schema versions needed 
> to execute the command, no waits will be needed on the primary node while 
> executing the RAFT command.
> As secondaries/learners refuse AppendEntries which they cannot execute 
> waitless, they will not have to wait at all in JRaft threads.
> A case when the RAFT leader is not collocated with the primary is possible. 
> We can add the same validation for ActionRequests: pass the required SR 
> inside an ActionRequest, validate it in ActionRequestProcessor and reject 
> requests having SR above the local MSR.
> h3. In RO transactions
> When processing an RO transaction, we just wait for MS SafeTime. This is made 
> out of RAFT, so no special measures are needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to