Re: Critical worker threads liveness checking drawbacks

vgrigorev Tue, 11 Sep 2018 05:13:41 -0700

Reliability of ignite is very important to me, so please consider following
idea:


- Important threads as WAL writer (as a sample of any critical thread)
must not do any blocking action, by this way:
   - WAL thread  must be management thread for all WAL operations
   - Child, worker thread of WAL writer must do separate operations which
implements concrete WAL writings
   - Operations are separate units of work, countable by it's heartbeat for
sample and has characteristics    
       and ids. 
   - Operations written in queue and has state.
   - If hung occur in a concrete operation, this operation may be cancelled,
(all child operations in a cluster too) and all others operations continue
to work, with failed operation go to recovery state or report user about
fail
   - If WAL child thread do infinite blocking operation, it's need to kill
this working thread and start new with same queue of operations of WAL type

So, we become able :
- always know what concrete operation  are in hung, (not that whole main WAL
thread hung) so can better decide want to do.
- WAL thread operations newer irresponsive, at minimum it reports that it
long doing some operation and just can insert next operation queue or
propose fail
- report size of queue and else full detail information about what happening
and allow to decide precisely - fail concrete user operations, clean
resources, spawn new working thread or else, and continue to work without
painful node or cluster restart
- minimal cleanless possible (just some operations)
- balance operations with queues, also implementing backpressure, so make
sure that optimal performance load is kept and cluster will not go to
degradation from some local oversaturations
- newer see that node hung, but just degrade and being in fully controlled
state 

- WAL thread operations check management functions can be encapsulated to
special class with that functionality and called from else main threads as
now.

Sorry for any inconvenience, I'm new to writing here



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: Critical worker threads liveness checking drawbacks

Reply via email to