Hi Ivan,

I have an idea about suggested maintenance mode.

First of all, I agree with your ideas about discovery restrictions: node
should not join topology when performing defragmentation.

At the same time I haven't heard about requests for this mode from users,
so we don't know much about possible requirements.
So I suggest to implement it in a pragmatical way: instead of inventing
(unknown in reality) user scenarios lets develop minimal but yet
well-designed functionality that suites our case. If we constrain our
implementation with reasonable set of restrictions that's OK.

So my idea is the following: to transit a node to maintenance user has to
send special command to the node (e.g. with new command in control.sh
utility or via JMX interface). Node saves maintenance request in local
metastorage and waits for restart. User has to manually restart that node
in order to finish moving it to maintenance mode.

When node restarts and finds maintenance request it creates special type of
discovery SPI that will not try to join topology at all yet node is able to
start all necessary components and APIs like REST processor or JMX
interface.

When in maintenance, we'll be able to do defragmentation safely and remove
maintenance request from metastorage only when it is completed (with all
fault-tolerance logic in mind).

As we don't have a mechanism (like watcher) to perform a "safe restart" (by
safe I mean Ignite restart without OS-level process restart) we cannot
finish maintenance mode without another manual restart but I think it is a
reasonable restriction as maintenance mode shouldn't be an every-day
routine and will be used quite rare.

What do you think about it?

On Tue, May 26, 2020 at 5:58 PM Ivan Bessonov <bessonov...@gmail.com> wrote:

> Hello Igniters,
>
> I'd like to discuss this new IEP with you: [1]. The main idea is to have a
> procedure that relocates
> pages to the top of the file as compact as possible which allows us to
> trim the file and increase its
> fill-factor. It will be configured manually and executed after the restart,
> but before node joins
> topology (it means any load would be impossible during defragmentation). It
> is described in detail
> in the IEP itself, please read it. This topic was also previously discussed
> here on dev-list in [2].
>
> Here I would like to list a few moments that are not as clear and require
> your opinion.
>
>  - what are your overall thoughts on the IEP? Any concerns?
>
>  - maintenance mode - how do we communicate with the node that's not in
> topology? What are
>    the options? As far as I know, we have no current tools like this.
>
>  - checkpointer refactoring - these changes will involve intensive writing
> of pages to the storage.
>    If we're going to reuse the offheap page model to perform
> defragmentation then the
>    checkpointing mechanism will have to be adapted in some form.
>    Are you fine with this? Or we need a separate discussion?
>
> [1]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-47%3A+Native+persistence+defragmentation
> [2]
>
> http://apache-ignite-developers.2346864.n4.nabble.com/How-to-free-up-space-on-disc-after-removing-entries-from-IgniteCache-with-enabled-PDS-td39839.html
>
>
> --
> Sincerely yours,
> Ivan Bessonov
>

Reply via email to