[jira] [Updated] (IGNITE-14794) Add JMX command and metrics for automatic snapshot restore operation.

Pavel Pereslegin (Jira) Tue, 05 Oct 2021 06:00:04 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Pavel Pereslegin updated IGNITE-14794:
--------------------------------------
    Description: 
Add JMX command to restore a cache group from the snapshot.
 Suggested methods
{code:java}
    @MXBeanDescription("Restore cluster-wide snapshot.")
    public void restoreSnapshot(
        @MXBeanParameter(name = "snpName", description = "Snapshot name.") 
String name,
        @MXBeanParameter(name = "cacheGroupNames", description = "Optional 
comma-separated list of cache group names.") String cacheGroupNames);

    @MXBeanDescription("Cancel previously started snapshot restore operation.")
    public void cancelSnapshotRestore(@MXBeanParameter(name = "snpName", 
description = "Snapshot name.") String name);
{code}
Since the automatic snapshot restore operation can take a long time, we must be 
able to track its progress using metrics.
 Suggested metrics:
{noformat}
start time
partitions (processed/total)
bytes (processed/total)
end time
{noformat}
 

Suggested status command output.

[in progress] 
{noformat}
Restore operation for snapshot "snapshot_25052021"  is still in progress 
(requestId=0e2d8c06-d44a-4ade-91bf-2b84b367499a).

  Progress: 100% completed (66/66 partitions, 3.8/3.8 MB)
  Started: 2021-10-05 15:47:47.942
  Cache groups: default

  Node 11faec83-a304-48f7-aac7-e67bf8800001: 100% completed (33/33 partitions, 
1.9/1.9 MB)
  Node 99066100-890f-41a3-b0cd-4a3d59600000: 100% completed (33/33 partitions, 
1.9/1.9 MB)
{noformat}
 [error]
{noformat}
Restore operation for snapshot "snapshot_25052021" failed 
(requestId=b9b312f5-ba34-40e9-bb94-35daacd552c0).

  Error: Operation has been canceled by the user.
  Started: 2021-10-05 15:51:52.255
  Finished: 2021-10-05 15:51:52.782
  Cache groups: default

  Node e3c8d45b-2ccd-43ba-81ab-ea3bb9e00001: 100% completed (33/33 partitions, 
1.9/1.9 MB)
  Node 884cd446-38c2-4538-9dcd-81509eb00000: 100% completed (33/33 partitions, 
1.9/1.9 MB){noformat}
 [finished]
{noformat}
Restore operation for snapshot "snapshot_25052021" completed successfully 
(requestId=6adeea86-1ee2-4664-8d7d-3383a484a00a).

  Progress: 100% completed (66/66 partitions, 3.8/3.8 MB)
  Started: 2021-10-05 15:53:03.352
  Finished: 2021-10-05 15:53:03.443
  Cache groups: default

  Node cc69e33f-de95-42b4-99af-86cf83900001: 100% completed (33/33 partitions, 
1.9/1.9 MB)
  Node b4f3bb36-aef3-4813-a3e9-9f7773600000: 100% completed (33/33 partitions, 
1.9/1.9 MB){noformat}
[missing snapshot name]
{noformat}
No information about restoring snapshot "snapshot_MISSING" is 
available.{noformat}
 

 

  was:
Add JMX command to restore a cache group from the snapshot.
 Suggested methods
{code:java}
    @MXBeanDescription("Restore cluster-wide snapshot.")
    public void restoreSnapshot(
        @MXBeanParameter(name = "snpName", description = "Snapshot name.") 
String name,
        @MXBeanParameter(name = "cacheGroupNames", description = "Optional 
comma-separated list of cache group names.") String cacheGroupNames);

    @MXBeanDescription("Cancel previously started snapshot restore operation.")
    public void cancelSnapshotRestore(@MXBeanParameter(name = "snpName", 
description = "Snapshot name.") String name);
{code}
Since the automatic snapshot restore operation can take a long time, we must be 
able to track its progress using metrics.
 Suggested metrics:
{noformat}
start time
partitions (processed/total)
bytes (processed/total)
end time
{noformat}
 

Suggested status command output.

[in progress]

 
{noformat}
Restore operation for snapshot "snapshot_25052021"  is still in progress 
(requestId=0e2d8c06-d44a-4ade-91bf-2b84b367499a).  Progress: 100% completed 
(66/66 partitions, 3.8/3.8 MB)
  Started: 2021-10-05 15:47:47.942
  Cache groups: default  Node 11faec83-a304-48f7-aac7-e67bf8800001: 100% 
completed (33/33 partitions, 1.9/1.9 MB)
  Node 99066100-890f-41a3-b0cd-4a3d59600000: 100% completed (33/33 partitions, 
1.9/1.9 MB)Command [SNAPSHOT] finished with code: 0{noformat}
 

[finished]
{noformat}
Restore operation for snapshot "snapshot_25052021" completed successfully 
(requestId=6adeea86-1ee2-4664-8d7d-3383a484a00a).  Progress: 100% completed 
(66/66 partitions, 3.8/3.8 MB)
  Started: 2021-10-05 15:53:03.352
  Finished: 2021-10-05 15:53:03.443
  Cache groups: default  Node cc69e33f-de95-42b4-99af-86cf83900001: 100% 
completed (33/33 partitions, 1.9/1.9 MB)
  Node b4f3bb36-aef3-4813-a3e9-9f7773600000: 100% completed (33/33 partitions, 
1.9/1.9 MB){noformat}
[missing snapshot name]

 
{noformat}
No information about restoring snapshot "snapshot_MISSING" is 
available.{noformat}
 

[error]

 
{noformat}
Restore operation for snapshot "snapshot_25052021" failed 
(requestId=b9b312f5-ba34-40e9-bb94-35daacd552c0).  Error: Operation has been 
canceled by the user.
  Started: 2021-10-05 15:51:52.255
  Finished: 2021-10-05 15:51:52.782
  Cache groups: default  Node e3c8d45b-2ccd-43ba-81ab-ea3bb9e00001: 100% 
completed (33/33 partitions, 1.9/1.9 MB)
  Node 884cd446-38c2-4538-9dcd-81509eb00000: 100% completed (33/33 partitions, 
1.9/1.9 MB){noformat}
 

 


>  Add JMX command and metrics for automatic snapshot restore operation.
> ----------------------------------------------------------------------
>
>                 Key: IGNITE-14794
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14794
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Pavel Pereslegin
>            Assignee: Pavel Pereslegin
>            Priority: Major
>              Labels: iep-43
>             Fix For: 2.12
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add JMX command to restore a cache group from the snapshot.
>  Suggested methods
> {code:java}
>     @MXBeanDescription("Restore cluster-wide snapshot.")
>     public void restoreSnapshot(
>         @MXBeanParameter(name = "snpName", description = "Snapshot name.") 
> String name,
>         @MXBeanParameter(name = "cacheGroupNames", description = "Optional 
> comma-separated list of cache group names.") String cacheGroupNames);
>     @MXBeanDescription("Cancel previously started snapshot restore 
> operation.")
>     public void cancelSnapshotRestore(@MXBeanParameter(name = "snpName", 
> description = "Snapshot name.") String name);
> {code}
> Since the automatic snapshot restore operation can take a long time, we must 
> be able to track its progress using metrics.
>  Suggested metrics:
> {noformat}
> start time
> partitions (processed/total)
> bytes (processed/total)
> end time
> {noformat}
>  
> Suggested status command output.
> [in progress] 
> {noformat}
> Restore operation for snapshot "snapshot_25052021"  is still in progress 
> (requestId=0e2d8c06-d44a-4ade-91bf-2b84b367499a).
>   Progress: 100% completed (66/66 partitions, 3.8/3.8 MB)
>   Started: 2021-10-05 15:47:47.942
>   Cache groups: default
>   Node 11faec83-a304-48f7-aac7-e67bf8800001: 100% completed (33/33 
> partitions, 1.9/1.9 MB)
>   Node 99066100-890f-41a3-b0cd-4a3d59600000: 100% completed (33/33 
> partitions, 1.9/1.9 MB)
> {noformat}
>  [error]
> {noformat}
> Restore operation for snapshot "snapshot_25052021" failed 
> (requestId=b9b312f5-ba34-40e9-bb94-35daacd552c0).
>   Error: Operation has been canceled by the user.
>   Started: 2021-10-05 15:51:52.255
>   Finished: 2021-10-05 15:51:52.782
>   Cache groups: default
>   Node e3c8d45b-2ccd-43ba-81ab-ea3bb9e00001: 100% completed (33/33 
> partitions, 1.9/1.9 MB)
>   Node 884cd446-38c2-4538-9dcd-81509eb00000: 100% completed (33/33 
> partitions, 1.9/1.9 MB){noformat}
>  [finished]
> {noformat}
> Restore operation for snapshot "snapshot_25052021" completed successfully 
> (requestId=6adeea86-1ee2-4664-8d7d-3383a484a00a).
>   Progress: 100% completed (66/66 partitions, 3.8/3.8 MB)
>   Started: 2021-10-05 15:53:03.352
>   Finished: 2021-10-05 15:53:03.443
>   Cache groups: default
>   Node cc69e33f-de95-42b4-99af-86cf83900001: 100% completed (33/33 
> partitions, 1.9/1.9 MB)
>   Node b4f3bb36-aef3-4813-a3e9-9f7773600000: 100% completed (33/33 
> partitions, 1.9/1.9 MB){noformat}
> [missing snapshot name]
> {noformat}
> No information about restoring snapshot "snapshot_MISSING" is 
> available.{noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (IGNITE-14794) Add JMX command and metrics for automatic snapshot restore operation.

Reply via email to