[ 
https://issues.apache.org/jira/browse/GEODE-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387638#comment-17387638
 ] 

Anilkumar Gingade edited comment on GEODE-8200 at 7/26/21, 10:26 PM:
---------------------------------------------------------------------

>> Re-opened because this issue has started reproducing again on develop.
[~aaronlindsey] Can you please add more details to this...Reading this ticket 
description, the issue was addressed with "Rebalance"; from your comments it 
seems like its with "restore redundancy" command...
Questions:
- Is the issue with the "rebalance" command?
- Is the issue only with "restore redundancy"?
- Is there a test that reproduces the issue?
- What are the steps involved in reproducing the issue? Does the issue 
reproduces every time?




was (Author: agingade):
>> Re-opened because this issue has started reproducing again on develop.
[~aaronlindsey] Can you please add more details to this...Reading this ticket 
description, the issue was addressed with "Rebalance"; from your comments it 
seems like its with :restore redundancy" command...
Questions:
- Is the issue with the "rebalance" command?
- Is the issue only with "restore redundancy"?
- Is there a test that reproduces the issue?
- What are the steps involved in reproducing the issue? Does the issue 
reproduces every time?



> Rebalance operations stuck in "IN_PROGRESS" state forever
> ---------------------------------------------------------
>
>                 Key: GEODE-8200
>                 URL: https://issues.apache.org/jira/browse/GEODE-8200
>             Project: Geode
>          Issue Type: Bug
>          Components: management
>            Reporter: Aaron Lindsey
>            Assignee: Jianxia Chen
>            Priority: Major
>              Labels: GeodeOperationAPI
>             Fix For: 1.13.1, 1.14.0
>
>         Attachments: GEODE-8200-exportedLogs.zip
>
>
> We use the management REST API to call rebalance immediately before stopping 
> a server to limit the possibility of data loss. In a cluster with 3 locators, 
> 3 servers, and no regions, we noticed that sometimes the rebalance operation 
> never ends if one of the locators is restarting concurrently with the 
> rebalance operation.
> More specifically, the scenario where we see this issue crop up is during an 
> automated "rolling restart" operation in a Kubernetes environment which 
> proceeds as follows:
> * At most one locator and one server are restarting at any point in time
> * Each locator/server waits until the previous locator/server is fully online 
> before restarting
> * Immediately before stopping a server, a rebalance operation is performed 
> and the server is not stopped until the rebalance operation is completed
> The impact of this issue is that the "rolling restart" operation will never 
> complete, because it cannot proceed with stopping a server until the 
> rebalance operation is completed. A human is then required to intervene and 
> manually trigger a rebalance and stop the server. This type of "rolling 
> restart" operation is triggered fairly often in Kubernetes — any time part of 
> the configuration of the locators or servers changes. 
> The following JSON is a sample response from the management REST API that 
> shows the rebalance operation stuck in "IN_PROGRESS".
> {code}
>     {
>       "statusCode": "IN_PROGRESS",
>       "links": {
>         "self": 
> "http://geodecluster-sample-locator.default/management/v1/operations/rebalances/a47f23c8-02b3-443c-a367-636fd6921ea7";,
>         "list": 
> "http://geodecluster-sample-locator.default/management/v1/operations/rebalances";
>       },
>       "operationStart": "2020-05-27T22:38:30.619Z",
>       "operationId": "a47f23c8-02b3-443c-a367-636fd6921ea7",
>       "operation": {
>         "simulate": false
>       }
>     }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to