[jira] [Updated] (RATIS-394) Remove the assertion while setting the exception in TransactionContextImpl

2018-11-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-394:
--
Attachment: RATIS-394.000.patch

> Remove the assertion while setting the exception in TransactionContextImpl
> --
>
> Key: RATIS-394
> URL: https://issues.apache.org/jira/browse/RATIS-394
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-394.000.patch
>
>
> In the below code in TransactionContextImpl,
> {code:java}
> @Override
> public TransactionContext setException(Exception ioe) {
>   assert exception != null;
>   this.exception = ioe;
>   return this;
> }
> {code}
> While setting the exception it asserts the exception maintained in the object 
> is not null or not. While setting the exception first time, it will be null 
> always and hence asserts. We should relax the check here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-394) Remove the assertion while setting the exception in TransactionContextImpl

2018-11-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-394:
--
Description: 
In the below code in TransactionContextImpl,
{code:java}
@Override
public TransactionContext setException(Exception ioe) {
  assert exception != null;
  this.exception = ioe;
  return this;
}
{code}
While setting the exception it asserts based on the exception maintained in the 
object is not null or not. While setting the exception first time, it will be 
null always and hence asserts. We should relax the check here.

  was:
In the below code in TransactionContextImpl,
{code:java}
@Override
public TransactionContext setException(Exception ioe) {
  assert exception != null;
  this.exception = ioe;
  return this;
}
{code}
While setting the exception it asserts the exception maintained in the object 
is not null or not. While setting the exception first time, it will be null 
always and hence asserts. We should relax the check here.


> Remove the assertion while setting the exception in TransactionContextImpl
> --
>
> Key: RATIS-394
> URL: https://issues.apache.org/jira/browse/RATIS-394
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-394.000.patch
>
>
> In the below code in TransactionContextImpl,
> {code:java}
> @Override
> public TransactionContext setException(Exception ioe) {
>   assert exception != null;
>   this.exception = ioe;
>   return this;
> }
> {code}
> While setting the exception it asserts based on the exception maintained in 
> the object is not null or not. While setting the exception first time, it 
> will be null always and hence asserts. We should relax the check here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-13 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-386:
--
Attachment: RATIS-386.000.patch

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.000.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16686481#comment-16686481
 ] 

Shashikant Banerjee commented on RATIS-386:
---

Thanks [~szetszwo] for the comments. Moving the retryPolicy Check to here :
{code:java}
private CompletableFuture sendRequestWithRetryAsync( 
RaftClientRequest request, intattemptCount) {
  LOG.debug("{}: send* {}", clientId, request);
  return clientRpc.sendRequestAsync(request).thenApply(reply -> {
LOG.info("{}: receive* {}", clientId, reply);
reply = handleNotLeaderException(request, reply);
if (reply == null) {
  if (!retryPolicy.shouldRetry(attemptCount)) {
LOG.info(" fail with max attempts failed");
reply = new RaftClientReply(request, new RaftException("Failed " + 
request + " for " 
+ attemptCount + " attempts with " + retryPolicy), null);
  }
}
if (reply != null) {
  getSlidingWindow(request).receiveReply(
  request.getSeqNum(), reply, this::sendRequestWithRetryAsync);
}
return reply;
  }).exceptionally(e -> {
if (LOG.isTraceEnabled()) {
  LOG.trace(clientId + ": Failed " + request, e);
} else {
  LOG.debug("{}: Failed {} with {}", clientId, request, e);
}
e = JavaUtils.unwrapCompletionException(e);
if (e instanceof GroupMismatchException) {
  throw new CompletionException(e);
} else if (e instanceof IOException) {
  handleIOException(request, (IOException)e, null);
} else {
  throw new CompletionException(e);
}
return null;
  });
}{code}
In case, clientRpc.sendRequestAsync(request) timeout, it will execute the code 
in exceptionally Path. In such case, #sendRequestWithRetryAsync will keep on 
retrying calling #sendRequestAsync as the retry validation will only be 
executed if clientRpc.sendRequestAsync(request) completes normally.

Also, in case the retryValidation check fails, we just return null for 
RaftClientReply for the sync API here without throwing any exception:
{code:java}
private RaftClientReply sendRequestWithRetry(
Supplier supplier)
throws InterruptedIOException, StateMachineException, 
GroupMismatchException {
  for(int attemptCount = 0;; attemptCount++) {
final RaftClientRequest request = supplier.get();
final RaftClientReply reply = sendRequest(request);
if (reply != null) {
  return reply;
}
if (!retryPolicy.shouldRetry(attemptCount)) {
  return null;
}
try {
  retryPolicy.getSleepTime().sleep();
} catch (InterruptedException e) {
  throw new InterruptedIOException("retry policy=" + retryPolicy);
}
  }
}

{code}
I think ,we probably should have same result for sync/async api's here.

Let me know if i am missing something here.

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.000.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16686481#comment-16686481
 ] 

Shashikant Banerjee edited comment on RATIS-386 at 11/15/18 1:07 AM:
-

Thanks [~szetszwo] for the comments. Moving the retryPolicy Check to here :
{code:java}
private CompletableFuture sendRequestAsync( RaftClientRequest 
request, intattemptCount) {
  LOG.debug("{}: send* {}", clientId, request);
  return clientRpc.sendRequestAsync(request).thenApply(reply -> {
LOG.info("{}: receive* {}", clientId, reply);
reply = handleNotLeaderException(request, reply);
if (reply == null) {
  if (!retryPolicy.shouldRetry(attemptCount)) {
LOG.info(" fail with max attempts failed");
reply = new RaftClientReply(request, new RaftException("Failed " + 
request + " for " 
+ attemptCount + " attempts with " + retryPolicy), null);
  }
}
if (reply != null) {
  getSlidingWindow(request).receiveReply(
  request.getSeqNum(), reply, this::sendRequestWithRetryAsync);
}
return reply;
  }).exceptionally(e -> {
if (LOG.isTraceEnabled()) {
  LOG.trace(clientId + ": Failed " + request, e);
} else {
  LOG.debug("{}: Failed {} with {}", clientId, request, e);
}
e = JavaUtils.unwrapCompletionException(e);
if (e instanceof GroupMismatchException) {
  throw new CompletionException(e);
} else if (e instanceof IOException) {
  handleIOException(request, (IOException)e, null);
} else {
  throw new CompletionException(e);
}
return null;
  });
}{code}
In case, clientRpc.sendRequestAsync(request) timeout, it will execute the code 
in exceptionally Path. In such case, #sendRequestWithRetryAsync will keep on 
retrying calling #sendRequestAsync as the retry validation will only be 
executed if clientRpc.sendRequestAsync(request) completes normally.

Also, in case the retryValidation check fails, we just return null for 
RaftClientReply for the sync API here without throwing any exception:
{code:java}
private RaftClientReply sendRequestWithRetry(
Supplier supplier)
throws InterruptedIOException, StateMachineException, 
GroupMismatchException {
  for(int attemptCount = 0;; attemptCount++) {
final RaftClientRequest request = supplier.get();
final RaftClientReply reply = sendRequest(request);
if (reply != null) {
  return reply;
}
if (!retryPolicy.shouldRetry(attemptCount)) {
  return null;
}
try {
  retryPolicy.getSleepTime().sleep();
} catch (InterruptedException e) {
  throw new InterruptedIOException("retry policy=" + retryPolicy);
}
  }
}

{code}
I think ,we probably should have same result for sync/async api's here.

Let me know if i am missing something here.


was (Author: shashikant):
Thanks [~szetszwo] for the comments. Moving the retryPolicy Check to here :
{code:java}
private CompletableFuture sendRequestWithRetryAsync( 
RaftClientRequest request, intattemptCount) {
  LOG.debug("{}: send* {}", clientId, request);
  return clientRpc.sendRequestAsync(request).thenApply(reply -> {
LOG.info("{}: receive* {}", clientId, reply);
reply = handleNotLeaderException(request, reply);
if (reply == null) {
  if (!retryPolicy.shouldRetry(attemptCount)) {
LOG.info(" fail with max attempts failed");
reply = new RaftClientReply(request, new RaftException("Failed " + 
request + " for " 
+ attemptCount + " attempts with " + retryPolicy), null);
  }
}
if (reply != null) {
  getSlidingWindow(request).receiveReply(
  request.getSeqNum(), reply, this::sendRequestWithRetryAsync);
}
return reply;
  }).exceptionally(e -> {
if (LOG.isTraceEnabled()) {
  LOG.trace(clientId + ": Failed " + request, e);
} else {
  LOG.debug("{}: Failed {} with {}", clientId, request, e);
}
e = JavaUtils.unwrapCompletionException(e);
if (e instanceof GroupMismatchException) {
  throw new CompletionException(e);
} else if (e instanceof IOException) {
  handleIOException(request, (IOException)e, null);
} else {
  throw new CompletionException(e);
}
return null;
  });
}{code}
In case, clientRpc.sendRequestAsync(request) timeout, it will execute the code 
in exceptionally Path. In such case, #sendRequestWithRetryAsync will keep on 
retrying calling #sendRequestAsync as the retry validation will only be 
executed if clientRpc.sendRequestAsync(request) completes normally.

Also, in case the retryValidation check fails, we just return null for 
RaftClientReply for the sync API here without throwing any exception:
{code:java}
private RaftClientReply sendRequestWithRetry(
Supplier supplier)
throws InterruptedIOException, StateMachineException, 
GroupMismatchException {
  for(int attemptCount = 0;; attemptCount++) {
final Raft

[jira] [Updated] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-386:
--
Attachment: RATIS-386.001.patch

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.000.patch, RATIS-386.001.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-386:
--
Attachment: (was: RATIS-386.001.patch)

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-386:
--
Attachment: RATIS-386.001.patch

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-386:
--
Attachment: (was: RATIS-386.000.patch)

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687401#comment-16687401
 ] 

Shashikant Banerjee commented on RATIS-386:
---

Thanks [~szetszwo], as per our discussion/comments, updated patch v1.

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687585#comment-16687585
 ] 

Shashikant Banerjee commented on RATIS-386:
---

Patch v3 addresses the issues discussed. 

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch, RATIS-386.002.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-14 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-386:
--
Attachment: RATIS-386.002.patch

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch, RATIS-386.002.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-15 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687803#comment-16687803
 ] 

Shashikant Banerjee commented on RATIS-386:
---

Patch v3 fixes a test failure in testRetryMultipleTimesWithFixedSleep. Other 
test failures are intermittent and not related to the patch.

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch, RATIS-386.002.patch, 
> RATIS-386.003.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-418) Fix Unit tests in ratis

2018-11-15 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-418:
-

 Summary: Fix Unit tests in ratis
 Key: RATIS-418
 URL: https://issues.apache.org/jira/browse/RATIS-418
 Project: Ratis
  Issue Type: Bug
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.3.0


The following tests seem to be failing intermittently  as shown here 
:https://builds.apache.org/job/PreCommit-RATIS-Build/523/testReport/
 # TestWatchRequestWithGrpc#testWatchRequestAsync
 # TestWatchRequestWithGrpc#testWatchRequestAsync
 # TestRaftAsyncWithGrpc#testWithLoadAsync

These tests need to fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-15 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687816#comment-16687816
 ] 

Shashikant Banerjee commented on RATIS-386:
---

Opened **RATIS-418 to track the test failures.

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch, RATIS-386.002.patch, 
> RATIS-386.003.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-15 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-386:
--
Attachment: RATIS-386.003.patch

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch, RATIS-386.002.patch, 
> RATIS-386.003.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-418) Fix Unit tests in ratis

2018-11-15 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688397#comment-16688397
 ] 

Shashikant Banerjee commented on RATIS-418:
---

[~szetszwo], sure, I will work on TestRaftAsyncWithGrpc here.

> Fix Unit tests in ratis
> ---
>
> Key: RATIS-418
> URL: https://issues.apache.org/jira/browse/RATIS-418
> Project: Ratis
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
>
> The following tests seem to be failing intermittently  as shown here 
> :https://builds.apache.org/job/PreCommit-RATIS-Build/523/testReport/
>  # TestWatchRequestWithGrpc#testWatchRequestAsync
>  # TestWatchRequestWithGrpc#testWatchRequestAsync
>  # TestRaftAsyncWithGrpc#testWithLoadAsync
> These tests need to fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-15 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-386:
--
Attachment: RATIS-386.004.patch

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch, RATIS-386.002.patch, 
> RATIS-386.003.patch, RATIS-386.004.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-15 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689037#comment-16689037
 ] 

Shashikant Banerjee commented on RATIS-386:
---

Thanks [~szetszwo] for the review. patch v4 addresses your review comments.

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch, RATIS-386.002.patch, 
> RATIS-386.003.patch, RATIS-386.004.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-386) Raft Client Async API's should honor Retry Policy

2018-11-16 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689431#comment-16689431
 ] 

Shashikant Banerjee commented on RATIS-386:
---

The following tests fail with/without patch in my local setup intermittently. 
The test failures reported are not related to the patch.
{code:java}
Failures:

[ERROR]   
TestRaftReconfigurationWithGrpc>RaftReconfigurationBaseTest.testKillLeaderDuringReconf:387

[ERROR]   
TestWatchRequestWithGrpc>WatchRequestTests.testWatchRequestAsync:70->WatchRequestTests.runTest:147->WatchRequestTests.runTestWatchRequestAsync:259->WatchRequestTests.lambda$runTestWatchRequestAsync$3:259

[ERROR]   
TestWatchRequestWithGrpc>WatchRequestTests.testWatchRequestAsyncChangeLeader:286->WatchRequestTests.runTest:147->WatchRequestTests.runTestWatchRequestAsyncChangeLeader:338->WatchRequestTests.lambda$runTestWatchRequestAsyncChangeLeader$5:338

[ERROR]   
TestRaftReconfigurationWithNetty>RaftReconfigurationBaseTest.testKillLeaderDuringReconf:387

[ERROR]   
TestRaftReconfigurationWithSimulatedRpc>RaftReconfigurationBaseTest.testKillLeaderDuringReconf:387

[ERROR] Errors:

[ERROR]   TestRaftAsyncWithGrpc.testStaleReadAsync » Completion 
org.apache.ratis.protoco...

[INFO]

[ERROR] Tests run: 198, Failures: 5, Errors: 1, Skipped: 2
{code}

> Raft Client Async API's should honor Retry Policy 
> --
>
> Key: RATIS-386
> URL: https://issues.apache.org/jira/browse/RATIS-386
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-386.001.patch, RATIS-386.002.patch, 
> RATIS-386.003.patch, RATIS-386.004.patch
>
>
> Raft client sync Api has support for retry policies. Similarly, for Async 
> API's including watch Api, support for Retry Policy is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-365) Implement RaftServer.getGroupIds() using the key set of ImplMap

2018-11-16 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690372#comment-16690372
 ] 

Shashikant Banerjee commented on RATIS-365:
---

Thanks [~szetszwo] for reporting and working on this. It looks good to me. +1

> Implement RaftServer.getGroupIds() using the key set of ImplMap
> ---
>
> Key: RATIS-365
> URL: https://issues.apache.org/jira/browse/RATIS-365
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r365_20181023.patch
>
>
> {code:java}
> //RaftServerProxy
>   public Iterable getGroupIds() throws IOException {
> return 
> getImpls().stream().map(RaftServerImpl::getGroupId).collect(Collectors.toList());
>   }
> {code}
> getGroupIds() above unnecessarily calls getImpls() and then map 
> RaftServerImpl to RaftGroupId.  We may get RaftGroupId(s) directly from 
> ImplMap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (RATIS-430) RaftLogCache#getCachedSegmentNum hits ConcurrentModificationException

2018-11-19 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned RATIS-430:
-

Assignee: (was: Shashikant Banerjee)

> RaftLogCache#getCachedSegmentNum hits ConcurrentModificationException 
> --
>
> Key: RATIS-430
> URL: https://issues.apache.org/jira/browse/RATIS-430
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Shashikant Banerjee
>Priority: Major
> Fix For: 0.3.0
>
>
> While running performance runs with Ozone, it hits 
> ConcurrentModificationException while loading the cached Raft Segment
> {code:java}
> 2018-11-16 14:55:56,329 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
> e3e9a703-55bb-482b-a0a1-ce8000474ac2: Failed appendEntries to 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341:172.26.32.228:9858: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2018-11-16 14:55:56,588 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
> e3e9a703-55bb-482b-a0a1-ce8000474ac2: Failed appendEntries to 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341:172.26.32.228:9858: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2018-11-16 14:55:56,778 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> Rolling segment:e3e9a703-55bb-482b-a0a1-ce8000474ac2-RaftLogWorker index 
> to:3962
> 2018-11-16 14:55:56,870 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> Rolling segment:e3e9a703-55bb-482b-a0a1-ce8000474ac2-RaftLogWorker index 
> to:3963
> 2018-11-16 14:55:56,895 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
> e3e9a703-55bb-482b-a0a1-ce8000474ac2: Failed appendEntries to 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341:172.26.32.228:9858: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2018-11-16 14:55:56,896 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> Rolling segment:e3e9a703-55bb-482b-a0a1-ce8000474ac2-RaftLogWorker index 
> to:3964
> 2018-11-16 14:55:56,898 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> Rolling segment:e3e9a703-55bb-482b-a0a1-ce8000474ac2-RaftLogWorker index 
> to:3965
> 2018-11-16 14:55:56,899 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(e3e9a703-55bb-482b-a0a1-ce8000474ac2 -> 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341) unexpected exception
> java.util.ConcurrentModificationException
>         at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1380)
>         at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>         at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>         at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>         at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>         at java.util.stream.LongPipeline.reduce(LongPipeline.java:438)
>         at java.util.stream.LongPipeline.sum(LongPipeline.java:396)
>         at 
> java.util.stream.ReferencePipeline.count(ReferencePipeline.java:526)
>         at 
> org.apache.ratis.server.storage.RaftLogCache.getCachedSegmentNum(RaftLogCache.java:118)
>         at 
> org.apache.ratis.server.storage.RaftLogCache.shouldEvict(RaftLogCache.java:122)
>         at 
> org.apache.ratis.server.storage.SegmentedRaftLog.checkAndEvictCache(SegmentedRaftLog.java:215)
>         at 
> org.apache.ratis.server.storage.SegmentedRaftLog.get(SegmentedRaftLog.java:193)
>         at 
> org.apache.ratis.server.storage.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:199)
>         at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:207)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>         at 
> org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)
>         at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-430) RaftLogCache#getCachedSegmentNum hits ConcurrentModificationException

2018-11-19 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-430:
-

 Summary: RaftLogCache#getCachedSegmentNum hits 
ConcurrentModificationException 
 Key: RATIS-430
 URL: https://issues.apache.org/jira/browse/RATIS-430
 Project: Ratis
  Issue Type: Bug
  Components: server
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.3.0


While running performance runs with Ozone, it hits 
ConcurrentModificationException while loading the cached Raft Segment
{code:java}
2018-11-16 14:55:56,329 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
e3e9a703-55bb-482b-a0a1-ce8000474ac2: Failed appendEntries to 
0813f1a9-61be-4cab-aa05-d5640f4a8341:172.26.32.228:9858: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception

2018-11-16 14:55:56,588 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
e3e9a703-55bb-482b-a0a1-ce8000474ac2: Failed appendEntries to 
0813f1a9-61be-4cab-aa05-d5640f4a8341:172.26.32.228:9858: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception

2018-11-16 14:55:56,778 INFO org.apache.ratis.server.storage.RaftLogWorker: 
Rolling segment:e3e9a703-55bb-482b-a0a1-ce8000474ac2-RaftLogWorker index to:3962

2018-11-16 14:55:56,870 INFO org.apache.ratis.server.storage.RaftLogWorker: 
Rolling segment:e3e9a703-55bb-482b-a0a1-ce8000474ac2-RaftLogWorker index to:3963

2018-11-16 14:55:56,895 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
e3e9a703-55bb-482b-a0a1-ce8000474ac2: Failed appendEntries to 
0813f1a9-61be-4cab-aa05-d5640f4a8341:172.26.32.228:9858: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception

2018-11-16 14:55:56,896 INFO org.apache.ratis.server.storage.RaftLogWorker: 
Rolling segment:e3e9a703-55bb-482b-a0a1-ce8000474ac2-RaftLogWorker index to:3964

2018-11-16 14:55:56,898 INFO org.apache.ratis.server.storage.RaftLogWorker: 
Rolling segment:e3e9a703-55bb-482b-a0a1-ce8000474ac2-RaftLogWorker index to:3965

2018-11-16 14:55:56,899 ERROR org.apache.ratis.server.impl.LogAppender: 
GrpcLogAppender(e3e9a703-55bb-482b-a0a1-ce8000474ac2 -> 
0813f1a9-61be-4cab-aa05-d5640f4a8341) unexpected exception

java.util.ConcurrentModificationException

        at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1380)

        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)

        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)

        at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)

        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)

        at java.util.stream.LongPipeline.reduce(LongPipeline.java:438)

        at java.util.stream.LongPipeline.sum(LongPipeline.java:396)

        at java.util.stream.ReferencePipeline.count(ReferencePipeline.java:526)

        at 
org.apache.ratis.server.storage.RaftLogCache.getCachedSegmentNum(RaftLogCache.java:118)

        at 
org.apache.ratis.server.storage.RaftLogCache.shouldEvict(RaftLogCache.java:122)

        at 
org.apache.ratis.server.storage.SegmentedRaftLog.checkAndEvictCache(SegmentedRaftLog.java:215)

        at 
org.apache.ratis.server.storage.SegmentedRaftLog.get(SegmentedRaftLog.java:193)

        at 
org.apache.ratis.server.storage.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:199)

        at 
org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:207)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)

        at 
org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)

        at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-431) RaftServer terminates while hitting FileNotFoundException while truncating a log

2018-11-19 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-431:
-

 Summary: RaftServer terminates while hitting FileNotFoundException 
while truncating a log
 Key: RATIS-431
 URL: https://issues.apache.org/jira/browse/RATIS-431
 Project: Ratis
  Issue Type: Bug
  Components: server
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.3.0


{code:java}
2018-11-16 09:54:42,949 INFO org.apache.ratis.server.impl.RoleInfo: 
0813f1a9-61be-4cab-aa05-d5640f4a8341: start FollowerState

2018-11-16 09:54:43,005 INFO org.apache.ratis.server.impl.RaftServerImpl: 
0813f1a9-61be-4cab-aa05-d5640f4a8341: change Leader from null to 
e3e9a703-55bb-482b-a0a1-ce8000474ac2 at term 4 for appendEntries, leader 
elected after 61ms

2018-11-16 09:54:43,006 INFO org.apache.ratis.server.impl.RaftServerImpl: 
0813f1a9-61be-4cab-aa05-d5640f4a8341: set configuration 3917: 
[e3e9a703-55bb-482b-a0a1-ce8000474ac2:172.26.32.230:9858, 
c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d:172.26.32.231:9858, 
0813f1a9-61be-4cab-aa05-d5640f4a8341:172.26.32.228:9858], old=null at 3917

2018-11-16 09:54:43,391 WARN org.apache.ratis.util.FileUtils: Failed to 
Files.delete 
/data/disk2/scm/ratis/a85fce1a-2aef-49cf-899e-894b9c74e7f6/current/log_3930-3930:
 java.nio.file.NoSuchFileException: 
/data/disk2/scm/ratis/a85fce1a-2aef-49cf-899e-894b9c74e7f6/current/log_3930-3930

2018-11-16 09:54:43,393 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
Terminating with exit status 1: 
0813f1a9-61be-4cab-aa05-d5640f4a8341-RaftLogWorker failed.

java.nio.file.NoSuchFileException: 
/data/disk2/scm/ratis/a85fce1a-2aef-49cf-899e-894b9c74e7f6/current/log_3930-3930

        at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)

        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)

        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)

        at 
sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)

        at 
sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)

        at java.nio.file.Files.delete(Files.java:1126)

        at org.apache.ratis.util.FileUtils.lambda$delete$8(FileUtils.java:82)

        at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:52)

        at org.apache.ratis.util.FileUtils.delete(FileUtils.java:81)

        at org.apache.ratis.util.FileUtils.deleteFile(FileUtils.java:72)

        at 
org.apache.ratis.server.storage.RaftLogWorker$TruncateLog.execute(RaftLogWorker.java:477)

        at 
org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:216)

        at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (RATIS-410) entry with stateMachineData may cause OOM

2018-11-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned RATIS-410:
-

Assignee: Shashikant Banerjee  (was: Mukul Kumar Singh)

> entry with stateMachineData may cause OOM
> -
>
> Key: RATIS-410
> URL: https://issues.apache.org/jira/browse/RATIS-410
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-410.001.patch, RATS-410.000.patch
>
>
> With Ozone, each of raft log segments are of 1GB in size. Also each of the 
> writeChunk request will have 16MB worth of payload. So this can result in 
> multiple writeChunk requests to be part of one log segment.
> Because the caching policy in ratis works on eviction of log segments, these 
> results in segments holding onto memory which is part of stateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-410) entry with stateMachineData may cause OOM

2018-11-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-410:
--
Attachment: RATS-410.000.patch

> entry with stateMachineData may cause OOM
> -
>
> Key: RATIS-410
> URL: https://issues.apache.org/jira/browse/RATIS-410
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-410.001.patch, RATS-410.000.patch
>
>
> With Ozone, each of raft log segments are of 1GB in size. Also each of the 
> writeChunk request will have 16MB worth of payload. So this can result in 
> multiple writeChunk requests to be part of one log segment.
> Because the caching policy in ratis works on eviction of log segments, these 
> results in segments holding onto memory which is part of stateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-410) entry with stateMachineData may cause OOM

2018-11-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-410:
--
Attachment: RATS-410.002.patch

> entry with stateMachineData may cause OOM
> -
>
> Key: RATIS-410
> URL: https://issues.apache.org/jira/browse/RATIS-410
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-410.001.patch, RATS-410.002.patch
>
>
> With Ozone, each of raft log segments are of 1GB in size. Also each of the 
> writeChunk request will have 16MB worth of payload. So this can result in 
> multiple writeChunk requests to be part of one log segment.
> Because the caching policy in ratis works on eviction of log segments, these 
> results in segments holding onto memory which is part of stateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-410) entry with stateMachineData may cause OOM

2018-11-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-410:
--
Attachment: (was: RATS-410.000.patch)

> entry with stateMachineData may cause OOM
> -
>
> Key: RATIS-410
> URL: https://issues.apache.org/jira/browse/RATIS-410
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-410.001.patch, RATS-410.002.patch
>
>
> With Ozone, each of raft log segments are of 1GB in size. Also each of the 
> writeChunk request will have 16MB worth of payload. So this can result in 
> multiple writeChunk requests to be part of one log segment.
> Because the caching policy in ratis works on eviction of log segments, these 
> results in segments holding onto memory which is part of stateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-410) entry with stateMachineData may cause OOM

2018-11-20 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693527#comment-16693527
 ] 

Shashikant Banerjee commented on RATIS-410:
---

Patch v2 removes the stateMachinedata while adding it to the raftLogCache.

> entry with stateMachineData may cause OOM
> -
>
> Key: RATIS-410
> URL: https://issues.apache.org/jira/browse/RATIS-410
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-410.001.patch, RATS-410.002.patch
>
>
> With Ozone, each of raft log segments are of 1GB in size. Also each of the 
> writeChunk request will have 16MB worth of payload. So this can result in 
> multiple writeChunk requests to be part of one log segment.
> Because the caching policy in ratis works on eviction of log segments, these 
> results in segments holding onto memory which is part of stateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-410) entry with stateMachineData may cause OOM

2018-11-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-410:
--
Attachment: (was: RATS-410.002.patch)

> entry with stateMachineData may cause OOM
> -
>
> Key: RATIS-410
> URL: https://issues.apache.org/jira/browse/RATIS-410
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-410.001.patch
>
>
> With Ozone, each of raft log segments are of 1GB in size. Also each of the 
> writeChunk request will have 16MB worth of payload. So this can result in 
> multiple writeChunk requests to be part of one log segment.
> Because the caching policy in ratis works on eviction of log segments, these 
> results in segments holding onto memory which is part of stateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-410) entry with stateMachineData may cause OOM

2018-11-20 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-410:
--
Attachment: RATIS-410.002.patch

> entry with stateMachineData may cause OOM
> -
>
> Key: RATIS-410
> URL: https://issues.apache.org/jira/browse/RATIS-410
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-410.001.patch, RATIS-410.002.patch
>
>
> With Ozone, each of raft log segments are of 1GB in size. Also each of the 
> writeChunk request will have 16MB worth of payload. So this can result in 
> multiple writeChunk requests to be part of one log segment.
> Because the caching policy in ratis works on eviction of log segments, these 
> results in segments holding onto memory which is part of stateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-410) entry with stateMachineData may cause OOM

2018-11-21 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694937#comment-16694937
 ] 

Shashikant Banerjee commented on RATIS-410:
---

Patch v3 adds a config to selectively cache the stateMachineData in the 
stateMachine itself or Raft server Itself. By the default, the config value is 
set to false to make sure for all other StateMachineImplementations. the 
stateMachine data is cached within Ratis itself in order to make sure all the 
custom stateMachineImplementations work as is.

> entry with stateMachineData may cause OOM
> -
>
> Key: RATIS-410
> URL: https://issues.apache.org/jira/browse/RATIS-410
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-410.001.patch, RATIS-410.002.patch, 
> RATIS-410.003.patch
>
>
> With Ozone, each of raft log segments are of 1GB in size. Also each of the 
> writeChunk request will have 16MB worth of payload. So this can result in 
> multiple writeChunk requests to be part of one log segment.
> Because the caching policy in ratis works on eviction of log segments, these 
> results in segments holding onto memory which is part of stateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-410) entry with stateMachineData may cause OOM

2018-11-21 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-410:
--
Attachment: RATIS-410.003.patch

> entry with stateMachineData may cause OOM
> -
>
> Key: RATIS-410
> URL: https://issues.apache.org/jira/browse/RATIS-410
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-410.001.patch, RATIS-410.002.patch, 
> RATIS-410.003.patch
>
>
> With Ozone, each of raft log segments are of 1GB in size. Also each of the 
> writeChunk request will have 16MB worth of payload. So this can result in 
> multiple writeChunk requests to be part of one log segment.
> Because the caching policy in ratis works on eviction of log segments, these 
> results in segments holding onto memory which is part of stateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-410) Raft log entry with stateMachineData may cause OOM

2018-11-21 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-410:
--
Summary: Raft log entry with stateMachineData may cause OOM  (was: entry 
with stateMachineData may cause OOM)

> Raft log entry with stateMachineData may cause OOM
> --
>
> Key: RATIS-410
> URL: https://issues.apache.org/jira/browse/RATIS-410
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-410.001.patch, RATIS-410.002.patch, 
> RATIS-410.003.patch
>
>
> With Ozone, each of raft log segments are of 1GB in size. Also each of the 
> writeChunk request will have 16MB worth of payload. So this can result in 
> multiple writeChunk requests to be part of one log segment.
> Because the caching policy in ratis works on eviction of log segments, these 
> results in segments holding onto memory which is part of stateMachine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-326) Introduce ClearStateMachineData API in StateMachine interface in Ratis

2018-11-29 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-326:
--
Summary: Introduce ClearStateMachineData API in StateMachine interface in 
Ratis  (was: Introduce RemoveStateMachineData API in StateMachine interface in 
Ratis)

> Introduce ClearStateMachineData API in StateMachine interface in Ratis
> --
>
> Key: RATIS-326
> URL: https://issues.apache.org/jira/browse/RATIS-326
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> When a follower truncates its log entry in case there is a mismatch between 
> the received log entry and its own stored entry, we should also remove the 
> stateMachine data written as a part of appending the stored log entry on the 
> follower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-326) Introduce ClearStateMachineData API in StateMachine interface in Ratis

2018-11-29 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-326:
--
Attachment: RATIS-326.000.patch

> Introduce ClearStateMachineData API in StateMachine interface in Ratis
> --
>
> Key: RATIS-326
> URL: https://issues.apache.org/jira/browse/RATIS-326
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-326.000.patch
>
>
> When a follower truncates its log entry in case there is a mismatch between 
> the received log entry and its own stored entry, we should also remove the 
> stateMachine data written as a part of appending the stored log entry on the 
> follower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-326) Introduce ClearStateMachineData API in StateMachine interface in Ratis

2018-12-06 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-326:
--
Attachment: RATIS-326.001.patch

> Introduce ClearStateMachineData API in StateMachine interface in Ratis
> --
>
> Key: RATIS-326
> URL: https://issues.apache.org/jira/browse/RATIS-326
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-326.000.patch, RATIS-326.001.patch
>
>
> When a follower truncates its log entry in case there is a mismatch between 
> the received log entry and its own stored entry, we should also remove the 
> stateMachine data written as a part of appending the stored log entry on the 
> follower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-326) Introduce ClearStateMachineData API in StateMachine interface in Ratis

2018-12-06 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711293#comment-16711293
 ] 

Shashikant Banerjee commented on RATIS-326:
---

Thanks [~jnp] and [~szetszwo] for the review. Patch v1 addresses the review 
comments.
{code:java}
Let's rename clearStateMachineData method to truncate, because it is to be 
called for truncation of log
{code}
Renamed it to truncateStateMachineData to be consistent with other stateMachine 
calls like writeStateMachineData/readStateMachineData/flushStateMachineData.

> Introduce ClearStateMachineData API in StateMachine interface in Ratis
> --
>
> Key: RATIS-326
> URL: https://issues.apache.org/jira/browse/RATIS-326
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-326.000.patch, RATIS-326.001.patch
>
>
> When a follower truncates its log entry in case there is a mismatch between 
> the received log entry and its own stored entry, we should also remove the 
> stateMachine data written as a part of appending the stored log entry on the 
> follower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-326) Introduce ClearStateMachineData API in StateMachine interface in Ratis

2018-12-13 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719933#comment-16719933
 ] 

Shashikant Banerjee commented on RATIS-326:
---

[~szetszwo], can you please have a look at patch v1?

> Introduce ClearStateMachineData API in StateMachine interface in Ratis
> --
>
> Key: RATIS-326
> URL: https://issues.apache.org/jira/browse/RATIS-326
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-326.000.patch, RATIS-326.001.patch
>
>
> When a follower truncates its log entry in case there is a mismatch between 
> the received log entry and its own stored entry, we should also remove the 
> stateMachine data written as a part of appending the stored log entry on the 
> follower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-326) Introduce ClearStateMachineData API in StateMachine interface in Ratis

2018-12-13 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720245#comment-16720245
 ] 

Shashikant Banerjee commented on RATIS-326:
---

Thanks [~szetszwo] for the review. patch v2 addresses the review comments.

> Introduce ClearStateMachineData API in StateMachine interface in Ratis
> --
>
> Key: RATIS-326
> URL: https://issues.apache.org/jira/browse/RATIS-326
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-326.000.patch, RATIS-326.001.patch, 
> RATIS-326.002.patch
>
>
> When a follower truncates its log entry in case there is a mismatch between 
> the received log entry and its own stored entry, we should also remove the 
> stateMachine data written as a part of appending the stored log entry on the 
> follower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-326) Introduce ClearStateMachineData API in StateMachine interface in Ratis

2018-12-13 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-326:
--
Attachment: RATIS-326.002.patch

> Introduce ClearStateMachineData API in StateMachine interface in Ratis
> --
>
> Key: RATIS-326
> URL: https://issues.apache.org/jira/browse/RATIS-326
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-326.000.patch, RATIS-326.001.patch, 
> RATIS-326.002.patch
>
>
> When a follower truncates its log entry in case there is a mismatch between 
> the received log entry and its own stored entry, we should also remove the 
> stateMachine data written as a part of appending the stored log entry on the 
> follower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-326) Introduce truncateStateMachineData API in StateMachine interface in Ratis

2018-12-13 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-326:
--
Summary: Introduce truncateStateMachineData API in StateMachine interface 
in Ratis  (was: Introduce ClearStateMachineData API in StateMachine interface 
in Ratis)

> Introduce truncateStateMachineData API in StateMachine interface in Ratis
> -
>
> Key: RATIS-326
> URL: https://issues.apache.org/jira/browse/RATIS-326
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: RATIS-326.000.patch, RATIS-326.001.patch, 
> RATIS-326.002.patch
>
>
> When a follower truncates its log entry in case there is a mismatch between 
> the received log entry and its own stored entry, we should also remove the 
> stateMachine data written as a part of appending the stored log entry on the 
> follower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-490) Logging Improvements with MultiRaft

2019-02-26 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-490:
-

 Summary: Logging Improvements with MultiRaft
 Key: RATIS-490
 URL: https://issues.apache.org/jira/browse/RATIS-490
 Project: Ratis
  Issue Type: Improvement
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee


With MultiRaft feature in Ratis, same server can be in different role in 
different raft groups. That's why it becomes essential to log the RaftGroupId 
along with other information to figure out precisely for which group particular 
steps are executed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-511) Fail the requests the sliding window when a raft client hits GroupMismatchException

2019-03-28 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-511:
-

 Summary: Fail the requests the sliding window when a raft client 
hits GroupMismatchException
 Key: RATIS-511
 URL: https://issues.apache.org/jira/browse/RATIS-511
 Project: Ratis
  Issue Type: Improvement
  Components: client
Affects Versions: 0.3.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.3.0


When a raft request fails with GroupMismatchException, all the pending requests 
in the sliding window should be marked failed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-511) Fail the requests the sliding window when a raft client hits GroupMismatchException

2019-03-29 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804789#comment-16804789
 ] 

Shashikant Banerjee commented on RATIS-511:
---

Thanks [~szetszwo] for the patch. The patch looks good to me. I have also 
tested it with HDDS-1337. 

Can you please check the unit test failures?

> Fail the requests the sliding window when a raft client hits 
> GroupMismatchException
> ---
>
> Key: RATIS-511
> URL: https://issues.apache.org/jira/browse/RATIS-511
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: r511_20190329.patch, r511_20190329b.patch
>
>
> When a raft request fails with GroupMismatchException, all the pending 
> requests in the sliding window should be marked failed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-511) Fail the requests the sliding window when a raft client hits GroupMismatchException

2019-03-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806397#comment-16806397
 ] 

Shashikant Banerjee commented on RATIS-511:
---

Thanks [~szetszwo]. +1 on latest patch. I will commit this shortly.

> Fail the requests the sliding window when a raft client hits 
> GroupMismatchException
> ---
>
> Key: RATIS-511
> URL: https://issues.apache.org/jira/browse/RATIS-511
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.3.0
>Reporter: Shashikant Banerjee
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: r511_20190329.patch, r511_20190329b.patch
>
>
> When a raft request fails with GroupMismatchException, all the pending 
> requests in the sliding window should be marked failed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-512) testLeaderStepDown may fail with NullPointerException

2019-03-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806401#comment-16806401
 ] 

Shashikant Banerjee commented on RATIS-512:
---

Thanks [~szetszwo]. The patch looks good to me . +1.

> testLeaderStepDown may fail with NullPointerException
> -
>
> Key: RATIS-512
> URL: https://issues.apache.org/jira/browse/RATIS-512
> Project: Ratis
>  Issue Type: Bug
>  Components: test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r512_20190330.patch
>
>
> {code}
> 2019-03-30 04:33:50,166 ERROR ratis.MiniRaftCluster 
> (MiniRaftCluster.java:runWithNewCluster(122)) - Failed 
> org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:107)
> java.lang.NullPointerException
>   at org.apache.ratis.MiniRaftCluster.toRaftPeer(MiniRaftCluster.java:384)
>   at 
> org.apache.ratis.MiniRaftCluster.removePeers(MiniRaftCluster.java:431)
>   at 
> org.apache.ratis.server.impl.RaftReconfigurationBaseTest.runTestAddRemovePeers(RaftReconfigurationBaseTest.java:128)
>   at 
> org.apache.ratis.server.impl.RaftReconfigurationBaseTest.lambda$testLeaderStepDown$3(RaftReconfigurationBaseTest.java:121)
>   at 
> org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:119)
>   at 
> org.apache.ratis.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:107)
>   at 
> org.apache.ratis.server.impl.RaftReconfigurationBaseTest.testLeaderStepDown(RaftReconfigurationBaseTest.java:121)
>   ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-518) Add request specific retry policy support in Ratis

2019-04-07 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-518:
-

 Summary: Add request specific retry policy support in Ratis
 Key: RATIS-518
 URL: https://issues.apache.org/jira/browse/RATIS-518
 Project: Ratis
  Issue Type: Bug
  Components: client
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee


Currently , the retry policy is enforced on a raft client which handles 
multiple requests. The idea here is to add support for request specific retry 
policy in Raft client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-518) Add request specific retry policy support in Ratis

2019-04-12 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816257#comment-16816257
 ] 

Shashikant Banerjee commented on RATIS-518:
---

Thanks [~szetszwo] for the patch. The patch really looks good to me. Some minor 
comments:

1)RaftClientReply.Java -> Unintended change.

2) Can we rename RequestTypeDependent to RequestTypeDependentRetry?

3) By default, the the retry policy set in RaftClient is 
retryForeverWithNoSleep , but for requestDependent retry, default is noRetry(). 
Is there any specific reason for this? Can we have the default behaviour same 
for both?

> Add request specific retry policy support in Ratis
> --
>
> Key: RATIS-518
> URL: https://issues.apache.org/jira/browse/RATIS-518
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r518_20190409.patch
>
>
> Currently , the retry policy is enforced on a raft client which handles 
> multiple requests. The idea here is to add support for request specific retry 
> policy in Raft client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-518) Add request specific retry policy support in Ratis

2019-04-15 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16817804#comment-16817804
 ] 

Shashikant Banerjee commented on RATIS-518:
---

Thanks [~szetszwo] for updating the patch . +1 on v2 patch.

> Add request specific retry policy support in Ratis
> --
>
> Key: RATIS-518
> URL: https://issues.apache.org/jira/browse/RATIS-518
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r518_20190409.patch, r518_20190413.patch
>
>
> Currently , the retry policy is enforced on a raft client which handles 
> multiple requests. The idea here is to add support for request specific retry 
> policy in Raft client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-532) UnorderedAsync requests should fail instead of retrying the request in case of GroupMismatchException

2019-04-22 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-532:
-

 Summary: UnorderedAsync requests should fail instead of retrying 
the request in case of GroupMismatchException
 Key: RATIS-532
 URL: https://issues.apache.org/jira/browse/RATIS-532
 Project: Ratis
  Issue Type: Bug
  Components: client
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


UnorderedAsync#sendRequestWithRetry
{code:java}
if (e instanceof IOException) {
  if (e instanceof NotLeaderException) {
client.handleNotLeaderException(request, (NotLeaderException) e, false);
  } else if (!(e instanceof GroupMismatchException)) {
client.handleIOException(request, (IOException) e, null, false);
  }
} else {
  if (!client.getClientRpc().handleException(request.getServerId(), e, false)) {
f.completeExceptionally(e);
return;
  }
}

LOG.info("schedule retry for attempt #{}, policy={}, request={}", attemptCount, 
retryPolicy, request);
client.getScheduler().onTimeout(retryPolicy.getSleepTime(), () -> 
sendRequestWithRetry(pending, client),
LOG, () -> clientId + ": Failed~ to retry " + request);
{code}
Currently, in case of GroupMismatchException, it is ignored and retried as per 
the retry policy.In case as such, it should just mark the reply future to 
complete exceptionally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-532) UnorderedAsync requests should fail instead of retrying the request in case of GroupMismatchException

2019-04-22 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-532:
--
Attachment: RATIS-532.000.patch

> UnorderedAsync requests should fail instead of retrying the request in case 
> of GroupMismatchException
> -
>
> Key: RATIS-532
> URL: https://issues.apache.org/jira/browse/RATIS-532
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: RATIS-532.000.patch
>
>
> UnorderedAsync#sendRequestWithRetry
> {code:java}
> if (e instanceof IOException) {
>   if (e instanceof NotLeaderException) {
> client.handleNotLeaderException(request, (NotLeaderException) e, false);
>   } else if (!(e instanceof GroupMismatchException)) {
> client.handleIOException(request, (IOException) e, null, false);
>   }
> } else {
>   if (!client.getClientRpc().handleException(request.getServerId(), e, 
> false)) {
> f.completeExceptionally(e);
> return;
>   }
> }
> LOG.info("schedule retry for attempt #{}, policy={}, request={}", 
> attemptCount, retryPolicy, request);
> client.getScheduler().onTimeout(retryPolicy.getSleepTime(), () -> 
> sendRequestWithRetry(pending, client),
> LOG, () -> clientId + ": Failed~ to retry " + request);
> {code}
> Currently, in case of GroupMismatchException, it is ignored and retried as 
> per the retry policy.In case as such, it should just mark the reply future to 
> complete exceptionally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-167) Implement a high performance OutputStream using async API

2019-05-30 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851682#comment-16851682
 ] 

Shashikant Banerjee commented on RATIS-167:
---

Thanks [~szetszwo] for working on this. The changes look good to me overall, 
but it needs to rebased to latest trunk to fix the compilation.

> Implement a high performance OutputStream using async API
> -
>
> Key: RATIS-167
> URL: https://issues.apache.org/jira/browse/RATIS-167
> Project: Ratis
>  Issue Type: New Feature
>  Components: client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
>  Labels: ozone
> Attachments: r167_20180921.patch, r167_20181109.patch, 
> r167_20181207.patch, r167_20181210.patch, r167_20190111.patch, 
> r167_20190528.patch
>
>
> With an in-order async API, it is easy to implement a high performance 
> OutputStream.
> Also, since the async API already takes care leader change, the OutputStream 
> implementation supports automatic failover for free.
> The code is generic in the sense that it is RPC independent (as long as the 
> RPC supports in-order async API).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-167) Implement a high performance OutputStream using async API

2019-05-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852883#comment-16852883
 ] 

Shashikant Banerjee commented on RATIS-167:
---

Thanks [~szetszwo] for updating the patch. Currently, TestGrpcOutputStream are 
disabled bcoz of issues in GrpcOutputStream.

Do we have ay open Jira to track them?

The changes  look good to me otherwise. I am +1 on this change.

> Implement a high performance OutputStream using async API
> -
>
> Key: RATIS-167
> URL: https://issues.apache.org/jira/browse/RATIS-167
> Project: Ratis
>  Issue Type: New Feature
>  Components: client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
>  Labels: ozone
> Attachments: r167_20180921.patch, r167_20181109.patch, 
> r167_20181207.patch, r167_20181210.patch, r167_20190111.patch, 
> r167_20190528.patch, r167_20190530.patch
>
>
> With an in-order async API, it is easy to implement a high performance 
> OutputStream.
> Also, since the async API already takes care leader change, the OutputStream 
> implementation supports automatic failover for free.
> The code is generic in the sense that it is RPC independent (as long as the 
> RPC supports in-order async API).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-620) TestWatchForCommit tests are flaky

2019-07-10 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-620:
-

 Summary: TestWatchForCommit tests are flaky
 Key: RATIS-620
 URL: https://issues.apache.org/jira/browse/RATIS-620
 Project: Ratis
  Issue Type: Bug
  Components: client
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-627) Rpc timeouts and leader election timeout configs are misleading

2019-07-19 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-627:
-

 Summary: Rpc timeouts and leader election timeout configs are 
misleading
 Key: RATIS-627
 URL: https://issues.apache.org/jira/browse/RATIS-627
 Project: Ratis
  Issue Type: Bug
  Components: server
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
 Fix For: 0.4.0


Currently, for a leader election to complete, timeouts are decided by min and 
max values given by rpc.timeout.min and rpc.timeout.max. But, there is a 
separate config for leader timeout given by "leader.election.timeout" after 
which the statemachine of the server is notified
about leader election pending for a long time. The naming of the configs seem 
confusing at times.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (RATIS-596) Rename raft.server.leader.election.timeout

2019-07-21 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889916#comment-16889916
 ] 

Shashikant Banerjee commented on RATIS-596:
---

The patch looks good . Please address the checkstyle issues while committing, I 
am +1 on the change.

> Rename raft.server.leader.election.timeout 
> ---
>
> Key: RATIS-596
> URL: https://issues.apache.org/jira/browse/RATIS-596
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r596_20190619.patch
>
>
> The conf raft.server.leader.election.timeout is actually for extended no 
> leader but not leader election.  It should be renamed to avoid confusion.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (RATIS-635) Add an API to get the min replicated logIndex for a raftGroup in raftServer

2019-07-23 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-635:
-

 Summary: Add an API to get the min replicated logIndex for a 
raftGroup in raftServer
 Key: RATIS-635
 URL: https://issues.apache.org/jira/browse/RATIS-635
 Project: Ratis
  Issue Type: Bug
  Components: server
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


This feature is required by Ozone(HDDS-1753) to figure the min replicated index 
across all servers of a RaftGroup.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (RATIS-643) Allow Ratis to take a configurable snapshot retention policy

2019-07-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896952#comment-16896952
 ] 

Shashikant Banerjee edited comment on RATIS-643 at 7/31/19 9:09 AM:


Thanks [~avijayan] for working on this. The patch looks good to me . I am +1 on 
the change. Test failures don't seem to be related.

I will commit this shortly


was (Author: shashikant):
Thanks [~avijayan] for working on this. The patch looks good to me . I am+1 on 
the change. Test failures don't seem to be related.

I will commit this shortly

> Allow Ratis to take a configurable snapshot retention policy
> 
>
> Key: RATIS-643
> URL: https://issues.apache.org/jira/browse/RATIS-643
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
> Attachments: RATIS-643-000.patch
>
>
> It will be a useful feature for Ratis to provide a Snapshot retention policy 
> which clients can configure. As a starting point, we can have the number of 
> recent snapshot files to retain be configurable.
> The motivation is from HDDS-1786. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (RATIS-643) Allow Ratis to take a configurable snapshot retention policy

2019-07-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896952#comment-16896952
 ] 

Shashikant Banerjee commented on RATIS-643:
---

Thanks [~avijayan] for working on this. The patch looks good to me . I am+1 on 
the change. Test failures don't seem to be related.

I will commit this shortly

> Allow Ratis to take a configurable snapshot retention policy
> 
>
> Key: RATIS-643
> URL: https://issues.apache.org/jira/browse/RATIS-643
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
> Attachments: RATIS-643-000.patch
>
>
> It will be a useful feature for Ratis to provide a Snapshot retention policy 
> which clients can configure. As a starting point, we can have the number of 
> recent snapshot files to retain be configurable.
> The motivation is from HDDS-1786. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (RATIS-643) Allow Ratis to take a configurable snapshot retention policy

2019-07-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896956#comment-16896956
 ] 

Shashikant Banerjee commented on RATIS-643:
---

Thanks [~avijayan] for the contribution. I have committed this change to master.

> Allow Ratis to take a configurable snapshot retention policy
> 
>
> Key: RATIS-643
> URL: https://issues.apache.org/jira/browse/RATIS-643
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
> Attachments: RATIS-643-000.patch
>
>
> It will be a useful feature for Ratis to provide a Snapshot retention policy 
> which clients can configure. As a starting point, we can have the number of 
> recent snapshot files to retain be configurable.
> The motivation is from HDDS-1786. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (RATIS-643) Allow Ratis to take a configurable snapshot retention policy

2019-07-31 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-643:
--
Component/s: server

> Allow Ratis to take a configurable snapshot retention policy
> 
>
> Key: RATIS-643
> URL: https://issues.apache.org/jira/browse/RATIS-643
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
> Attachments: RATIS-643-000.patch
>
>
> It will be a useful feature for Ratis to provide a Snapshot retention policy 
> which clients can configure. As a starting point, we can have the number of 
> recent snapshot files to retain be configurable.
> The motivation is from HDDS-1786. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (RATIS-643) Allow Ratis to take a configurable snapshot retention policy

2019-07-31 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-643:
--
Labels: ozone  (was: )

> Allow Ratis to take a configurable snapshot retention policy
> 
>
> Key: RATIS-643
> URL: https://issues.apache.org/jira/browse/RATIS-643
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-643-000.patch
>
>
> It will be a useful feature for Ratis to provide a Snapshot retention policy 
> which clients can configure. As a starting point, we can have the number of 
> recent snapshot files to retain be configurable.
> The motivation is from HDDS-1786. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (RATIS-643) Allow Ratis to take a configurable snapshot retention policy

2019-07-31 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-643:
--
Fix Version/s: 0.4.0

> Allow Ratis to take a configurable snapshot retention policy
> 
>
> Key: RATIS-643
> URL: https://issues.apache.org/jira/browse/RATIS-643
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: ozone
> Fix For: 0.4.0
>
> Attachments: RATIS-643-000.patch
>
>
> It will be a useful feature for Ratis to provide a Snapshot retention policy 
> which clients can configure. As a starting point, we can have the number of 
> recent snapshot files to retain be configurable.
> The motivation is from HDDS-1786. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (RATIS-125) The cause in a StateMachineException is not sent to client

2019-07-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896968#comment-16896968
 ] 

Shashikant Banerjee edited comment on RATIS-125 at 7/31/19 9:19 AM:


Thanks [~nandakumar131] for working on this. The patch does not apply anymore. 
Can you please rebase?


was (Author: shashikant):
Thanks [~nandakumar131] for working on this. The path does not apply anymore. 
Can you please rebase?

> The cause in a StateMachineException is not sent to client
> --
>
> Key: RATIS-125
> URL: https://issues.apache.org/jira/browse/RATIS-125
> Project: Ratis
>  Issue Type: Improvement
>  Components: proto
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Nanda kumar
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-125.000.patch
>
>
> StateMachineExceptionProto only has class name, message and stack trace but 
> not the cause.
> In the client side, it cannot see the real cause of the exception.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (RATIS-125) The cause in a StateMachineException is not sent to client

2019-07-31 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896968#comment-16896968
 ] 

Shashikant Banerjee commented on RATIS-125:
---

Thanks [~nandakumar131] for working on this. The path does not apply anymore. 
Can you please rebase?

> The cause in a StateMachineException is not sent to client
> --
>
> Key: RATIS-125
> URL: https://issues.apache.org/jira/browse/RATIS-125
> Project: Ratis
>  Issue Type: Improvement
>  Components: proto
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Nanda kumar
>Priority: Major
>  Labels: ozone
> Attachments: RATIS-125.000.patch
>
>
> StateMachineExceptionProto only has class name, message and stack trace but 
> not the cause.
> In the client side, it cannot see the real cause of the exception.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (RATIS-635) Add an API to get the min replicated logIndex for a raftGroup in raftServer

2019-07-31 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned RATIS-635:
-

Assignee: Shashikant Banerjee  (was: Lokesh Jain)

> Add an API to get the min replicated logIndex for a raftGroup in raftServer
> ---
>
> Key: RATIS-635
> URL: https://issues.apache.org/jira/browse/RATIS-635
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.4.0
>
>
> This feature is required by Ozone(HDDS-1753) to figure the min replicated 
> index across all servers of a RaftGroup.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (RATIS-635) Add an API to get the min replicated logIndex for a raftGroup in raftServer

2019-08-01 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-635:
--
Attachment: RATIS-635.000.patch

> Add an API to get the min replicated logIndex for a raftGroup in raftServer
> ---
>
> Key: RATIS-635
> URL: https://issues.apache.org/jira/browse/RATIS-635
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.4.0
>
> Attachments: RATIS-635.000.patch
>
>
> This feature is required by Ozone(HDDS-1753) to figure the min replicated 
> index across all servers of a RaftGroup.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (RATIS-627) Rpc timeouts and leader election timeout configs are misleading

2019-08-01 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-627.
---
Resolution: Duplicate
  Assignee: Shashikant Banerjee

> Rpc timeouts and leader election timeout configs are misleading
> ---
>
> Key: RATIS-627
> URL: https://issues.apache.org/jira/browse/RATIS-627
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.4.0
>
>
> Currently, for a leader election to complete, timeouts are decided by min and 
> max values given by rpc.timeout.min and rpc.timeout.max. But, there is a 
> separate config for leader timeout given by "leader.election.timeout" after 
> which the statemachine of the server is notified
> about leader election pending for a long time. The naming of the configs seem 
> confusing at times.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (RATIS-646) Add Metrics support for Ratis pipeline

2019-08-01 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-646:
-

 Summary: Add Metrics support for Ratis pipeline
 Key: RATIS-646
 URL: https://issues.apache.org/jira/browse/RATIS-646
 Project: Ratis
  Issue Type: Sub-task
  Components: server
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


For performance measurements and ratis pipeline health diagnostics, certain 
metrics need to be incorporated inside ratis. This Jira aims to encompass all 
the required metrices.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (RATIS-646) Add Metrics support for Ratis pipeline

2019-08-01 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-646:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: RATIS-618)

> Add Metrics support for Ratis pipeline
> --
>
> Key: RATIS-646
> URL: https://issues.apache.org/jira/browse/RATIS-646
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
>
> For performance measurements and ratis pipeline health diagnostics, certain 
> metrics need to be incorporated inside ratis. This Jira aims to encompass all 
> the required metrices.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (RATIS-647) Create metrics associated with RaftLog for RaftServer

2019-08-01 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-647:
-

 Summary: Create metrics associated with RaftLog for RaftServer
 Key: RATIS-647
 URL: https://issues.apache.org/jira/browse/RATIS-647
 Project: Ratis
  Issue Type: Sub-task
  Components: server
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


We need the following metrics related to RaftLog and RaftLogWorker:
|raftLogSyncLatency|Time taken to sync raft log|
|numRaftLogSyncOps|Number of Raft log sync calls with respect to time(equals no 
of FlushStateMacine Calls)|
|raftLogSynBatchSize|No of raft log entries synced with each flush call|
|raftLogReadLatency|Time required to read a raft log entry from actual raft log 
file and create a raft log entry (Raft log read latency)|
|raftLogAppendLatency|Total time taken to append a raft log entry (this also 
includes writeStateMachineData which will vary depending upon the size of the 
data to be written as well as external factors)|
|raftLogEnqueuedTime|Time of RaftLogEntry in the Raft Log Worker Queue|
|raftLogQueueingDelay|Time required to enqueue a raft Log entry in raft log 
worker queue|
|raftLogWorkerQueueSize
|Raft log worker queue size which at any time gives the no of pending log 
entries to be committed to the raft log.|
|raftLogSegmentLoadLatency|Time required to load and process raft log segments 
during restart|



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (RATIS-648) Add metrics related to GrpcLogAppendRequests

2019-08-01 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-648:
-

 Summary: Add metrics related to GrpcLogAppendRequests 
 Key: RATIS-648
 URL: https://issues.apache.org/jira/browse/RATIS-648
 Project: Ratis
  Issue Type: Sub-task
  Components: server
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee


Following metrics would be useful related to GrpcLogAppends for performance and 
health monitoring and tuning:
|GrpcLogAppenderLatency|Time taken to append a log entry to each follower and 
get acknowledgement|
|logAppendRetryCount|Total no of retried logAppends requests|
|logAppendRequestCount|Total no of logAppendRequest|
|appendEntryProcessingLatency|Time required to process an append entry request 
on each follower|



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (RATIS-647) Create metrics associated with RaftLog for RaftServer

2019-08-01 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-647:
--
Description: 
We need the following metrics related to RaftLog and RaftLogWorker:
|raftLogSyncLatency|Time taken to sync raft log|
|numRaftLogSyncOps|Number of Raft log sync calls with respect to time(equals no 
of FlushStateMacine Calls)|
|raftLogSynBatchSize|No of raft log entries synced with each flush call|
|raftLogReadLatency|Time required to read a raft log entry from actual raft log 
file and create a raft log entry (Raft log read latency)|
|raftLogAppendLatency|Total time taken to append a raft log entry (this also 
includes writeStateMachineData which will vary depending upon the size of the 
data to be written as well as external factors)|
|raftLogEnqueuedTime|Time of RaftLogEntry in the Raft Log Worker Queue|
|raftLogQueueingDelay|Time required to enqueue a raft Log entry in raft log 
worker queue|
|raftLogSegmentLoadLatency|Time required to load and process raft log segments 
during restart|
|raftLogWorkerQueueSize|Raft log worker queue size which at any time gives the 
no of pending log entries to be committed to the raft log.|

  was:
We need the following metrics related to RaftLog and RaftLogWorker:
|raftLogSyncLatency|Time taken to sync raft log|
|numRaftLogSyncOps|Number of Raft log sync calls with respect to time(equals no 
of FlushStateMacine Calls)|
|raftLogSynBatchSize|No of raft log entries synced with each flush call|
|raftLogReadLatency|Time required to read a raft log entry from actual raft log 
file and create a raft log entry (Raft log read latency)|
|raftLogAppendLatency|Total time taken to append a raft log entry (this also 
includes writeStateMachineData which will vary depending upon the size of the 
data to be written as well as external factors)|
|raftLogEnqueuedTime|Time of RaftLogEntry in the Raft Log Worker Queue|
|raftLogQueueingDelay|Time required to enqueue a raft Log entry in raft log 
worker queue|
|raftLogWorkerQueueSize
|Raft log worker queue size which at any time gives the no of pending log 
entries to be committed to the raft log.|
|raftLogSegmentLoadLatency|Time required to load and process raft log segments 
during restart|


> Create metrics associated with RaftLog for RaftServer
> -
>
> Key: RATIS-647
> URL: https://issues.apache.org/jira/browse/RATIS-647
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
>
> We need the following metrics related to RaftLog and RaftLogWorker:
> |raftLogSyncLatency|Time taken to sync raft log|
> |numRaftLogSyncOps|Number of Raft log sync calls with respect to time(equals 
> no of FlushStateMacine Calls)|
> |raftLogSynBatchSize|No of raft log entries synced with each flush call|
> |raftLogReadLatency|Time required to read a raft log entry from actual raft 
> log file and create a raft log entry (Raft log read latency)|
> |raftLogAppendLatency|Total time taken to append a raft log entry (this also 
> includes writeStateMachineData which will vary depending upon the size of the 
> data to be written as well as external factors)|
> |raftLogEnqueuedTime|Time of RaftLogEntry in the Raft Log Worker Queue|
> |raftLogQueueingDelay|Time required to enqueue a raft Log entry in raft log 
> worker queue|
> |raftLogSegmentLoadLatency|Time required to load and process raft log 
> segments during restart|
> |raftLogWorkerQueueSize|Raft log worker queue size which at any time gives 
> the no of pending log entries to be committed to the raft log.|



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (RATIS-649) Add metrics related to ClientRequests

2019-08-01 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-649:
-

 Summary: Add metrics related to ClientRequests 
 Key: RATIS-649
 URL: https://issues.apache.org/jira/browse/RATIS-649
 Project: Ratis
  Issue Type: Sub-task
  Components: server
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee


Following metrics would be good to have to measure the load and the processing 
time of client requests:

 
|numReadRequestCount|Number of read type requests received on the leader|
|numWriteRequestCount|Number of write type requests received on the leader|
|numWatchForMajorityRequestCount|Number of Watch for Majority type requests 
received on the leader. 
 |
|numWatchForAllRequestCount|Number of Watch for All type requests received on 
the leader.|
|raftClientReadRequestLatency|Time required to process read type requests |
|raftClientWriteRequestLatency|Time required to process write type requests|
|raftClientWatchForMajority|Time required to process WatchForMajority requests|
|raftClientWatchForAllRequests|Time required to process WatchForAll requests|
|requestQueueLimitHitCount|Number of times the no of pending requests in the 
leader hit the configured limit.|
|numRequestRetryCacheHitCount|No of of Request Retry Cache hits. This gives an 
idea of retries via Raft clients because of request timeouts or exceptions.|



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (RATIS-647) Create metrics associated with RaftLog for RaftServer

2019-08-01 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-647:
--
Description: 
We need the following metrics related to RaftLog and RaftLogWorker:
|raftLogSyncLatency|Time taken to sync raft log|
|numRaftLogSyncOps|Number of Raft log sync calls with respect to time(equals no 
of FlushStateMacine Calls)|
|raftLogSynBatchSize|No of raft log entries synced with each flush call|
|raftLogReadLatency|Time required to read a raft log entry from actual raft log 
file and create a raft log entry (Raft log read latency)|
|raftLogAppendLatency|Total time taken to append a raft log entry (this also 
includes writeStateMachineData which will vary depending upon the size of the 
data to be written as well as external factors)|
|raftLogEnqueuedTime|Time of RaftLogEntry in the Raft Log Worker Queue|
|raftLogQueueingDelay|Time required to enqueue a raft Log entry in raft log 
worker queue|
|raftLogSegmentLoadLatency|Time required to load and process raft log segments 
during restart|
|raftLogWorkerQueueSize|Raft log worker queue size which at any time gives the 
no of pending log entries to be committed to the raft log.|
|raftLogCacheMissCount|Number of RaftLogCacheMisses |

  was:
We need the following metrics related to RaftLog and RaftLogWorker:
|raftLogSyncLatency|Time taken to sync raft log|
|numRaftLogSyncOps|Number of Raft log sync calls with respect to time(equals no 
of FlushStateMacine Calls)|
|raftLogSynBatchSize|No of raft log entries synced with each flush call|
|raftLogReadLatency|Time required to read a raft log entry from actual raft log 
file and create a raft log entry (Raft log read latency)|
|raftLogAppendLatency|Total time taken to append a raft log entry (this also 
includes writeStateMachineData which will vary depending upon the size of the 
data to be written as well as external factors)|
|raftLogEnqueuedTime|Time of RaftLogEntry in the Raft Log Worker Queue|
|raftLogQueueingDelay|Time required to enqueue a raft Log entry in raft log 
worker queue|
|raftLogSegmentLoadLatency|Time required to load and process raft log segments 
during restart|
|raftLogWorkerQueueSize|Raft log worker queue size which at any time gives the 
no of pending log entries to be committed to the raft log.|


> Create metrics associated with RaftLog for RaftServer
> -
>
> Key: RATIS-647
> URL: https://issues.apache.org/jira/browse/RATIS-647
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
>
> We need the following metrics related to RaftLog and RaftLogWorker:
> |raftLogSyncLatency|Time taken to sync raft log|
> |numRaftLogSyncOps|Number of Raft log sync calls with respect to time(equals 
> no of FlushStateMacine Calls)|
> |raftLogSynBatchSize|No of raft log entries synced with each flush call|
> |raftLogReadLatency|Time required to read a raft log entry from actual raft 
> log file and create a raft log entry (Raft log read latency)|
> |raftLogAppendLatency|Total time taken to append a raft log entry (this also 
> includes writeStateMachineData which will vary depending upon the size of the 
> data to be written as well as external factors)|
> |raftLogEnqueuedTime|Time of RaftLogEntry in the Raft Log Worker Queue|
> |raftLogQueueingDelay|Time required to enqueue a raft Log entry in raft log 
> worker queue|
> |raftLogSegmentLoadLatency|Time required to load and process raft log 
> segments during restart|
> |raftLogWorkerQueueSize|Raft log worker queue size which at any time gives 
> the no of pending log entries to be committed to the raft log.|
> |raftLogCacheMissCount|Number of RaftLogCacheMisses |



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (RATIS-646) Add Metrics for Ratis Data pipeline

2019-08-01 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-646:
--
Summary: Add Metrics for Ratis Data pipeline  (was: Add Metrics support for 
Ratis pipeline)

> Add Metrics for Ratis Data pipeline
> ---
>
> Key: RATIS-646
> URL: https://issues.apache.org/jira/browse/RATIS-646
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
>
> For performance measurements and ratis pipeline health diagnostics, certain 
> metrics need to be incorporated inside ratis. This Jira aims to encompass all 
> the required metrices.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (RATIS-646) Add Metrics for Ratis Data pipeline

2019-08-01 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-646:
--
Labels: ozone  (was: )

> Add Metrics for Ratis Data pipeline
> ---
>
> Key: RATIS-646
> URL: https://issues.apache.org/jira/browse/RATIS-646
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: ozone
> Fix For: 0.4.0
>
>
> For performance measurements and ratis pipeline health diagnostics, certain 
> metrics need to be incorporated inside ratis. This Jira aims to encompass all 
> the required metrices.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (RATIS-650) Add metrics related to leader and follower commits

2019-08-01 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-650:
-

 Summary: Add metrics related to leader and follower commits
 Key: RATIS-650
 URL: https://issues.apache.org/jira/browse/RATIS-650
 Project: Ratis
  Issue Type: Sub-task
  Components: server
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


Following metrics would be useful for measuring how each server is performing 
with respect to the incoming load as well as with respect to its peers in terms 
of discrepancies with respect to the commit indices.
|appliedIndexToCommitIndexDiff|The difference between lastApplied Index and log 
commit Index on each of Ratis server. This will give an indication of how the 
stateMachine is behaving with respect to the incoming load.|
|raftServerCommitIndexDiff|difference between the next commitIndex between a 
leader and each of its peer. This will give an idea at what of point of time a 
follower lags/catches up with respect to leader|

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (RATIS-651) Add metrics related to leaderElection and HeratBeat

2019-08-01 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-651:
-

 Summary: Add metrics related to leaderElection and HeratBeat
 Key: RATIS-651
 URL: https://issues.apache.org/jira/browse/RATIS-651
 Project: Ratis
  Issue Type: Sub-task
  Components: server
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee


Following metrics would be helpful to determine the leader election events and 
timeouts:

 
|numLeaderElections|Number of leader elections since the creation of ratis 
pipeline|
|numLeaderElectionTimeouts|Number of leader election timeouts or failures|
|LeaderElectionCompletionLatency|Time required to complete a leader election|
|MaxNoLeaderInterval|Max time where there has been no elected leader in the 
raft ring|
|heartBeatMissCount|No of times heartBeat response is missed from a server |



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (RATIS-652) Add metrics related to snapshot and log purge

2019-08-01 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created RATIS-652:
-

 Summary: Add metrics related to snapshot and log purge
 Key: RATIS-652
 URL: https://issues.apache.org/jira/browse/RATIS-652
 Project: Ratis
  Issue Type: Sub-task
  Components: server
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


Following metrics would be good to determine overall snapshot and log purge 
behaviour of a ratis pipeline:

 
|takeSnapshotLatency|Time taken to take a ratis snapshot.|
|numSnapshots|Number of snapshots taken |
|purgeLogRecordLatency|Time taken to purge logRecords.|
|numPurgeLogCalls|Number of Purge log calls|
|numInstallSnnapshotOps|Number of install snapshot calls|



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (RATIS-651) Add metrics related to leaderElection and HearttBeat

2019-08-01 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated RATIS-651:
--
Summary: Add metrics related to leaderElection and HearttBeat  (was: Add 
metrics related to leaderElection and HeratBeat)

> Add metrics related to leaderElection and HearttBeat
> 
>
> Key: RATIS-651
> URL: https://issues.apache.org/jira/browse/RATIS-651
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> Following metrics would be helpful to determine the leader election events 
> and timeouts:
>  
> |numLeaderElections|Number of leader elections since the creation of ratis 
> pipeline|
> |numLeaderElectionTimeouts|Number of leader election timeouts or failures|
> |LeaderElectionCompletionLatency|Time required to complete a leader election|
> |MaxNoLeaderInterval|Max time where there has been no elected leader in the 
> raft ring|
> |heartBeatMissCount|No of times heartBeat response is missed from a server |



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (RATIS-578) Illegal State transition in LeaderElection

2019-08-02 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899136#comment-16899136
 ] 

Shashikant Banerjee commented on RATIS-578:
---

The patch looks good to me. +1

> Illegal State transition in LeaderElection
> --
>
> Key: RATIS-578
> URL: https://issues.apache.org/jira/browse/RATIS-578
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: ozone
> Fix For: 0.4.0
>
> Attachments: r578_20190731.patch
>
>
> Illegal State transition in LeaderElection
> {code}
> java.lang.IllegalStateException: ILLEGAL TRANSITION: In 
> 3d75e29e-ff2a-47a6-82c4-6408d200876d:2019group--CB73AD2587F6:LeaderElection13,
>  STARTING -> CLOSED
> java.lang.IllegalStateException: ILLEGAL TRANSITION: In 
> 3d75e29e-ff2a-47a6-82c4-6408d200876d:group-CB73AD2587F6:LeaderElection13, 
> CLOSED -> RUNNING
> java.lang.IllegalStateException: ILLEGAL TRANSITION: In 
> 37da83b0-33ff-44cf-aeb9-67a102e13468:group-9FC4313E1696:LeaderElection217, 
> RUNNING -> CLOSED
> java.lang.IllegalStateException: IL2LEGAL TRANSITION: I0n 
> 95ef0599-6d8a-40f8-a69c-7ba0c956dc6c:group-21734B88A322:LeaderElection265, 
> RUNNING -> CLOSED
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (RATIS-661) Add call in state machine to handle group removal

2019-08-23 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914197#comment-16914197
 ] 

Shashikant Banerjee commented on RATIS-661:
---

Thanks [~ljain] for working on this. The patch looks good to me with just one 
minor comment. In the stateMachine ,can we put the default implementation as 
empty (just add a default keyword with method signature)?

> Add call in state machine to handle group removal
> -
>
> Key: RATIS-661
> URL: https://issues.apache.org/jira/browse/RATIS-661
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: RATIS-661.001.patch, RATIS-661.002.patch
>
>
> Currently during RaftServerProxy#groupRemoveAsync there is no way for 
> stateMachine to know that the RaftGroup will be removed. This Jira aims to 
> add a call in the stateMachine to handle group removal.
> It also changes the logic of groupRemoval api to remove the RaftServerImpl 
> from the RaftServerProxy#impls map after the shutdown is complete. This is 
> required to synchronize the removal with the corresponding api of 
> RaftServer#getGroupIds. RaftServer#getGroupIds uses the RaftServerProxy#impls 
> map to get the groupIds.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-651) Add metrics related to leaderElection and HeartBeat

2019-08-26 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915784#comment-16915784
 ] 

Shashikant Banerjee commented on RATIS-651:
---

Thanks [~avijayan] for working on this and [~elserj] for the review. I agree 
that the heartBeat miss count should be aggregated in the leader.  A solution 
would be to take the heartBeat count outside of LeaderElection metrics and 
define an new metrics for HeartBeat  itself and aggregate it in the LeaderState.

[~avijayan], We can also handle heartBeat in a new Jira altogether.

> Add metrics related to leaderElection and HeartBeat
> ---
>
> Key: RATIS-651
> URL: https://issues.apache.org/jira/browse/RATIS-651
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Aravindan Vijayan
>Priority: Major
> Attachments: RATIS-651-000.patch
>
>
> Following metrics would be helpful to determine the leader election events 
> and timeouts:
>  
> |numLeaderElections|Number of leader elections since the creation of ratis 
> pipeline|
> |numLeaderElectionTimeouts|Number of leader election timeouts or failures|
> |LeaderElectionCompletionLatency|Time required to complete a leader election|
> |MaxNoLeaderInterval|Max time where there has been no elected leader in the 
> raft ring|
> |heartBeatMissCount|No of times heartBeat response is missed from a server |



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (RATIS-651) Add metrics related to leaderElection and HeartBeat

2019-08-26 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915784#comment-16915784
 ] 

Shashikant Banerjee edited comment on RATIS-651 at 8/26/19 1:20 PM:


Thanks [~avijayan] for working on this and [~elserj] for the review. I agree 
that the heartBeat miss count should be aggregated in the leader.  A solution 
would be to take the heartBeat count outside of LeaderElection metrics and 
define an new metrics for HeartBeat  itself and aggregate it in the LeaderState.

[~avijayan], We can also handle heartBeat in a new Jira altogether.


was (Author: shashikant):
Thanks [~avijayan] for working on this and [~elserj] for the review. I agree 
that the heartBeat miss count should be aggregated in the leader.  A solution 
would be to take the heartBeat count outside of LeaderElection metrics and 
define an new metrics for HeartBeat  itself and aggregate it in the LeaderState.

[~avijayan], We can also handle heartBeat in a new Jira altogether.

> Add metrics related to leaderElection and HeartBeat
> ---
>
> Key: RATIS-651
> URL: https://issues.apache.org/jira/browse/RATIS-651
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Aravindan Vijayan
>Priority: Major
> Attachments: RATIS-651-000.patch
>
>
> Following metrics would be helpful to determine the leader election events 
> and timeouts:
>  
> |numLeaderElections|Number of leader elections since the creation of ratis 
> pipeline|
> |numLeaderElectionTimeouts|Number of leader election timeouts or failures|
> |LeaderElectionCompletionLatency|Time required to complete a leader election|
> |MaxNoLeaderInterval|Max time where there has been no elected leader in the 
> raft ring|
> |heartBeatMissCount|No of times heartBeat response is missed from a server |



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-651) Add metrics related to leaderElection and HeartBeat

2019-08-27 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916886#comment-16916886
 ] 

Shashikant Banerjee commented on RATIS-651:
---

Thanks [~avijayan] for working on this. The patch looks overall good to me .  
Can we just initialize and aggregate the heartBeatMetrics in LeaderState 
instead of LogAppender class?

> Add metrics related to leaderElection and HeartBeat
> ---
>
> Key: RATIS-651
> URL: https://issues.apache.org/jira/browse/RATIS-651
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Aravindan Vijayan
>Priority: Critical
> Attachments: RATIS-651-000.patch, RATIS-651-001.patch, 
> RATIS-651-002.patch
>
>
> Following metrics would be helpful to determine the leader election events 
> and timeouts:
>  
> |numLeaderElections|Number of leader elections since the creation of ratis 
> pipeline|
> |numLeaderElectionTimeouts|Number of leader election timeouts or failures|
> |LeaderElectionCompletionLatency|Time required to complete a leader election|
> |MaxNoLeaderInterval|Max time where there has been no elected leader in the 
> raft ring|
> |heartBeatMissCount|No of times heartBeat response is missed from a server |



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-651) Add metrics related to leaderElection and HeartBeat

2019-08-27 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917409#comment-16917409
 ] 

Shashikant Banerjee commented on RATIS-651:
---

Thanks [~avijayan] for updating the patch. The patch looks good to me. I am +1 
on this change. Will commit this shortly.

> Add metrics related to leaderElection and HeartBeat
> ---
>
> Key: RATIS-651
> URL: https://issues.apache.org/jira/browse/RATIS-651
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Aravindan Vijayan
>Priority: Critical
> Attachments: RATIS-651-000.patch, RATIS-651-001.patch, 
> RATIS-651-002.patch, RATIS-651-003.patch
>
>
> Following metrics would be helpful to determine the leader election events 
> and timeouts:
>  
> |numLeaderElections|Number of leader elections since the creation of ratis 
> pipeline|
> |numLeaderElectionTimeouts|Number of leader election timeouts or failures|
> |LeaderElectionCompletionLatency|Time required to complete a leader election|
> |MaxNoLeaderInterval|Max time where there has been no elected leader in the 
> raft ring|
> |heartBeatMissCount|No of times heartBeat response is missed from a server |



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-650) Add metric to track lag between leader and follower in Log commit

2019-09-05 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923141#comment-16923141
 ] 

Shashikant Banerjee commented on RATIS-650:
---

Thanks [~sdeka] for working on this. The patch looks good .

Since, these specific metrics are only collected at leader  , would it make 
more sense to name it as LeaderMetrics rather than a RaftGroupMetrics?

Also, the below comment needs to be fixed syntactically.
{code:java}
// Normally, leader commit index is always ahead followers.
{code}

> Add metric to track lag between leader and follower in Log commit
> -
>
> Key: RATIS-650
> URL: https://issues.apache.org/jira/browse/RATIS-650
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: Supratim Deka
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: RATIS-650.000.patch
>
>
> Following metric indicates by how much each follower lags the leader with 
> respect to the log commit index.
> |raftServerCommitIndexDiff|difference between the next commitIndex between a 
> leader and each of its peer. This will give an idea at what of point of time 
> a follower lags/catches up with respect to leader|
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (RATIS-670) Add a metric to track StateMachine Log apply index

2019-09-11 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928159#comment-16928159
 ] 

Shashikant Banerjee edited comment on RATIS-670 at 9/12/19 3:15 AM:


Thanks [~sdeka] for updating the patch. StateMachineUpdater increments the 
applied index as soon as the call to applyLogToStateMachine happens, but 
doesn't wait for the transactions to complete unless it needs to take a 
snapshot.. Would it make more sense to update the applied index in the metrics 
by fetching it from StateMachine rather that what is maintained in 
StateMachineUpdater?


was (Author: shashikant):
Thanks [~sdeka] for updating the patch. StateMachineUpdater increments the 
applied index as soon as the call to applyLogToStateMachine happens, but 
doesn't wait for the transactions to complete unless it needs to take a 
snapshot.. Would it make more sense to update the applied index in the metrics 
by fetching it from StateMachine rather that what is maintainned in 
StateMachineUpdater?

> Add a metric to track StateMachine Log apply index
> --
>
> Key: RATIS-670
> URL: https://issues.apache.org/jira/browse/RATIS-670
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
> Attachments: RATIS-670.000.patch, RATIS-670.001.patch
>
>
> Plotting the Log apply index (log index applied on the StateMachine) against 
> the RaftLog commit index, is useful in monitoring the performance of the 
> statemachine.
> This jira adds a metric/gauge which tracks the current value of log apply 
> index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-670) Add a metric to track StateMachine Log apply index

2019-09-11 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928159#comment-16928159
 ] 

Shashikant Banerjee commented on RATIS-670:
---

Thanks [~sdeka] for updating the patch. StateMachineUpdater increments the 
applied index as soon as the call to applyLogToStateMachine happens, but 
doesn't wait for the transactions to complete unless it needs to take a 
snapshot.. Would it make more sense to update the applied index in the metrics 
by fetching it from StateMachine rather that what is maintainned in 
StateMachineUpdater?

> Add a metric to track StateMachine Log apply index
> --
>
> Key: RATIS-670
> URL: https://issues.apache.org/jira/browse/RATIS-670
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
> Attachments: RATIS-670.000.patch, RATIS-670.001.patch
>
>
> Plotting the Log apply index (log index applied on the StateMachine) against 
> the RaftLog commit index, is useful in monitoring the performance of the 
> statemachine.
> This jira adds a metric/gauge which tracks the current value of log apply 
> index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (RATIS-670) Add a metric to track StateMachine Log apply index

2019-09-11 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928159#comment-16928159
 ] 

Shashikant Banerjee edited comment on RATIS-670 at 9/12/19 3:17 AM:


Thanks [~sdeka] for updating the patch. StateMachineUpdater increments the 
applied index as soon as the call to applyLogToStateMachine happens, but 
doesn't wait for the transactions to complete unless it needs to take a 
snapshot.. Would it make more sense to update the applied index in the metrics 
by fetching it from StateMachine rather that what is maintained in 
StateMachineUpdater?

Also, in the test , what we can do is to create a raft cluster and send across 
some requests and see if the applied index metrics is getting incremented?


was (Author: shashikant):
Thanks [~sdeka] for updating the patch. StateMachineUpdater increments the 
applied index as soon as the call to applyLogToStateMachine happens, but 
doesn't wait for the transactions to complete unless it needs to take a 
snapshot.. Would it make more sense to update the applied index in the metrics 
by fetching it from StateMachine rather that what is maintained in 
StateMachineUpdater?

> Add a metric to track StateMachine Log apply index
> --
>
> Key: RATIS-670
> URL: https://issues.apache.org/jira/browse/RATIS-670
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
> Attachments: RATIS-670.000.patch, RATIS-670.001.patch
>
>
> Plotting the Log apply index (log index applied on the StateMachine) against 
> the RaftLog commit index, is useful in monitoring the performance of the 
> statemachine.
> This jira adds a metric/gauge which tracks the current value of log apply 
> index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-647) Create metrics associated with RaftLog for RaftServer

2019-09-12 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928420#comment-16928420
 ] 

Shashikant Banerjee commented on RATIS-647:
---

Thanks [~avijayan] for working on this. The approach looks good to me. Can we 
also follow the same existing convention for metrics name as it exists in Ratis 
rather than using underscore.

> Create metrics associated with RaftLog for RaftServer
> -
>
> Key: RATIS-647
> URL: https://issues.apache.org/jira/browse/RATIS-647
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Aravindan Vijayan
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: RATIS-647-000.patch
>
>
> We need the following metrics related to RaftLog and RaftLogWorker:
> |raftLogSyncLatency|Time taken to sync raft log|
> |numRaftLogSyncOps|Number of Raft log sync calls with respect to time(equals 
> no of FlushStateMacine Calls)|
> |raftLogSynBatchSize|No of raft log entries synced with each flush call|
> |raftLogReadLatency|Time required to read a raft log entry from actual raft 
> log file and create a raft log entry (Raft log read latency)|
> |raftLogAppendLatency|Total time taken to append a raft log entry (this also 
> includes writeStateMachineData which will vary depending upon the size of the 
> data to be written as well as external factors)|
> |raftLogEnqueuedTime|Time of RaftLogEntry in the Raft Log Worker Queue|
> |raftLogQueueingDelay|Time required to enqueue a raft Log entry in raft log 
> worker queue|
> |raftLogSegmentLoadLatency|Time required to load and process raft log 
> segments during restart|
> |raftLogWorkerQueueSize|Raft log worker queue size which at any time gives 
> the no of pending log entries to be committed to the raft log.|
> |raftLogCacheMissCount|Number of RaftLogCacheMisses |



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-670) Add a metric to track StateMachine Log apply index

2019-09-12 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928427#comment-16928427
 ] 

Shashikant Banerjee commented on RATIS-670:
---

Thanks [~sdeka] for working on this. The patch looks good to me. Can you check 
the checkstyle/javac failures?

> Add a metric to track StateMachine Log apply index
> --
>
> Key: RATIS-670
> URL: https://issues.apache.org/jira/browse/RATIS-670
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
> Attachments: RATIS-670.000.patch, RATIS-670.001.patch, 
> RATIS-670.002.patch, RATIS-670.003.patch
>
>
> Plotting the Log apply index (log index applied on the StateMachine) against 
> the RaftLog commit index, is useful in monitoring the performance of the 
> statemachine.
> This jira adds a metric/gauge which tracks the current value of log apply 
> index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-670) Add a metric to track StateMachine Log apply index

2019-09-12 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928432#comment-16928432
 ] 

Shashikant Banerjee commented on RATIS-670:
---

For the test, we just need to create a raft client inn a raft cluster, send a 
msg and read the gauge values from the leader. 
CheckRaftBasicTests#testRequestTimeout for reference.

> Add a metric to track StateMachine Log apply index
> --
>
> Key: RATIS-670
> URL: https://issues.apache.org/jira/browse/RATIS-670
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
> Attachments: RATIS-670.000.patch, RATIS-670.001.patch, 
> RATIS-670.002.patch, RATIS-670.003.patch
>
>
> Plotting the Log apply index (log index applied on the StateMachine) against 
> the RaftLog commit index, is useful in monitoring the performance of the 
> statemachine.
> This jira adds a metric/gauge which tracks the current value of log apply 
> index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (RATIS-670) Add a metric to track StateMachine Log apply index

2019-09-12 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928432#comment-16928432
 ] 

Shashikant Banerjee edited comment on RATIS-670 at 9/12/19 10:26 AM:
-

For the test, we just need to create a raft client inn a raft cluster, send a 
msg , wait for the reply and read the gauge values from the leader. 
CheckRaftBasicTests#testRequestTimeout for reference.


was (Author: shashikant):
For the test, we just need to create a raft client inn a raft cluster, send a 
msg and read the gauge values from the leader. 
CheckRaftBasicTests#testRequestTimeout for reference.

> Add a metric to track StateMachine Log apply index
> --
>
> Key: RATIS-670
> URL: https://issues.apache.org/jira/browse/RATIS-670
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
> Attachments: RATIS-670.000.patch, RATIS-670.001.patch, 
> RATIS-670.002.patch, RATIS-670.003.patch
>
>
> Plotting the Log apply index (log index applied on the StateMachine) against 
> the RaftLog commit index, is useful in monitoring the performance of the 
> statemachine.
> This jira adds a metric/gauge which tracks the current value of log apply 
> index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (RATIS-670) Add a metric to track StateMachine Log apply index

2019-09-12 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928432#comment-16928432
 ] 

Shashikant Banerjee edited comment on RATIS-670 at 9/12/19 10:27 AM:
-

For the test, we just need to create a raft client inn a raft cluster, send a 
msg , wait for the reply and read the gauge values from the leader. Check 
RaftBasicTests#testRequestTimeout for reference.


was (Author: shashikant):
For the test, we just need to create a raft client inn a raft cluster, send a 
msg , wait for the reply and read the gauge values from the leader. 
CheckRaftBasicTests#testRequestTimeout for reference.

> Add a metric to track StateMachine Log apply index
> --
>
> Key: RATIS-670
> URL: https://issues.apache.org/jira/browse/RATIS-670
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
> Attachments: RATIS-670.000.patch, RATIS-670.001.patch, 
> RATIS-670.002.patch, RATIS-670.003.patch
>
>
> Plotting the Log apply index (log index applied on the StateMachine) against 
> the RaftLog commit index, is useful in monitoring the performance of the 
> statemachine.
> This jira adds a metric/gauge which tracks the current value of log apply 
> index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-670) Add a metric to track StateMachine Log apply index

2019-09-13 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929101#comment-16929101
 ] 

Shashikant Banerjee commented on RATIS-670:
---

Thanks [~sdeka] for working on this. The patch overall looks good to me. Few 
comments inline:

 

1. Can we change the naming here? it looks confusing.
{code:java}
public static final String RATIS_APPLIED_INDEX_GAUGE =
"ratis_applied_index";
public static final String STATEMACHINE_APPLIED_INDEX_GAUGE =
"statemachine_applied_index";
{code}
2. Can we add a new test altogether just verifying the metric value instead of 
modifying existing "testRequestTiemout" test?

 

> Add a metric to track StateMachine Log apply index
> --
>
> Key: RATIS-670
> URL: https://issues.apache.org/jira/browse/RATIS-670
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
> Attachments: RATIS-670.000.patch, RATIS-670.001.patch, 
> RATIS-670.002.patch, RATIS-670.003.patch, RATIS-670.004.patch, 
> RATIS-670.005.patch
>
>
> Plotting the Log apply index (log index applied on the StateMachine) against 
> the RaftLog commit index, is useful in monitoring the performance of the 
> statemachine.
> This jira adds a metric/gauge which tracks the current value of log apply 
> index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-647) Create metrics associated with RaftLog for RaftServer

2019-09-13 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929117#comment-16929117
 ] 

Shashikant Banerjee commented on RATIS-647:
---

[~avijayan], let's follow the hadoop convention of using cameCase for all.

> Create metrics associated with RaftLog for RaftServer
> -
>
> Key: RATIS-647
> URL: https://issues.apache.org/jira/browse/RATIS-647
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Aravindan Vijayan
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: RATIS-647-000.patch
>
>
> We need the following metrics related to RaftLog and RaftLogWorker:
> |raftLogSyncLatency|Time taken to sync raft log|
> |numRaftLogSyncOps|Number of Raft log sync calls with respect to time(equals 
> no of FlushStateMacine Calls)|
> |raftLogSynBatchSize|No of raft log entries synced with each flush call|
> |raftLogReadLatency|Time required to read a raft log entry from actual raft 
> log file and create a raft log entry (Raft log read latency)|
> |raftLogAppendLatency|Total time taken to append a raft log entry (this also 
> includes writeStateMachineData which will vary depending upon the size of the 
> data to be written as well as external factors)|
> |raftLogEnqueuedTime|Time of RaftLogEntry in the Raft Log Worker Queue|
> |raftLogQueueingDelay|Time required to enqueue a raft Log entry in raft log 
> worker queue|
> |raftLogSegmentLoadLatency|Time required to load and process raft log 
> segments during restart|
> |raftLogWorkerQueueSize|Raft log worker queue size which at any time gives 
> the no of pending log entries to be committed to the raft log.|
> |raftLogCacheMissCount|Number of RaftLogCacheMisses |



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-670) Add a metric to track StateMachine Log apply index

2019-09-16 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930389#comment-16930389
 ] 

Shashikant Banerjee commented on RATIS-670:
---

Thanks [~sdeka] for working on this. The patch looks good to me. I am +1 on the 
change.

> Add a metric to track StateMachine Log apply index
> --
>
> Key: RATIS-670
> URL: https://issues.apache.org/jira/browse/RATIS-670
> Project: Ratis
>  Issue Type: Sub-task
>  Components: server
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
> Attachments: RATIS-670.000.patch, RATIS-670.001.patch, 
> RATIS-670.002.patch, RATIS-670.003.patch, RATIS-670.004.patch, 
> RATIS-670.005.patch, RATIS-670.006.patch
>
>
> Plotting the Log apply index (log index applied on the StateMachine) against 
> the RaftLog commit index, is useful in monitoring the performance of the 
> statemachine.
> This jira adds a metric/gauge which tracks the current value of log apply 
> index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


<    1   2   3   4   >