[jira] [Assigned] (RATIS-624) RaftServer should support pause/ unpause in its LifeCycle state

2020-07-30 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned RATIS-624:
---

Assignee: Rui Wang

> RaftServer should support pause/ unpause in its LifeCycle state
> ---
>
> Key: RATIS-624
> URL: https://issues.apache.org/jira/browse/RATIS-624
> Project: Ratis
>  Issue Type: Task
>Reporter: Hanisha Koneru
>Assignee: Rui Wang
>Priority: Major
>  Labels: ozone
> Fix For: 1.1.0
>
>
> This Jira aims to add support to RaftServer to support pause and unpause to 
> its state. When paused, the RaftServer should not accept any incoming append 
> log entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-624) RaftServer should support pause/ unpause in its LifeCycle state

2020-07-30 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168377#comment-17168377
 ] 

Arpit Agarwal commented on RATIS-624:
-

[~amaliujia] I have added you as a Ratis contributor. You will be able to 
assign issues to yourself now. Welcome aboard!

> RaftServer should support pause/ unpause in its LifeCycle state
> ---
>
> Key: RATIS-624
> URL: https://issues.apache.org/jira/browse/RATIS-624
> Project: Ratis
>  Issue Type: Task
>Reporter: Hanisha Koneru
>Assignee: Rui Wang
>Priority: Major
>  Labels: ozone
> Fix For: 1.1.0
>
>
> This Jira aims to add support to RaftServer to support pause and unpause to 
> its state. When paused, the RaftServer should not accept any incoming append 
> log entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-923) Cache evicted before RollSegment completes

2020-05-11 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-923:

Labels: pull-request-available  (was: )

> Cache evicted before RollSegment completes
> --
>
> Key: RATIS-923
> URL: https://issues.apache.org/jira/browse/RATIS-923
> Project: Ratis
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If a segment is evicted from Segment Cache before it is rolled over and 
> applyTransaction thread tries to read this segment, it can lead to 
> FileNotFoundException.
> Please refer to [~msingh]'s comment in HDDS-3382.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-854) Update download links

2020-04-16 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-854:

Description: The download links for signatures/checksums/KEYS should be 
updated from dist.apache.org to https://downloads.apache.org/incubator/ratis/  
(was: The download lists for signatures/checksums/KEYS should be updated from 
dist.apache.org to https://downloads.apache.org/incubator/ratis/)

> Update download links
> -
>
> Key: RATIS-854
> URL: https://issues.apache.org/jira/browse/RATIS-854
> Project: Ratis
>  Issue Type: Improvement
>  Components: website
>Reporter: Arpit Agarwal
>Priority: Major
>  Labels: newbie
>
> The download links for signatures/checksums/KEYS should be updated from 
> dist.apache.org to https://downloads.apache.org/incubator/ratis/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Moved] (RATIS-854) Update download links

2020-04-16 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal moved HDDS-3410 to RATIS-854:
---

Component/s: (was: website)
 website
Key: RATIS-854  (was: HDDS-3410)
   Workflow: no-reopen-closed, patch-avail  (was: patch-available, re-open 
possible)
Project: Ratis  (was: Hadoop Distributed Data Store)

> Update download links
> -
>
> Key: RATIS-854
> URL: https://issues.apache.org/jira/browse/RATIS-854
> Project: Ratis
>  Issue Type: Improvement
>  Components: website
>Reporter: Arpit Agarwal
>Priority: Major
>  Labels: newbie
>
> The download lists for signatures/checksums/KEYS should be updated from 
> dist.apache.org to https://downloads.apache.org/hadoop/ozone/.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-854) Update download links

2020-04-16 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-854:

Description: The download lists for signatures/checksums/KEYS should be 
updated from dist.apache.org to https://downloads.apache.org/incubator/ratis/  
(was: The download lists for signatures/checksums/KEYS should be updated from 
dist.apache.org to https://downloads.apache.org/hadoop/ozone/.)

> Update download links
> -
>
> Key: RATIS-854
> URL: https://issues.apache.org/jira/browse/RATIS-854
> Project: Ratis
>  Issue Type: Improvement
>  Components: website
>Reporter: Arpit Agarwal
>Priority: Major
>  Labels: newbie
>
> The download lists for signatures/checksums/KEYS should be updated from 
> dist.apache.org to https://downloads.apache.org/incubator/ratis/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-816) Use peerId in error log / exception of GrpcServerProtocolClient

2020-03-11 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057089#comment-17057089
 ] 

Arpit Agarwal commented on RATIS-816:
-

The checkstyle warning looks related to the patch.

> Use peerId in error log / exception of GrpcServerProtocolClient
> ---
>
> Key: RATIS-816
> URL: https://issues.apache.org/jira/browse/RATIS-816
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
> Attachments: RATIS-816.001.patch, RATIS-816.002.patch
>
>
> GrpcServerProtocolClient is used to send out requestVote and appendLogEntry 
> requests.
> I propose to persist raftPeerId in the constructor and use it in the error / 
> exception message.
> This is not just getting more meaningful message (it's a nice to have) but in 
> HDDS-3023 I am modifying the byte code to mock the leader->follower 
> communication. It's way more easier to do if the required raftPeerId is 
> available in the class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (RATIS-823) Add links to signatures and hashes on download page

2020-03-03 Thread Arpit Agarwal (Jira)
Arpit Agarwal created RATIS-823:
---

 Summary: Add links to signatures and hashes on download page
 Key: RATIS-823
 URL: https://issues.apache.org/jira/browse/RATIS-823
 Project: Ratis
  Issue Type: Bug
  Components: website
Reporter: Arpit Agarwal


Add the missing links to signatures and hashes here: 
https://ratis.incubator.apache.org/#download



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-815) Log entry corrupted with 0 checksum

2020-02-13 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-815:

Priority: Blocker  (was: Major)

> Log entry corrupted with 0 checksum
> ---
>
> Key: RATIS-815
> URL: https://issues.apache.org/jira/browse/RATIS-815
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Priority: Blocker
>
> After writing a few large keys (128MB) with very small chunks size (64KB) in 
> Ozone, Ratis reports log entry corruption due to checksum error:
> {code}
> 2020-02-13 12:01:41 INFO  SegmentedRaftLogWorker:396 - 
> e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker:
>  Rolling segment log-62379_62465 to index:62465
> 2020-02-13 12:01:41 INFO  SegmentedRaftLogWorker:541 - 
> e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker:
>  Rolled log segment from 
> /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_inprogress_62379
>  to 
> /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_62379-62465
> 2020-02-13 12:01:41 INFO  SegmentedRaftLogWorker:583 - 
> e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236-SegmentedRaftLogWorker:
>  created new log segment 
> /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_inprogress_62466
> 2020-02-13 12:01:41 ERROR LogAppender:81 - 
> e5e4fd1e-aa81-48a2-98f9-b1ba24531624@group-B85226EEE236->ac5b3434-874b-4375-8a03-989e8c7fb692-GrpcLogAppender-AppenderDaemon
>  failed RaftLog
> org.apache.ratis.server.raftlog.RaftLogIOException: 
> org.apache.ratis.protocol.ChecksumException: Log entry corrupted: Calculated 
> checksum is CDFED097 but read checksum is .
>   at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:311)
>   at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:292)
>   at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:297)
>   at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:213)
>   at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:179)
>   at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:122)
>   at 
> org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:77)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.ratis.protocol.ChecksumException: Log entry corrupted: 
> Calculated checksum is CDFED097 but read checksum is .
>   at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.decodeEntry(SegmentedRaftLogReader.java:312)
>   at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.readEntry(SegmentedRaftLogReader.java:194)
>   at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:129)
>   at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:98)
>   at 
> org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:202)
>   at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:309)
>   ... 7 more
> {code}
> Steps to reproduce:
> 1. Configure Ozone with 64KB chunk size and slightly higher buffer sizes:
> {code}
> ozone.scm.chunk.size: 64KB
> ozone.client.stream.buffer.flush.size: 256KB
> ozone.client.stream.buffer.max.size: 1MB
> {code}
> 2. Run Freon:
> {code}
> ozone freon ockg -n 1 -t 1 -p warmup
> ozone freon ockg -p test -t 8 -s 134217728 -n 32
> {code}
> Interestingly, even {{log_5106-5509}} has invalid entry (according to log 
> dump utility):
> {code}
> Processing Raft Log file: 
> /data/metadata/ratis/f89fc072-9ee9-459b-85d1-b85226eee236/current/log_5106-5509
>  size:1030796
> ...
> (t:1, i:5161), STATEMACHINELOGENTRY, client-296B6A48E40D, cid=3307
> Exception in thread "main" org.apache.ratis.protocol.ChecksumException: Log 
> entry corrupted: Calculated checksum is 926127AE but read checksum is 
> .
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-804) Race condition between cache evict and load in LogSegment

2020-01-30 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026767#comment-17026767
 ] 

Arpit Agarwal commented on RATIS-804:
-

[~msingh] [~szetszwo] any suggestions on how we should fix this?

> Race condition between cache evict and load in LogSegment
> -
>
> Key: RATIS-804
> URL: https://issues.apache.org/jira/browse/RATIS-804
> Project: Ratis
>  Issue Type: Bug
>Reporter: Marton Elek
>Priority: Critical
>
> I am doing some kind of stress testing with Ozone. I start one Datanode in 
> FOLLOWER mode and the load generator (Freon) behaves like a LEADER.
> I am sending huge number of AppendLogEntries to the FOLLOWER without 
> inhibitions.
> As a result I got NPE:
> {code:java}
> 2020-01-28 15:08:20 ERROR StateMachineUpdater:184 - 
> 3fda0c39-ce3c-4540-a804-44d9ac1f4853@group-E1B13B4CA5C0-StateMachineUpdater: 
> the StateMachineUp
> dater hits Throwable
> org.apache.ratis.server.raftlog.RaftLogIOException: 
> java.lang.NullPointerException
> at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:320)
> at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:293)
> at 
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:218)
> at 
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:167)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at java.util.Objects.requireNonNull(Objects.java:203)
> at 
> org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:214)
> at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:318)
> ... 4 more {code}
> It seems to be a race condition between LogSegment.evictCache() and 
> LogSegment.loadCache().
>  # StateMachineUpdater tries to update the StateMachine with the next log 
> entry
>  # It can't be found in the cache, therefore the LogSegment.loadCache() is 
> called
>  # The LogSegment.LogEntryLoader.load() reads the segment files from the disk
>  # After loading, it returns with the loaded entry
> If the GRPC thread evicts the cache between 3 and 4. (it's possible that the 
> log segment is already flushed, therefore can be evicted) an NPE will be 
> thrown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-797) Ratis segment file corruption after server restart

2020-01-16 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-797:

Priority: Blocker  (was: Major)

> Ratis segment file corruption after server restart
> --
>
> Key: RATIS-797
> URL: https://issues.apache.org/jira/browse/RATIS-797
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Blocker
> Fix For: 0.5.0
>
>
> While testing ozone, it was observed that ratis segment show corruptions 
> after a server restart
> {code:java}
> 2020-01-08 02:06:46,576 INFO 
> org.apache.ratis.server.raftlog.segmented.LogSegment: Successfully read 1 
> entries from segment file 
> /metadata/hadoop-ozone/datanode/ratis/data/5e26b460-ca4e-4791-bf70-1fd535056988/current/log_inprogress_0
> 2020-01-08 02:06:46,576 WARN 
> org.apache.ratis.server.raftlog.segmented.LogSegment: Segment file is 
> corrupted: expected to have 0 entries but only 1 entries read successfully
> 2020-01-08 02:06:46,580 INFO 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
> 2d422fc8-f7c2-4e41-a59b-abbf76330dfe@group-1FD535056988-SegmentedRaftLogWorker:
>  flushIndex: setUnconditionally 0 -> 0
> 2020-01-08 02:06:46,618 INFO org.eclipse.jetty.util.log: Logging initialized 
> @1978ms
> 2020-01-08 02:06:46,738 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.snaps
> 2020-01-16 07:51:12,268 WARN 
> org.apache.ratis.server.raftlog.segmented.LogSegment: Segment file is 
> corrupted: expected to have -3668 entries but only 3500 entries read 
> successfully
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-776) Handle ResourceUnavailabeException properly in Ratis Server

2020-01-05 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008523#comment-17008523
 ] 

Arpit Agarwal commented on RATIS-776:
-

Are the unit test failures related to the patch?

> Handle ResourceUnavailabeException properly in Ratis Server
> ---
>
> Key: RATIS-776
> URL: https://issues.apache.org/jira/browse/RATIS-776
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: RATIS-776.001.patch, RATIS-776.002.patch, 
> RATIS-776.003.patch
>
>
> Ratis leader while processing a client request tries to create a pending 
> request. If it is not able to do so it fails the request with 
> ResourceUnavailableException. But the server keeps processing the other 
> requests from the same client. The resources can be released when the other 
> client requests are processed, resulting in out of order processing of client 
> requests. On failure the server should ideally fail all the client requests 
> which need to be processed.
> {code:java}
> 2019-12-10 19:50:31,846 [grpc-default-executor-5] INFO  
> ratis.ContainerStateMachine 
> (ContainerStateMachine.java:preAppendTransaction(311)) - append seqNum:2 
> WriteChunk
> 2019-12-10 19:50:31,860 [grpc-default-executor-5] INFO  
> ratis.ContainerStateMachine 
> (ContainerStateMachine.java:preAppendTransaction(311)) - append seqNum:3 
> WriteChunk
> Caused by: org.apache.ratis.protocol.exceptions.ResourceUnavailableException: 
> 164293f2-68e3-4851-bc46-4a828bd79ffa@group-03010B1A5718: Failed to acquire a 
> pending write request for 
> RaftClientRequest:client-38E7254A5AF1->164293f2-68e3-4851-bc46-4a828bd79ffa@group-03010B1A5718,
>  cid=3, seq=4, RW, Message:00b2080612343362...(size=182)Caused by: 
> org.apache.ratis.protocol.exceptions.ResourceUnavailableException: 
> 164293f2-68e3-4851-bc46-4a828bd79ffa@group-03010B1A5718: Failed to acquire a 
> pending write request for 
> RaftClientRequest:client-38E7254A5AF1->164293f2-68e3-4851-bc46-4a828bd79ffa@group-03010B1A5718,
>  cid=3, seq=4, RW, Message:00b2080612343362...(size=182) at 
> org.apache.ratis.server.impl.RaftServerImpl.appendTransaction(RaftServerImpl.java:514)
>  at 
> org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:589)
>  at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$submitClientRequestAsync$7(RaftServerProxy.java:333)
>  at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$null$5(RaftServerProxy.java:328)
>  at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:109) at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$submitRequest$6(RaftServerProxy.java:328)
>  at 
> java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
>  at 
> java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
>  at 
> org.apache.ratis.server.impl.RaftServerProxy.submitRequest(RaftServerProxy.java:327)
>  at 
> org.apache.ratis.server.impl.RaftServerProxy.submitClientRequestAsync(RaftServerProxy.java:333)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.processClientRequest(GrpcClientProtocolService.java:221)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolService$OrderedRequestStreamObserver.processClientRequest(GrpcClientProtocolService.java:327)
>  at 
> org.apache.ratis.util.SlidingWindow$Server.processRequestsFromHead(SlidingWindow.java:429)
>  at 
> org.apache.ratis.util.SlidingWindow$Server.receivedRequest(SlidingWindow.java:421)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolService$OrderedRequestStreamObserver.processClientRequest(GrpcClientProtocolService.java:346)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.onNext(GrpcClientProtocolService.java:241)
>  at 
> org.apache.ratis.grpc.client.GrpcClientProtocolService$RequestStreamObserver.onNext(GrpcClientProtocolService.java:168)
>  at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:251)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:309)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:292)
>  at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:779)
>  ... 5 more
> 2019-12-10 19:50:31,860 [grpc-default-executor-5] INFO 
> ratis.ContainerStateMachine 
> (ContainerStateMachine.java:preAppendTransaction(311)) - append seqNum:5 
> WriteC

[jira] [Updated] (RATIS-727) Garbage collection due to same request retries on a follower

2019-11-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-727:

Priority: Blocker  (was: Major)

> Garbage collection due to same request retries on a follower
> 
>
> Key: RATIS-727
> URL: https://issues.apache.org/jira/browse/RATIS-727
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Lokesh Jain
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> In a heap dump it could be seen that a client request retries on the same 
> follower multiple times and every time the request is rejected with a 
> NotLeaderException. In case of Ozone it is a WriteChunk request which leads 
> to garbage collection of 16MB for every request. In the heap dump a client 
> request retries multiple times leading to garbage collection of ~100MB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-709) RaftClient should not retry on a different leader on NotReplicated exception from leader

2019-11-24 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-709:

Priority: Blocker  (was: Major)

> RaftClient should not retry on a different leader on NotReplicated exception 
> from leader
> 
>
> Key: RATIS-709
> URL: https://issues.apache.org/jira/browse/RATIS-709
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Shashikant Banerjee
>Assignee: Sammi Chen
>Priority: Blocker
>
> Currently, when a watch request times out with a NotReplicatedException on 
> the leader raft client starts retrying the request on different server and 
> starts failing with NotLeaderException and it goes in a loop. Ideally , when 
> a watch request times out , it should not be retried automatically by raft 
> client given the timeout value in the leader is sufficiently reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (RATIS-727) Garbage collection due to same request retries on a follower

2019-11-22 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned RATIS-727:
---

Assignee: Hanisha Koneru

> Garbage collection due to same request retries on a follower
> 
>
> Key: RATIS-727
> URL: https://issues.apache.org/jira/browse/RATIS-727
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Lokesh Jain
>Assignee: Hanisha Koneru
>Priority: Major
>
> In a heap dump it could be seen that a client request retries on the same 
> follower multiple times and every time the request is rejected with a 
> NotLeaderException. In case of Ozone it is a WriteChunk request which leads 
> to garbage collection of 16MB for every request. In the heap dump a client 
> request retries multiple times leading to garbage collection of ~100MB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-458) GrpcLogAppender#shouldWait should wait on pending log entries to follower

2019-11-19 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-458:

Priority: Blocker  (was: Major)

> GrpcLogAppender#shouldWait should wait on pending log entries to follower
> -
>
> Key: RATIS-458
> URL: https://issues.apache.org/jira/browse/RATIS-458
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Blocker
>  Labels: ozone
> Attachments: RATIS-458.001.patch, RATIS-458.002.patch, 
> RATIS-458.003.patch, RATIS-458.004.patch
>
>
> In GrpcLogAppender when an append entry times out we remove the entry from 
> the pendingRequests. This decreases the size of pendingRequests which affects 
> the logic in GrpcLogAppender#shouldWait. Further we also consider heartbeats 
> in shouldWait because heartbeats are tracked in pendingRequests. It should 
> actually wait on the number of log entries which are appended to follower but 
> have not yet been processed by it.
> GrpcConfigKeys.Server.leaderOutstandingAppendsMax should also be a fraction 
> of RaftServerConfigKeys.Log.queueSize. This brings flow control for leader's 
> append entries to follower because then number of outstanding append entries 
> in leader would be limited by maximum number of operations in raft log worker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-458) GrpcLogAppender#shouldWait should wait on pending log entries to follower

2019-11-19 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-458:

Target Version/s: 0.5.0

> GrpcLogAppender#shouldWait should wait on pending log entries to follower
> -
>
> Key: RATIS-458
> URL: https://issues.apache.org/jira/browse/RATIS-458
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Blocker
>  Labels: ozone
> Attachments: RATIS-458.001.patch, RATIS-458.002.patch, 
> RATIS-458.003.patch, RATIS-458.004.patch
>
>
> In GrpcLogAppender when an append entry times out we remove the entry from 
> the pendingRequests. This decreases the size of pendingRequests which affects 
> the logic in GrpcLogAppender#shouldWait. Further we also consider heartbeats 
> in shouldWait because heartbeats are tracked in pendingRequests. It should 
> actually wait on the number of log entries which are appended to follower but 
> have not yet been processed by it.
> GrpcConfigKeys.Server.leaderOutstandingAppendsMax should also be a fraction 
> of RaftServerConfigKeys.Log.queueSize. This brings flow control for leader's 
> append entries to follower because then number of outstanding append entries 
> in leader would be limited by maximum number of operations in raft log worker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-748) Follower might not update its commit index

2019-11-19 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-748:

Priority: Blocker  (was: Critical)

> Follower might not update its commit index
> --
>
> Key: RATIS-748
> URL: https://issues.apache.org/jira/browse/RATIS-748
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Blocker
> Attachments: RATIS-748.001.patch, RATIS-748.002.patch, 
> RATIS-748.003.patch
>
>
> While updating the commit index, the follower checks whether majority index 
> is present in the raft log. There can be cases where leader is ahead of the 
> follower and follower does not have the entry corresponding to the 
> majorityIndex. In such cases the follower commit index is not updated. Below 
> is the corresponding code snippet.
> {code:java}
> public boolean updateLastCommitted(long majorityIndex, long currentTerm) {
>   try(AutoCloseableLock writeLock = writeLock()) {
> final long oldCommittedIndex = getLastCommittedIndex();
> if (oldCommittedIndex < majorityIndex) {
>   // Only update last committed index for current term. See §5.4.2 in
>   // paper for details.
>   final TermIndex entry = getTermIndex(majorityIndex);
>   if (entry != null && entry.getTerm() == currentTerm) {
> final long newCommitIndex = Math.min(majorityIndex, getFlushIndex());
> if (newCommitIndex > oldCommittedIndex) {
>   commitIndex.updateIncreasingly(newCommitIndex, traceIndexChange);
> }
> return true;
>   }
> }
>   }
>   return false;
> }{code}
> This function RaftLog#updateLastCommitted is also used by follower to update 
> its commit index. The follower does not require the check of entry.getTerm() 
> == currentTerm and its commitIndex can be updated to min(majorityIndex, 
> getFlushIndex()). It has already verified the entries in the 
> appendEntriesAsync call.
> This can lead to the follower commit being updated in bursts and can lead to 
> failure of watch requests.
> cc [~shashikant] [~szetszwo]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (RATIS-729) ClientProtoUtils#toRaftClientReplyProto should consider all RaftException types

2019-10-30 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned RATIS-729:
---

Assignee: Hanisha Koneru

> ClientProtoUtils#toRaftClientReplyProto should consider all RaftException 
> types
> ---
>
> Key: RATIS-729
> URL: https://issues.apache.org/jira/browse/RATIS-729
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Hanisha Koneru
>Priority: Major
>
> In one of the runs it is seen that client receives RaftClientReply with 
> exception as null and success flag as false. This happens because currently 
> ClientProtoUtils#toRaftClientReplyProto only considers a few RaftException 
> types while creating a RaftClientReplyProto. We should also add handling for 
> other exception types.Similar changes will be required in 
> ClientProtoUtils#toRaftClientReply.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-711) Add ability to specify higher request timeout in watch request

2019-10-30 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-711:

Description: 
Currently , a watch request from raft client times out by default in 3 sec . In 
certain conditions, it may be required to have a higher watch request timeout 
value.

This will require having a separate timeout for the watch request.

  was:Currently , a watch request from raft client times out by default in 3 
sec . In certain conditions, it may be required to have a higher watch request 
timeout value.


> Add ability to specify higher request timeout in watch request
> --
>
> Key: RATIS-711
> URL: https://issues.apache.org/jira/browse/RATIS-711
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Priority: Major
>
> Currently , a watch request from raft client times out by default in 3 sec . 
> In certain conditions, it may be required to have a higher watch request 
> timeout value.
> This will require having a separate timeout for the watch request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-711) Add ability to specify higher request timeout in watch request

2019-10-30 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-711:

Target Version/s: 0.5.0

> Add ability to specify higher request timeout in watch request
> --
>
> Key: RATIS-711
> URL: https://issues.apache.org/jira/browse/RATIS-711
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Priority: Major
>
> Currently , a watch request from raft client times out by default in 3 sec . 
> In certain conditions, it may be required to have a higher watch request 
> timeout value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-709) RaftClient should not retry on a different leader on NotReplicated exception from leader

2019-10-30 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-709:

Fix Version/s: (was: 0.5.0)

> RaftClient should not retry on a different leader on NotReplicated exception 
> from leader
> 
>
> Key: RATIS-709
> URL: https://issues.apache.org/jira/browse/RATIS-709
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Shashikant Banerjee
>Assignee: Sammi Chen
>Priority: Major
>
> Currently, when a watch request times out with a NotReplicatedException on 
> the leader raft client starts retrying the request on different server and 
> starts failing with NotLeaderException and it goes in a loop. Ideally , when 
> a watch request times out , it should not be retried automatically by raft 
> client given the timeout value in the leader is sufficiently reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-709) RaftClient should not retry on a different leader on NotReplicated exception from leader

2019-10-30 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-709:

Target Version/s: 0.5.0

> RaftClient should not retry on a different leader on NotReplicated exception 
> from leader
> 
>
> Key: RATIS-709
> URL: https://issues.apache.org/jira/browse/RATIS-709
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Shashikant Banerjee
>Assignee: Sammi Chen
>Priority: Major
>
> Currently, when a watch request times out with a NotReplicatedException on 
> the leader raft client starts retrying the request on different server and 
> starts failing with NotLeaderException and it goes in a loop. Ideally , when 
> a watch request times out , it should not be retried automatically by raft 
> client given the timeout value in the leader is sufficiently reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-614) Raft leader should use state machine's last applied index for LeaderNotReady exception

2019-10-30 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-614:

Target Version/s: 0.5.0  (was: 0.4.0)

> Raft leader should use state machine's last applied index for LeaderNotReady 
> exception
> --
>
> Key: RATIS-614
> URL: https://issues.apache.org/jira/browse/RATIS-614
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Blocker
>  Labels: ozone
> Attachments: RATIS-614.001.patch
>
>
> Currently Raft leader uses the StateMachineUpdater's lastAppliedIndex to 
> determine if leader is ready to take requests. It should rather use 
> StateMachine's lastAppliedTermIndex because it denotes the index till which 
> the transactions have already been applied whereas StateMachineUpdater's 
> lastAppliedIndex denotes the index till which the applyTransaction call has 
> already been made.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-680) Fix LICENSE file issues

2019-10-25 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960207#comment-16960207
 ] 

Arpit Agarwal commented on RATIS-680:
-

Thank you for the patient and detailed reviews [~elserj]. Attached an updated 
v003 patch.

> Fix LICENSE file issues
> ---
>
> Key: RATIS-680
> URL: https://issues.apache.org/jira/browse/RATIS-680
> Project: Ratis
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-680.01.patch, RATIS-680.02.patch, 
> RATIS-680.03.patch
>
>
> Fix Ratis LICENSE file issues raised by Justin here:
> https://mail-archives.apache.org/mod_mbox/incubator-general/201909.mbox/%3C573A4F4D-8303-418D-8133-03AAC8085708%40me.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-680) Fix LICENSE file issues

2019-10-25 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-680:

Attachment: RATIS-680.03.patch

> Fix LICENSE file issues
> ---
>
> Key: RATIS-680
> URL: https://issues.apache.org/jira/browse/RATIS-680
> Project: Ratis
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-680.01.patch, RATIS-680.02.patch, 
> RATIS-680.03.patch
>
>
> Fix Ratis LICENSE file issues raised by Justin here:
> https://mail-archives.apache.org/mod_mbox/incubator-general/201909.mbox/%3C573A4F4D-8303-418D-8133-03AAC8085708%40me.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-680) Fix LICENSE file issues

2019-09-27 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939768#comment-16939768
 ] 

Arpit Agarwal commented on RATIS-680:
-

v02 patch adds BSD license text and clarifies which portions are covered by BSD 
license.

> Fix LICENSE file issues
> ---
>
> Key: RATIS-680
> URL: https://issues.apache.org/jira/browse/RATIS-680
> Project: Ratis
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-680.01.patch, RATIS-680.02.patch
>
>
> Fix Ratis LICENSE file issues raised by Justin here:
> https://mail-archives.apache.org/mod_mbox/incubator-general/201909.mbox/%3C573A4F4D-8303-418D-8133-03AAC8085708%40me.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (RATIS-680) Fix LICENSE file issues

2019-09-27 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939765#comment-16939765
 ] 

Arpit Agarwal edited comment on RATIS-680 at 9/27/19 9:37 PM:
--

Thanks Josh, that's helpful. It looks like the BSD portion was added by 
HADOOP-7443 in 2011 (we copied this file from Hadoop as-is).

I think we can use the same scheme in our LICENSE file. I will update the patch.


was (Author: arpitagarwal):
Thanks Josh, that's helpful. It looks like the BSD portion was added by 
HADOOP-7443 in 2011 (we copied this file from Hadoop as-is).

I think we can use the same scheme in the NOTICE file. I will update the patch.

> Fix LICENSE file issues
> ---
>
> Key: RATIS-680
> URL: https://issues.apache.org/jira/browse/RATIS-680
> Project: Ratis
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-680.01.patch, RATIS-680.02.patch
>
>
> Fix Ratis LICENSE file issues raised by Justin here:
> https://mail-archives.apache.org/mod_mbox/incubator-general/201909.mbox/%3C573A4F4D-8303-418D-8133-03AAC8085708%40me.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (RATIS-680) Fix LICENSE file issues

2019-09-27 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-680:

Attachment: RATIS-680.02.patch

> Fix LICENSE file issues
> ---
>
> Key: RATIS-680
> URL: https://issues.apache.org/jira/browse/RATIS-680
> Project: Ratis
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-680.01.patch, RATIS-680.02.patch
>
>
> Fix Ratis LICENSE file issues raised by Justin here:
> https://mail-archives.apache.org/mod_mbox/incubator-general/201909.mbox/%3C573A4F4D-8303-418D-8133-03AAC8085708%40me.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-680) Fix LICENSE file issues

2019-09-27 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939765#comment-16939765
 ] 

Arpit Agarwal commented on RATIS-680:
-

Thanks Josh, that's helpful. It looks like the BSD portion was added by 
HADOOP-7443 in 2011 (we copied this file from Hadoop as-is).

I think we can use the same scheme in the NOTICE file. I will update the patch.

> Fix LICENSE file issues
> ---
>
> Key: RATIS-680
> URL: https://issues.apache.org/jira/browse/RATIS-680
> Project: Ratis
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-680.01.patch
>
>
> Fix Ratis LICENSE file issues raised by Justin here:
> https://mail-archives.apache.org/mod_mbox/incubator-general/201909.mbox/%3C573A4F4D-8303-418D-8133-03AAC8085708%40me.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (RATIS-680) Fix LICENSE file issues

2019-09-17 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931711#comment-16931711
 ] 

Arpit Agarwal commented on RATIS-680:
-

iiuc MIT license doesn't need to be included in the LICENSE file since we are 
not bundling any MIT licensed code in the source distribution.

Need confirmation from someone who knows better - perhaps [~elserj].

> Fix LICENSE file issues
> ---
>
> Key: RATIS-680
> URL: https://issues.apache.org/jira/browse/RATIS-680
> Project: Ratis
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-680.01.patch
>
>
> Fix Ratis LICENSE file issues raised by Justin here:
> https://mail-archives.apache.org/mod_mbox/incubator-general/201909.mbox/%3C573A4F4D-8303-418D-8133-03AAC8085708%40me.com%3E



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (RATIS-680) Fix LICENSE file issues

2019-09-17 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-680:

Attachment: RATIS-680.01.patch

> Fix LICENSE file issues
> ---
>
> Key: RATIS-680
> URL: https://issues.apache.org/jira/browse/RATIS-680
> Project: Ratis
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-680.01.patch
>
>
> Fix Ratis LICENSE file issues raised by Justin here:
> https://mail-archives.apache.org/mod_mbox/incubator-general/201909.mbox/%3C573A4F4D-8303-418D-8133-03AAC8085708%40me.com%3E



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (RATIS-680) Fix LICENSE file issues

2019-09-17 Thread Arpit Agarwal (Jira)
Arpit Agarwal created RATIS-680:
---

 Summary: Fix LICENSE file issues
 Key: RATIS-680
 URL: https://issues.apache.org/jira/browse/RATIS-680
 Project: Ratis
  Issue Type: Bug
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal


Fix Ratis LICENSE file issues raised by Justin here:

https://mail-archives.apache.org/mod_mbox/incubator-general/201909.mbox/%3C573A4F4D-8303-418D-8133-03AAC8085708%40me.com%3E





--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-668) Fix NOTICE file

2019-09-11 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928019#comment-16928019
 ] 

Arpit Agarwal commented on RATIS-668:
-

Thanks for the offline discussion and suggestions [~an...@apache.org]. The 
patch reflects your suggestions.

I need a binding +1 - [~msingh] can you take a look please?


> Fix NOTICE file
> ---
>
> Key: RATIS-668
> URL: https://issues.apache.org/jira/browse/RATIS-668
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-668.01.patch, RATIS-668.02.patch, 
> RATIS-668.03.patch
>
>
> NOTICE file needs to be updated based on Justin's comments here:
>  
> [https://mail-archives.apache.org/mod_mbox/incubator-general/201908.mbox/%3C8EA21F57-A972-4CBE-AC2F-D3830FE6BDB4%40classsoftware.com%3E]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (RATIS-668) Fix NOTICE file

2019-09-11 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-668:

Attachment: RATIS-668.03.patch

> Fix NOTICE file
> ---
>
> Key: RATIS-668
> URL: https://issues.apache.org/jira/browse/RATIS-668
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-668.01.patch, RATIS-668.02.patch, 
> RATIS-668.03.patch
>
>
> NOTICE file needs to be updated based on Justin's comments here:
>  
> [https://mail-archives.apache.org/mod_mbox/incubator-general/201908.mbox/%3C8EA21F57-A972-4CBE-AC2F-D3830FE6BDB4%40classsoftware.com%3E]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-668) Fix NOTICE file

2019-09-10 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927100#comment-16927100
 ] 

Arpit Agarwal commented on RATIS-668:
-

bq. We provide three different NOTICE files, one with source, binary and 
examples assembly, do we know which file is referred by IPMC? (though we should 
check all NOTICES to address the concern)
Thanks for the look [~an...@apache.org]. That's a great question. As far as I 
can tell the NOTICE files for binary and examples are auto-generated. I don't 
see them in the source tree.

[~msingh] how are the packages and their notice files generated?

> Fix NOTICE file
> ---
>
> Key: RATIS-668
> URL: https://issues.apache.org/jira/browse/RATIS-668
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-668.01.patch, RATIS-668.02.patch
>
>
> NOTICE file needs to be updated based on Justin's comments here:
>  
> [https://mail-archives.apache.org/mod_mbox/incubator-general/201908.mbox/%3C8EA21F57-A972-4CBE-AC2F-D3830FE6BDB4%40classsoftware.com%3E]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (RATIS-668) Fix NOTICE file

2019-09-03 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/RATIS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921655#comment-16921655
 ] 

Arpit Agarwal commented on RATIS-668:
-

v2 patch adds missing notices for DropWizard and JUnit.

> Fix NOTICE file
> ---
>
> Key: RATIS-668
> URL: https://issues.apache.org/jira/browse/RATIS-668
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-668.01.patch, RATIS-668.02.patch
>
>
> NOTICE file needs to be updated based on Justin's comments here:
>  
> [https://mail-archives.apache.org/mod_mbox/incubator-general/201908.mbox/%3C8EA21F57-A972-4CBE-AC2F-D3830FE6BDB4%40classsoftware.com%3E]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (RATIS-668) Fix NOTICE file

2019-09-03 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-668:

Attachment: RATIS-668.02.patch

> Fix NOTICE file
> ---
>
> Key: RATIS-668
> URL: https://issues.apache.org/jira/browse/RATIS-668
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-668.01.patch, RATIS-668.02.patch
>
>
> NOTICE file needs to be updated based on Justin's comments here:
>  
> [https://mail-archives.apache.org/mod_mbox/incubator-general/201908.mbox/%3C8EA21F57-A972-4CBE-AC2F-D3830FE6BDB4%40classsoftware.com%3E]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (RATIS-668) Fix NOTICE file

2019-09-03 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-668:

Attachment: RATIS-668.01.patch

> Fix NOTICE file
> ---
>
> Key: RATIS-668
> URL: https://issues.apache.org/jira/browse/RATIS-668
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: RATIS-668.01.patch
>
>
> NOTICE file needs to be updated based on Justin's comments here:
>  
> [https://mail-archives.apache.org/mod_mbox/incubator-general/201908.mbox/%3C8EA21F57-A972-4CBE-AC2F-D3830FE6BDB4%40classsoftware.com%3E]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Moved] (RATIS-668) Fix NOTICE file

2019-08-27 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/RATIS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal moved HDDS-2046 to RATIS-668:
---

  Key: RATIS-668  (was: HDDS-2046)
 Target Version/s: 0.4.0  (was: 0.4.1)
Affects Version/s: (was: 0.4.1)
   0.4.0
 Workflow: no-reopen-closed, patch-avail  (was: patch-available, 
re-open possible)
  Project: Ratis  (was: Hadoop Distributed Data Store)

> Fix NOTICE file
> ---
>
> Key: RATIS-668
> URL: https://issues.apache.org/jira/browse/RATIS-668
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
>
> NOTICE file needs to be updated based on Justin's comments here:
>  
> [https://mail-archives.apache.org/mod_mbox/incubator-general/201908.mbox/%3C8EA21F57-A972-4CBE-AC2F-D3830FE6BDB4%40classsoftware.com%3E]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (RATIS-586) NotifyInstallSnapshot should return only installed snapshot index

2019-08-16 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-586:

Target Version/s: 0.5.0

> NotifyInstallSnapshot should return only installed snapshot index
> -
>
> Key: RATIS-586
> URL: https://issues.apache.org/jira/browse/RATIS-586
> Project: Ratis
>  Issue Type: Bug
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: ozone
>
> On receiving install snapshot notification and installing a snapshot, the 
> follower should only return the installed snapshot index. Currently the 
> return type is termIndex but snapshots do not have a term.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (RATIS-658) ratis-assembly does not include ratis-resource-bundle

2019-08-16 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909291#comment-16909291
 ] 

Arpit Agarwal commented on RATIS-658:
-

+1, thanks for filing and fixing this [~msingh].

> ratis-assembly does not include ratis-resource-bundle
> -
>
> Key: RATIS-658
> URL: https://issues.apache.org/jira/browse/RATIS-658
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Arpit Agarwal
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: RATIS-658.001.patch
>
>
> ratis-assembly does not include ratis-resource-bundle, this results in a 
> compilation failure in the created bundle.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (RATIS-624) RaftServer should support pause/ unpause in its LifeCycle state

2019-07-17 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-624:

Fix Version/s: 0.5.0

> RaftServer should support pause/ unpause in its LifeCycle state
> ---
>
> Key: RATIS-624
> URL: https://issues.apache.org/jira/browse/RATIS-624
> Project: Ratis
>  Issue Type: Task
>Reporter: Hanisha Koneru
>Priority: Major
> Fix For: 0.5.0
>
>
> This Jira aims to add support to RaftServer to support pause and unpause to 
> its state. When paused, the RaftServer should not accept any incoming append 
> log entries.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (RATIS-615) StateMachine#notifyExtendedNoLeader is not called when all the nodes in the raft ring are isolated

2019-07-16 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-615:

Labels: blockade ozone  (was: ozone)

> StateMachine#notifyExtendedNoLeader is not called when all the nodes in the 
> raft ring are isolated
> --
>
> Key: RATIS-615
> URL: https://issues.apache.org/jira/browse/RATIS-615
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Nanda kumar
>Priority: Blocker
>  Labels: blockade, ozone
>
> When all the nodes in the raft ring are isolated (network partition) 
> {{StateMachine#notifyExtendedNoLeader}} is not getting called.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (RATIS-573) Handle Raft Log Append Failure

2019-05-28 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned RATIS-573:
---

Assignee: Supratim Deka

> Handle Raft Log Append Failure
> --
>
> Key: RATIS-573
> URL: https://issues.apache.org/jira/browse/RATIS-573
> Project: Ratis
>  Issue Type: Improvement
>  Components: server
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>
> As part of Handling IO Failures, HDDS-1595.
> The scope of this jira is to handle failure in RAFT log append by:
> 1. notify the error to the state machine for consumer specific handling
> 2. propagate the error to the initiator (to the client from leader, to the 
> leader from follower).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-518) Add request specific retry policy support

2019-04-15 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818562#comment-16818562
 ] 

Arpit Agarwal commented on RATIS-518:
-

Thanks [~szetszwo], filed HDDS-1441.

> Add request specific retry policy support
> -
>
> Key: RATIS-518
> URL: https://issues.apache.org/jira/browse/RATIS-518
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: r518_20190409.patch, r518_20190413.patch
>
>
> Currently , the retry policy is enforced on a raft client which handles 
> multiple requests. The idea here is to add support for request specific retry 
> policy in Raft client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (RATIS-518) Add request specific retry policy support

2019-04-15 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reopened RATIS-518:
-

I have temporarily reverted this change as it breaks Ozone build and seems to 
introduce an incompatibility.

The Ozone build error is:
{code}
[ERROR] 
/ozone/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientRatis.java:[308,22]
 cannot find symbol
[ERROR]   symbol:   method getRetryFailureException()
[ERROR]   location: variable reply of type 
org.apache.ratis.protocol.RaftClientReply
{code}

> Add request specific retry policy support
> -
>
> Key: RATIS-518
> URL: https://issues.apache.org/jira/browse/RATIS-518
> Project: Ratis
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: r518_20190409.patch, r518_20190413.patch
>
>
> Currently , the retry policy is enforced on a raft client which handles 
> multiple requests. The idea here is to add support for request specific retry 
> policy in Raft client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Moved] (RATIS-530) Avoid using common fork join pool

2019-04-11 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal moved HDDS-1429 to RATIS-530:
---

Target Version/s: 0.4.0  (was: 0.4.0)
Workflow: no-reopen-closed, patch-avail  (was: patch-available, 
re-open possible)
 Key: RATIS-530  (was: HDDS-1429)
 Project: Ratis  (was: Hadoop Distributed Data Store)

> Avoid using common fork join pool
> -
>
> Key: RATIS-530
> URL: https://issues.apache.org/jira/browse/RATIS-530
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Major
>
> After enabling thread context in Ozone log messages, we see some Ratis 
> operations being executed in common fork join pool. E.g.
> {code}
> 2019-04-11 14:25:54,583 ForkJoinPool.commonPool-worker-3 INFO  
> storage.RaftLogWorker (RaftLogWorker.java:rollLogSegment(303)) - 
> 3e59bcbc-e6f0-4681-b8a6-76e163e9ff19-RaftLogWorker: Rolling segment log-0_70 
> to index:70
> {code}
> and
> {code}
> 2019-04-11 14:25:56,715 ForkJoinPool.commonPool-worker-1 INFO  
> client.GrpcClientProtocolService 
> (GrpcClientProtocolService.java:lambda$processClientRequest$0(264)) - Failed 
> RaftClientRequest:client-B5832CAE4B89->8a182bb4-96a8-42e8-a7da-549c9663fc30@group-7136CD304607,
>  cid=333, seq=0, Watch-ALL_COMMITTED(79), Message:, 
> reply=RaftClientReply:client-B5832CAE4B89->8a182bb4-96a8-42e8-a7da-549c9663fc30@group-7136CD304607,
>  cid=333, FAILED org.apache.ratis.protocol.NotLeaderException: Server 
> 8a182bb4-96a8-42e8-a7da-549c9663fc30 is not the leader 
> (beff1b4d-05f9-4f7d-a6f2-1405b0950e8c:10.22.8.149:57253). Request must be 
> sent to leader., logIndex=0, 
> commits[8a182bb4-96a8-42e8-a7da-549c9663fc30:c129, 
> 48a1cbdc-86a2-40d3-9831-13e044923ee4:c72, 
> beff1b4d-05f9-4f7d-a6f2-1405b0950e8c:c129]
> {code}
> It's better to use a dedicated ExecutorService or ForkJoinPool instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (RATIS-530) Avoid using common fork join pool

2019-04-11 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned RATIS-530:
---

Assignee: (was: Arpit Agarwal)

> Avoid using common fork join pool
> -
>
> Key: RATIS-530
> URL: https://issues.apache.org/jira/browse/RATIS-530
> Project: Ratis
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Priority: Major
>
> After enabling thread context in Ozone log messages, we see some Ratis 
> operations being executed in common fork join pool. E.g.
> {code}
> 2019-04-11 14:25:54,583 ForkJoinPool.commonPool-worker-3 INFO  
> storage.RaftLogWorker (RaftLogWorker.java:rollLogSegment(303)) - 
> 3e59bcbc-e6f0-4681-b8a6-76e163e9ff19-RaftLogWorker: Rolling segment log-0_70 
> to index:70
> {code}
> and
> {code}
> 2019-04-11 14:25:56,715 ForkJoinPool.commonPool-worker-1 INFO  
> client.GrpcClientProtocolService 
> (GrpcClientProtocolService.java:lambda$processClientRequest$0(264)) - Failed 
> RaftClientRequest:client-B5832CAE4B89->8a182bb4-96a8-42e8-a7da-549c9663fc30@group-7136CD304607,
>  cid=333, seq=0, Watch-ALL_COMMITTED(79), Message:, 
> reply=RaftClientReply:client-B5832CAE4B89->8a182bb4-96a8-42e8-a7da-549c9663fc30@group-7136CD304607,
>  cid=333, FAILED org.apache.ratis.protocol.NotLeaderException: Server 
> 8a182bb4-96a8-42e8-a7da-549c9663fc30 is not the leader 
> (beff1b4d-05f9-4f7d-a6f2-1405b0950e8c:10.22.8.149:57253). Request must be 
> sent to leader., logIndex=0, 
> commits[8a182bb4-96a8-42e8-a7da-549c9663fc30:c129, 
> 48a1cbdc-86a2-40d3-9831-13e044923ee4:c72, 
> beff1b4d-05f9-4f7d-a6f2-1405b0950e8c:c129]
> {code}
> It's better to use a dedicated ExecutorService or ForkJoinPool instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-507) RaftServerProxy should not use common thread pool for creating raft server

2019-03-21 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798240#comment-16798240
 ] 

Arpit Agarwal commented on RATIS-507:
-

+1

> RaftServerProxy should not use common thread pool for creating raft server
> --
>
> Key: RATIS-507
> URL: https://issues.apache.org/jira/browse/RATIS-507
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: RATIS-507.001.patch
>
>
> Currently RaftServerProxy#newRaftServerImpl uses common thread pool for 
> creating an instance of RaftServerImpl. We should use a separate executor for 
> this purpose.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-506) ServerRestartTests.testRestartCommitIndex may fail intermittently

2019-03-20 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797628#comment-16797628
 ] 

Arpit Agarwal commented on RATIS-506:
-

[~szetszwo], I did not get this assert:
{code}
Assert.assertTrue(lastCommittedEntry.getIndex() <= 
lastAppliedTermIndex.getIndex());
{code}

Is this saying that commit index should be less than or equal to applied term 
index? If so should it be the other way around?

> ServerRestartTests.testRestartCommitIndex may fail intermittently
> -
>
> Key: RATIS-506
> URL: https://issues.apache.org/jira/browse/RATIS-506
> Project: Ratis
>  Issue Type: Bug
>  Components: test
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: r506_20190320.patch, r506_20190320b.patch, 
> r506_20190320c.patch
>
>
> {code}
> java.lang.AssertionError: 
> Expected :101
> Actual   :111
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.ratis.server.ServerRestartTests.runTestRestartCommitIndex(ServerRestartTests.java:276)
> {code}
> It seems that the test runs too fast so that the last metadata entry is not 
> yet written.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-507) RaftServerProxy should not use common thread pool for creating raft server

2019-03-20 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797370#comment-16797370
 ] 

Arpit Agarwal commented on RATIS-507:
-

Will a single threaded executor be sufficient?

> RaftServerProxy should not use common thread pool for creating raft server
> --
>
> Key: RATIS-507
> URL: https://issues.apache.org/jira/browse/RATIS-507
> Project: Ratis
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: RATIS-507.001.patch
>
>
> Currently RaftServerProxy#newRaftServerImpl uses common thread pool for 
> creating an instance of RaftServerImpl. We should use a separate executor for 
> this purpose.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-502) Commit Index less than the snapshot's commit indexes need to be ignored on restart

2019-03-19 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796450#comment-16796450
 ] 

Arpit Agarwal commented on RATIS-502:
-

Sorry I pushed over your comments [~msingh]. I didn't hit refresh and missed 
them.

Please feel free to revert the commit, or we can address your comments in a 
follow up patch.

> Commit Index less than the snapshot's commit indexes need to be ignored on 
> restart
> --
>
> Key: RATIS-502
> URL: https://issues.apache.org/jira/browse/RATIS-502
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Mukul Kumar Singh
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: r502_20190319.patch, r502_20190319b.patch
>
>
> This problem was seen with Ozone in datanodes state machine.
> Before restart of the datanode, a snapshot was taken at log index 6. Please 
> note that the commit entry for this will be after this log index in the Raft 
> Log.
> {code}
> 2019-03-19 08:56:30,225 INFO  impl.StateMachineUpdater 
> (StateMachineUpdater.java:stopAndJoin(109)) - 
> StateMachineUpdater-4c165953-147b-48fb-89e1-951579e828eb: set stopIndex = 6
> 2019-03-19 08:56:30,225 INFO  ratis.ContainerStateMachine 
> (ContainerStateMachine.java:takeSnapshot(245)) - Taking snapshot at 
> termIndex:(t:1, i:6)
> 2019-03-19 08:56:30,226 INFO  ratis.ContainerStateMachine 
> (ContainerStateMachine.java:takeSnapshot(249)) - Taking a snapshot to file 
> /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test/t
> arget/test-dir/MiniOzoneClusterImpl-0fa66624-f533-44bc-8f6f-99cf251fe4c3/datanode-0/data/ratis/34393916-e10d-4b3c-b212-5c910eea4935/sm/snapshot.1_6
> 2019-03-19 08:56:30,231 INFO  impl.RaftServerImpl 
> (ServerState.java:close(386)) - 4c165953-147b-48fb-89e1-951579e828eb closes. 
> The last applied log index is 6
> {code}
> After restart, the state machine register 6 as the log index in the snapshot.
> {code}
> 2019-03-19 08:56:33,351 INFO  ratis.ContainerStateMachine 
> (ContainerStateMachine.java:loadSnapshot(209)) - Setting the last applied 
> index to (t:1, i:6)
> {code}
> Now, after applying all the transactions after the snapshot, the state 
> machine will encounter a commit entry for this log index (5). this hits the 
> assert in the state machine.
> {code}
> java.io.IOException: java.lang.IllegalStateException: Failed to 
> updateIncreasingly for commitIndex: 6 -> 5
> at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
> at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
> at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:283)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:295)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:417)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:182)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:165)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:334)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Failed to updateIncreasingly for 
> commitIndex: 6 -> 5
> at 
> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:72)
> at 
> org.apache.ratis.server.storage.RaftLogIndex.updateIncreasingly(RaftLogIndex.java:60)
> at 
> org.apache.ratis.server.storage.RaftLog.lambda$open$7(RaftLog.java:245)
> at java.util.Optional.ifPresent(Optional.java:159)
> at org.apache.ratis.server.storage.RaftLog.open(RaftLog.java:244)
> at 
> org.apache.ratis.server.impl.ServerState.initLog(ServerState.java:191)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:114)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:103)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:207)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582)
> at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
>  

[jira] [Moved] (RATIS-505) Make Install Snapshot option configurable

2019-03-19 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal moved HDDS-1311 to RATIS-505:
---

Workflow: no-reopen-closed, patch-avail  (was: patch-available, re-open 
possible)
 Key: RATIS-505  (was: HDDS-1311)
 Project: Ratis  (was: Hadoop Distributed Data Store)

> Make Install Snapshot option configurable
> -
>
> Key: RATIS-505
> URL: https://issues.apache.org/jira/browse/RATIS-505
> Project: Ratis
>  Issue Type: New Feature
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
>
> This Jira aims to make the install snapshot command from leader to follower 
> configurable. By default, install snapshot should be enabled. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-502) Commit Index less than the snapshot's commit indexes need to be ignored on restart

2019-03-19 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796398#comment-16796398
 ] 

Arpit Agarwal commented on RATIS-502:
-

+1 the UT failure looks unrelated. I will commit this shortly.

> Commit Index less than the snapshot's commit indexes need to be ignored on 
> restart
> --
>
> Key: RATIS-502
> URL: https://issues.apache.org/jira/browse/RATIS-502
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Reporter: Mukul Kumar Singh
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r502_20190319.patch, r502_20190319b.patch
>
>
> This problem was seen with Ozone in datanodes state machine.
> Before restart of the datanode, a snapshot was taken at log index 6. Please 
> note that the commit entry for this will be after this log index in the Raft 
> Log.
> {code}
> 2019-03-19 08:56:30,225 INFO  impl.StateMachineUpdater 
> (StateMachineUpdater.java:stopAndJoin(109)) - 
> StateMachineUpdater-4c165953-147b-48fb-89e1-951579e828eb: set stopIndex = 6
> 2019-03-19 08:56:30,225 INFO  ratis.ContainerStateMachine 
> (ContainerStateMachine.java:takeSnapshot(245)) - Taking snapshot at 
> termIndex:(t:1, i:6)
> 2019-03-19 08:56:30,226 INFO  ratis.ContainerStateMachine 
> (ContainerStateMachine.java:takeSnapshot(249)) - Taking a snapshot to file 
> /Users/msingh/code/apache/ozone/oz_new1/hadoop-ozone/integration-test/t
> arget/test-dir/MiniOzoneClusterImpl-0fa66624-f533-44bc-8f6f-99cf251fe4c3/datanode-0/data/ratis/34393916-e10d-4b3c-b212-5c910eea4935/sm/snapshot.1_6
> 2019-03-19 08:56:30,231 INFO  impl.RaftServerImpl 
> (ServerState.java:close(386)) - 4c165953-147b-48fb-89e1-951579e828eb closes. 
> The last applied log index is 6
> {code}
> After restart, the state machine register 6 as the log index in the snapshot.
> {code}
> 2019-03-19 08:56:33,351 INFO  ratis.ContainerStateMachine 
> (ContainerStateMachine.java:loadSnapshot(209)) - Setting the last applied 
> index to (t:1, i:6)
> {code}
> Now, after applying all the transactions after the snapshot, the state 
> machine will encounter a commit entry for this log index (5). this hits the 
> assert in the state machine.
> {code}
> java.io.IOException: java.lang.IllegalStateException: Failed to 
> updateIncreasingly for commitIndex: 6 -> 5
> at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
> at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
> at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:283)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:295)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:417)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:182)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:165)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:334)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Failed to updateIncreasingly for 
> commitIndex: 6 -> 5
> at 
> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:72)
> at 
> org.apache.ratis.server.storage.RaftLogIndex.updateIncreasingly(RaftLogIndex.java:60)
> at 
> org.apache.ratis.server.storage.RaftLog.lambda$open$7(RaftLog.java:245)
> at java.util.Optional.ifPresent(Optional.java:159)
> at org.apache.ratis.server.storage.RaftLog.open(RaftLog.java:244)
> at 
> org.apache.ratis.server.impl.ServerState.initLog(ServerState.java:191)
> at 
> org.apache.ratis.server.impl.ServerState.(ServerState.java:114)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:103)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:207)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582)
> at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
> {code}



--
This message was sent by Atlassian JIRA
(v

[jira] [Created] (RATIS-501) Update source code link on website to gitbox

2019-03-18 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created RATIS-501:
---

 Summary: Update source code link on website to gitbox
 Key: RATIS-501
 URL: https://issues.apache.org/jira/browse/RATIS-501
 Project: Ratis
  Issue Type: Bug
Reporter: Arpit Agarwal


The source code is now hosted on gitbox, so the website needs to be updated.

https://ratis.incubator.apache.org/#source




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (RATIS-384) writeStateMachineData times out

2018-10-31 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created RATIS-384:
---

 Summary: writeStateMachineData times out
 Key: RATIS-384
 URL: https://issues.apache.org/jira/browse/RATIS-384
 Project: Ratis
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Nilotpal Nandi
 Fix For: 0.3.0


datanode stopped due to following error :

datanode.log
{noformat}
2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
[9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
Terminating with exit status 1: 
9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
i:182), STATEMACHINELOGENTRY, client-611073BBFA46, cid=127-writeStateMachineData
 at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
 at 
org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
 at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
 at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
 at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
 at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
 ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-329) Current Ratis heartbeats are missing for a heavily loaded cluster

2018-10-31 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-329:

Target Version/s:   (was: 0.3.0)

> Current Ratis heartbeats are missing for a heavily loaded cluster
> -
>
> Key: RATIS-329
> URL: https://issues.apache.org/jira/browse/RATIS-329
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: ozone
> Fix For: 0.3.0
>
>
> Currently while running Ratis with Ozone, Frequent leader elections can be 
> noticed in the datanode logs. This is happening because of missing heartbeats 
> from the leader to follower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (RATIS-382) writeStateMachineData times out

2018-10-31 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/RATIS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated RATIS-382:

Fix Version/s: 0.3.0

> writeStateMachineData times out
> ---
>
> Key: RATIS-382
> URL: https://issues.apache.org/jira/browse/RATIS-382
> Project: Ratis
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Priority: Blocker
> Fix For: 0.3.0
>
> Attachments: all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (RATIS-348) TimeoutScheduler and SlidingWindow should use daemon threads

2018-10-11 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647000#comment-16647000
 ] 

Arpit Agarwal edited comment on RATIS-348 at 10/11/18 9:11 PM:
---

-findbugs- checkstyle warnings look okay. -findbugs- checkstyle is just being 
annoying.


was (Author: arpitagarwal):
findbugs warnings look okay. findbugs is just being annoying.

> TimeoutScheduler and SlidingWindow should use daemon threads
> 
>
> Key: RATIS-348
> URL: https://issues.apache.org/jira/browse/RATIS-348
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: r348_20181011b.patch
>
>
> In HDDS-625, we found that the Ozone client does not terminate.  The 
> SlidingWindow (debug) thread and the TimeoutScheduler threads are holding up 
> process termination.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-348) TimeoutScheduler and SlidingWindow should use daemon threads

2018-10-11 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647000#comment-16647000
 ] 

Arpit Agarwal commented on RATIS-348:
-

findbugs warnings look okay. findbugs is just being annoying.

> TimeoutScheduler and SlidingWindow should use daemon threads
> 
>
> Key: RATIS-348
> URL: https://issues.apache.org/jira/browse/RATIS-348
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r348_20181011b.patch
>
>
> In HDDS-625, we found that the Ozone client does not terminate.  The 
> SlidingWindow (debug) thread and the TimeoutScheduler threads are holding up 
> process termination.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-348) TimeoutScheduler and SlidingWindow should use daemon threads

2018-10-11 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646985#comment-16646985
 ] 

Arpit Agarwal commented on RATIS-348:
-

Not sure if the test failures are new or unrelated.

> TimeoutScheduler and SlidingWindow should use daemon threads
> 
>
> Key: RATIS-348
> URL: https://issues.apache.org/jira/browse/RATIS-348
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r348_20181011b.patch
>
>
> In HDDS-625, we found that the Ozone client does not terminate.  The 
> SlidingWindow (debug) thread and the TimeoutScheduler threads are holding up 
> process termination.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-348) TimeoutScheduler and SlidingWindow should use daemon threads

2018-10-11 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646781#comment-16646781
 ] 

Arpit Agarwal commented on RATIS-348:
-

+1

Looks like LOG_REPEATEDLY is always false for now.

> TimeoutScheduler and SlidingWindow should use daemon threads
> 
>
> Key: RATIS-348
> URL: https://issues.apache.org/jira/browse/RATIS-348
> Project: Ratis
>  Issue Type: Bug
>  Components: client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: r348_20181011b.patch
>
>
> In HDDS-625, we found that the Ozone client does not terminate.  The 
> SlidingWindow (debug) thread and the TimeoutScheduler threads are holding up 
> process termination.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-286) Add information about raft peers and rpc delay in ServerInformationReply

2018-07-30 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562436#comment-16562436
 ] 

Arpit Agarwal commented on RATIS-286:
-

+1 the change lgtm, but it would probably be a good idea to have a look from 
[~szetszwo] or [~elek] before committing.

> Add information about raft peers and rpc delay in ServerInformationReply
> 
>
> Key: RATIS-286
> URL: https://issues.apache.org/jira/browse/RATIS-286
> Project: Ratis
>  Issue Type: Bug
>  Components: server
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: RATIS-286.001.patch
>
>
> In order to detect slowness of a followere/leader node in a ratis ring. The 
> delay in rpc communication between nodes should be tracked. This jira 
> proposes to add new fields to ServerInformationReply to return this 
> information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-284) FollowerInfo#toString should pring the elapsed time from last rpc

2018-07-30 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/RATIS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562422#comment-16562422
 ] 

Arpit Agarwal commented on RATIS-284:
-

+1 (binding)

> FollowerInfo#toString should pring the elapsed time from last rpc
> -
>
> Key: RATIS-284
> URL: https://issues.apache.org/jira/browse/RATIS-284
> Project: Ratis
>  Issue Type: Bug
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: RATIS-284.001.patch
>
>
> FollowerInfo#toString currently prints the absolute time of last rpc, however 
> while debugging ratis issues it will be useful to have last elapsed time from 
> last rpc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (RATIS-6) Project logo

2017-11-20 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/RATIS-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259691#comment-16259691
 ] 

Arpit Agarwal commented on RATIS-6:
---

+1 on this one from [~w...@hortonworks.com] 
-!https://image.ibb.co/fiqCdR/logo_vote.png! 

> Project logo
> 
>
> Key: RATIS-6
> URL: https://issues.apache.org/jira/browse/RATIS-6
> Project: Ratis
>  Issue Type: Task
>Reporter: Enis Soztutar
>Assignee: Will Xu
> Attachments: Artboard 2.png, Ratis-Logo.png, Ratis.png, 
> logo-finalist.png
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)