[jira] [Created] (KUDU-2938) built-in NTP client: implement handling of KoD packets
Alexey Serbin created KUDU-2938: --- Summary: built-in NTP client: implement handling of KoD packets Key: KUDU-2938 URL: https://issues.apache.org/jira/browse/KUDU-2938 Project: Kudu Issue Type: Sub-task Reporter: Alexey Serbin To be RFC-compliant, the built-in NTP client has to properly handle KoD packets, as described in [RFC5905|https://tools.ietf.org/html/rfc5905] {{7.4. The Kiss-o'-Death Packet}} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (KUDU-2936) built-in NTP client: simplified implementation to work with well behaved/trusted servers
[ https://issues.apache.org/jira/browse/KUDU-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-2936: Summary: built-in NTP client: simplified implementation to work with well behaved/trusted servers (was: Implement simplified built-in NTP client to work with well behaved servers) > built-in NTP client: simplified implementation to work with well > behaved/trusted servers > > > Key: KUDU-2936 > URL: https://issues.apache.org/jira/browse/KUDU-2936 > Project: Kudu > Issue Type: Sub-task >Reporter: Alexey Serbin >Assignee: Alexey Serbin >Priority: Major > Labels: clock, ntp > > The first implementation should be able to work with well-behaved NTP > servers, not to be super strict to follow all RFC5905 provisions. It must > accept list of NTP servers to work with, and by default it should be a list > of publicly available NTP servers. > The client must not latch on non-synchronized NTP servers or set of servers > whose true time is too far from each other. > [~tlipcon] has posted a WIP for such an implementation already: see the 'Code > Review' link. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (KUDU-2937) built-in NTP client: implement 'iburst'
Alexey Serbin created KUDU-2937: --- Summary: built-in NTP client: implement 'iburst' Key: KUDU-2937 URL: https://issues.apache.org/jira/browse/KUDU-2937 Project: Kudu Issue Type: Sub-task Reporter: Alexey Serbin For fast initial synchronisation with NTP sources it's necessary to implement iburst-like behavior. >From [RFC5905|https://tools.ietf.org/html/rfc5905] : {noformat} If the BURST flag is lit and the server is reachable and a valid source of synchronization is available, the client sends a burst of BCOUNT (8) packets at each poll interval. The interval between packets in the burst is two seconds. This is useful to accurately measure jitter with long poll intervals. If the IBURST flag is lit and this is the first packet sent when the server has been unreachable, the client sends a burst. This is useful to quickly reduce the synchronization distance below the distance threshold and synchronize the clock. {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (KUDU-2936) Implement simplified built-in NTP client to work with well behaved servers
[ https://issues.apache.org/jira/browse/KUDU-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-2936: Status: In Review (was: In Progress) > Implement simplified built-in NTP client to work with well behaved servers > -- > > Key: KUDU-2936 > URL: https://issues.apache.org/jira/browse/KUDU-2936 > Project: Kudu > Issue Type: Sub-task >Reporter: Alexey Serbin >Assignee: Alexey Serbin >Priority: Major > Labels: clock, ntp > > The first implementation should be able to work with well-behaved NTP > servers, not to be super strict to follow all RFC5905 provisions. It must > accept list of NTP servers to work with, and by default it should be a list > of publicly available NTP servers. > The client must not latch on non-synchronized NTP servers or set of servers > whose true time is too far from each other. > [~tlipcon] has posted a WIP for such an implementation already: see the 'Code > Review' link. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (KUDU-2936) Implement simplified built-in NTP client to work with well behaved servers
Alexey Serbin created KUDU-2936: --- Summary: Implement simplified built-in NTP client to work with well behaved servers Key: KUDU-2936 URL: https://issues.apache.org/jira/browse/KUDU-2936 Project: Kudu Issue Type: Sub-task Reporter: Alexey Serbin Assignee: Alexey Serbin The first implementation should be able to work with well-behaved NTP servers, not to be super strict to follow all RFC5905 provisions. It must accept list of NTP servers to work with, and by default it should be a list of publicly available NTP servers. The client must not latch on non-synchronized NTP servers or set of servers whose true time is too far from each other. [~tlipcon] has posted a WIP for such an implementation already: see the 'Code Review' link. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (KUDU-2935) Implement built-in NTP client
Alexey Serbin created KUDU-2935: --- Summary: Implement built-in NTP client Key: KUDU-2935 URL: https://issues.apache.org/jira/browse/KUDU-2935 Project: Kudu Issue Type: New Feature Components: clock, master, tserver Affects Versions: 1.11.0 Reporter: Alexey Serbin It would be nice to add a stripped-down implementation of built-in NTP client without any reliance on the kernel NTP discipline. The built-in client should maintain wall clock synchronized with NTP servers, and calling {{WalltimeWithError()}} should return wall clock timestamp with the estimation of error/offset from true time. Having built-in NTP client would provide more control over acceptable clock error and jitter acceptable for HybridTime timestamp generation. >From the operability perspective, it would make it easier to run Kudu in >containerized environments and overall make it easier for users to configure >NTP even if they don't have superuser privileges at a node. The very first implementation should be good enough to work with properly configured and well behaving NTP servers, not necessarily being full-featured and 100% RFC-compliant NTP client. Later on, we can add more features and constraints to protect against misbehaving and rogue NTP servers. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (KUDU-2920) Block cache capacity validator shouldn't run on an NVM block cache
[ https://issues.apache.org/jira/browse/KUDU-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo resolved KUDU-2920. -- Fix Version/s: 1.11.0 Resolution: Fixed Fixed in commit 324b8f2. Thanks, [~vladimir_committer]! > Block cache capacity validator shouldn't run on an NVM block cache > -- > > Key: KUDU-2920 > URL: https://issues.apache.org/jira/browse/KUDU-2920 > Project: Kudu > Issue Type: Bug > Components: cfile >Affects Versions: 1.11.0 >Reporter: Adar Dembo >Assignee: Vladimir Verjovkin >Priority: Major > Labels: newbie > Fix For: 1.11.0 > > > As part of KUDU-2318, we added a validator to enforce that the block cache > capacity does not exceed the process' memory pressure threshold (defaults to > 60% of the overall memory limit). This makes sense for a DRAM-based block > cache, which competes for DRAM with Kudu (and thus its capacity should be a > fraction of the overall limit). > However, this doesn't make sense for NVM-based block caches, because the pool > of available NVM is distinct from the pool of available DRAM. In this case, > there really shouldn't be any relationship between the overall memory limit > and the block cache capacity. > We should relax the validator for NVM-based block caches. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (KUDU-2800) Avoid 'unintended' re-replication of long-bootstrapping tablet replicas
[ https://issues.apache.org/jira/browse/KUDU-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-2800: Description: As implemented in https://github.com/apache/kudu/blob/10ea0ce5a636a050a1207f7ab5ecf63d178683f5/src/kudu/consensus/consensus_queue.cc#L576 , the logic for tracking 'health' of tablet replicas cannot differentiate between bootstrapping and failed replicas. As a result, if a tablet replica is bootstrapping for times longer than the interval specified by {{--follower_unavailable_considered_failed_sec}} run-time flag, the system can start the process of re-replication of the tablet replica elsewhere. One option might be sending a specific error with {{ConsensusResponsePB}} in response to a Raft message sent by a leader replica, maybe adding extra information on the current progress of the replica bootstrap process. As soon as such bootstrapping follower replica isn't failing behind leader's WAL GC threshold, the leader replica will not evict it. But if the bootstrapping follower replica falls behind the WAL GC threshold, leader replica will evict it and the system will start re-replicating it elsewhere. In cases when the amount of Raft transactions for a tablet is low, this approach would allow for longer bootstrapping times of tablet replicas. That might be especially beneficial in cases when a tablet server with IO-heavy tablet replicas is being restarted, and there aren't many incoming updates/inserts for tablets hosted by the tablet server. However, the approach above requires the Raft consensus object for a bootstrapping replica to be at least partially functional, so it entails reading at least some information about a replica from the on-disk consensus metadata prior to proper bootstrapping of a tablet replica by a tablet server. was: As implemented in https://github.com/apache/kudu/blob/10ea0ce5a636a050a1207f7ab5ecf63d178683f5/src/kudu/consensus/consensus_queue.cc#L576 , the logic for tracking 'health' of tablet replicas cannot differentiate between bootstrapping and failed replicas. As a result, if a tablet replica is bootstrapping for times longer than the interval specified by {{--follower_unavailable_considered_failed_sec}} run-time flag, the system can start the process of re-replication of the tablet replica elsewhere. One option might be sending a special {{PeerStatus}} for a bootstrapping replica with a response to a Raft message sent by a leader replica and updating the logic referenced above. The response might also include additional information on the current progress of the bootstrap process. Probably, we need add a separate timeout to track a stale bootstrapping replica, so its health would be reported as FAILED after the leader observes the replica being stuck in bootstrapping with no forward progress for a time interval longer than the timeout specified by the new parameter. However, the approach above requires the Raft consensus object for a bootstrapping replica to be at least partially functional, so it entails reading at least some information about a replica from the on-disk consensus metadata prior to proper bootstrapping of a tablet replica by a tablet server. > Avoid 'unintended' re-replication of long-bootstrapping tablet replicas > --- > > Key: KUDU-2800 > URL: https://issues.apache.org/jira/browse/KUDU-2800 > Project: Kudu > Issue Type: Improvement > Components: consensus, tserver >Affects Versions: 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.9.1, 1.10.0 >Reporter: Alexey Serbin >Assignee: Vladimir Verjovkin >Priority: Major > Labels: newbie > > As implemented in > https://github.com/apache/kudu/blob/10ea0ce5a636a050a1207f7ab5ecf63d178683f5/src/kudu/consensus/consensus_queue.cc#L576 > , the logic for tracking 'health' of tablet replicas cannot differentiate > between bootstrapping and failed replicas. > As a result, if a tablet replica is bootstrapping for times longer than the > interval specified by {{--follower_unavailable_considered_failed_sec}} > run-time flag, the system can start the process of re-replication of the > tablet replica elsewhere. > One option might be sending a specific error with {{ConsensusResponsePB}} in > response to a Raft message sent by a leader replica, maybe adding extra > information on the current progress of the replica bootstrap process. As > soon as such bootstrapping follower replica isn't failing behind leader's WAL > GC threshold, the leader replica will not evict it. But if the bootstrapping > follower replica falls behind the WAL GC threshold, leader replica will evict > it and the system will start re-replicating it elsewhere. In cases when the > amount of Raft transactions for a tablet is low,