[jira] [Created] (KUDU-2938) built-in NTP client: implement handling of KoD packets

2019-09-13 Thread Alexey Serbin (Jira)
Alexey Serbin created KUDU-2938:
---

 Summary: built-in NTP client: implement handling of KoD packets
 Key: KUDU-2938
 URL: https://issues.apache.org/jira/browse/KUDU-2938
 Project: Kudu
  Issue Type: Sub-task
Reporter: Alexey Serbin


To be RFC-compliant, the built-in NTP client has to properly handle KoD 
packets, as described in [RFC5905|https://tools.ietf.org/html/rfc5905] {{7.4.  
The Kiss-o'-Death Packet}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (KUDU-2936) built-in NTP client: simplified implementation to work with well behaved/trusted servers

2019-09-13 Thread Alexey Serbin (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2936:

Summary: built-in NTP client: simplified implementation to work with well 
behaved/trusted servers  (was: Implement simplified built-in NTP client to work 
with well behaved servers)

> built-in NTP client: simplified implementation to work with well 
> behaved/trusted servers
> 
>
> Key: KUDU-2936
> URL: https://issues.apache.org/jira/browse/KUDU-2936
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: Alexey Serbin
>Assignee: Alexey Serbin
>Priority: Major
>  Labels: clock, ntp
>
> The first implementation should be able to work with well-behaved NTP 
> servers, not to be super strict to follow all RFC5905 provisions.  It must 
> accept list of NTP servers to work with, and by default it should be a list 
> of publicly available NTP servers.
> The client must not latch on non-synchronized NTP servers or set of servers 
> whose true time is too far from each other.
> [~tlipcon] has posted a WIP for such an implementation already: see the 'Code 
> Review' link. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (KUDU-2937) built-in NTP client: implement 'iburst'

2019-09-13 Thread Alexey Serbin (Jira)
Alexey Serbin created KUDU-2937:
---

 Summary: built-in NTP client: implement 'iburst'
 Key: KUDU-2937
 URL: https://issues.apache.org/jira/browse/KUDU-2937
 Project: Kudu
  Issue Type: Sub-task
Reporter: Alexey Serbin


For fast initial synchronisation with NTP sources it's necessary to implement 
iburst-like behavior.

>From [RFC5905|https://tools.ietf.org/html/rfc5905] :
{noformat}
   If the BURST flag is lit and the server is reachable and a valid
   source of synchronization is available, the client sends a burst of
   BCOUNT (8) packets at each poll interval.  The interval between
   packets in the burst is two seconds.  This is useful to accurately
   measure jitter with long poll intervals.  If the IBURST flag is lit
   and this is the first packet sent when the server has been
   unreachable, the client sends a burst.  This is useful to quickly
   reduce the synchronization distance below the distance threshold and
   synchronize the clock.
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (KUDU-2936) Implement simplified built-in NTP client to work with well behaved servers

2019-09-13 Thread Alexey Serbin (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2936:

Status: In Review  (was: In Progress)

> Implement simplified built-in NTP client to work with well behaved servers
> --
>
> Key: KUDU-2936
> URL: https://issues.apache.org/jira/browse/KUDU-2936
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: Alexey Serbin
>Assignee: Alexey Serbin
>Priority: Major
>  Labels: clock, ntp
>
> The first implementation should be able to work with well-behaved NTP 
> servers, not to be super strict to follow all RFC5905 provisions.  It must 
> accept list of NTP servers to work with, and by default it should be a list 
> of publicly available NTP servers.
> The client must not latch on non-synchronized NTP servers or set of servers 
> whose true time is too far from each other.
> [~tlipcon] has posted a WIP for such an implementation already: see the 'Code 
> Review' link. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (KUDU-2936) Implement simplified built-in NTP client to work with well behaved servers

2019-09-13 Thread Alexey Serbin (Jira)
Alexey Serbin created KUDU-2936:
---

 Summary: Implement simplified built-in NTP client to work with 
well behaved servers
 Key: KUDU-2936
 URL: https://issues.apache.org/jira/browse/KUDU-2936
 Project: Kudu
  Issue Type: Sub-task
Reporter: Alexey Serbin
Assignee: Alexey Serbin


The first implementation should be able to work with well-behaved NTP servers, 
not to be super strict to follow all RFC5905 provisions.  It must accept list 
of NTP servers to work with, and by default it should be a list of publicly 
available NTP servers.

The client must not latch on non-synchronized NTP servers or set of servers 
whose true time is too far from each other.

[~tlipcon] has posted a WIP for such an implementation already: see the 'Code 
Review' link. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (KUDU-2935) Implement built-in NTP client

2019-09-13 Thread Alexey Serbin (Jira)
Alexey Serbin created KUDU-2935:
---

 Summary: Implement built-in NTP client
 Key: KUDU-2935
 URL: https://issues.apache.org/jira/browse/KUDU-2935
 Project: Kudu
  Issue Type: New Feature
  Components: clock, master, tserver
Affects Versions: 1.11.0
Reporter: Alexey Serbin


It would be nice to add a stripped-down implementation of built-in NTP client 
without any reliance on the kernel NTP discipline.  The built-in client should 
maintain wall clock synchronized with NTP servers, and calling 
{{WalltimeWithError()}} should return wall clock timestamp with the estimation 
of error/offset from true time.  Having built-in NTP client would provide more 
control over acceptable clock error and jitter acceptable for HybridTime 
timestamp generation.

>From the operability perspective, it would make it easier to run Kudu in 
>containerized environments and overall make it easier for users to configure 
>NTP even if they don't have superuser privileges at a node.

The very first implementation should be good enough to work with properly 
configured and well behaving NTP servers, not necessarily being full-featured 
and 100% RFC-compliant NTP client.  Later on, we can add more features and 
constraints to protect against misbehaving and rogue NTP servers.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (KUDU-2920) Block cache capacity validator shouldn't run on an NVM block cache

2019-09-13 Thread Adar Dembo (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adar Dembo resolved KUDU-2920.
--
Fix Version/s: 1.11.0
   Resolution: Fixed

Fixed in commit 324b8f2. Thanks, [~vladimir_committer]!

> Block cache capacity validator shouldn't run on an NVM block cache
> --
>
> Key: KUDU-2920
> URL: https://issues.apache.org/jira/browse/KUDU-2920
> Project: Kudu
>  Issue Type: Bug
>  Components: cfile
>Affects Versions: 1.11.0
>Reporter: Adar Dembo
>Assignee: Vladimir Verjovkin
>Priority: Major
>  Labels: newbie
> Fix For: 1.11.0
>
>
> As part of KUDU-2318, we added a validator to enforce that the block cache 
> capacity does not exceed the process' memory pressure threshold (defaults to 
> 60% of the overall memory limit). This makes sense for a DRAM-based block 
> cache, which competes for DRAM with Kudu (and thus its capacity should be a 
> fraction of the overall limit).
> However, this doesn't make sense for NVM-based block caches, because the pool 
> of available NVM is distinct from the pool of available DRAM. In this case, 
> there really shouldn't be any relationship between the overall memory limit 
> and the block cache capacity.
> We should relax the validator for NVM-based block caches.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (KUDU-2800) Avoid 'unintended' re-replication of long-bootstrapping tablet replicas

2019-09-13 Thread Alexey Serbin (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2800:

Description: 
As implemented in
https://github.com/apache/kudu/blob/10ea0ce5a636a050a1207f7ab5ecf63d178683f5/src/kudu/consensus/consensus_queue.cc#L576
 , the logic for tracking 'health' of tablet replicas cannot differentiate 
between bootstrapping and failed replicas.

As a result, if a tablet replica is bootstrapping for times longer than the 
interval specified by {{--follower_unavailable_considered_failed_sec}} run-time 
flag, the system can start the process of re-replication of the tablet replica 
elsewhere.

One option might be sending a specific error with {{ConsensusResponsePB}} in 
response to a Raft message sent by a leader replica, maybe adding extra 
information on the current progress of the replica bootstrap process.  As soon 
as such bootstrapping follower replica isn't failing behind leader's WAL GC 
threshold, the leader replica will not evict it.  But if the bootstrapping 
follower replica falls behind the WAL GC threshold, leader replica will evict 
it and the system will start re-replicating it elsewhere.  In cases when the 
amount of Raft transactions for a tablet is low, this approach would allow for 
longer bootstrapping times of tablet replicas.  That might be especially 
beneficial in cases when a tablet server with IO-heavy tablet replicas is being 
restarted, and there aren't many incoming updates/inserts for tablets hosted by 
the tablet server.

However, the approach above requires the Raft consensus object for a 
bootstrapping replica to be at least partially functional, so it entails 
reading at least some information about a replica from the on-disk consensus 
metadata prior to proper bootstrapping of a tablet replica by a tablet server.


  was:
As implemented in
https://github.com/apache/kudu/blob/10ea0ce5a636a050a1207f7ab5ecf63d178683f5/src/kudu/consensus/consensus_queue.cc#L576
 , the logic for tracking 'health' of tablet replicas cannot differentiate 
between bootstrapping and failed replicas.

As a result, if a tablet replica is bootstrapping for times longer than the 
interval specified by {{--follower_unavailable_considered_failed_sec}} run-time 
flag, the system can start the process of re-replication of the tablet replica 
elsewhere.

One option might be sending a special {{PeerStatus}} for a bootstrapping 
replica with a response to a Raft message sent by a leader replica and updating 
the logic referenced above.  The response might also include additional 
information on the current progress of the bootstrap process.  Probably, we 
need add a separate timeout to track a stale bootstrapping replica, so its 
health would be reported as FAILED after the leader observes the replica being 
stuck in bootstrapping with no forward progress for a time interval longer than 
the timeout specified by the new parameter.

However, the approach above requires the Raft consensus object for a 
bootstrapping replica to be at least partially functional, so it entails 
reading at least some information about a replica from the on-disk consensus 
metadata prior to proper bootstrapping of a tablet replica by a tablet server.




> Avoid 'unintended' re-replication of long-bootstrapping tablet replicas
> ---
>
> Key: KUDU-2800
> URL: https://issues.apache.org/jira/browse/KUDU-2800
> Project: Kudu
>  Issue Type: Improvement
>  Components: consensus, tserver
>Affects Versions: 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.9.1, 1.10.0
>Reporter: Alexey Serbin
>Assignee: Vladimir Verjovkin
>Priority: Major
>  Labels: newbie
>
> As implemented in
> https://github.com/apache/kudu/blob/10ea0ce5a636a050a1207f7ab5ecf63d178683f5/src/kudu/consensus/consensus_queue.cc#L576
>  , the logic for tracking 'health' of tablet replicas cannot differentiate 
> between bootstrapping and failed replicas.
> As a result, if a tablet replica is bootstrapping for times longer than the 
> interval specified by {{--follower_unavailable_considered_failed_sec}} 
> run-time flag, the system can start the process of re-replication of the 
> tablet replica elsewhere.
> One option might be sending a specific error with {{ConsensusResponsePB}} in 
> response to a Raft message sent by a leader replica, maybe adding extra 
> information on the current progress of the replica bootstrap process.  As 
> soon as such bootstrapping follower replica isn't failing behind leader's WAL 
> GC threshold, the leader replica will not evict it.  But if the bootstrapping 
> follower replica falls behind the WAL GC threshold, leader replica will evict 
> it and the system will start re-replicating it elsewhere.  In cases when the 
> amount of Raft transactions for a tablet is low,