[jira] [Commented] (KUDU-2323) NON_VOTER replica flapping (repeatedly added and evicted)

2018-04-13 Thread Mike Percy (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438143#comment-16438143
 ] 

Mike Percy commented on KUDU-2323:
--

Based on a conversation with Todd in Slack the underlying issue still looks 
like a race between adding and removing a replica by the master in cluster. 
Having a delayed or retried DeleteTablet() RPC seems more likely under heavy 
load, and adding back the same node after evicting it seems more likely the 
smaller the cluster is. However we still have not been able to reproduce this 
particular bug.

> NON_VOTER replica flapping (repeatedly added and evicted)
> -
>
> Key: KUDU-2323
> URL: https://issues.apache.org/jira/browse/KUDU-2323
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus
>Affects Versions: 1.7.0
>Reporter: Todd Lipcon
>Assignee: Alexey Serbin
>Priority: Major
>
> In running a YCSB stress workload I see a tablet got into some state where 
> the master flapped back and forth adding and then removing a replica as a 
> NON_VOTER:
> {code}
> I0221 21:54:35.341892 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.360297 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.612417 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.713057 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.725723 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.752959 28052 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.767974 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.772202 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.291569 28046 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.296468 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.328945 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.339675 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.387465 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.394716 28047 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.398644 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.405082 28047 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.409888 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.414216 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.417915 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.423548 28048 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.453407 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.552772 28048 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:58:01.300199 28053 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:58:01.426921 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 22:01:37.779790 28051 catalog_manager.cc:3274] Sending 
> 

[jira] [Commented] (KUDU-2323) NON_VOTER replica flapping (repeatedly added and evicted)

2018-03-27 Thread Mike Percy (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416530#comment-16416530
 ] 

Mike Percy commented on KUDU-2323:
--

The analysis in my previous comment was wrong. We actually Close() the 
PeerManager and then reinitialize it when we change the config in 
RaftConsensus::RefreshConsensusQueueAndPeersUnlocked(). So I'm not really sure 
what could cause the behavior seen in the description of this ticket. 

> NON_VOTER replica flapping (repeatedly added and evicted)
> -
>
> Key: KUDU-2323
> URL: https://issues.apache.org/jira/browse/KUDU-2323
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus
>Affects Versions: 1.7.0
>Reporter: Todd Lipcon
>Assignee: Alexey Serbin
>Priority: Major
>
> In running a YCSB stress workload I see a tablet got into some state where 
> the master flapped back and forth adding and then removing a replica as a 
> NON_VOTER:
> {code}
> I0221 21:54:35.341892 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.360297 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.612417 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.713057 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.725723 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.752959 28052 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.767974 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.772202 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.291569 28046 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.296468 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.328945 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.339675 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.387465 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.394716 28047 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.398644 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.405082 28047 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.409888 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.414216 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.417915 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.423548 28048 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.453407 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.552772 28048 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:58:01.300199 28053 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:58:01.426921 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 22:01:37.779790 28051 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> {code}



--
This message was

[jira] [Commented] (KUDU-2323) NON_VOTER replica flapping (repeatedly added and evicted)

2018-03-26 Thread Mike Percy (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414801#comment-16414801
 ] 

Mike Percy commented on KUDU-2323:
--

While the speed of this cycle was likely fixed by the patch to fix KUDU-2320, 
it appears there is no code path to remove a TrackedPeer when it gets evicted. 
While this could cause a minor resource leak until a leader was evicted in a 
3-2-3 world, in a 3-4-3 world it affects last_communcation_time and can 
therefore make a downed NON_VOTER to be considered FAILED as soon as it is 
added to the config.

Maybe this is also interacting with KUDU-2354, in which there are certain cases 
that can cause a catalog manager task to endlessly retry adding a new replica.

> NON_VOTER replica flapping (repeatedly added and evicted)
> -
>
> Key: KUDU-2323
> URL: https://issues.apache.org/jira/browse/KUDU-2323
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus
>Affects Versions: 1.7.0
>Reporter: Todd Lipcon
>Assignee: Alexey Serbin
>Priority: Major
>
> In running a YCSB stress workload I see a tablet got into some state where 
> the master flapped back and forth adding and then removing a replica as a 
> NON_VOTER:
> {code}
> I0221 21:54:35.341892 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.360297 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.612417 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.713057 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.725723 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.752959 28052 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.767974 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.772202 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.291569 28046 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.296468 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.328945 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.339675 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.387465 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.394716 28047 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.398644 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.405082 28047 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.409888 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.414216 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.417915 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.423548 28048 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.453407 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.552772 28048 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:58:01.300199 28053 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:58:01.426921 28046 catalog_manager.cc:3162] Sen

[jira] [Commented] (KUDU-2323) NON_VOTER replica flapping (repeatedly added and evicted)

2018-03-20 Thread Mike Percy (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407272#comment-16407272
 ] 

Mike Percy commented on KUDU-2323:
--

It seems this should have been partially fixed by the fix for KUDU-2320.

> NON_VOTER replica flapping (repeatedly added and evicted)
> -
>
> Key: KUDU-2323
> URL: https://issues.apache.org/jira/browse/KUDU-2323
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus
>Affects Versions: 1.7.0
>Reporter: Todd Lipcon
>Assignee: Alexey Serbin
>Priority: Major
>
> In running a YCSB stress workload I see a tablet got into some state where 
> the master flapped back and forth adding and then removing a replica as a 
> NON_VOTER:
> {code}
> I0221 21:54:35.341892 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.360297 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.612417 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.713057 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.725723 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.752959 28052 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.767974 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.772202 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.291569 28046 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.296468 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.328945 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.339675 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.387465 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.394716 28047 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.398644 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.405082 28047 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.409888 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.414216 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.417915 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.423548 28048 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.453407 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.552772 28048 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:58:01.300199 28053 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:58:01.426921 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 22:01:37.779790 28051 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2323) NON_VOTER replica flapping (repeatedly added and evicted)

2018-02-22 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373030#comment-16373030
 ] 

Jean-Daniel Cryans commented on KUDU-2323:
--

Or [~aserbin].

> NON_VOTER replica flapping (repeatedly added and evicted)
> -
>
> Key: KUDU-2323
> URL: https://issues.apache.org/jira/browse/KUDU-2323
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus
>Affects Versions: 1.7.0
>Reporter: Todd Lipcon
>Priority: Major
>
> In running a YCSB stress workload I see a tablet got into some state where 
> the master flapped back and forth adding and then removing a replica as a 
> NON_VOTER:
> {code}
> I0221 21:54:35.341892 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.360297 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.612417 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.713057 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.725723 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.752959 28052 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.767974 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.772202 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.291569 28046 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.296468 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.328945 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.339675 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.387465 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.394716 28047 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.398644 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.405082 28047 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.409888 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.414216 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.417915 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.423548 28048 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.453407 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.552772 28048 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:58:01.300199 28053 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:58:01.426921 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 22:01:37.779790 28051 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2323) NON_VOTER replica flapping (repeatedly added and evicted)

2018-02-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372471#comment-16372471
 ] 

Todd Lipcon commented on KUDU-2323:
---

Not 100% following what's going on, but it seems like what happened is the 
following:
- the NON_VOTER peer had fallen behind the WAL retention
- it got evicted by the leader
- the master decided to add back the same node that just got evicted.
-- it sends a DELETE request to the TS to delete the old (un-catch-up-able) 
replica
-- simultaneously it sends an ADD_PEER request to the leader to add back the 
same peer as a non-voter

The two things race,, and in this case for some reason the TS Admin Service was 
fairly blocked up and the Delete request took 10+ seconds to eventually get 
through. Thus, the leader received the ADD_PEER before the replica had been 
deleted, causing it to add back the same stale replica as before. Of course as 
soon as it connected it saw that it was stale and decided to evict it.

What I'm not 100% sure of is why it kept choosing the same peer over and over 
again. The cluster has 5 servers so seems like there should have been another 
choice. Perhaps due to the load balancing algorithm this one was chosen 
overwhelmingly often?

[~mpercy] do you mind taking a look since this might be a regression since 1.6?

> NON_VOTER replica flapping (repeatedly added and evicted)
> -
>
> Key: KUDU-2323
> URL: https://issues.apache.org/jira/browse/KUDU-2323
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus
>Affects Versions: 1.7.0
>Reporter: Todd Lipcon
>Priority: Major
>
> In running a YCSB stress workload I see a tablet got into some state where 
> the master flapped back and forth adding and then removing a replica as a 
> NON_VOTER:
> {code}
> I0221 21:54:35.341892 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.360297 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.612417 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.713057 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.725723 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.752959 28052 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:35.767974 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:35.772202 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.291569 28046 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.296468 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.328945 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.339675 28045 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.387465 28045 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.394716 28047 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.398644 28047 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.405082 28047 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.409888 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.414216 28046 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e9f2148c38e441f19aa230872 
> (attempt 1)
> I0221 21:54:36.417915 28048 catalog_manager.cc:3274] Sending 
> ChangeConfig:REMOVE_PEER on tablet 4ffc930e9f2148c38e441f19aa230872 (attempt 
> 1)
> I0221 21:54:36.423548 28048 catalog_manager.cc:3162] Sending 
> ChangeConfig:ADD_PEER:NON_VOTER on tablet 4ffc930e