[jira] [Commented] (KAFKA-911) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request

2013-05-22 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13664277#comment-13664277
 ] 

Neha Narkhede commented on KAFKA-911:
-

testShutdownBroker() testcase in AdminTest will fail with this patch since it 
assumes that the controlled shutdown logic will shrink the ISR proactively. I 
will fix the test if the changes in this patch of not shrinking the ISR are 
acceptable.

> Bug in controlled shutdown logic in controller leads to controller not 
> sending out some state change request 
> -
>
> Key: KAFKA-911
> URL: https://issues.apache.org/jira/browse/KAFKA-911
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>Priority: Blocker
>  Labels: kafka-0.8, p1
> Attachments: kafka-911-v1.patch
>
>
> The controlled shutdown logic in the controller first tries to move the 
> leaders from the broker being shutdown. Then it tries to remove the broker 
> from the isr list. During that operation, it does not synchronize on the 
> controllerLock. This causes a race condition while dispatching data using the 
> controller's channel manager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-911) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request

2013-05-23 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13665258#comment-13665258
 ] 

Jun Rao commented on KAFKA-911:
---

If we just stop the replica to be shut down without sending a reduced ISR to 
the leader, it will take replicaLagTimeMaxMs (defaults to 10s) before the 
leader realize that the follower is gone. Before that, no new messages can be 
committed. The idea of letting the controller send a reduced ISR to the leader 
is to allow the leader to commit new messages sooner. Not very sure if the 
existing logic does this effectively though. It seems to me that it's better if 
we stop the shutdown replica one at a time after the leader is moved. Maybe 
Joel can comment?

> Bug in controlled shutdown logic in controller leads to controller not 
> sending out some state change request 
> -
>
> Key: KAFKA-911
> URL: https://issues.apache.org/jira/browse/KAFKA-911
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>Priority: Blocker
>  Labels: kafka-0.8, p1
> Attachments: kafka-911-v1.patch
>
>
> The controlled shutdown logic in the controller first tries to move the 
> leaders from the broker being shutdown. Then it tries to remove the broker 
> from the isr list. During that operation, it does not synchronize on the 
> controllerLock. This causes a race condition while dispatching data using the 
> controller's channel manager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-911) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request

2013-05-24 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666445#comment-13666445
 ] 

Neha Narkhede commented on KAFKA-911:
-

You are right that we can send the reduced ISR request to the leader, but that 
is independent of removing the shutting down broker from the ISR in zookeeper. 
I'm arguing that the zookeeper write is unnecessary. To handle the issue you 
described, we can send a leader and isr request just to the leader with the 
reduced isr.

> Bug in controlled shutdown logic in controller leads to controller not 
> sending out some state change request 
> -
>
> Key: KAFKA-911
> URL: https://issues.apache.org/jira/browse/KAFKA-911
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>Priority: Blocker
>  Labels: kafka-0.8, p1
> Attachments: kafka-911-v1.patch
>
>
> The controlled shutdown logic in the controller first tries to move the 
> leaders from the broker being shutdown. Then it tries to remove the broker 
> from the isr list. During that operation, it does not synchronize on the 
> controllerLock. This causes a race condition while dispatching data using the 
> controller's channel manager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-911) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request

2013-05-24 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666532#comment-13666532
 ] 

Joel Koshy commented on KAFKA-911:
--

I had to revisit the notes from KAFKA-340. I think this was touched upon. i.e., 
the fact that the current implementation's attempt to shrink ISR may be 
ineffective for partitions whose leadership has been moved from the current 
broker - 
https://issues.apache.org/jira/browse/KAFKA-340?focusedCommentId=13483478&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13483478


> 3.4 What is the point of sending leader and isr request at the end of 
> shutdownBroker, since the OfflineReplica state 
> change would've taken care of that anyway. It seems like you just need to 
> send the stop replica request with the delete 
> partitions flag turned off, no ? 

I still need (as an optimization) to send the leader and isr request to the 
leaders of all partitions that are present 
on the shutting down broker so it can remove the shutting down broker from its 
inSyncReplicas cache 
(in Partition.scala) so it no longer waits for acks from the shutting down 
broker if a producer request's num-acks is 
set to -1. Otherwise, we have to wait for the leader to "organically" shrink 
the ISR. 

This also applies to partitions which are moved (i.e., partitions for which the 
shutting down broker was the leader): 
the ControlledShutdownLeaderSelector needs to send the updated leaderAndIsr 
request to the shutting down broker as well 
(to tell it that it is no longer the leader) at which point it will start up a 
replica fetcher and re-enter the ISR. 
So in fact, there is actually not much point in removing the "current leader" 
from the ISR in the 
ControlledShutdownLeaderSelector.selectLeader. 


and 

https://issues.apache.org/jira/browse/KAFKA-340?focusedCommentId=13484727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13484727
(I don't think I actually filed that jira though.)


> Bug in controlled shutdown logic in controller leads to controller not 
> sending out some state change request 
> -
>
> Key: KAFKA-911
> URL: https://issues.apache.org/jira/browse/KAFKA-911
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>Priority: Blocker
>  Labels: kafka-0.8, p1
> Attachments: kafka-911-v1.patch
>
>
> The controlled shutdown logic in the controller first tries to move the 
> leaders from the broker being shutdown. Then it tries to remove the broker 
> from the isr list. During that operation, it does not synchronize on the 
> controllerLock. This causes a race condition while dispatching data using the 
> controller's channel manager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-911) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request

2013-05-28 Thread Sriram Subramanian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668466#comment-13668466
 ] 

Sriram Subramanian commented on KAFKA-911:
--

I suggest we wait for my patch. My patch changes quite a bit of this logic and 
it just adds to the merge problem.

> Bug in controlled shutdown logic in controller leads to controller not 
> sending out some state change request 
> -
>
> Key: KAFKA-911
> URL: https://issues.apache.org/jira/browse/KAFKA-911
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>Priority: Blocker
>  Labels: kafka-0.8, p1
> Attachments: kafka-911-v1.patch, kafka-911-v2.patch
>
>
> The controlled shutdown logic in the controller first tries to move the 
> leaders from the broker being shutdown. Then it tries to remove the broker 
> from the isr list. During that operation, it does not synchronize on the 
> controllerLock. This causes a race condition while dispatching data using the 
> controller's channel manager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-911) Bug in controlled shutdown logic in controller leads to controller not sending out some state change request

2013-07-03 Thread Sriram Subramanian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699500#comment-13699500
 ] 

Sriram Subramanian commented on KAFKA-911:
--

This has been fixed.

> Bug in controlled shutdown logic in controller leads to controller not 
> sending out some state change request 
> -
>
> Key: KAFKA-911
> URL: https://issues.apache.org/jira/browse/KAFKA-911
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>Priority: Blocker
>  Labels: kafka-0.8, p1
> Attachments: kafka-911-v1.patch, kafka-911-v2.patch
>
>
> The controlled shutdown logic in the controller first tries to move the 
> leaders from the broker being shutdown. Then it tries to remove the broker 
> from the isr list. During that operation, it does not synchronize on the 
> controllerLock. This causes a race condition while dispatching data using the 
> controller's channel manager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira