Dasheng,

If you still have the controller/broker-1 logs during when the first
reassignment happened which caused this issue, it would be good. Otherwise
it will be a little hard to investigate.

For now I think you can try to bounce broker 1, and see if that resolves
the situation.

Guozhang


On Wed, Jul 9, 2014 at 6:58 PM, DashengJu <dashen...@gmail.com> wrote:

> Guozhang,
>
> I will try to re-preduce the problem in our test environment.
>
> And the problem now exist in our production environment for 10 days, is
> there any log or memory dump or zookeeper information useful for you to
> analysis the problem?
> and any ideas to recover from the situation? restart the cluster? delete
> the topic manual?
>
> thx
>
>
> On Thu, Jul 10, 2014 at 5:27 AM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Dasheng,
> >
> > This is indeed wired. Could you easily re-produce the problem, i.e.
> > starting the cluster with a topic' replication factor > 1, then change it
> > to 1. And see if this issue shows up again.
> >
> > Guozhang
> >
> >
> > On Wed, Jul 9, 2014 at 12:09 PM, DashengJu <dashen...@gmail.com> wrote:
> >
> > > thanks for your found, it is weird.
> > >
> > > i checked my broker 1, it is working fine, no error or warn log.
> > >
> > > I checked the data folder, and found replication 10's replicas, isr and
> > > leader is actually on broker 2, works fine.
> > > just now I execute a reassignment operation to move partition 10's
> > replicas
> > > to broker 2, the controller log says"Partition [org.mobile_nginx,10] to
> > be
> > > reassigned is already assigned to replicas 2. ignoring request for
> > > partition reassignment". then describe the topic shows partition10's
> > > replicas became to 2.
> > > 2014年7月9日 PM11:31于 "Guozhang Wang" <wangg...@gmail.com>写道:
> > >
> > > > It seems your broker 1 is in a bad state, besides these two
> partitions
> > > you
> > > > also have partition 10 whose Isr/Leader is not part of the replicas
> > list:
> > > >
> > > > Topic: org.mobile_nginx Partition: 10   Leader: 2       Replicas: 1
> > > > Isr: 2
> > > >
> > > > Maybe you can go to broker 1 and check its logs first.
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Wed, Jul 9, 2014 at 7:32 AM, Jun Rao <jun...@gmail.com> wrote:
> > > >
> > > > > It's weird that you have replication factor 1, but two  of the
> > > partitions
> > > > > 25 and 31 have 2 assigned replicas. What's the command you used for
> > > > > reassignment?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Wed, Jul 9, 2014 at 12:10 AM, 鞠大升 <dashen...@gmail.com> wrote:
> > > > >
> > > > > > @Jun Rao,   Kafka version: 0.8.1.1
> > > > > >
> > > > > > @Guozhang Wang, I can not found the original controller log, but
> I
> > > can
> > > > > give
> > > > > > the controller log after execute
> ./bin/kafka-reassign-partitions.sh
> > > > > > and ./bin/kafka-preferred-replica-election.sh
> > > > > >
> > > > > > Now I do not known how to recover leader for partition 25 and 31,
> > any
> > > > > idea?
> > > > > >
> > > > > > ----------------- controller log for
> > > ./bin/kafka-reassign-partitions.sh
> > > > > > -------------------------------------------
> > > > > > [2014-07-09 15:01:31,552] DEBUG [PartitionsReassignedListener on
> > 5]:
> > > > > > Partitions reassigned listener fired for path
> > > > /admin/reassign_partitions.
> > > > > > Record partitions to be reassigned
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25,"replicas":[6]},{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> > > > > > (kafka.controller.PartitionsReassignedListener)
> > > > > > [2014-07-09 15:01:31,579] INFO [AddPartitionsListener on 5]: Add
> > > > > Partition
> > > > > > triggered
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> > > > > > for path /brokers/topics/org.mobile_nginx
> > > > > > (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> > > > > > [2014-07-09 15:01:31,587] DEBUG [PartitionsReassignedListener on
> > 5]:
> > > > > > Partitions reassigned listener fired for path
> > > > /admin/reassign_partitions.
> > > > > > Record partitions to be reassigned
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> > > > > > (kafka.controller.PartitionsReassignedListener)
> > > > > > [2014-07-09 15:01:31,590] INFO [AddPartitionsListener on 5]: Add
> > > > > Partition
> > > > > > triggered
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> > > > > > for path /brokers/topics/org.mobile_nginx
> > > > > > (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> > > > > >
> > > > > > ----------------- controller log for
> > > ./bin/kafka-reassign-partitions.sh
> > > > > > -------------------------------------------
> > > > > > [2014-07-09 15:07:02,968] DEBUG [PreferredReplicaElectionListener
> > on
> > > > 5]:
> > > > > > Preferred replica election listener fired for path
> > > > > > /admin/preferred_replica_election. Record partitions to undergo
> > > > preferred
> > > > > > replica election
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25},{"topic":"org.mobile_nginx","partition":31}]}
> > > > > > (kafka.controller.PreferredReplicaElectionListener)
> > > > > > [2014-07-09 15:07:02,969] INFO [Controller 5]: Starting preferred
> > > > replica
> > > > > > leader election for partitions
> > > > > [org.mobile_nginx,25],[org.mobile_nginx,31]
> > > > > > (kafka.controller.KafkaController)
> > > > > > [2014-07-09 15:07:02,969] INFO [Partition state machine on
> > Controller
> > > > 5]:
> > > > > > Invoking state change to OnlinePartition for partitions
> > > > > > [org.mobile_nginx,25],[org.mobile_nginx,31]
> > > > > > (kafka.controller.PartitionStateMachine)
> > > > > > [2014-07-09 15:07:02,972] INFO
> > > > [PreferredReplicaPartitionLeaderSelector]:
> > > > > > Current leader -1 for partition [org.mobile_nginx,25] is not the
> > > > > preferred
> > > > > > replica. Trigerring preferred replica leader election
> > > > > > (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> > > > > > [2014-07-09 15:07:02,973] INFO
> > > > [PreferredReplicaPartitionLeaderSelector]:
> > > > > > Current leader -1 for partition [org.mobile_nginx,31] is not the
> > > > > preferred
> > > > > > replica. Trigerring preferred replica leader election
> > > > > > (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> > > > > > [2014-07-09 15:07:02,973] WARN [Controller 5]: Partition
> > > > > > [org.mobile_nginx,25] failed to complete preferred replica leader
> > > > > election.
> > > > > > Leader is -1 (kafka.controller.KafkaController)
> > > > > > [2014-07-09 15:07:02,973] WARN [Controller 5]: Partition
> > > > > > [org.mobile_nginx,31] failed to complete preferred replica leader
> > > > > election.
> > > > > > Leader is -1 (kafka.controller.KafkaController)
> > > > > >
> > > > > >
> > > > > > On Sun, Jul 6, 2014 at 11:47 PM, Jun Rao <jun...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Also, which version of Kafka are you using?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jul 3, 2014 at 2:26 AM, 鞠大升 <dashen...@gmail.com>
> wrote:
> > > > > > >
> > > > > > > > hi, all
> > > > > > > >
> > > > > > > > I have a topic with 32 partitions, after some reassign
> > > operation, 2
> > > > > > > > partitions became to no leader and isr.
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------------------------
> > > > > > > > Topic:org.mobile_nginx  PartitionCount:32
> > > ReplicationFactor:1
> > > > > > > > Configs:
> > > > > > > >         Topic: org.mobile_nginx Partition: 0    Leader: 3
> > > > > > > Replicas: 3
> > > > > > > >     Isr: 3
> > > > > > > >         Topic: org.mobile_nginx Partition: 1    Leader: 4
> > > > > > > Replicas: 4
> > > > > > > >     Isr: 4
> > > > > > > >         Topic: org.mobile_nginx Partition: 2    Leader: 5
> > > > > > > Replicas: 5
> > > > > > > >     Isr: 5
> > > > > > > >         Topic: org.mobile_nginx Partition: 3    Leader: 6
> > > > > > > Replicas: 6
> > > > > > > >     Isr: 6
> > > > > > > >         Topic: org.mobile_nginx Partition: 4    Leader: 3
> > > > > > > Replicas: 3
> > > > > > > >     Isr: 3
> > > > > > > >         Topic: org.mobile_nginx Partition: 5    Leader: 4
> > > > > > > Replicas: 4
> > > > > > > >     Isr: 4
> > > > > > > >         Topic: org.mobile_nginx Partition: 6    Leader: 5
> > > > > > > Replicas: 5
> > > > > > > >     Isr: 5
> > > > > > > >         Topic: org.mobile_nginx Partition: 7    Leader: 6
> > > > > > > Replicas: 6
> > > > > > > >     Isr: 6
> > > > > > > >         Topic: org.mobile_nginx Partition: 8    Leader: 3
> > > > > > > Replicas: 3
> > > > > > > >     Isr: 3
> > > > > > > >         Topic: org.mobile_nginx Partition: 9    Leader: 4
> > > > > > > Replicas: 4
> > > > > > > >     Isr: 4
> > > > > > > >         Topic: org.mobile_nginx Partition: 10   Leader: 2
> > > > > > > Replicas: 1
> > > > > > > >     Isr: 2
> > > > > > > >         Topic: org.mobile_nginx Partition: 11   Leader: 2
> > > > > > > Replicas: 2
> > > > > > > >     Isr: 2
> > > > > > > >         Topic: org.mobile_nginx Partition: 12   Leader: 3
> > > > > > > Replicas: 1
> > > > > > > >     Isr: 3
> > > > > > > >         Topic: org.mobile_nginx Partition: 13   Leader: 2
> > > > > > > Replicas: 2
> > > > > > > >     Isr: 2
> > > > > > > >         Topic: org.mobile_nginx Partition: 14   Leader: 4
> > > > > > > Replicas: 4
> > > > > > > >     Isr: 4
> > > > > > > >         Topic: org.mobile_nginx Partition: 15   Leader: 2
> > > > > > > Replicas: 2
> > > > > > > >     Isr: 2
> > > > > > > >         Topic: org.mobile_nginx Partition: 16   Leader: 4
> > > > > > > Replicas: 4
> > > > > > > >     Isr: 4
> > > > > > > >         Topic: org.mobile_nginx Partition: 17   Leader: 5
> > > > > > > Replicas: 5
> > > > > > > >     Isr: 5
> > > > > > > >         Topic: org.mobile_nginx Partition: 18   Leader: 6
> > > > > > > Replicas: 6
> > > > > > > >     Isr: 6
> > > > > > > >         Topic: org.mobile_nginx Partition: 19   Leader: 5
> > > > > > > Replicas: 5
> > > > > > > >     Isr: 5
> > > > > > > >         Topic: org.mobile_nginx Partition: 20   Leader: 2
> > > > > > > Replicas: 2
> > > > > > > >     Isr: 2
> > > > > > > >         Topic: org.mobile_nginx Partition: 21   Leader: 3
> > > > > > > Replicas: 3
> > > > > > > >     Isr: 3
> > > > > > > >         Topic: org.mobile_nginx Partition: 22   Leader: 4
> > > > > > > Replicas: 4
> > > > > > > >     Isr: 4
> > > > > > > >         Topic: org.mobile_nginx Partition: 23   Leader: 5
> > > > > > > Replicas: 5
> > > > > > > >     Isr: 5
> > > > > > > >         Topic: org.mobile_nginx Partition: 24   Leader: 6
> > > > > > > Replicas: 6
> > > > > > > >     Isr: 6
> > > > > > > >         Topic: org.mobile_nginx Partition: 25   Leader: -1
> > > > > >  Replicas:
> > > > > > > > 6,1   Isr:
> > > > > > > >         Topic: org.mobile_nginx Partition: 26   Leader: 2
> > > > > > > Replicas: 2
> > > > > > > >     Isr: 2
> > > > > > > >         Topic: org.mobile_nginx Partition: 27   Leader: 3
> > > > > > > Replicas: 3
> > > > > > > >     Isr: 3
> > > > > > > >         Topic: org.mobile_nginx Partition: 28   Leader: 4
> > > > > > > Replicas: 4
> > > > > > > >     Isr: 4
> > > > > > > >         Topic: org.mobile_nginx Partition: 29   Leader: 5
> > > > > > > Replicas: 5
> > > > > > > >     Isr: 5
> > > > > > > >         Topic: org.mobile_nginx Partition: 30   Leader: 6
> > > > > > > Replicas: 6
> > > > > > > >     Isr: 6
> > > > > > > >         Topic: org.mobile_nginx Partition: 31   Leader: -1
> > > > > >  Replicas:
> > > > > > > > 3,1   Isr:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------------------------
> > > > > > > > partition-25 and partition-32 have no leader and no isr.
> > > > > > > > No matter reassign or leader election operation, can not
> reduce
> > > > > > replicas
> > > > > > > > number, and can not election a leader for 4 days.
> > > > > > > >
> > > > > > > > Anyone have any idea how to resolve this problem?
> > > > > > > >
> > > > > > > > --
> > > > > > > > dashengju
> > > > > > > > +86 13810875910
> > > > > > > > dashen...@gmail.com
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > dashengju
> > > > > > +86 13810875910
> > > > > > dashen...@gmail.com
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> dashengju
> +86 13810875910
> dashen...@gmail.com
>



-- 
-- Guozhang

Reply via email to