Re: topic's partition have no leader and isr

2014-07-09 Thread 鞠大升
@Jun Rao,   Kafka version: 0.8.1.1

@Guozhang Wang, I can not found the original controller log, but I can give
the controller log after execute ./bin/kafka-reassign-partitions.sh
and ./bin/kafka-preferred-replica-election.sh

Now I do not known how to recover leader for partition 25 and 31, any idea?

- controller log for ./bin/kafka-reassign-partitions.sh
---
[2014-07-09 15:01:31,552] DEBUG [PartitionsReassignedListener on 5]:
Partitions reassigned listener fired for path /admin/reassign_partitions.
Record partitions to be reassigned
{"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25,"replicas":[6]},{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
(kafka.controller.PartitionsReassignedListener)
[2014-07-09 15:01:31,579] INFO [AddPartitionsListener on 5]: Add Partition
triggered
{"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
for path /brokers/topics/org.mobile_nginx
(kafka.controller.PartitionStateMachine$AddPartitionsListener)
[2014-07-09 15:01:31,587] DEBUG [PartitionsReassignedListener on 5]:
Partitions reassigned listener fired for path /admin/reassign_partitions.
Record partitions to be reassigned
{"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
(kafka.controller.PartitionsReassignedListener)
[2014-07-09 15:01:31,590] INFO [AddPartitionsListener on 5]: Add Partition
triggered
{"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
for path /brokers/topics/org.mobile_nginx
(kafka.controller.PartitionStateMachine$AddPartitionsListener)

- controller log for ./bin/kafka-reassign-partitions.sh
---
[2014-07-09 15:07:02,968] DEBUG [PreferredReplicaElectionListener on 5]:
Preferred replica election listener fired for path
/admin/preferred_replica_election. Record partitions to undergo preferred
replica election
{"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25},{"topic":"org.mobile_nginx","partition":31}]}
(kafka.controller.PreferredReplicaElectionListener)
[2014-07-09 15:07:02,969] INFO [Controller 5]: Starting preferred replica
leader election for partitions [org.mobile_nginx,25],[org.mobile_nginx,31]
(kafka.controller.KafkaController)
[2014-07-09 15:07:02,969] INFO [Partition state machine on Controller 5]:
Invoking state change to OnlinePartition for partitions
[org.mobile_nginx,25],[org.mobile_nginx,31]
(kafka.controller.PartitionStateMachine)
[2014-07-09 15:07:02,972] INFO [PreferredReplicaPartitionLeaderSelector]:
Current leader -1 for partition [org.mobile_nginx,25] is not the preferred
replica. Trigerring preferred replica leader election
(kafka.controller.PreferredReplicaPartitionLeaderSelector)
[2014-07-09 15:07:02,973] INFO [PreferredReplicaPartitionLeaderSelector]:
Current leader -1 for partition [org.mobile_nginx,31] is not the preferred
replica. Trigerring preferred replica leader election
(kafka.controller.PreferredReplicaPartitionLeaderSelector)
[2014-07-09 15:07:02,973] WARN [Controller 5]: Partition
[org.mobile_nginx,25] failed to complete preferred replica leader election.
Leader is -1 (kafka.controller.KafkaController)
[2014-07-09 15:07:02,973] WARN [Controller 5]: Partition
[org.mobile_nginx,31] failed to complete preferred replica leader election.
Leader is -1 (kafka.controller.KafkaController)


On Sun, Jul 6, 2014 at 11:47 PM, Jun Rao  wrote:

> Also, which version of Kafka are you using?
>
> Thanks,
>
> Jun
>
>
> On Thu, Jul 3, 2014 at 2:26 AM, 鞠大升  wrote:
>
> > hi, all
> >
> > I have a topic with 32 partitions, after some reassign operation, 2
> > partitions became to no leader and isr.
> >
> >
> ---
> > Topic:org.mobile_nginx  PartitionCount:32   ReplicationFactor:1
> > Configs:
> > Topic: org.mobile_nginx Partition: 0Leader: 3
> Replicas: 3
> > Isr: 3
> > Topic: org.mobile_nginx Partition: 1Leader: 4
> Replicas: 4
> > Isr: 4
> > Topic: org.mobile_nginx Partition: 2Leader: 5
> Replicas: 5
> > Isr: 5
> > Topic: org.mobile_nginx Partition: 3Leader: 6
> Replicas: 6
> > Isr: 6
> > Topic: org.mobile_nginx Partition: 4Leader: 3
> Replicas: 3
> > Isr: 3
> > Topic: org.mobile_nginx Partition: 5Leader: 4
> Replicas: 4
> > Isr: 4
> > Topic: org.mobile_nginx Partition

Re: topic's partition have no leader and isr

2014-07-09 Thread 鞠大升
I think @chenlax has encounter the same problem with me in
us...@kafka.apache.org with titile "How recover leader when broker restart".
cc to us...@kafka.apache.org.


On Wed, Jul 9, 2014 at 3:10 PM, 鞠大升  wrote:

> @Jun Rao,   Kafka version: 0.8.1.1
>
> @Guozhang Wang, I can not found the original controller log, but I can
> give the controller log after execute ./bin/kafka-reassign-partitions.sh
> and ./bin/kafka-preferred-replica-election.sh
>
> Now I do not known how to recover leader for partition 25 and 31, any idea?
>
> - controller log for ./bin/kafka-reassign-partitions.sh
> ---
> [2014-07-09 15:01:31,552] DEBUG [PartitionsReassignedListener on 5]:
> Partitions reassigned listener fired for path /admin/reassign_partitions.
> Record partitions to be reassigned
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25,"replicas":[6]},{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> (kafka.controller.PartitionsReassignedListener)
> [2014-07-09 15:01:31,579] INFO [AddPartitionsListener on 5]: Add Partition
> triggered
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> for path /brokers/topics/org.mobile_nginx
> (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> [2014-07-09 15:01:31,587] DEBUG [PartitionsReassignedListener on 5]:
> Partitions reassigned listener fired for path /admin/reassign_partitions.
> Record partitions to be reassigned
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> (kafka.controller.PartitionsReassignedListener)
> [2014-07-09 15:01:31,590] INFO [AddPartitionsListener on 5]: Add Partition
> triggered
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> for path /brokers/topics/org.mobile_nginx
> (kafka.controller.PartitionStateMachine$AddPartitionsListener)
>
> - controller log for ./bin/kafka-reassign-partitions.sh
> ---
> [2014-07-09 15:07:02,968] DEBUG [PreferredReplicaElectionListener on 5]:
> Preferred replica election listener fired for path
> /admin/preferred_replica_election. Record partitions to undergo preferred
> replica election
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25},{"topic":"org.mobile_nginx","partition":31}]}
> (kafka.controller.PreferredReplicaElectionListener)
> [2014-07-09 15:07:02,969] INFO [Controller 5]: Starting preferred replica
> leader election for partitions [org.mobile_nginx,25],[org.mobile_nginx,31]
> (kafka.controller.KafkaController)
> [2014-07-09 15:07:02,969] INFO [Partition state machine on Controller 5]:
> Invoking state change to OnlinePartition for partitions
> [org.mobile_nginx,25],[org.mobile_nginx,31]
> (kafka.controller.PartitionStateMachine)
> [2014-07-09 15:07:02,972] INFO [PreferredReplicaPartitionLeaderSelector]:
> Current leader -1 for partition [org.mobile_nginx,25] is not the preferred
> replica. Trigerring preferred replica leader election
> (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> [2014-07-09 15:07:02,973] INFO [PreferredReplicaPartitionLeaderSelector]:
> Current leader -1 for partition [org.mobile_nginx,31] is not the preferred
> replica. Trigerring preferred replica leader election
> (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> [2014-07-09 15:07:02,973] WARN [Controller 5]: Partition
> [org.mobile_nginx,25] failed to complete preferred replica leader election.
> Leader is -1 (kafka.controller.KafkaController)
> [2014-07-09 15:07:02,973] WARN [Controller 5]: Partition
> [org.mobile_nginx,31] failed to complete preferred replica leader election.
> Leader is -1 (kafka.controller.KafkaController)
>
>
> On Sun, Jul 6, 2014 at 11:47 PM, Jun Rao  wrote:
>
>> Also, which version of Kafka are you using?
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Thu, Jul 3, 2014 at 2:26 AM, 鞠大升  wrote:
>>
>> > hi, all
>> >
>> > I have a topic with 32 partitions, after some reassign operation, 2
>> > partitions became to no leader and isr.
>> >
>> >
>> ---
>> > Topic:org.mobile_nginx  PartitionCount:32   ReplicationFactor:1
>> > Configs:
>> > Topic: org.mobile_nginx Partition: 0Leader: 3
>> Replicas: 3
>> > Isr: 3
>> > Topic: org.mobile_nginx Partition: 1Leader: 4
>> Replicas: 4
>> > Isr: 4
>> > Topic: org.mobile_nginx Partition: 2  

[jira] [Created] (KAFKA-1531) zookeeper.connection.timeout.ms is set to 10000000 in configuration file in Kafka tarball

2014-07-09 Thread JIRA
Michał Michalski created KAFKA-1531:
---

 Summary: zookeeper.connection.timeout.ms is set to 1000 in 
configuration file in Kafka tarball
 Key: KAFKA-1531
 URL: https://issues.apache.org/jira/browse/KAFKA-1531
 Project: Kafka
  Issue Type: Bug
  Components: config
Affects Versions: 0.8.1.1, 0.8.0
Reporter: Michał Michalski


I've noticed that Kafka tarball comes with 
zookeeper.connection.timeout.ms=100 in server.properties while 
https://kafka.apache.org/08/documentation.html says the default is 6000. This 
setting was introduced in configuration file in 46b6144a, so quite a long time 
ago (3 years), so it looks intentional, but as per Jun Rao's comment on IRC, 
6000 sounds more reasonable, so that entry should probably be changed or 
removed from config at all.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1531) zookeeper.connection.timeout.ms is set to 10000000 in configuration file in Kafka tarball

2014-07-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/KAFKA-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michał Michalski updated KAFKA-1531:


Description: I've noticed that Kafka tarball comes with 
zookeeper.connection.timeout.ms=100 in server.properties while 
https://kafka.apache.org/08/documentation.html says the default is 6000. This 
setting was introduced in configuration file in 46b6144a, so quite a long time 
ago (3 years), which makes it look intentional, but as per Jun Rao's comment on 
IRC, 6000 sounds more reasonable, so that entry should probably be changed or 
removed from config at all.  (was: I've noticed that Kafka tarball comes with 
zookeeper.connection.timeout.ms=100 in server.properties while 
https://kafka.apache.org/08/documentation.html says the default is 6000. This 
setting was introduced in configuration file in 46b6144a, so quite a long time 
ago (3 years), so it looks intentional, but as per Jun Rao's comment on IRC, 
6000 sounds more reasonable, so that entry should probably be changed or 
removed from config at all.)

> zookeeper.connection.timeout.ms is set to 1000 in configuration file in 
> Kafka tarball
> -
>
> Key: KAFKA-1531
> URL: https://issues.apache.org/jira/browse/KAFKA-1531
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Affects Versions: 0.8.0, 0.8.1.1
>Reporter: Michał Michalski
>
> I've noticed that Kafka tarball comes with 
> zookeeper.connection.timeout.ms=100 in server.properties while 
> https://kafka.apache.org/08/documentation.html says the default is 6000. This 
> setting was introduced in configuration file in 46b6144a, so quite a long 
> time ago (3 years), which makes it look intentional, but as per Jun Rao's 
> comment on IRC, 6000 sounds more reasonable, so that entry should probably be 
> changed or removed from config at all.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1530) howto update continuously

2014-07-09 Thread Stanislav Gilmulin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056049#comment-14056049
 ] 

Stanislav Gilmulin commented on KAFKA-1530:
---

Thank you,
i'd like to ask some questions.
If a cluster has a lot of nodes and a few of them are lagging or down, we can't 
guarantee we would stop and restart nodes properly and in the right order  
Is there any recommended way to manage it?
Or even an already existing script or tool for it?
Replication factor = 3.
Version 0.8.1.1

> howto update continuously
> -
>
> Key: KAFKA-1530
> URL: https://issues.apache.org/jira/browse/KAFKA-1530
> Project: Kafka
>  Issue Type: Wish
>Reporter: Stanislav Gilmulin
>Priority: Minor
>  Labels: operating_manual, performance
>
> Hi,
>  
> Could I ask you a question about the Kafka update procedure?
> Is there a way to update software, which doesn't require service interruption 
> or lead to data losses?
> We can't stop message brokering during the update as we have a strict SLA.
>  
> Best regards
> Stanislav Gilmulin



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1414) Speedup broker startup after hard reset

2014-07-09 Thread Alexey Ozeritskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Ozeritskiy updated KAFKA-1414:
-

Attachment: parallel-dir-loading-trunk-threadpool.patch

> Speedup broker startup after hard reset
> ---
>
> Key: KAFKA-1414
> URL: https://issues.apache.org/jira/browse/KAFKA-1414
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.2, 0.9.0, 0.8.1.1
>Reporter: Dmitry Bugaychenko
>Assignee: Jay Kreps
> Attachments: parallel-dir-loading-0.8.patch, 
> parallel-dir-loading-trunk-threadpool.patch, parallel-dir-loading-trunk.patch
>
>
> After hard reset due to power failure broker takes way too much time 
> recovering unflushed segments in a single thread. This could be easiliy 
> improved launching multiple threads (one per data dirrectory, assuming that 
> typically each data directory is on a dedicated drive). Localy we trie this 
> simple patch to LogManager.loadLogs and it seems to work, however I'm too new 
> to scala, so do not take it literally:
> {code}
>   /**
>* Recover and load all logs in the given data directories
>*/
>   private def loadLogs(dirs: Seq[File]) {
> val threads : Array[Thread] = new Array[Thread](dirs.size)
> var i: Int = 0
> val me = this
> for(dir <- dirs) {
>   val thread = new Thread( new Runnable {
> def run()
> {
>   val recoveryPoints = me.recoveryPointCheckpoints(dir).read
>   /* load the logs */
>   val subDirs = dir.listFiles()
>   if(subDirs != null) {
> val cleanShutDownFile = new File(dir, Log.CleanShutdownFile)
> if(cleanShutDownFile.exists())
>   info("Found clean shutdown file. Skipping recovery for all logs 
> in data directory '%s'".format(dir.getAbsolutePath))
> for(dir <- subDirs) {
>   if(dir.isDirectory) {
> info("Loading log '" + dir.getName + "'")
> val topicPartition = Log.parseTopicPartitionName(dir.getName)
> val config = topicConfigs.getOrElse(topicPartition.topic, 
> defaultConfig)
> val log = new Log(dir,
>   config,
>   recoveryPoints.getOrElse(topicPartition, 0L),
>   scheduler,
>   time)
> val previous = addLogWithLock(topicPartition, log)
> if(previous != null)
>   throw new IllegalArgumentException("Duplicate log 
> directories found: %s, %s!".format(log.dir.getAbsolutePath, 
> previous.dir.getAbsolutePath))
>   }
> }
> cleanShutDownFile.delete()
>   }
> }
>   })
>   thread.start()
>   threads(i) = thread
>   i = i + 1
> }
> for(thread <- threads) {
>   thread.join()
> }
>   }
>   def addLogWithLock(topicPartition: TopicAndPartition, log: Log): Log = {
> logCreationOrDeletionLock synchronized {
>   this.logs.put(topicPartition, log)
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1414) Speedup broker startup after hard reset

2014-07-09 Thread Alexey Ozeritskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056065#comment-14056065
 ] 

Alexey Ozeritskiy commented on KAFKA-1414:
--

You are right. I have made new version with threadpool. It works with all scala 
versions (since 2.8.0).

> Speedup broker startup after hard reset
> ---
>
> Key: KAFKA-1414
> URL: https://issues.apache.org/jira/browse/KAFKA-1414
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.2, 0.9.0, 0.8.1.1
>Reporter: Dmitry Bugaychenko
>Assignee: Jay Kreps
> Attachments: parallel-dir-loading-0.8.patch, 
> parallel-dir-loading-trunk-threadpool.patch, parallel-dir-loading-trunk.patch
>
>
> After hard reset due to power failure broker takes way too much time 
> recovering unflushed segments in a single thread. This could be easiliy 
> improved launching multiple threads (one per data dirrectory, assuming that 
> typically each data directory is on a dedicated drive). Localy we trie this 
> simple patch to LogManager.loadLogs and it seems to work, however I'm too new 
> to scala, so do not take it literally:
> {code}
>   /**
>* Recover and load all logs in the given data directories
>*/
>   private def loadLogs(dirs: Seq[File]) {
> val threads : Array[Thread] = new Array[Thread](dirs.size)
> var i: Int = 0
> val me = this
> for(dir <- dirs) {
>   val thread = new Thread( new Runnable {
> def run()
> {
>   val recoveryPoints = me.recoveryPointCheckpoints(dir).read
>   /* load the logs */
>   val subDirs = dir.listFiles()
>   if(subDirs != null) {
> val cleanShutDownFile = new File(dir, Log.CleanShutdownFile)
> if(cleanShutDownFile.exists())
>   info("Found clean shutdown file. Skipping recovery for all logs 
> in data directory '%s'".format(dir.getAbsolutePath))
> for(dir <- subDirs) {
>   if(dir.isDirectory) {
> info("Loading log '" + dir.getName + "'")
> val topicPartition = Log.parseTopicPartitionName(dir.getName)
> val config = topicConfigs.getOrElse(topicPartition.topic, 
> defaultConfig)
> val log = new Log(dir,
>   config,
>   recoveryPoints.getOrElse(topicPartition, 0L),
>   scheduler,
>   time)
> val previous = addLogWithLock(topicPartition, log)
> if(previous != null)
>   throw new IllegalArgumentException("Duplicate log 
> directories found: %s, %s!".format(log.dir.getAbsolutePath, 
> previous.dir.getAbsolutePath))
>   }
> }
> cleanShutDownFile.delete()
>   }
> }
>   })
>   thread.start()
>   threads(i) = thread
>   i = i + 1
> }
> for(thread <- threads) {
>   thread.join()
> }
>   }
>   def addLogWithLock(topicPartition: TopicAndPartition, log: Log): Log = {
> logCreationOrDeletionLock synchronized {
>   this.logs.put(topicPartition, log)
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1406) Fix scaladoc/javadoc warnings

2014-07-09 Thread Alan Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056147#comment-14056147
 ] 

Alan Lee commented on KAFKA-1406:
-

Would it be okay if I work on this?

> Fix scaladoc/javadoc warnings
> -
>
> Key: KAFKA-1406
> URL: https://issues.apache.org/jira/browse/KAFKA-1406
> Project: Kafka
>  Issue Type: Bug
>  Components: packaging
>Reporter: Joel Koshy
>  Labels: build
> Fix For: 0.8.2
>
>
> ./gradlew docsJarAll
> You will see a bunch of warnings mainly due to typos/incorrect use of 
> javadoc/scaladoc



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (KAFKA-1532) Move CRC32 to AFTER the payload

2014-07-09 Thread Julian Morrison (JIRA)
Julian Morrison created KAFKA-1532:
--

 Summary: Move CRC32 to AFTER the payload
 Key: KAFKA-1532
 URL: https://issues.apache.org/jira/browse/KAFKA-1532
 Project: Kafka
  Issue Type: Improvement
  Components: core, producer 
Reporter: Julian Morrison
Assignee: Jun Rao
Priority: Minor


To support streaming a message of known length but unknown content, take the 
CRC32 out of the message header and make it a message trailer. Then client 
libraries can calculate it after streaming the message to Kafka, without 
materializing the whole message in RAM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1325) Fix inconsistent per topic log configs

2014-07-09 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056258#comment-14056258
 ] 

Manikumar Reddy commented on KAFKA-1325:


Created reviewboard  against branch origin/trunk

> Fix inconsistent per topic log configs
> --
>
> Key: KAFKA-1325
> URL: https://issues.apache.org/jira/browse/KAFKA-1325
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1
>Reporter: Neha Narkhede
>  Labels: usability
> Attachments: KAFKA-1325.patch
>
>
> Related thread from the user mailing list - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201403.mbox/%3Cd8f6f3857b82c4ccd8725aba6fd68cb8%40cweb01.nmdf.nhnsystem.com%3E
> Our documentation is a little confusing on the log configs. 
> The log property for retention.ms is in millis but the server default it maps 
> to is in minutes.
> Same is true for segment.ms as well. We could either improve the docs or
> change the per-topic configs to be consistent with the server defaults.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1325) Fix inconsistent per topic log configs

2014-07-09 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1325:
---

Attachment: KAFKA-1325.patch

> Fix inconsistent per topic log configs
> --
>
> Key: KAFKA-1325
> URL: https://issues.apache.org/jira/browse/KAFKA-1325
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1
>Reporter: Neha Narkhede
>  Labels: usability
> Attachments: KAFKA-1325.patch
>
>
> Related thread from the user mailing list - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201403.mbox/%3Cd8f6f3857b82c4ccd8725aba6fd68cb8%40cweb01.nmdf.nhnsystem.com%3E
> Our documentation is a little confusing on the log configs. 
> The log property for retention.ms is in millis but the server default it maps 
> to is in minutes.
> Same is true for segment.ms as well. We could either improve the docs or
> change the per-topic configs to be consistent with the server defaults.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1325) Fix inconsistent per topic log configs

2014-07-09 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1325:
---

Attachment: (was: KAFKA-1325.patch)

> Fix inconsistent per topic log configs
> --
>
> Key: KAFKA-1325
> URL: https://issues.apache.org/jira/browse/KAFKA-1325
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1
>Reporter: Neha Narkhede
>  Labels: usability
> Attachments: KAFKA-1325.patch
>
>
> Related thread from the user mailing list - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201403.mbox/%3Cd8f6f3857b82c4ccd8725aba6fd68cb8%40cweb01.nmdf.nhnsystem.com%3E
> Our documentation is a little confusing on the log configs. 
> The log property for retention.ms is in millis but the server default it maps 
> to is in minutes.
> Same is true for segment.ms as well. We could either improve the docs or
> change the per-topic configs to be consistent with the server defaults.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: topic's partition have no leader and isr

2014-07-09 Thread Jun Rao
It's weird that you have replication factor 1, but two  of the partitions
25 and 31 have 2 assigned replicas. What's the command you used for
reassignment?

Thanks,

Jun


On Wed, Jul 9, 2014 at 12:10 AM, 鞠大升  wrote:

> @Jun Rao,   Kafka version: 0.8.1.1
>
> @Guozhang Wang, I can not found the original controller log, but I can give
> the controller log after execute ./bin/kafka-reassign-partitions.sh
> and ./bin/kafka-preferred-replica-election.sh
>
> Now I do not known how to recover leader for partition 25 and 31, any idea?
>
> - controller log for ./bin/kafka-reassign-partitions.sh
> ---
> [2014-07-09 15:01:31,552] DEBUG [PartitionsReassignedListener on 5]:
> Partitions reassigned listener fired for path /admin/reassign_partitions.
> Record partitions to be reassigned
>
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25,"replicas":[6]},{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> (kafka.controller.PartitionsReassignedListener)
> [2014-07-09 15:01:31,579] INFO [AddPartitionsListener on 5]: Add Partition
> triggered
>
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> for path /brokers/topics/org.mobile_nginx
> (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> [2014-07-09 15:01:31,587] DEBUG [PartitionsReassignedListener on 5]:
> Partitions reassigned listener fired for path /admin/reassign_partitions.
> Record partitions to be reassigned
>
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> (kafka.controller.PartitionsReassignedListener)
> [2014-07-09 15:01:31,590] INFO [AddPartitionsListener on 5]: Add Partition
> triggered
>
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> for path /brokers/topics/org.mobile_nginx
> (kafka.controller.PartitionStateMachine$AddPartitionsListener)
>
> - controller log for ./bin/kafka-reassign-partitions.sh
> ---
> [2014-07-09 15:07:02,968] DEBUG [PreferredReplicaElectionListener on 5]:
> Preferred replica election listener fired for path
> /admin/preferred_replica_election. Record partitions to undergo preferred
> replica election
>
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25},{"topic":"org.mobile_nginx","partition":31}]}
> (kafka.controller.PreferredReplicaElectionListener)
> [2014-07-09 15:07:02,969] INFO [Controller 5]: Starting preferred replica
> leader election for partitions [org.mobile_nginx,25],[org.mobile_nginx,31]
> (kafka.controller.KafkaController)
> [2014-07-09 15:07:02,969] INFO [Partition state machine on Controller 5]:
> Invoking state change to OnlinePartition for partitions
> [org.mobile_nginx,25],[org.mobile_nginx,31]
> (kafka.controller.PartitionStateMachine)
> [2014-07-09 15:07:02,972] INFO [PreferredReplicaPartitionLeaderSelector]:
> Current leader -1 for partition [org.mobile_nginx,25] is not the preferred
> replica. Trigerring preferred replica leader election
> (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> [2014-07-09 15:07:02,973] INFO [PreferredReplicaPartitionLeaderSelector]:
> Current leader -1 for partition [org.mobile_nginx,31] is not the preferred
> replica. Trigerring preferred replica leader election
> (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> [2014-07-09 15:07:02,973] WARN [Controller 5]: Partition
> [org.mobile_nginx,25] failed to complete preferred replica leader election.
> Leader is -1 (kafka.controller.KafkaController)
> [2014-07-09 15:07:02,973] WARN [Controller 5]: Partition
> [org.mobile_nginx,31] failed to complete preferred replica leader election.
> Leader is -1 (kafka.controller.KafkaController)
>
>
> On Sun, Jul 6, 2014 at 11:47 PM, Jun Rao  wrote:
>
> > Also, which version of Kafka are you using?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Jul 3, 2014 at 2:26 AM, 鞠大升  wrote:
> >
> > > hi, all
> > >
> > > I have a topic with 32 partitions, after some reassign operation, 2
> > > partitions became to no leader and isr.
> > >
> > >
> >
> ---
> > > Topic:org.mobile_nginx  PartitionCount:32   ReplicationFactor:1
> > > Configs:
> > > Topic: org.mobile_nginx Partition: 0Leader: 3
> > Replicas: 3
> > > Isr: 3
> > > Topic: org.mobile_nginx Partition: 1Leader: 4
> > Replicas: 4
> > > Isr: 4
> > > 

Review Request 23362: Patch for KAFKA-1325

2014-07-09 Thread Manikumar Reddy O

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23362/
---

Review request for kafka.


Bugs: KAFKA-1325
https://issues.apache.org/jira/browse/KAFKA-1325


Repository: kafka


Description
---

KAFKA-1325 Changed per-topic configs segment.ms -> segment.hours, retention.ms 
-> retention.minutes to be consistent with the server defaults


Diffs
-

  core/src/main/scala/kafka/log/LogConfig.scala 
5746ad4767589594f904aa085131dd95e56d72bb 
  core/src/test/scala/unit/kafka/admin/AdminTest.scala 
e28979827110dfbbb92fe5b152e7f1cc973de400 

Diff: https://reviews.apache.org/r/23362/diff/


Testing
---


Thanks,

Manikumar Reddy O



[jira] [Updated] (KAFKA-1325) Fix inconsistent per topic log configs

2014-07-09 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1325:
---

Attachment: KAFKA-1325.patch

> Fix inconsistent per topic log configs
> --
>
> Key: KAFKA-1325
> URL: https://issues.apache.org/jira/browse/KAFKA-1325
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1
>Reporter: Neha Narkhede
>  Labels: usability
> Attachments: KAFKA-1325.patch, KAFKA-1325.patch
>
>
> Related thread from the user mailing list - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201403.mbox/%3Cd8f6f3857b82c4ccd8725aba6fd68cb8%40cweb01.nmdf.nhnsystem.com%3E
> Our documentation is a little confusing on the log configs. 
> The log property for retention.ms is in millis but the server default it maps 
> to is in minutes.
> Same is true for segment.ms as well. We could either improve the docs or
> change the per-topic configs to be consistent with the server defaults.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1406) Fix scaladoc/javadoc warnings

2014-07-09 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056299#comment-14056299
 ] 

Jun Rao commented on KAFKA-1406:


Alan,

That would be great. Thanks for your help.

> Fix scaladoc/javadoc warnings
> -
>
> Key: KAFKA-1406
> URL: https://issues.apache.org/jira/browse/KAFKA-1406
> Project: Kafka
>  Issue Type: Bug
>  Components: packaging
>Reporter: Joel Koshy
>  Labels: build
> Fix For: 0.8.2
>
>
> ./gradlew docsJarAll
> You will see a bunch of warnings mainly due to typos/incorrect use of 
> javadoc/scaladoc



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1325) Fix inconsistent per topic log configs

2014-07-09 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056300#comment-14056300
 ] 

Manikumar Reddy commented on KAFKA-1325:


Created reviewboard https://reviews.apache.org/r/23362/diff/
 against branch origin/trunk

> Fix inconsistent per topic log configs
> --
>
> Key: KAFKA-1325
> URL: https://issues.apache.org/jira/browse/KAFKA-1325
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1
>Reporter: Neha Narkhede
>  Labels: usability
> Attachments: KAFKA-1325.patch, KAFKA-1325.patch
>
>
> Related thread from the user mailing list - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201403.mbox/%3Cd8f6f3857b82c4ccd8725aba6fd68cb8%40cweb01.nmdf.nhnsystem.com%3E
> Our documentation is a little confusing on the log configs. 
> The log property for retention.ms is in millis but the server default it maps 
> to is in minutes.
> Same is true for segment.ms as well. We could either improve the docs or
> change the per-topic configs to be consistent with the server defaults.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1325) Fix inconsistent per topic log configs

2014-07-09 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1325:
---

Attachment: (was: KAFKA-1325.patch)

> Fix inconsistent per topic log configs
> --
>
> Key: KAFKA-1325
> URL: https://issues.apache.org/jira/browse/KAFKA-1325
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1
>Reporter: Neha Narkhede
>  Labels: usability
> Attachments: KAFKA-1325.patch
>
>
> Related thread from the user mailing list - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201403.mbox/%3Cd8f6f3857b82c4ccd8725aba6fd68cb8%40cweb01.nmdf.nhnsystem.com%3E
> Our documentation is a little confusing on the log configs. 
> The log property for retention.ms is in millis but the server default it maps 
> to is in minutes.
> Same is true for segment.ms as well. We could either improve the docs or
> change the per-topic configs to be consistent with the server defaults.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1325) Fix inconsistent per topic log configs

2014-07-09 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056319#comment-14056319
 ] 

Jun Rao commented on KAFKA-1325:


Thanks for the patch. It seems to me that ms granularity is more general. 
Perhaps we can add log.retention.ms and log.roll.ms in the server config and 
leave the per topic config unchanged. This will also make the change backward 
compatible.

> Fix inconsistent per topic log configs
> --
>
> Key: KAFKA-1325
> URL: https://issues.apache.org/jira/browse/KAFKA-1325
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1
>Reporter: Neha Narkhede
>  Labels: usability
> Attachments: KAFKA-1325.patch
>
>
> Related thread from the user mailing list - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201403.mbox/%3Cd8f6f3857b82c4ccd8725aba6fd68cb8%40cweb01.nmdf.nhnsystem.com%3E
> Our documentation is a little confusing on the log configs. 
> The log property for retention.ms is in millis but the server default it maps 
> to is in minutes.
> Same is true for segment.ms as well. We could either improve the docs or
> change the per-topic configs to be consistent with the server defaults.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1528) Normalize all the line endings

2014-07-09 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056323#comment-14056323
 ] 

Jun Rao commented on KAFKA-1528:


Thanks for the patch. It doesn't apply to trunk though. Are there changes other 
than adding the .gitattributes file?

> Normalize all the line endings
> --
>
> Key: KAFKA-1528
> URL: https://issues.apache.org/jira/browse/KAFKA-1528
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Evgeny Vereshchagin
>Priority: Trivial
> Attachments: KAFKA-1528.patch, KAFKA-1528_2014-07-06_16:20:28.patch, 
> KAFKA-1528_2014-07-06_16:23:21.patch
>
>
> Hi!
> I add .gitattributes file and remove all '\r' from some .bat files
> See http://adaptivepatchwork.com/2012/03/01/mind-the-end-of-your-line/ and 
> https://help.github.com/articles/dealing-with-line-endings for explanation.
> Maybe, add https://help.github.com/articles/dealing-with-line-endings to 
> contributing/commiting guide?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1414) Speedup broker startup after hard reset

2014-07-09 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056338#comment-14056338
 ] 

Jun Rao commented on KAFKA-1414:


Thanks for the patch. A few comments.

1. If there are lots of partitions in a broker, having a thread per partition 
may be too much. Perhaps we can add a new config like log.recovery.threads. It 
can default to 1 to preserve the current behavior.

2. If we hit any exception when loading the logs, the current behavior is that 
we will shut down the broker. We need to preserve this behavior when running 
log loading  in separate threads. So, we need a way to propagate the exceptions 
from those threads to the caller in KafkaServer.

> Speedup broker startup after hard reset
> ---
>
> Key: KAFKA-1414
> URL: https://issues.apache.org/jira/browse/KAFKA-1414
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.2, 0.9.0, 0.8.1.1
>Reporter: Dmitry Bugaychenko
>Assignee: Jay Kreps
> Attachments: parallel-dir-loading-0.8.patch, 
> parallel-dir-loading-trunk-threadpool.patch, parallel-dir-loading-trunk.patch
>
>
> After hard reset due to power failure broker takes way too much time 
> recovering unflushed segments in a single thread. This could be easiliy 
> improved launching multiple threads (one per data dirrectory, assuming that 
> typically each data directory is on a dedicated drive). Localy we trie this 
> simple patch to LogManager.loadLogs and it seems to work, however I'm too new 
> to scala, so do not take it literally:
> {code}
>   /**
>* Recover and load all logs in the given data directories
>*/
>   private def loadLogs(dirs: Seq[File]) {
> val threads : Array[Thread] = new Array[Thread](dirs.size)
> var i: Int = 0
> val me = this
> for(dir <- dirs) {
>   val thread = new Thread( new Runnable {
> def run()
> {
>   val recoveryPoints = me.recoveryPointCheckpoints(dir).read
>   /* load the logs */
>   val subDirs = dir.listFiles()
>   if(subDirs != null) {
> val cleanShutDownFile = new File(dir, Log.CleanShutdownFile)
> if(cleanShutDownFile.exists())
>   info("Found clean shutdown file. Skipping recovery for all logs 
> in data directory '%s'".format(dir.getAbsolutePath))
> for(dir <- subDirs) {
>   if(dir.isDirectory) {
> info("Loading log '" + dir.getName + "'")
> val topicPartition = Log.parseTopicPartitionName(dir.getName)
> val config = topicConfigs.getOrElse(topicPartition.topic, 
> defaultConfig)
> val log = new Log(dir,
>   config,
>   recoveryPoints.getOrElse(topicPartition, 0L),
>   scheduler,
>   time)
> val previous = addLogWithLock(topicPartition, log)
> if(previous != null)
>   throw new IllegalArgumentException("Duplicate log 
> directories found: %s, %s!".format(log.dir.getAbsolutePath, 
> previous.dir.getAbsolutePath))
>   }
> }
> cleanShutDownFile.delete()
>   }
> }
>   })
>   thread.start()
>   threads(i) = thread
>   i = i + 1
> }
> for(thread <- threads) {
>   thread.join()
> }
>   }
>   def addLogWithLock(topicPartition: TopicAndPartition, log: Log): Log = {
> logCreationOrDeletionLock synchronized {
>   this.logs.put(topicPartition, log)
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1406) Fix scaladoc/javadoc warnings

2014-07-09 Thread Alan Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Lee updated KAFKA-1406:


Attachment: kafka-1406-v1.patch

I've attached a patch file for the fix.
There are still few (non variable type-argument ...) warnings during scaladoc 
build but it does not seem to be related to documentation.

> Fix scaladoc/javadoc warnings
> -
>
> Key: KAFKA-1406
> URL: https://issues.apache.org/jira/browse/KAFKA-1406
> Project: Kafka
>  Issue Type: Bug
>  Components: packaging
>Reporter: Joel Koshy
>  Labels: build
> Fix For: 0.8.2
>
> Attachments: kafka-1406-v1.patch
>
>
> ./gradlew docsJarAll
> You will see a bunch of warnings mainly due to typos/incorrect use of 
> javadoc/scaladoc



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1530) howto update continuously

2014-07-09 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056339#comment-14056339
 ] 

Guozhang Wang commented on KAFKA-1530:
--

Not sure I understand your question, what do you mean by "we can't guarantee we 
would stop and restart nodes properly and in the right order "?

> howto update continuously
> -
>
> Key: KAFKA-1530
> URL: https://issues.apache.org/jira/browse/KAFKA-1530
> Project: Kafka
>  Issue Type: Wish
>Reporter: Stanislav Gilmulin
>Priority: Minor
>  Labels: operating_manual, performance
>
> Hi,
>  
> Could I ask you a question about the Kafka update procedure?
> Is there a way to update software, which doesn't require service interruption 
> or lead to data losses?
> We can't stop message brokering during the update as we have a strict SLA.
>  
> Best regards
> Stanislav Gilmulin



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: topic's partition have no leader and isr

2014-07-09 Thread Guozhang Wang
It seems your broker 1 is in a bad state, besides these two partitions you
also have partition 10 whose Isr/Leader is not part of the replicas list:

Topic: org.mobile_nginx Partition: 10   Leader: 2   Replicas: 1
Isr: 2

Maybe you can go to broker 1 and check its logs first.

Guozhang


On Wed, Jul 9, 2014 at 7:32 AM, Jun Rao  wrote:

> It's weird that you have replication factor 1, but two  of the partitions
> 25 and 31 have 2 assigned replicas. What's the command you used for
> reassignment?
>
> Thanks,
>
> Jun
>
>
> On Wed, Jul 9, 2014 at 12:10 AM, 鞠大升  wrote:
>
> > @Jun Rao,   Kafka version: 0.8.1.1
> >
> > @Guozhang Wang, I can not found the original controller log, but I can
> give
> > the controller log after execute ./bin/kafka-reassign-partitions.sh
> > and ./bin/kafka-preferred-replica-election.sh
> >
> > Now I do not known how to recover leader for partition 25 and 31, any
> idea?
> >
> > - controller log for ./bin/kafka-reassign-partitions.sh
> > ---
> > [2014-07-09 15:01:31,552] DEBUG [PartitionsReassignedListener on 5]:
> > Partitions reassigned listener fired for path /admin/reassign_partitions.
> > Record partitions to be reassigned
> >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25,"replicas":[6]},{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> > (kafka.controller.PartitionsReassignedListener)
> > [2014-07-09 15:01:31,579] INFO [AddPartitionsListener on 5]: Add
> Partition
> > triggered
> >
> >
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> > for path /brokers/topics/org.mobile_nginx
> > (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> > [2014-07-09 15:01:31,587] DEBUG [PartitionsReassignedListener on 5]:
> > Partitions reassigned listener fired for path /admin/reassign_partitions.
> > Record partitions to be reassigned
> >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> > (kafka.controller.PartitionsReassignedListener)
> > [2014-07-09 15:01:31,590] INFO [AddPartitionsListener on 5]: Add
> Partition
> > triggered
> >
> >
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> > for path /brokers/topics/org.mobile_nginx
> > (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> >
> > - controller log for ./bin/kafka-reassign-partitions.sh
> > ---
> > [2014-07-09 15:07:02,968] DEBUG [PreferredReplicaElectionListener on 5]:
> > Preferred replica election listener fired for path
> > /admin/preferred_replica_election. Record partitions to undergo preferred
> > replica election
> >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25},{"topic":"org.mobile_nginx","partition":31}]}
> > (kafka.controller.PreferredReplicaElectionListener)
> > [2014-07-09 15:07:02,969] INFO [Controller 5]: Starting preferred replica
> > leader election for partitions
> [org.mobile_nginx,25],[org.mobile_nginx,31]
> > (kafka.controller.KafkaController)
> > [2014-07-09 15:07:02,969] INFO [Partition state machine on Controller 5]:
> > Invoking state change to OnlinePartition for partitions
> > [org.mobile_nginx,25],[org.mobile_nginx,31]
> > (kafka.controller.PartitionStateMachine)
> > [2014-07-09 15:07:02,972] INFO [PreferredReplicaPartitionLeaderSelector]:
> > Current leader -1 for partition [org.mobile_nginx,25] is not the
> preferred
> > replica. Trigerring preferred replica leader election
> > (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> > [2014-07-09 15:07:02,973] INFO [PreferredReplicaPartitionLeaderSelector]:
> > Current leader -1 for partition [org.mobile_nginx,31] is not the
> preferred
> > replica. Trigerring preferred replica leader election
> > (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> > [2014-07-09 15:07:02,973] WARN [Controller 5]: Partition
> > [org.mobile_nginx,25] failed to complete preferred replica leader
> election.
> > Leader is -1 (kafka.controller.KafkaController)
> > [2014-07-09 15:07:02,973] WARN [Controller 5]: Partition
> > [org.mobile_nginx,31] failed to complete preferred replica leader
> election.
> > Leader is -1 (kafka.controller.KafkaController)
> >
> >
> > On Sun, Jul 6, 2014 at 11:47 PM, Jun Rao  wrote:
> >
> > > Also, which version of Kafka are you using?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Thu, Jul 3, 2014 at 2:26 AM, 鞠大升  wrote:
> > >
> > > > hi, all
> > > >

[jira] [Commented] (KAFKA-1414) Speedup broker startup after hard reset

2014-07-09 Thread Alexey Ozeritskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056358#comment-14056358
 ] 

Alexey Ozeritskiy commented on KAFKA-1414:
--

1. Ok
2. If any thread fails we get ExecutionException at
{code}
jobs.foreach(_.get())
{code}
We can get the original exception by calling e.getCause() and rethrow it. Is 
this ok ?


> Speedup broker startup after hard reset
> ---
>
> Key: KAFKA-1414
> URL: https://issues.apache.org/jira/browse/KAFKA-1414
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.2, 0.9.0, 0.8.1.1
>Reporter: Dmitry Bugaychenko
>Assignee: Jay Kreps
> Attachments: parallel-dir-loading-0.8.patch, 
> parallel-dir-loading-trunk-threadpool.patch, parallel-dir-loading-trunk.patch
>
>
> After hard reset due to power failure broker takes way too much time 
> recovering unflushed segments in a single thread. This could be easiliy 
> improved launching multiple threads (one per data dirrectory, assuming that 
> typically each data directory is on a dedicated drive). Localy we trie this 
> simple patch to LogManager.loadLogs and it seems to work, however I'm too new 
> to scala, so do not take it literally:
> {code}
>   /**
>* Recover and load all logs in the given data directories
>*/
>   private def loadLogs(dirs: Seq[File]) {
> val threads : Array[Thread] = new Array[Thread](dirs.size)
> var i: Int = 0
> val me = this
> for(dir <- dirs) {
>   val thread = new Thread( new Runnable {
> def run()
> {
>   val recoveryPoints = me.recoveryPointCheckpoints(dir).read
>   /* load the logs */
>   val subDirs = dir.listFiles()
>   if(subDirs != null) {
> val cleanShutDownFile = new File(dir, Log.CleanShutdownFile)
> if(cleanShutDownFile.exists())
>   info("Found clean shutdown file. Skipping recovery for all logs 
> in data directory '%s'".format(dir.getAbsolutePath))
> for(dir <- subDirs) {
>   if(dir.isDirectory) {
> info("Loading log '" + dir.getName + "'")
> val topicPartition = Log.parseTopicPartitionName(dir.getName)
> val config = topicConfigs.getOrElse(topicPartition.topic, 
> defaultConfig)
> val log = new Log(dir,
>   config,
>   recoveryPoints.getOrElse(topicPartition, 0L),
>   scheduler,
>   time)
> val previous = addLogWithLock(topicPartition, log)
> if(previous != null)
>   throw new IllegalArgumentException("Duplicate log 
> directories found: %s, %s!".format(log.dir.getAbsolutePath, 
> previous.dir.getAbsolutePath))
>   }
> }
> cleanShutDownFile.delete()
>   }
> }
>   })
>   thread.start()
>   threads(i) = thread
>   i = i + 1
> }
> for(thread <- threads) {
>   thread.join()
> }
>   }
>   def addLogWithLock(topicPartition: TopicAndPartition, log: Log): Log = {
> logCreationOrDeletionLock synchronized {
>   this.logs.put(topicPartition, log)
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 23363: Patch for KAFKA-1325

2014-07-09 Thread Manikumar Reddy O

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23363/
---

Review request for kafka.


Bugs: KAFKA-1325
https://issues.apache.org/jira/browse/KAFKA-1325


Repository: kafka


Description
---

Added log.retention.ms and log.roll.ms in the server config. If both 
log.retention.ms and log.retention.minutes are given then log.retention.ms will 
be given priority. If both log.roll.ms and log.roll.hours are given then
log.roll.ms property will be given prioroty.


Diffs
-

  core/src/main/scala/kafka/server/KafkaConfig.scala 
ef75b67b67676ae5b8931902cbc8c0c2cc72c0d3 
  core/src/main/scala/kafka/server/KafkaServer.scala 
c22e51e0412843ec993721ad3230824c0aadd2ba 
  core/src/test/scala/unit/kafka/server/KafkaConfigTest.scala 
6f4809da9968de293f365307ffd9cfe1d5c34ce0 

Diff: https://reviews.apache.org/r/23363/diff/


Testing
---


Thanks,

Manikumar Reddy O



[jira] [Updated] (KAFKA-1325) Fix inconsistent per topic log configs

2014-07-09 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1325:
---

Attachment: KAFKA-1325.patch

> Fix inconsistent per topic log configs
> --
>
> Key: KAFKA-1325
> URL: https://issues.apache.org/jira/browse/KAFKA-1325
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1
>Reporter: Neha Narkhede
>  Labels: usability
> Attachments: KAFKA-1325.patch, KAFKA-1325.patch
>
>
> Related thread from the user mailing list - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201403.mbox/%3Cd8f6f3857b82c4ccd8725aba6fd68cb8%40cweb01.nmdf.nhnsystem.com%3E
> Our documentation is a little confusing on the log configs. 
> The log property for retention.ms is in millis but the server default it maps 
> to is in minutes.
> Same is true for segment.ms as well. We could either improve the docs or
> change the per-topic configs to be consistent with the server defaults.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1325) Fix inconsistent per topic log configs

2014-07-09 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056402#comment-14056402
 ] 

Manikumar Reddy commented on KAFKA-1325:


Created reviewboard https://reviews.apache.org/r/23363/diff/
 against branch origin/trunk

> Fix inconsistent per topic log configs
> --
>
> Key: KAFKA-1325
> URL: https://issues.apache.org/jira/browse/KAFKA-1325
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1
>Reporter: Neha Narkhede
>  Labels: usability
> Attachments: KAFKA-1325.patch, KAFKA-1325.patch
>
>
> Related thread from the user mailing list - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201403.mbox/%3Cd8f6f3857b82c4ccd8725aba6fd68cb8%40cweb01.nmdf.nhnsystem.com%3E
> Our documentation is a little confusing on the log configs. 
> The log property for retention.ms is in millis but the server default it maps 
> to is in minutes.
> Same is true for segment.ms as well. We could either improve the docs or
> change the per-topic configs to be consistent with the server defaults.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1325) Fix inconsistent per topic log configs

2014-07-09 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056410#comment-14056410
 ] 

Manikumar Reddy commented on KAFKA-1325:


I added log.retention.ms and log.roll.ms in the server config.  
Among log.retention.ms, log.retention.minutes and log.retention.hours,  
log.retention.ms will be taken.
Between log.roll.ms and log.roll.hours,  log.roll.ms property will be taken.


> Fix inconsistent per topic log configs
> --
>
> Key: KAFKA-1325
> URL: https://issues.apache.org/jira/browse/KAFKA-1325
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1
>Reporter: Neha Narkhede
>  Labels: usability
> Attachments: KAFKA-1325.patch, KAFKA-1325.patch
>
>
> Related thread from the user mailing list - 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201403.mbox/%3Cd8f6f3857b82c4ccd8725aba6fd68cb8%40cweb01.nmdf.nhnsystem.com%3E
> Our documentation is a little confusing on the log configs. 
> The log property for retention.ms is in millis but the server default it maps 
> to is in minutes.
> Same is true for segment.ms as well. We could either improve the docs or
> change the per-topic configs to be consistent with the server defaults.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1530) howto update continuously

2014-07-09 Thread Stanislav Gilmulin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056411#comment-14056411
 ] 

Stanislav Gilmulin commented on KAFKA-1530:
---

It means we have the risks.
You're right. My question wansn't clear. Let me try to explain.

First of all, accordint to business requirements we can't stop the service. So 
we can't stop all nodes before updating.
And, as you've advised, our option would be updating step by step.  
But when we update without using the right procedure, we could lose an unknown 
amount of messages in the example case presented below. 

Let's consider this case for a example.
We have 3 replicas of one partition with 2 of them lagging behind. Then we 
restart the leader. At that very moment one of the two lagging partitions 
become a new leader. After that, when the used-to-be-leader partiton starts 
working again (and which in fact has the newest data), it truncates all the 
newest data to match with now elected leader.
This situation happens quite often when we restart a highly loaded Kafka 
cluster, so that we loose some part of our data.



> howto update continuously
> -
>
> Key: KAFKA-1530
> URL: https://issues.apache.org/jira/browse/KAFKA-1530
> Project: Kafka
>  Issue Type: Wish
>Reporter: Stanislav Gilmulin
>Priority: Minor
>  Labels: operating_manual, performance
>
> Hi,
>  
> Could I ask you a question about the Kafka update procedure?
> Is there a way to update software, which doesn't require service interruption 
> or lead to data losses?
> We can't stop message brokering during the update as we have a strict SLA.
>  
> Best regards
> Stanislav Gilmulin



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1414) Speedup broker startup after hard reset

2014-07-09 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056417#comment-14056417
 ] 

Jun Rao commented on KAFKA-1414:


2. Yes, that probably right. Could you verify that the original exception is 
indeed preserved? Also, should we use newFixedThreadPool instead of 
newScheduledThreadPool since we want to run those tasks immediately?

> Speedup broker startup after hard reset
> ---
>
> Key: KAFKA-1414
> URL: https://issues.apache.org/jira/browse/KAFKA-1414
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.2, 0.9.0, 0.8.1.1
>Reporter: Dmitry Bugaychenko
>Assignee: Jay Kreps
> Attachments: parallel-dir-loading-0.8.patch, 
> parallel-dir-loading-trunk-threadpool.patch, parallel-dir-loading-trunk.patch
>
>
> After hard reset due to power failure broker takes way too much time 
> recovering unflushed segments in a single thread. This could be easiliy 
> improved launching multiple threads (one per data dirrectory, assuming that 
> typically each data directory is on a dedicated drive). Localy we trie this 
> simple patch to LogManager.loadLogs and it seems to work, however I'm too new 
> to scala, so do not take it literally:
> {code}
>   /**
>* Recover and load all logs in the given data directories
>*/
>   private def loadLogs(dirs: Seq[File]) {
> val threads : Array[Thread] = new Array[Thread](dirs.size)
> var i: Int = 0
> val me = this
> for(dir <- dirs) {
>   val thread = new Thread( new Runnable {
> def run()
> {
>   val recoveryPoints = me.recoveryPointCheckpoints(dir).read
>   /* load the logs */
>   val subDirs = dir.listFiles()
>   if(subDirs != null) {
> val cleanShutDownFile = new File(dir, Log.CleanShutdownFile)
> if(cleanShutDownFile.exists())
>   info("Found clean shutdown file. Skipping recovery for all logs 
> in data directory '%s'".format(dir.getAbsolutePath))
> for(dir <- subDirs) {
>   if(dir.isDirectory) {
> info("Loading log '" + dir.getName + "'")
> val topicPartition = Log.parseTopicPartitionName(dir.getName)
> val config = topicConfigs.getOrElse(topicPartition.topic, 
> defaultConfig)
> val log = new Log(dir,
>   config,
>   recoveryPoints.getOrElse(topicPartition, 0L),
>   scheduler,
>   time)
> val previous = addLogWithLock(topicPartition, log)
> if(previous != null)
>   throw new IllegalArgumentException("Duplicate log 
> directories found: %s, %s!".format(log.dir.getAbsolutePath, 
> previous.dir.getAbsolutePath))
>   }
> }
> cleanShutDownFile.delete()
>   }
> }
>   })
>   thread.start()
>   threads(i) = thread
>   i = i + 1
> }
> for(thread <- threads) {
>   thread.join()
> }
>   }
>   def addLogWithLock(topicPartition: TopicAndPartition, log: Log): Log = {
> logCreationOrDeletionLock synchronized {
>   this.logs.put(topicPartition, log)
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1530) howto update continuously

2014-07-09 Thread Oleg (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056422#comment-14056422
 ] 

Oleg commented on KAFKA-1530:
-

Another situation which happened in our production:
We have replication level 3. One out of 3 partition started lagging behind (due 
to network connectivity problems, etc.). Then while upgrading/restarting Kafka 
we restart the whole cluster. After upgrade Kafka starts electing leaders for 
each partition. It's highly likely it may elect the lagging behind partition as 
a leader. Which in result leads to truncating two other partitions. In this 
case we loose data.

So we are seeking a means of restarting/upgrading Kafka without data loose.

> howto update continuously
> -
>
> Key: KAFKA-1530
> URL: https://issues.apache.org/jira/browse/KAFKA-1530
> Project: Kafka
>  Issue Type: Wish
>Reporter: Stanislav Gilmulin
>Priority: Minor
>  Labels: operating_manual, performance
>
> Hi,
>  
> Could I ask you a question about the Kafka update procedure?
> Is there a way to update software, which doesn't require service interruption 
> or lead to data losses?
> We can't stop message brokering during the update as we have a strict SLA.
>  
> Best regards
> Stanislav Gilmulin



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1507) Using GetOffsetShell against non-existent topic creates the topic unintentionally

2014-07-09 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056424#comment-14056424
 ] 

Jun Rao commented on KAFKA-1507:


Thanks for the patch. I am currently working on KAFKA-1462, which I hope will 
standardize how we support multiple versions of a request on the server. 
Perhaps, you can wait until KAFKA-1462 is done, hopefully in a week or so.

> Using GetOffsetShell against non-existent topic creates the topic 
> unintentionally
> -
>
> Key: KAFKA-1507
> URL: https://issues.apache.org/jira/browse/KAFKA-1507
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1.1
> Environment: centos
>Reporter: Luke Forehand
>Assignee: Sriharsha Chintalapani
>Priority: Minor
>  Labels: newbie
> Attachments: KAFKA-1507.patch
>
>
> A typo in using GetOffsetShell command can cause a
> topic to be created which cannot be deleted (because deletion is still in
> progress)
> ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list
> kafka10:9092,kafka11:9092,kafka12:9092,kafka13:9092 --topic typo --time 1
> ./kafka-topics.sh --zookeeper stormqa1/kafka-prod --describe --topic typo
> Topic:typo  PartitionCount:8ReplicationFactor:1 Configs:
>  Topic: typo Partition: 0Leader: 10  Replicas: 10
>   Isr: 10
> ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (KAFKA-1530) howto update continuously

2014-07-09 Thread Oleg (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056422#comment-14056422
 ] 

Oleg edited comment on KAFKA-1530 at 7/9/14 4:33 PM:
-

Another situation which happened in our production:
We have replication level 3. One out of 3 partition started lagging behind (due 
to network connectivity problems, etc.). Then while upgrading/restarting Kafka 
we restart the whole cluster. After upgrade the first node to start becomes a 
leader. It's highly likely it may be the lagging behind partition. Which in 
result leads to truncating two other partitions. In this case we loose data.

So we are seeking a mean of restarting/upgrading Kafka without data loose.


was (Author: ovgolovin):
Another situation which happened in our production:
We have replication level 3. One out of 3 partition started lagging behind (due 
to network connectivity problems, etc.). Then while upgrading/restarting Kafka 
we restart the whole cluster. After upgrade Kafka starts electing leaders for 
each partition. It's highly likely it may elect the lagging behind partition as 
a leader. Which in result leads to truncating two other partitions. In this 
case we loose data.

So we are seeking a means of restarting/upgrading Kafka without data loose.

> howto update continuously
> -
>
> Key: KAFKA-1530
> URL: https://issues.apache.org/jira/browse/KAFKA-1530
> Project: Kafka
>  Issue Type: Wish
>Reporter: Stanislav Gilmulin
>Priority: Minor
>  Labels: operating_manual, performance
>
> Hi,
>  
> Could I ask you a question about the Kafka update procedure?
> Is there a way to update software, which doesn't require service interruption 
> or lead to data losses?
> We can't stop message brokering during the update as we have a strict SLA.
>  
> Best regards
> Stanislav Gilmulin



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (KAFKA-1515) Wake-up Sender upon blocked on fetching leader metadata

2014-07-09 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps resolved KAFKA-1515.
--

Resolution: Fixed

> Wake-up Sender upon blocked on fetching leader metadata
> ---
>
> Key: KAFKA-1515
> URL: https://issues.apache.org/jira/browse/KAFKA-1515
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 0.9.0
>
> Attachments: KAFKA-1515_2014-07-03_10:19:28.patch, 
> KAFKA-1515_2014-07-03_16:43:05.patch, KAFKA-1515_2014-07-07_10:55:58.patch, 
> KAFKA-1515_2014-07-08_11:35:59.patch
>
>
> Currently the new KafkaProducer will not wake up the sender thread upon 
> forcing metadata fetch, and hence if the sender is polling with a long 
> timeout (e.g. the metadata.age period) this wait will usually timeout and 
> fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1019) kafka-preferred-replica-election.sh will fail without clear error message if /brokers/topics/[topic]/partitions does not exist

2014-07-09 Thread Mickael Hemri (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056440#comment-14056440
 ] 

Mickael Hemri commented on KAFKA-1019:
--

Hi Jun,

We use ZK 3.4.6, can it cause this issue?
How can we check that the topic watcher is registered by the controller?

Thanks

> kafka-preferred-replica-election.sh will fail without clear error message if 
> /brokers/topics/[topic]/partitions does not exist
> --
>
> Key: KAFKA-1019
> URL: https://issues.apache.org/jira/browse/KAFKA-1019
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Guozhang Wang
>  Labels: newbie
> Fix For: 0.8.2
>
>
> From Libo Yu:
> I tried to run kafka-preferred-replica-election.sh on our kafka cluster.
> But I got this expection:
> Failed to start preferred replica election
> org.I0Itec.zkclient.exception.ZkNoNodeException: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /brokers/topics/uattoqaaa.default/partitions
> I checked zookeeper and there is no 
> /brokers/topics/uattoqaaa.default/partitions. All I found is
> /brokers/topics/uattoqaaa.default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1532) Move CRC32 to AFTER the payload

2014-07-09 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056441#comment-14056441
 ] 

Jay Kreps commented on KAFKA-1532:
--

This is a good idea, but unfortunately would break compatibility with existing 
clients. It would be good to save up enough of these fixes and do them all at 
once to minimize breakage.

> Move CRC32 to AFTER the payload
> ---
>
> Key: KAFKA-1532
> URL: https://issues.apache.org/jira/browse/KAFKA-1532
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, producer 
>Reporter: Julian Morrison
>Assignee: Jun Rao
>Priority: Minor
>
> To support streaming a message of known length but unknown content, take the 
> CRC32 out of the message header and make it a message trailer. Then client 
> libraries can calculate it after streaming the message to Kafka, without 
> materializing the whole message in RAM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1530) howto update continuously

2014-07-09 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056448#comment-14056448
 ] 

Guozhang Wang commented on KAFKA-1530:
--

Hi Stanislav/Oleg,

Kafka server has a config called "controlled.shutdown.enable", and when it is 
turned on, the shutting down process will first wait for all the leaders of the 
current shutting down node to migrate to other nodes before shutting down the 
server (http://kafka.apache.org/documentation.html#brokerconfigs).

For your first case, where the shutting down node is the only replica in ISR, 
the shutting down process will block until there are other nodes back in ISR 
and hence can take the partitions; for your second case where there are more 
than one node in ISR, then it is guaranteed that the leaders of the shutting 
down nodes will be moved to another ISR node.

> howto update continuously
> -
>
> Key: KAFKA-1530
> URL: https://issues.apache.org/jira/browse/KAFKA-1530
> Project: Kafka
>  Issue Type: Wish
>Reporter: Stanislav Gilmulin
>Priority: Minor
>  Labels: operating_manual, performance
>
> Hi,
>  
> Could I ask you a question about the Kafka update procedure?
> Is there a way to update software, which doesn't require service interruption 
> or lead to data losses?
> We can't stop message brokering during the update as we have a strict SLA.
>  
> Best regards
> Stanislav Gilmulin



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1532) Move CRC32 to AFTER the payload

2014-07-09 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056444#comment-14056444
 ] 

Jay Kreps commented on KAFKA-1532:
--

Also since Kafka itself fully materializes messages in memory, fixing this 
won't actually let clients send arbitrarily large messages, as the broker will 
still choke.

> Move CRC32 to AFTER the payload
> ---
>
> Key: KAFKA-1532
> URL: https://issues.apache.org/jira/browse/KAFKA-1532
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, producer 
>Reporter: Julian Morrison
>Assignee: Jun Rao
>Priority: Minor
>
> To support streaming a message of known length but unknown content, take the 
> CRC32 out of the message header and make it a message trailer. Then client 
> libraries can calculate it after streaming the message to Kafka, without 
> materializing the whole message in RAM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1019) kafka-preferred-replica-election.sh will fail without clear error message if /brokers/topics/[topic]/partitions does not exist

2014-07-09 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056456#comment-14056456
 ] 

Jun Rao commented on KAFKA-1019:


I am not sure how reliable ZK 3.4.6 is. If you check the ZK admin page, it 
shows you the command of listing all watchers registered. You want to check if 
there is a child watcher on /brokers/topics.

Another thing is to avoid ZK session expiration, since it may expose some 
corner case bugs.

> kafka-preferred-replica-election.sh will fail without clear error message if 
> /brokers/topics/[topic]/partitions does not exist
> --
>
> Key: KAFKA-1019
> URL: https://issues.apache.org/jira/browse/KAFKA-1019
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Guozhang Wang
>  Labels: newbie
> Fix For: 0.8.2
>
>
> From Libo Yu:
> I tried to run kafka-preferred-replica-election.sh on our kafka cluster.
> But I got this expection:
> Failed to start preferred replica election
> org.I0Itec.zkclient.exception.ZkNoNodeException: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /brokers/topics/uattoqaaa.default/partitions
> I checked zookeeper and there is no 
> /brokers/topics/uattoqaaa.default/partitions. All I found is
> /brokers/topics/uattoqaaa.default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1528) Normalize all the line endings

2014-07-09 Thread Evgeny Vereshchagin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056521#comment-14056521
 ] 

Evgeny Vereshchagin commented on KAFKA-1528:


[~junrao], what command do you run to apply patch?

git apply --stat xyz-v1.patch ?

I want to reproduce it. 
Looks like I send invalid patch.

> Normalize all the line endings
> --
>
> Key: KAFKA-1528
> URL: https://issues.apache.org/jira/browse/KAFKA-1528
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Evgeny Vereshchagin
>Priority: Trivial
> Attachments: KAFKA-1528.patch, KAFKA-1528_2014-07-06_16:20:28.patch, 
> KAFKA-1528_2014-07-06_16:23:21.patch
>
>
> Hi!
> I add .gitattributes file and remove all '\r' from some .bat files
> See http://adaptivepatchwork.com/2012/03/01/mind-the-end-of-your-line/ and 
> https://help.github.com/articles/dealing-with-line-endings for explanation.
> Maybe, add https://help.github.com/articles/dealing-with-line-endings to 
> contributing/commiting guide?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: topic's partition have no leader and isr

2014-07-09 Thread DashengJu
that's because the topic have replication factor 2 at first, then I want to
reassign all partitions 2 replicas to 1 replicas, so all partitions changed
success except 25 and 31.

I think before the reassign operation partitions 25 and 31 became to have
no leader, cause the reassign operation failed.
2014年7月9日 PM10:33于 "Jun Rao" 写道:

> It's weird that you have replication factor 1, but two  of the partitions
> 25 and 31 have 2 assigned replicas. What's the command you used for
> reassignment?
>
> Thanks,
>
> Jun
>
>
> On Wed, Jul 9, 2014 at 12:10 AM, 鞠大升  wrote:
>
> > @Jun Rao,   Kafka version: 0.8.1.1
> >
> > @Guozhang Wang, I can not found the original controller log, but I can
> give
> > the controller log after execute ./bin/kafka-reassign-partitions.sh
> > and ./bin/kafka-preferred-replica-election.sh
> >
> > Now I do not known how to recover leader for partition 25 and 31, any
> idea?
> >
> > - controller log for ./bin/kafka-reassign-partitions.sh
> > ---
> > [2014-07-09 15:01:31,552] DEBUG [PartitionsReassignedListener on 5]:
> > Partitions reassigned listener fired for path /admin/reassign_partitions.
> > Record partitions to be reassigned
> >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25,"replicas":[6]},{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> > (kafka.controller.PartitionsReassignedListener)
> > [2014-07-09 15:01:31,579] INFO [AddPartitionsListener on 5]: Add
> Partition
> > triggered
> >
> >
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> > for path /brokers/topics/org.mobile_nginx
> > (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> > [2014-07-09 15:01:31,587] DEBUG [PartitionsReassignedListener on 5]:
> > Partitions reassigned listener fired for path /admin/reassign_partitions.
> > Record partitions to be reassigned
> >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> > (kafka.controller.PartitionsReassignedListener)
> > [2014-07-09 15:01:31,590] INFO [AddPartitionsListener on 5]: Add
> Partition
> > triggered
> >
> >
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> > for path /brokers/topics/org.mobile_nginx
> > (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> >
> > - controller log for ./bin/kafka-reassign-partitions.sh
> > ---
> > [2014-07-09 15:07:02,968] DEBUG [PreferredReplicaElectionListener on 5]:
> > Preferred replica election listener fired for path
> > /admin/preferred_replica_election. Record partitions to undergo preferred
> > replica election
> >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25},{"topic":"org.mobile_nginx","partition":31}]}
> > (kafka.controller.PreferredReplicaElectionListener)
> > [2014-07-09 15:07:02,969] INFO [Controller 5]: Starting preferred replica
> > leader election for partitions
> [org.mobile_nginx,25],[org.mobile_nginx,31]
> > (kafka.controller.KafkaController)
> > [2014-07-09 15:07:02,969] INFO [Partition state machine on Controller 5]:
> > Invoking state change to OnlinePartition for partitions
> > [org.mobile_nginx,25],[org.mobile_nginx,31]
> > (kafka.controller.PartitionStateMachine)
> > [2014-07-09 15:07:02,972] INFO [PreferredReplicaPartitionLeaderSelector]:
> > Current leader -1 for partition [org.mobile_nginx,25] is not the
> preferred
> > replica. Trigerring preferred replica leader election
> > (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> > [2014-07-09 15:07:02,973] INFO [PreferredReplicaPartitionLeaderSelector]:
> > Current leader -1 for partition [org.mobile_nginx,31] is not the
> preferred
> > replica. Trigerring preferred replica leader election
> > (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> > [2014-07-09 15:07:02,973] WARN [Controller 5]: Partition
> > [org.mobile_nginx,25] failed to complete preferred replica leader
> election.
> > Leader is -1 (kafka.controller.KafkaController)
> > [2014-07-09 15:07:02,973] WARN [Controller 5]: Partition
> > [org.mobile_nginx,31] failed to complete preferred replica leader
> election.
> > Leader is -1 (kafka.controller.KafkaController)
> >
> >
> > On Sun, Jul 6, 2014 at 11:47 PM, Jun Rao  wrote:
> >
> > > Also, which version of Kafka are you using?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Thu, Jul 3, 2014 at 2:26 AM, 鞠大升  wrote:
> > >
> > > > hi, all
> > > >
> > > >

Re: topic's partition have no leader and isr

2014-07-09 Thread DashengJu
thanks for your found, it is weird.

i checked my broker 1, it is working fine, no error or warn log.

I checked the data folder, and found replication 10's replicas, isr and
leader is actually on broker 2, works fine.
just now I execute a reassignment operation to move partition 10's replicas
to broker 2, the controller log says"Partition [org.mobile_nginx,10] to be
reassigned is already assigned to replicas 2. ignoring request for
partition reassignment". then describe the topic shows partition10's
replicas became to 2.
2014年7月9日 PM11:31于 "Guozhang Wang" 写道:

> It seems your broker 1 is in a bad state, besides these two partitions you
> also have partition 10 whose Isr/Leader is not part of the replicas list:
>
> Topic: org.mobile_nginx Partition: 10   Leader: 2   Replicas: 1
> Isr: 2
>
> Maybe you can go to broker 1 and check its logs first.
>
> Guozhang
>
>
> On Wed, Jul 9, 2014 at 7:32 AM, Jun Rao  wrote:
>
> > It's weird that you have replication factor 1, but two  of the partitions
> > 25 and 31 have 2 assigned replicas. What's the command you used for
> > reassignment?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jul 9, 2014 at 12:10 AM, 鞠大升  wrote:
> >
> > > @Jun Rao,   Kafka version: 0.8.1.1
> > >
> > > @Guozhang Wang, I can not found the original controller log, but I can
> > give
> > > the controller log after execute ./bin/kafka-reassign-partitions.sh
> > > and ./bin/kafka-preferred-replica-election.sh
> > >
> > > Now I do not known how to recover leader for partition 25 and 31, any
> > idea?
> > >
> > > - controller log for ./bin/kafka-reassign-partitions.sh
> > > ---
> > > [2014-07-09 15:01:31,552] DEBUG [PartitionsReassignedListener on 5]:
> > > Partitions reassigned listener fired for path
> /admin/reassign_partitions.
> > > Record partitions to be reassigned
> > >
> > >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25,"replicas":[6]},{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> > > (kafka.controller.PartitionsReassignedListener)
> > > [2014-07-09 15:01:31,579] INFO [AddPartitionsListener on 5]: Add
> > Partition
> > > triggered
> > >
> > >
> >
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> > > for path /brokers/topics/org.mobile_nginx
> > > (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> > > [2014-07-09 15:01:31,587] DEBUG [PartitionsReassignedListener on 5]:
> > > Partitions reassigned listener fired for path
> /admin/reassign_partitions.
> > > Record partitions to be reassigned
> > >
> > >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> > > (kafka.controller.PartitionsReassignedListener)
> > > [2014-07-09 15:01:31,590] INFO [AddPartitionsListener on 5]: Add
> > Partition
> > > triggered
> > >
> > >
> >
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> > > for path /brokers/topics/org.mobile_nginx
> > > (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> > >
> > > - controller log for ./bin/kafka-reassign-partitions.sh
> > > ---
> > > [2014-07-09 15:07:02,968] DEBUG [PreferredReplicaElectionListener on
> 5]:
> > > Preferred replica election listener fired for path
> > > /admin/preferred_replica_election. Record partitions to undergo
> preferred
> > > replica election
> > >
> > >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25},{"topic":"org.mobile_nginx","partition":31}]}
> > > (kafka.controller.PreferredReplicaElectionListener)
> > > [2014-07-09 15:07:02,969] INFO [Controller 5]: Starting preferred
> replica
> > > leader election for partitions
> > [org.mobile_nginx,25],[org.mobile_nginx,31]
> > > (kafka.controller.KafkaController)
> > > [2014-07-09 15:07:02,969] INFO [Partition state machine on Controller
> 5]:
> > > Invoking state change to OnlinePartition for partitions
> > > [org.mobile_nginx,25],[org.mobile_nginx,31]
> > > (kafka.controller.PartitionStateMachine)
> > > [2014-07-09 15:07:02,972] INFO
> [PreferredReplicaPartitionLeaderSelector]:
> > > Current leader -1 for partition [org.mobile_nginx,25] is not the
> > preferred
> > > replica. Trigerring preferred replica leader election
> > > (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> > > [2014-07-09 15:07:02,973] INFO
> [PreferredReplicaPartitionLeaderSelector]:
> > > Current leader -1 for partition [org.mobile_nginx,31] is no

[jira] [Commented] (KAFKA-1528) Normalize all the line endings

2014-07-09 Thread Evgeny Vereshchagin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056720#comment-14056720
 ] 

Evgeny Vereshchagin commented on KAFKA-1528:


kafka-patch-review.py squashes two commits to one giant diff. It's wrong for 
these changes. Maybe, format-patch and apply-patch would be better?

> Normalize all the line endings
> --
>
> Key: KAFKA-1528
> URL: https://issues.apache.org/jira/browse/KAFKA-1528
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Evgeny Vereshchagin
>Priority: Trivial
> Attachments: KAFKA-1528.patch, KAFKA-1528_2014-07-06_16:20:28.patch, 
> KAFKA-1528_2014-07-06_16:23:21.patch
>
>
> Hi!
> I add .gitattributes file and remove all '\r' from some .bat files
> See http://adaptivepatchwork.com/2012/03/01/mind-the-end-of-your-line/ and 
> https://help.github.com/articles/dealing-with-line-endings for explanation.
> Maybe, add https://help.github.com/articles/dealing-with-line-endings to 
> contributing/commiting guide?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: topic's partition have no leader and isr

2014-07-09 Thread Guozhang Wang
Dasheng,

This is indeed wired. Could you easily re-produce the problem, i.e.
starting the cluster with a topic' replication factor > 1, then change it
to 1. And see if this issue shows up again.

Guozhang


On Wed, Jul 9, 2014 at 12:09 PM, DashengJu  wrote:

> thanks for your found, it is weird.
>
> i checked my broker 1, it is working fine, no error or warn log.
>
> I checked the data folder, and found replication 10's replicas, isr and
> leader is actually on broker 2, works fine.
> just now I execute a reassignment operation to move partition 10's replicas
> to broker 2, the controller log says"Partition [org.mobile_nginx,10] to be
> reassigned is already assigned to replicas 2. ignoring request for
> partition reassignment". then describe the topic shows partition10's
> replicas became to 2.
> 2014年7月9日 PM11:31于 "Guozhang Wang" 写道:
>
> > It seems your broker 1 is in a bad state, besides these two partitions
> you
> > also have partition 10 whose Isr/Leader is not part of the replicas list:
> >
> > Topic: org.mobile_nginx Partition: 10   Leader: 2   Replicas: 1
> > Isr: 2
> >
> > Maybe you can go to broker 1 and check its logs first.
> >
> > Guozhang
> >
> >
> > On Wed, Jul 9, 2014 at 7:32 AM, Jun Rao  wrote:
> >
> > > It's weird that you have replication factor 1, but two  of the
> partitions
> > > 25 and 31 have 2 assigned replicas. What's the command you used for
> > > reassignment?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Jul 9, 2014 at 12:10 AM, 鞠大升  wrote:
> > >
> > > > @Jun Rao,   Kafka version: 0.8.1.1
> > > >
> > > > @Guozhang Wang, I can not found the original controller log, but I
> can
> > > give
> > > > the controller log after execute ./bin/kafka-reassign-partitions.sh
> > > > and ./bin/kafka-preferred-replica-election.sh
> > > >
> > > > Now I do not known how to recover leader for partition 25 and 31, any
> > > idea?
> > > >
> > > > - controller log for
> ./bin/kafka-reassign-partitions.sh
> > > > ---
> > > > [2014-07-09 15:01:31,552] DEBUG [PartitionsReassignedListener on 5]:
> > > > Partitions reassigned listener fired for path
> > /admin/reassign_partitions.
> > > > Record partitions to be reassigned
> > > >
> > > >
> > >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25,"replicas":[6]},{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> > > > (kafka.controller.PartitionsReassignedListener)
> > > > [2014-07-09 15:01:31,579] INFO [AddPartitionsListener on 5]: Add
> > > Partition
> > > > triggered
> > > >
> > > >
> > >
> >
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> > > > for path /brokers/topics/org.mobile_nginx
> > > > (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> > > > [2014-07-09 15:01:31,587] DEBUG [PartitionsReassignedListener on 5]:
> > > > Partitions reassigned listener fired for path
> > /admin/reassign_partitions.
> > > > Record partitions to be reassigned
> > > >
> > > >
> > >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> > > > (kafka.controller.PartitionsReassignedListener)
> > > > [2014-07-09 15:01:31,590] INFO [AddPartitionsListener on 5]: Add
> > > Partition
> > > > triggered
> > > >
> > > >
> > >
> >
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> > > > for path /brokers/topics/org.mobile_nginx
> > > > (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> > > >
> > > > - controller log for
> ./bin/kafka-reassign-partitions.sh
> > > > ---
> > > > [2014-07-09 15:07:02,968] DEBUG [PreferredReplicaElectionListener on
> > 5]:
> > > > Preferred replica election listener fired for path
> > > > /admin/preferred_replica_election. Record partitions to undergo
> > preferred
> > > > replica election
> > > >
> > > >
> > >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25},{"topic":"org.mobile_nginx","partition":31}]}
> > > > (kafka.controller.PreferredReplicaElectionListener)
> > > > [2014-07-09 15:07:02,969] INFO [Controller 5]: Starting preferred
> > replica
> > > > leader election for partitions
> > > [org.mobile_nginx,25],[org.mobile_nginx,31]
> > > > (kafka.controller.KafkaController)
> > > > [2014-07-09 15:07:02,969] INFO [Partition state machine on Controller
> > 5]:
> > > > Invoking state change to OnlinePartition for partitions
> > > > [org.mobile_nginx,25],[org.mobile_ngi

[jira] [Commented] (KAFKA-1532) Move CRC32 to AFTER the payload

2014-07-09 Thread Julian Morrison (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056792#comment-14056792
 ] 

Julian Morrison commented on KAFKA-1532:


Is "Kafka itself fully materializes messages in memory" unavoidable? (I am 
assuming writing it to file buffers is obvious.)

> Move CRC32 to AFTER the payload
> ---
>
> Key: KAFKA-1532
> URL: https://issues.apache.org/jira/browse/KAFKA-1532
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, producer 
>Reporter: Julian Morrison
>Assignee: Jun Rao
>Priority: Minor
>
> To support streaming a message of known length but unknown content, take the 
> CRC32 out of the message header and make it a message trailer. Then client 
> libraries can calculate it after streaming the message to Kafka, without 
> materializing the whole message in RAM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1532) Move CRC32 to AFTER the payload

2014-07-09 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056802#comment-14056802
 ] 

Jay Kreps commented on KAFKA-1532:
--

It's not unavoidable, but it is pretty hard. To make this really work we have 
to be able to stream partial requests from the network layer, to the 
api/processing layer, and to the log layer. This is even more complicated 
because additional appends to the log have to be blocked until the message is 
completely written.

> Move CRC32 to AFTER the payload
> ---
>
> Key: KAFKA-1532
> URL: https://issues.apache.org/jira/browse/KAFKA-1532
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, producer 
>Reporter: Julian Morrison
>Assignee: Jun Rao
>Priority: Minor
>
> To support streaming a message of known length but unknown content, take the 
> CRC32 out of the message header and make it a message trailer. Then client 
> libraries can calculate it after streaming the message to Kafka, without 
> materializing the whole message in RAM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1532) Move CRC32 to AFTER the payload

2014-07-09 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056806#comment-14056806
 ] 

Jay Kreps commented on KAFKA-1532:
--

Your approach where you buffer the message with a backing file and then do the 
full write to the log would also work and might be easier. But still a pretty 
big change.

> Move CRC32 to AFTER the payload
> ---
>
> Key: KAFKA-1532
> URL: https://issues.apache.org/jira/browse/KAFKA-1532
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, producer 
>Reporter: Julian Morrison
>Assignee: Jun Rao
>Priority: Minor
>
> To support streaming a message of known length but unknown content, take the 
> CRC32 out of the message header and make it a message trailer. Then client 
> libraries can calculate it after streaming the message to Kafka, without 
> materializing the whole message in RAM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1532) Move CRC32 to AFTER the payload

2014-07-09 Thread Julian Morrison (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056816#comment-14056816
 ] 

Julian Morrison commented on KAFKA-1532:


Streaming it to a scratch file (never sync'd, unlinked after use) avoids 
blocking the log - the blocking operation becomes "it's verified in the scratch 
file, copy it to the log."

> Move CRC32 to AFTER the payload
> ---
>
> Key: KAFKA-1532
> URL: https://issues.apache.org/jira/browse/KAFKA-1532
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, producer 
>Reporter: Julian Morrison
>Assignee: Jun Rao
>Priority: Minor
>
> To support streaming a message of known length but unknown content, take the 
> CRC32 out of the message header and make it a message trailer. Then client 
> libraries can calculate it after streaming the message to Kafka, without 
> materializing the whole message in RAM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1532) Move CRC32 to AFTER the payload

2014-07-09 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056833#comment-14056833
 ] 

Jay Kreps commented on KAFKA-1532:
--

Yeah agreed. I think that is doable.

> Move CRC32 to AFTER the payload
> ---
>
> Key: KAFKA-1532
> URL: https://issues.apache.org/jira/browse/KAFKA-1532
> Project: Kafka
>  Issue Type: Improvement
>  Components: core, producer 
>Reporter: Julian Morrison
>Assignee: Jun Rao
>Priority: Minor
>
> To support streaming a message of known length but unknown content, take the 
> CRC32 out of the message header and make it a message trailer. Then client 
> libraries can calculate it after streaming the message to Kafka, without 
> materializing the whole message in RAM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: topic's partition have no leader and isr

2014-07-09 Thread DashengJu
Guozhang,

I will try to re-preduce the problem in our test environment.

And the problem now exist in our production environment for 10 days, is
there any log or memory dump or zookeeper information useful for you to
analysis the problem?
and any ideas to recover from the situation? restart the cluster? delete
the topic manual?

thx


On Thu, Jul 10, 2014 at 5:27 AM, Guozhang Wang  wrote:

> Dasheng,
>
> This is indeed wired. Could you easily re-produce the problem, i.e.
> starting the cluster with a topic' replication factor > 1, then change it
> to 1. And see if this issue shows up again.
>
> Guozhang
>
>
> On Wed, Jul 9, 2014 at 12:09 PM, DashengJu  wrote:
>
> > thanks for your found, it is weird.
> >
> > i checked my broker 1, it is working fine, no error or warn log.
> >
> > I checked the data folder, and found replication 10's replicas, isr and
> > leader is actually on broker 2, works fine.
> > just now I execute a reassignment operation to move partition 10's
> replicas
> > to broker 2, the controller log says"Partition [org.mobile_nginx,10] to
> be
> > reassigned is already assigned to replicas 2. ignoring request for
> > partition reassignment". then describe the topic shows partition10's
> > replicas became to 2.
> > 2014年7月9日 PM11:31于 "Guozhang Wang" 写道:
> >
> > > It seems your broker 1 is in a bad state, besides these two partitions
> > you
> > > also have partition 10 whose Isr/Leader is not part of the replicas
> list:
> > >
> > > Topic: org.mobile_nginx Partition: 10   Leader: 2   Replicas: 1
> > > Isr: 2
> > >
> > > Maybe you can go to broker 1 and check its logs first.
> > >
> > > Guozhang
> > >
> > >
> > > On Wed, Jul 9, 2014 at 7:32 AM, Jun Rao  wrote:
> > >
> > > > It's weird that you have replication factor 1, but two  of the
> > partitions
> > > > 25 and 31 have 2 assigned replicas. What's the command you used for
> > > > reassignment?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Wed, Jul 9, 2014 at 12:10 AM, 鞠大升  wrote:
> > > >
> > > > > @Jun Rao,   Kafka version: 0.8.1.1
> > > > >
> > > > > @Guozhang Wang, I can not found the original controller log, but I
> > can
> > > > give
> > > > > the controller log after execute ./bin/kafka-reassign-partitions.sh
> > > > > and ./bin/kafka-preferred-replica-election.sh
> > > > >
> > > > > Now I do not known how to recover leader for partition 25 and 31,
> any
> > > > idea?
> > > > >
> > > > > - controller log for
> > ./bin/kafka-reassign-partitions.sh
> > > > > ---
> > > > > [2014-07-09 15:01:31,552] DEBUG [PartitionsReassignedListener on
> 5]:
> > > > > Partitions reassigned listener fired for path
> > > /admin/reassign_partitions.
> > > > > Record partitions to be reassigned
> > > > >
> > > > >
> > > >
> > >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":25,"replicas":[6]},{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> > > > > (kafka.controller.PartitionsReassignedListener)
> > > > > [2014-07-09 15:01:31,579] INFO [AddPartitionsListener on 5]: Add
> > > > Partition
> > > > > triggered
> > > > >
> > > > >
> > > >
> > >
> >
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> > > > > for path /brokers/topics/org.mobile_nginx
> > > > > (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> > > > > [2014-07-09 15:01:31,587] DEBUG [PartitionsReassignedListener on
> 5]:
> > > > > Partitions reassigned listener fired for path
> > > /admin/reassign_partitions.
> > > > > Record partitions to be reassigned
> > > > >
> > > > >
> > > >
> > >
> >
> {"version":1,"partitions":[{"topic":"org.mobile_nginx","partition":31,"replicas":[3]}]}
> > > > > (kafka.controller.PartitionsReassignedListener)
> > > > > [2014-07-09 15:01:31,590] INFO [AddPartitionsListener on 5]: Add
> > > > Partition
> > > > > triggered
> > > > >
> > > > >
> > > >
> > >
> >
> {"version":1,"partitions":{"12":[1],"8":[3],"19":[5],"23":[5],"4":[3],"15":[2],"11":[2],"9":[4],"22":[4],"26":[2],"13":[2],"24":[6],"16":[4],"5":[4],"10":[1],"21":[3],"6":[5],"1":[4],"17":[5],"25":[6,1],"14":[4],"31":[3,1],"0":[3],"20":[2],"27":[3],"2":[5],"18":[6],"30":[6],"7":[6],"29":[5],"3":[6],"28":[4]}}
> > > > > for path /brokers/topics/org.mobile_nginx
> > > > > (kafka.controller.PartitionStateMachine$AddPartitionsListener)
> > > > >
> > > > > - controller log for
> > ./bin/kafka-reassign-partitions.sh
> > > > > ---
> > > > > [2014-07-09 15:07:02,968] DEBUG [PreferredReplicaElectionListener
> on
> > > 5]:
> > > > > Preferred replica election listener fired for path
> > > > > /admin/preferred_replica_election. Record partitions to undergo
> > > preferred
> > >