date:20140903


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25155/
---

(Updated Sept. 3, 2014, 7:52 p.m.)


Review request for kafka.


Bugs: KAFKA-1616
https://issues.apache.org/jira/browse/KAFKA-1616


Repository: kafka


Description (updated)
---

Incorporated Jun's comments round four


Diffs (updated)
-

  core/src/main/scala/kafka/server/RequestPurgatory.scala 
ce06d2c381348deef8559374869fcaed923da1d1 
  core/src/test/scala/unit/kafka/server/RequestPurgatoryTest.scala 
168712de241125982d556c188c76514fceb93779 

Diff: https://reviews.apache.org/r/25155/diff/


Testing
---


Thanks,

Guozhang Wang

[jira] [Commented] (KAFKA-1616) Purgatory Size and Num.Delayed.Request metrics are incorrect


[ 
https://issues.apache.org/jira/browse/KAFKA-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120328#comment-14120328
 ] 

Guozhang Wang commented on KAFKA-1616:
--

Updated reviewboard https://reviews.apache.org/r/25155/diff/
 against branch origin/trunk

> Purgatory Size and Num.Delayed.Request metrics are incorrect
> 
>
> Key: KAFKA-1616
> URL: https://issues.apache.org/jira/browse/KAFKA-1616
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 0.9.0
>
> Attachments: KAFKA-1616.patch, KAFKA-1616_2014-08-28_10:12:17.patch, 
> KAFKA-1616_2014-09-01_14:41:56.patch, KAFKA-1616_2014-09-02_12:58:07.patch, 
> KAFKA-1616_2014-09-02_13:23:13.patch, KAFKA-1616_2014-09-03_12:53:09.patch
>
>
> The request purgatory used two atomic integers "watched" and "unsatisfied" to 
> record the purgatory size ( = watched + unsatisfied) and number of delayed 
> requests ( = unsatisfied). But due to some race conditions these two atomic 
> integers are not updated correctly, result in incorrect metrics.
> Proposed solution: to have a cleaner semantics, we can define the "purgatory 
> size" to be just the number of elements in the watched lists, and the "number 
> of delayed requests" to be just the length of the expiry queue. And instead 
> of using two atomic integeres we just compute the size of the lists / queue 
> on the fly each time the metrics are pulled. This may use some more CPU 
> cycles for these two metrics but should be minor, and the correctness is 
> guaranteed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1616) Purgatory Size and Num.Delayed.Request metrics are incorrect


 [ 
https://issues.apache.org/jira/browse/KAFKA-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1616:
-
Attachment: KAFKA-1616_2014-09-03_12:53:09.patch

> Purgatory Size and Num.Delayed.Request metrics are incorrect
> 
>
> Key: KAFKA-1616
> URL: https://issues.apache.org/jira/browse/KAFKA-1616
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 0.9.0
>
> Attachments: KAFKA-1616.patch, KAFKA-1616_2014-08-28_10:12:17.patch, 
> KAFKA-1616_2014-09-01_14:41:56.patch, KAFKA-1616_2014-09-02_12:58:07.patch, 
> KAFKA-1616_2014-09-02_13:23:13.patch, KAFKA-1616_2014-09-03_12:53:09.patch
>
>
> The request purgatory used two atomic integers "watched" and "unsatisfied" to 
> record the purgatory size ( = watched + unsatisfied) and number of delayed 
> requests ( = unsatisfied). But due to some race conditions these two atomic 
> integers are not updated correctly, result in incorrect metrics.
> Proposed solution: to have a cleaner semantics, we can define the "purgatory 
> size" to be just the number of elements in the watched lists, and the "number 
> of delayed requests" to be just the length of the expiry queue. And instead 
> of using two atomic integeres we just compute the size of the lists / queue 
> on the fly each time the metrics are pulled. This may use some more CPU 
> cycles for these two metrics but should be minor, and the correctness is 
> guaranteed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25136: Patch for KAFKA-1610


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25136/#review52234
---

Ship it!


Ship It!

- Guozhang Wang


On Sept. 3, 2014, 6:27 p.m., Mayuresh Gharat wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25136/
> ---
> 
> (Updated Sept. 3, 2014, 6:27 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1610
> https://issues.apache.org/jira/browse/KAFKA-1610
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Added comments explaining the changes and reverted back some changes as per 
> comments on the reviewboard
> 
> 
> Removed the unnecessary import
> 
> 
> Made changes to comments as per the suggestions on the reviewboard
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/admin/ReassignPartitionsCommand.scala 
> 691d69a49a240f38883d2025afaec26fd61281b5 
>   core/src/main/scala/kafka/controller/KafkaController.scala 
> 8ab4a1b8072c9dd187a9a6e94138b725d1f1b153 
>   core/src/main/scala/kafka/server/DelayedFetch.scala 
> e0f14e25af03e6d4344386dcabc1457ee784d345 
>   core/src/main/scala/kafka/server/DelayedProduce.scala 
> 9481508fc2d6140b36829840c337e557f3d090da 
>   core/src/main/scala/kafka/server/KafkaApis.scala 
> c584b559416b3ee4bcbec5966be4891e0a03eefb 
>   core/src/main/scala/kafka/server/KafkaServer.scala 
> 28711182aaa70eaa623de858bc063cb2613b2a4d 
>   core/src/main/scala/kafka/tools/ReplicaVerificationTool.scala 
> af4783646803e58714770c21f8c3352370f26854 
>   core/src/test/scala/unit/kafka/server/LeaderElectionTest.scala 
> c2ba07c5fdbaf0e65ca033b2e4d88f45a8a15b2e 
> 
> Diff: https://reviews.apache.org/r/25136/diff/
> 
> 
> Testing
> ---
> 
> Ran the unit tests and everything passed and the build succeeeded
> 
> 
> Thanks,
> 
> Mayuresh Gharat
> 
>

[jira] [Commented] (KAFKA-1591) Clean-up Unnecessary stack trace in error/warn logs


[ 
https://issues.apache.org/jira/browse/KAFKA-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120435#comment-14120435
 ] 

Guozhang Wang commented on KAFKA-1591:
--

Thanks for the patch. Could you go through the kafka server code base for all 
INFO entries and double check if they are necessarily INFO level logs?

> Clean-up Unnecessary stack trace in error/warn logs
> ---
>
> Key: KAFKA-1591
> URL: https://issues.apache.org/jira/browse/KAFKA-1591
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: Jira-1591-SocketConnection-Warning.patch
>
>
> Some of the unnecessary stack traces in error / warning log entries can 
> easily pollute the log files. Examples include KAFKA-1066, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Java Custom SLF4J Implementation and Kafka Lib Logs

2014-09-03 Thread Bhavesh Mistry

Hi Dev,

We are in process of upgrading the from 0.8.1 to trunk.  I have noticed
that Java New Kafka Producer Lib uses SLF4J api to log kafka logs.   We
have our own library which send all LOGGER.logXXX("MSG")" using the Kafka
Lib so what is best way avoid it.  One of way is to avoid all producer side
logs ignore package "kafka.producer.*" and
"org.apache.kafka.clients.producer".   I have to do this order to avoid
potential DEADLOCK with Kafka Threads and send methods..

https://github.com/apache/kafka/blob/0.8.1/clients/src/main/java/org/apache/kafka/clients/producer/Producer.java

Our use cases is to send all the logs to brokers. Let me know your
suggestions.


Thanks,

Bhavesh

[jira] [Commented] (KAFKA-686) 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper


[ 
https://issues.apache.org/jira/browse/KAFKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120664#comment-14120664
 ] 

Guozhang Wang commented on KAFKA-686:
-

Thanks [~phargett] and [~viktortnk] for your patches.

A general thought is that we may probably want to go one step further to avoid 
the brokers to be unnecessarily fallen into the bad state. One example I have 
seen before is that a 0.7 consumer mistakenly writes to a 0.8 Zookeeper may 
cause the controller to fail and the whole cluster in a bad state. To be more 
specific, according to the ZK data structure:

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper

1. The controller would log the error and "exclude" the information when 
encounter error in parsing a) consumer offsets, b) admin paths since these data 
can be written by clients or admin tools.
2. The controller would fail gracefully (i.e. log the error and shutdown the 
server itself) when encounter error in parsing a) broker registration info, b) 
controller epoch, c) controller registration, d) topic registration info, and 
e) partition state info,  since these data can only be written by the brokers 
themselves.
3. The broker should fail gracefully when encounter error in parsing a) 
partition state info.

Some other comments:

1. readDataMaybeNull is used in other places of ZkUtils and other classes, and 
they may also be modified accordingly.
2. I am wondering does Exception.failAsValue exist in scala version 2.8/9 also? 
We are planning to retire scala 2.8 support but have not finished yet.

Will you have time recently to continue working on this ticket?

> 0.8 Kafka broker should give a better error message when running against 0.7 
> zookeeper
> --
>
> Key: KAFKA-686
> URL: https://issues.apache.org/jira/browse/KAFKA-686
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Jay Kreps
>Priority: Blocker
>  Labels: newbie, patch
> Fix For: 0.8.2
>
> Attachments: KAFAK-686-null-pointer-fix.patch, 
> KAFKA-686-null-pointer-fix-2.patch
>
>
> People will not know that the zookeeper paths are not compatible. When you 
> try to start the 0.8 broker pointed at a 0.7 zookeeper you get a 
> NullPointerException. We should detect this and give a more sane error.
> Error:
> kafka.common.KafkaException: Can't parse json string: null
> at kafka.utils.Json$.liftedTree1$1(Json.scala:20)
> at kafka.utils.Json$.parseFull(Json.scala:16)
> at 
> kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:498)
> at 
> kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:494)
> at 
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
> at scala.collection.immutable.List.foreach(List.scala:45)
> at 
> kafka.utils.ZkUtils$.getReplicaAssignmentForTopics(ZkUtils.scala:494)
> at 
> kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:446)
> at 
> kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:220)
> at 
> kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:85)
> at 
> kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53)
> at 
> kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43)
> at kafka.controller.KafkaController.startup(KafkaController.scala:381)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:90)
> at 
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
> at kafka.Kafka$.main(Kafka.scala:46)
> at kafka.Kafka.main(Kafka.scala)
> Caused by: java.lang.NullPointerException
> at 
> scala.util.parsing.combinator.lexical.Scanners$Scanner.(Scanners.scala:52)
> at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71)
> at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85)
> at kafka.utils.Json$.liftedTree1$1(Json.scala:17)
> ... 16 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 24676: Fix KAFKA-1583



> On Sept. 3, 2014, 6:23 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/server/DelayedFetch.scala, line 100
> > 
> >
> > Should we log topicAndPartition as well?

fetchMetadata includes the fetchPartitionStatus, that includes the mapping of 
topicAndPartition to fetchPartitionStatus.


- Guozhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24676/#review52092
---


On Sept. 2, 2014, 8:37 p.m., Guozhang Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24676/
> ---
> 
> (Updated Sept. 2, 2014, 8:37 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1583
> https://issues.apache.org/jira/browse/KAFKA-1583
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Incorporated Jun's comments round three
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/api/FetchRequest.scala 
> 51cdccf7f90eb530cc62b094ed822b8469d50b12 
>   core/src/main/scala/kafka/api/FetchResponse.scala 
> af9308737bf7832eca018c2b3ede703f7d1209f1 
>   core/src/main/scala/kafka/api/OffsetCommitRequest.scala 
> 861a6cf11dc6b6431fcbbe9de00c74a122f204bd 
>   core/src/main/scala/kafka/api/ProducerRequest.scala 
> b2366e7eedcac17f657271d5293ff0bef6f3cbe6 
>   core/src/main/scala/kafka/api/ProducerResponse.scala 
> a286272c834b6f40164999ff8b7f8998875f2cfe 
>   core/src/main/scala/kafka/cluster/Partition.scala 
> ff106b47e6ee194cea1cf589474fef975b9dd7e2 
>   core/src/main/scala/kafka/common/ErrorMapping.scala 
> 3fae7910e4ce17bc8325887a046f383e0c151d44 
>   core/src/main/scala/kafka/log/Log.scala 
> 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
>   core/src/main/scala/kafka/network/BoundedByteBufferSend.scala 
> a624359fb2059340bb8dc1619c5b5f226e26eb9b 
>   core/src/main/scala/kafka/server/DelayedFetch.scala 
> e0f14e25af03e6d4344386dcabc1457ee784d345 
>   core/src/main/scala/kafka/server/DelayedProduce.scala 
> 9481508fc2d6140b36829840c337e557f3d090da 
>   core/src/main/scala/kafka/server/FetchRequestPurgatory.scala 
> ed1318891253556cdf4d908033b704495acd5724 
>   core/src/main/scala/kafka/server/KafkaApis.scala 
> c584b559416b3ee4bcbec5966be4891e0a03eefb 
>   core/src/main/scala/kafka/server/OffsetManager.scala 
> 43eb2a35bb54d32c66cdb94772df657b3a104d1a 
>   core/src/main/scala/kafka/server/ProducerRequestPurgatory.scala 
> d4a7d4a79b44263a1f7e1a92874dd36aa06e5a3f 
>   core/src/main/scala/kafka/server/ReplicaManager.scala 
> 68758e35d496a4659819960ae8e809d6e215568e 
>   core/src/main/scala/kafka/server/RequestPurgatory.scala 
> ce06d2c381348deef8559374869fcaed923da1d1 
>   core/src/main/scala/kafka/utils/DelayedItem.scala 
> d7276494072f14f1cdf7d23f755ac32678c5675c 
>   core/src/test/scala/unit/kafka/server/HighwatermarkPersistenceTest.scala 
> 03a424d45215e1e7780567d9559dae4d0ae6fc29 
>   core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
> cd302aa51eb8377d88b752d48274e403926439f2 
>   core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala 
> a9c4ddc78df0b3695a77a12cf8cf25521a203122 
>   core/src/test/scala/unit/kafka/server/RequestPurgatoryTest.scala 
> 168712de241125982d556c188c76514fceb93779 
>   core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala 
> ab60e9b3a4d063c838bdc7f97b3ac7d2ede87072 
>   core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
> 09ed8f5a7a414ae139803bf82d336c2d80bf4ac5 
> 
> Diff: https://reviews.apache.org/r/24676/diff/
> 
> 
> Testing
> ---
> 
> Unit tests
> 
> 
> Thanks,
> 
> Guozhang Wang
> 
>

Re: Review Request 24676: Fix KAFKA-1583



> On Sept. 3, 2014, 6:23 p.m., Jun Rao wrote:
> > Looks good. I only have the following minor comments.

Thanks Jun. If there is no more comments for now I will wait for KAFKA-1616 to 
be checked in first, and then do the rebase and the class / function renaming 
(which will make the diff file quite hard to review) as well as comments 
modification accordingly.


- Guozhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24676/#review52092
---


On Sept. 2, 2014, 8:37 p.m., Guozhang Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24676/
> ---
> 
> (Updated Sept. 2, 2014, 8:37 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1583
> https://issues.apache.org/jira/browse/KAFKA-1583
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Incorporated Jun's comments round three
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/api/FetchRequest.scala 
> 51cdccf7f90eb530cc62b094ed822b8469d50b12 
>   core/src/main/scala/kafka/api/FetchResponse.scala 
> af9308737bf7832eca018c2b3ede703f7d1209f1 
>   core/src/main/scala/kafka/api/OffsetCommitRequest.scala 
> 861a6cf11dc6b6431fcbbe9de00c74a122f204bd 
>   core/src/main/scala/kafka/api/ProducerRequest.scala 
> b2366e7eedcac17f657271d5293ff0bef6f3cbe6 
>   core/src/main/scala/kafka/api/ProducerResponse.scala 
> a286272c834b6f40164999ff8b7f8998875f2cfe 
>   core/src/main/scala/kafka/cluster/Partition.scala 
> ff106b47e6ee194cea1cf589474fef975b9dd7e2 
>   core/src/main/scala/kafka/common/ErrorMapping.scala 
> 3fae7910e4ce17bc8325887a046f383e0c151d44 
>   core/src/main/scala/kafka/log/Log.scala 
> 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
>   core/src/main/scala/kafka/network/BoundedByteBufferSend.scala 
> a624359fb2059340bb8dc1619c5b5f226e26eb9b 
>   core/src/main/scala/kafka/server/DelayedFetch.scala 
> e0f14e25af03e6d4344386dcabc1457ee784d345 
>   core/src/main/scala/kafka/server/DelayedProduce.scala 
> 9481508fc2d6140b36829840c337e557f3d090da 
>   core/src/main/scala/kafka/server/FetchRequestPurgatory.scala 
> ed1318891253556cdf4d908033b704495acd5724 
>   core/src/main/scala/kafka/server/KafkaApis.scala 
> c584b559416b3ee4bcbec5966be4891e0a03eefb 
>   core/src/main/scala/kafka/server/OffsetManager.scala 
> 43eb2a35bb54d32c66cdb94772df657b3a104d1a 
>   core/src/main/scala/kafka/server/ProducerRequestPurgatory.scala 
> d4a7d4a79b44263a1f7e1a92874dd36aa06e5a3f 
>   core/src/main/scala/kafka/server/ReplicaManager.scala 
> 68758e35d496a4659819960ae8e809d6e215568e 
>   core/src/main/scala/kafka/server/RequestPurgatory.scala 
> ce06d2c381348deef8559374869fcaed923da1d1 
>   core/src/main/scala/kafka/utils/DelayedItem.scala 
> d7276494072f14f1cdf7d23f755ac32678c5675c 
>   core/src/test/scala/unit/kafka/server/HighwatermarkPersistenceTest.scala 
> 03a424d45215e1e7780567d9559dae4d0ae6fc29 
>   core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
> cd302aa51eb8377d88b752d48274e403926439f2 
>   core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala 
> a9c4ddc78df0b3695a77a12cf8cf25521a203122 
>   core/src/test/scala/unit/kafka/server/RequestPurgatoryTest.scala 
> 168712de241125982d556c188c76514fceb93779 
>   core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala 
> ab60e9b3a4d063c838bdc7f97b3ac7d2ede87072 
>   core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
> 09ed8f5a7a414ae139803bf82d336c2d80bf4ac5 
> 
> Diff: https://reviews.apache.org/r/24676/diff/
> 
> 
> Testing
> ---
> 
> Unit tests
> 
> 
> Thanks,
> 
> Guozhang Wang
> 
>

[jira] [Commented] (KAFKA-1555) provide strong consistency with reasonable availability


[ 
https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120810#comment-14120810
 ] 

Joe Stein commented on KAFKA-1555:
--

Where have we ended up with this? Do we have an agreed to solution? I think 
different people throwing up patches would be a bad way to proceed.  Can we 
come up with a 1) how we want to-do this and 2) have someone(s) take this on 
with an expected timeframe to completion.  having it it in 0.8.2 would be great 
but at least a patch would be fantastic we could apply, please!  Thanks!!!

> provide strong consistency with reasonable availability
> ---
>
> Key: KAFKA-1555
> URL: https://issues.apache.org/jira/browse/KAFKA-1555
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.8.1.1
>Reporter: Jiang Wu
>Assignee: Neha Narkhede
>
> In a mission critical application, we expect a kafka cluster with 3 brokers 
> can satisfy two requirements:
> 1. When 1 broker is down, no message loss or service blocking happens.
> 2. In worse cases such as two brokers are down, service can be blocked, but 
> no message loss happens.
> We found that current kafka versoin (0.8.1.1) cannot achieve the requirements 
> due to its three behaviors:
> 1. when choosing a new leader from 2 followers in ISR, the one with less 
> messages may be chosen as the leader.
> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it 
> has less messages than the leader.
> 3. ISR can contains only 1 broker, therefore acknowledged messages may be 
> stored in only 1 broker.
> The following is an analytical proof. 
> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume 
> that at the beginning, all 3 replicas, leader A, followers B and C, are in 
> sync, i.e., they have the same messages and are all in ISR.
> According to the value of request.required.acks (acks for short), there are 
> the following cases.
> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this 
> time, although C hasn't received m, C is still in ISR. If A is killed, C can 
> be elected as the new leader, and consumers will miss m.
> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a 
> message m to A, and receives an acknowledgement. Disk failure happens in A 
> before B and C replicate m. Message m is lost.
> In summary, any existing configuration cannot satisfy the requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1555) provide strong consistency with reasonable availability


 [ 
https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1555:
-
Fix Version/s: 0.8.2

> provide strong consistency with reasonable availability
> ---
>
> Key: KAFKA-1555
> URL: https://issues.apache.org/jira/browse/KAFKA-1555
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.8.1.1
>Reporter: Jiang Wu
>Assignee: Neha Narkhede
> Fix For: 0.8.2
>
>
> In a mission critical application, we expect a kafka cluster with 3 brokers 
> can satisfy two requirements:
> 1. When 1 broker is down, no message loss or service blocking happens.
> 2. In worse cases such as two brokers are down, service can be blocked, but 
> no message loss happens.
> We found that current kafka versoin (0.8.1.1) cannot achieve the requirements 
> due to its three behaviors:
> 1. when choosing a new leader from 2 followers in ISR, the one with less 
> messages may be chosen as the leader.
> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it 
> has less messages than the leader.
> 3. ISR can contains only 1 broker, therefore acknowledged messages may be 
> stored in only 1 broker.
> The following is an analytical proof. 
> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume 
> that at the beginning, all 3 replicas, leader A, followers B and C, are in 
> sync, i.e., they have the same messages and are all in ISR.
> According to the value of request.required.acks (acks for short), there are 
> the following cases.
> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this 
> time, although C hasn't received m, C is still in ISR. If A is killed, C can 
> be elected as the new leader, and consumers will miss m.
> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a 
> message m to A, and receives an acknowledgement. Disk failure happens in A 
> before B and C replicate m. Message m is lost.
> In summary, any existing configuration cannot satisfy the requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1555) provide strong consistency with reasonable availability

2014-09-03 Thread Gwen Shapira (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120815#comment-14120815
 ] 

Gwen Shapira commented on KAFKA-1555:
-

If there's a consensus (or even a lazy consensus) around the min.isr approach, 
I'll be happy to take a stab at it.

> provide strong consistency with reasonable availability
> ---
>
> Key: KAFKA-1555
> URL: https://issues.apache.org/jira/browse/KAFKA-1555
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.8.1.1
>Reporter: Jiang Wu
>Assignee: Neha Narkhede
> Fix For: 0.8.2
>
>
> In a mission critical application, we expect a kafka cluster with 3 brokers 
> can satisfy two requirements:
> 1. When 1 broker is down, no message loss or service blocking happens.
> 2. In worse cases such as two brokers are down, service can be blocked, but 
> no message loss happens.
> We found that current kafka versoin (0.8.1.1) cannot achieve the requirements 
> due to its three behaviors:
> 1. when choosing a new leader from 2 followers in ISR, the one with less 
> messages may be chosen as the leader.
> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it 
> has less messages than the leader.
> 3. ISR can contains only 1 broker, therefore acknowledged messages may be 
> stored in only 1 broker.
> The following is an analytical proof. 
> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume 
> that at the beginning, all 3 replicas, leader A, followers B and C, are in 
> sync, i.e., they have the same messages and are all in ISR.
> According to the value of request.required.acks (acks for short), there are 
> the following cases.
> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this 
> time, although C hasn't received m, C is still in ISR. If A is killed, C can 
> be elected as the new leader, and consumers will miss m.
> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a 
> message m to A, and receives an acknowledgement. Disk failure happens in A 
> before B and C replicate m. Message m is lost.
> In summary, any existing configuration cannot satisfy the requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[DISCUSS] 0.8.2 release branch, "unofficial" release candidates(s), 0.8.1.2 release

2014-09-03 Thread Joe Stein

Hey, I wanted to take a quick pulse to see if we are getting closer to a
branch for 0.8.2.

1) There still seems to be a lot of open issues
https://issues.apache.org/jira/browse/KAFKA/fixforversion/12326167/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel
and our 30 day summary is showing issues: 51 created and *34* resolved and not
sure how much of that we could really just decide to push off to 0.8.3 or
0.9.0 vs working on 0.8.2 as stable for release. There is already so much
goodness on trunk. I appreciate the double commit pain especially as trunk
and branch drift (ugh).

2) Also, I wanted to float the idea of after making the 0.8.2 branch that I
would do some unofficial release candidates for folks to test prior to
release and vote. What I was thinking was I would build, upload and stage
like I was preparing artifacts for vote but let the community know to go in
and "have at it" well prior to the vote release. We don't get a lot of
community votes during a release but issues after (which is natural because
of how things are done). I have seen four Apache projects doing this very
successfully not only have they had less iterations of RC votes (sensitive
to that myself) but the community kicked back issues they saw by giving
them some "pre release" time to go through their own test and staging
environments as the release are coming about.

3) Checking again on "should we have a 0.8.1.2" release if folks in the
community find important features (this might be best asked on the user
list maybe not sure) they don't want/can't wait for which wouldn't be too
much pain/dangerous to back port. Two things that spring to the top of my
head are 2.11 Scala support and fixing the source jars. Both of these are
easy to patch personally I don't mind but want to gauge more from the
community on this too. I have heard gripes ad hoc from folks in direct
communication but no complains really in the public forum and wanted to
open the floor if folks had a need.

4) 0.9 work I feel is being held up some (or at least resourcing it from my
perspective). We decided to hold up including SSL (even though we have a
path for it). Jay did a nice update recently to the Security wiki which I
think we should move forward with. I have some more to add/change/update
and want to start getting down to more details and getting specific people
working on specific tasks but without knowing what we are doing when it is
hard to manage.

5) I just updated https://issues.apache.org/jira/browse/KAFKA-1555 I think
it is a really important feature update doesn't have to be in 0.8.2 but we
need consensus (no pun intended). It fundamentally allows for data in min
two rack requirement which A LOT of data requires for successful save to
occur.

/***
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop
/

[jira] [Assigned] (KAFKA-1555) provide strong consistency with reasonable availability


 [ 
https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein reassigned KAFKA-1555:


Assignee: Gwen Shapira  (was: Neha Narkhede)

> provide strong consistency with reasonable availability
> ---
>
> Key: KAFKA-1555
> URL: https://issues.apache.org/jira/browse/KAFKA-1555
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.8.1.1
>Reporter: Jiang Wu
>Assignee: Gwen Shapira
> Fix For: 0.8.2
>
>
> In a mission critical application, we expect a kafka cluster with 3 brokers 
> can satisfy two requirements:
> 1. When 1 broker is down, no message loss or service blocking happens.
> 2. In worse cases such as two brokers are down, service can be blocked, but 
> no message loss happens.
> We found that current kafka versoin (0.8.1.1) cannot achieve the requirements 
> due to its three behaviors:
> 1. when choosing a new leader from 2 followers in ISR, the one with less 
> messages may be chosen as the leader.
> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it 
> has less messages than the leader.
> 3. ISR can contains only 1 broker, therefore acknowledged messages may be 
> stored in only 1 broker.
> The following is an analytical proof. 
> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume 
> that at the beginning, all 3 replicas, leader A, followers B and C, are in 
> sync, i.e., they have the same messages and are all in ISR.
> According to the value of request.required.acks (acks for short), there are 
> the following cases.
> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this 
> time, although C hasn't received m, C is still in ISR. If A is killed, C can 
> be elected as the new leader, and consumers will miss m.
> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a 
> message m to A, and receives an acknowledgement. Disk failure happens in A 
> before B and C replicate m. Message m is lost.
> In summary, any existing configuration cannot satisfy the requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1555) provide strong consistency with reasonable availability