[jira] [Created] (KAFKA-1322) slow start on unclean shutdown

2014-03-22 Thread Alexey Ozeritskiy (JIRA)
Alexey Ozeritskiy created KAFKA-1322:


 Summary: slow start on unclean shutdown
 Key: KAFKA-1322
 URL: https://issues.apache.org/jira/browse/KAFKA-1322
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Alexey Ozeritskiy
Priority: Critical


Kafka 0.8.1 checks all segments on unclean shutdown. 
0.8.0 checks only the latest segment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1322) slow start on unclean shutdown

2014-03-22 Thread Alexey Ozeritskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Ozeritskiy updated KAFKA-1322:
-

Status: Patch Available  (was: Open)

> slow start on unclean shutdown
> --
>
> Key: KAFKA-1322
> URL: https://issues.apache.org/jira/browse/KAFKA-1322
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Alexey Ozeritskiy
>Priority: Critical
>
> Kafka 0.8.1 checks all segments on unclean shutdown. 
> 0.8.0 checks only the latest segment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1322) slow start on unclean shutdown

2014-03-22 Thread Alexey Ozeritskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Ozeritskiy updated KAFKA-1322:
-

Attachment: kafka.patch

> slow start on unclean shutdown
> --
>
> Key: KAFKA-1322
> URL: https://issues.apache.org/jira/browse/KAFKA-1322
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Alexey Ozeritskiy
>Priority: Critical
> Attachments: kafka.patch
>
>
> Kafka 0.8.1 checks all segments on unclean shutdown. 
> 0.8.0 checks only the latest segment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1322) slow startup after unclean shutdown

2014-03-22 Thread Alexey Ozeritskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Ozeritskiy updated KAFKA-1322:
-

Summary: slow startup after unclean shutdown  (was: slow start on unclean 
shutdown)

> slow startup after unclean shutdown
> ---
>
> Key: KAFKA-1322
> URL: https://issues.apache.org/jira/browse/KAFKA-1322
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Alexey Ozeritskiy
>Priority: Critical
> Attachments: kafka.patch
>
>
> Kafka 0.8.1 checks all segments on unclean shutdown. 
> 0.8.0 checks only the latest segment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1322) slow startup after unclean shutdown

2014-03-22 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944231#comment-13944231
 ] 

Guozhang Wang commented on KAFKA-1322:
--

Hi Alexey,

This is not expected in 0.8.1, which used a checkpoint file to avoid recovering 
all the latest segments. Did you see the checkpoint file after unclean shutdown?

> slow startup after unclean shutdown
> ---
>
> Key: KAFKA-1322
> URL: https://issues.apache.org/jira/browse/KAFKA-1322
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.1
>Reporter: Alexey Ozeritskiy
>Priority: Critical
> Attachments: kafka.patch
>
>
> Kafka 0.8.1 checks all segments on unclean shutdown. 
> 0.8.0 checks only the latest segment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (KAFKA-1029) Zookeeper leader election stuck in ephemeral node retry loop

2014-03-22 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944233#comment-13944233
 ] 

Guozhang Wang commented on KAFKA-1029:
--

This is a little wired. The patch should have been in 0.8 already. Could you 
upload the related logs here?

> Zookeeper leader election stuck in ephemeral node retry loop
> 
>
> Key: KAFKA-1029
> URL: https://issues.apache.org/jira/browse/KAFKA-1029
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.0
>Reporter: Sam Meder
>Assignee: Sam Meder
>Priority: Blocker
> Fix For: 0.8.0
>
> Attachments: 
> 0002-KAFKA-1029-Use-brokerId-instead-of-leaderId-when-tri.patch
>
>
> We're seeing the following log statements (over and over):
> [2013-08-27 07:21:49,538] INFO conflict in /controller data: { "brokerid":3, 
> "timestamp":"1377587945206", "version":1 } stored data: { "brokerid":2, 
> "timestamp":"1377587460904", "version":1 } (kafka.utils.ZkUtils$)
> [2013-08-27 07:21:49,559] INFO I wrote this conflicted ephemeral node [{ 
> "brokerid":3, "timestamp":"1377587945206", "version":1 }] at /controller a 
> while back in a different session, hence I will backoff for this node to be 
> deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
> where the broker is essentially stuck in the loop that is trying to deal with 
> left-over ephemeral nodes. The code looks a bit racy to me. In particular:
> ZookeeperLeaderElector:
>   def elect: Boolean = {
> controllerContext.zkClient.subscribeDataChanges(electionPath, 
> leaderChangeListener)
> val timestamp = SystemTime.milliseconds.toString
> val electString = ...
> try {
>   
> createEphemeralPathExpectConflictHandleZKBug(controllerContext.zkClient, 
> electionPath, electString, leaderId,
> (controllerString : String, leaderId : Any) => 
> KafkaController.parseControllerId(controllerString) == 
> leaderId.asInstanceOf[Int],
> controllerContext.zkSessionTimeout)
> leaderChangeListener is registered before the create call (by the way, it 
> looks like a new registration will be added every elect call - shouldn't it 
> register in startup()?) so can update leaderId to the current leader before 
> the call to create. If that happens then we will continuously get node exists 
> exceptions and the checker function will always return true, i.e. we will 
> never get out of the while(true) loop.
> I think the right fix here is to pass brokerId instead of leaderId when 
> calling create, i.e.
> createEphemeralPathExpectConflictHandleZKBug(controllerContext.zkClient, 
> electionPath, electString, brokerId,
> (controllerString : String, leaderId : Any) => 
> KafkaController.parseControllerId(controllerString) == 
> leaderId.asInstanceOf[Int],
> controllerContext.zkSessionTimeout)
> The loop dealing with the ephemeral node bug is now only triggered for the 
> broker that owned the node previously, although I am still not 100% sure if 
> that is sufficient.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 18299: Fix KAFKA-1253

2014-03-22 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18299/
---

(Updated March 23, 2014, 12:53 a.m.)


Review request for kafka.


Bugs: KAFKA-1253
https://issues.apache.org/jira/browse/KAFKA-1253


Repository: kafka


Description (updated)
---

Minor change in ConsoleProducer


Add estimation conservativeness factor


Dummy


Incorporate Jay and Timothy's comments


Dummy


Dummy


Dummy


Dummy


Dummy


Dummy


Incorporate Jun's comments round four


Dummy


KAFKA-1253.v5


Dummy


Dummy


Dummy


Small fix on ProducerPerformance


Dummy


Dummy


Dummy


Dummy


Dummy


Dummy


Dummy


Dummy


Dummy


Dummy


Dummy


Dummy


Dummy


Incorporate Jun's comments one


Dummy


Dummy


Dummy


Dummy


Dummy


Dummy


Dummy


Dummy 2


Dummy


KAFKA-1253


Diffs (updated)
-

  build.gradle 84fa0d6b5f7405af755c5d7ff7bdd7592bb8668f 
  clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java 
1ac69436f117800815b8d50f042e9e2a29364b43 
  clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java 
32e12ad149f6d70c96a498d0a390976f77bf9e2a 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/BufferPool.java
 b69866a9fb9a8b4e1e78d304a20eda3cbf178c6f 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
 673b2962771c28ceb3c7a6c0fd6f69521bd7ed16 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordBatch.java
 038a05a94b795ec0a95b2d40a89222394b5a74c4 
  clients/src/main/java/org/apache/kafka/clients/tools/ProducerPerformance.java 
3ebbb804242be6a001b3bae6524afccc85a87602 
  
clients/src/main/java/org/apache/kafka/common/record/ByteBufferInputStream.java 
PRE-CREATION 
  
clients/src/main/java/org/apache/kafka/common/record/ByteBufferOutputStream.java
 PRE-CREATION 
  clients/src/main/java/org/apache/kafka/common/record/CompressionType.java 
906da02d02c03aadd8ab73ed2fc9a1898acb8d72 
  clients/src/main/java/org/apache/kafka/common/record/Compressor.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java 
9d8935fa3beeb2a78b109a41ed76fd4374239560 
  clients/src/main/java/org/apache/kafka/common/record/Record.java 
f1dc9778502cbdfe982254fb6e25947842622239 
  clients/src/main/java/org/apache/kafka/common/utils/Crc32.java 
153c5a6d345293aa0ba2cf513373323a6e9f2467 
  clients/src/main/java/org/apache/kafka/common/utils/Utils.java 
0c6b3656375721a718fb4de10118170aacce0ea9 
  clients/src/test/java/org/apache/kafka/common/record/MemoryRecordsTest.java 
b0745b528cef929c4273f7e2ac4de1476cfc25ad 
  clients/src/test/java/org/apache/kafka/common/record/RecordTest.java 
ae54d67da9907b0a043180c7395a1370b3d0528d 
  clients/src/test/java/org/apache/kafka/common/utils/CrcTest.java PRE-CREATION 
  clients/src/test/java/org/apache/kafka/test/TestUtils.java 
36cfc0fda742eb501af2c2c0330e3f461cf1f40c 
  core/src/main/scala/kafka/producer/ConsoleProducer.scala 
dd39ff22c918fe5b05f04582b748e32349b2055f 
  core/src/test/scala/integration/kafka/api/ProducerCompressionTest.scala 
PRE-CREATION 
  core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 
c002f5ea38ece66ad559fadb18ffaf40ac2026aa 
  core/src/test/scala/integration/kafka/api/ProducerSendTest.scala 
66ea76b9b6c0f8839f715c845fb9b9671b8f35c1 
  perf/src/main/scala/kafka/perf/ProducerPerformance.scala 
3df0d130308a35fca96184adc212eea1488f 

Diff: https://reviews.apache.org/r/18299/diff/


Testing
---

integration tests

unit tests

stress tests (1K message size, 1M messages in producer performance with ack = 
1, linger time = 0ms/500ms, random bit/all-ones messages)

snappy dynamic load test


Thanks,

Guozhang Wang



[jira] [Commented] (KAFKA-1253) Implement compression in new producer

2014-03-22 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944287#comment-13944287
 ] 

Guozhang Wang commented on KAFKA-1253:
--

Updated reviewboard https://reviews.apache.org/r/18299/
 against branch origin/trunk

> Implement compression in new producer
> -
>
> Key: KAFKA-1253
> URL: https://issues.apache.org/jira/browse/KAFKA-1253
> Project: Kafka
>  Issue Type: Sub-task
>  Components: producer 
>Reporter: Jay Kreps
>Assignee: Guozhang Wang
> Attachments: KAFKA-1253.patch, KAFKA-1253_2014-02-21_16:15:21.patch, 
> KAFKA-1253_2014-02-21_17:55:52.patch, KAFKA-1253_2014-02-24_13:31:50.patch, 
> KAFKA-1253_2014-02-26_17:31:30.patch, KAFKA-1253_2014-03-06_17:48:11.patch, 
> KAFKA-1253_2014-03-07_16:34:33.patch, KAFKA-1253_2014-03-10_14:35:56.patch, 
> KAFKA-1253_2014-03-10_14:39:58.patch, KAFKA-1253_2014-03-10_15:27:47.patch, 
> KAFKA-1253_2014-03-14_13:46:40.patch, KAFKA-1253_2014-03-14_17:39:53.patch, 
> KAFKA-1253_2014-03-17_15:56:04.patch, KAFKA-1253_2014-03-18_17:10:10.patch, 
> KAFKA-1253_2014-03-19_16:31:39.patch, KAFKA-1253_2014-03-22_17:53:44.patch, 
> compression-fix.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1253) Implement compression in new producer

2014-03-22 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1253:
-

Attachment: KAFKA-1253_2014-03-22_17:53:44.patch

> Implement compression in new producer
> -
>
> Key: KAFKA-1253
> URL: https://issues.apache.org/jira/browse/KAFKA-1253
> Project: Kafka
>  Issue Type: Sub-task
>  Components: producer 
>Reporter: Jay Kreps
>Assignee: Guozhang Wang
> Attachments: KAFKA-1253.patch, KAFKA-1253_2014-02-21_16:15:21.patch, 
> KAFKA-1253_2014-02-21_17:55:52.patch, KAFKA-1253_2014-02-24_13:31:50.patch, 
> KAFKA-1253_2014-02-26_17:31:30.patch, KAFKA-1253_2014-03-06_17:48:11.patch, 
> KAFKA-1253_2014-03-07_16:34:33.patch, KAFKA-1253_2014-03-10_14:35:56.patch, 
> KAFKA-1253_2014-03-10_14:39:58.patch, KAFKA-1253_2014-03-10_15:27:47.patch, 
> KAFKA-1253_2014-03-14_13:46:40.patch, KAFKA-1253_2014-03-14_17:39:53.patch, 
> KAFKA-1253_2014-03-17_15:56:04.patch, KAFKA-1253_2014-03-18_17:10:10.patch, 
> KAFKA-1253_2014-03-19_16:31:39.patch, KAFKA-1253_2014-03-22_17:53:44.patch, 
> compression-fix.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 18299: Fix KAFKA-1253

2014-03-22 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18299/
---

(Updated March 23, 2014, 12:55 a.m.)


Review request for kafka.


Bugs: KAFKA-1253
https://issues.apache.org/jira/browse/KAFKA-1253


Repository: kafka


Description (updated)
---

Incorporated Jay's comments

In-place compression with

1) Dynamic reallocation in the underlying byte buffer
2) Written bytes estimate to reduce reallocation probabilities
3) Deallocation in buffer pool following the original capacity


Diffs
-

  build.gradle 84fa0d6b5f7405af755c5d7ff7bdd7592bb8668f 
  clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java 
1ac69436f117800815b8d50f042e9e2a29364b43 
  clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java 
32e12ad149f6d70c96a498d0a390976f77bf9e2a 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/BufferPool.java
 b69866a9fb9a8b4e1e78d304a20eda3cbf178c6f 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
 673b2962771c28ceb3c7a6c0fd6f69521bd7ed16 
  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordBatch.java
 038a05a94b795ec0a95b2d40a89222394b5a74c4 
  clients/src/main/java/org/apache/kafka/clients/tools/ProducerPerformance.java 
3ebbb804242be6a001b3bae6524afccc85a87602 
  
clients/src/main/java/org/apache/kafka/common/record/ByteBufferInputStream.java 
PRE-CREATION 
  
clients/src/main/java/org/apache/kafka/common/record/ByteBufferOutputStream.java
 PRE-CREATION 
  clients/src/main/java/org/apache/kafka/common/record/CompressionType.java 
906da02d02c03aadd8ab73ed2fc9a1898acb8d72 
  clients/src/main/java/org/apache/kafka/common/record/Compressor.java 
PRE-CREATION 
  clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java 
9d8935fa3beeb2a78b109a41ed76fd4374239560 
  clients/src/main/java/org/apache/kafka/common/record/Record.java 
f1dc9778502cbdfe982254fb6e25947842622239 
  clients/src/main/java/org/apache/kafka/common/utils/Crc32.java 
153c5a6d345293aa0ba2cf513373323a6e9f2467 
  clients/src/main/java/org/apache/kafka/common/utils/Utils.java 
0c6b3656375721a718fb4de10118170aacce0ea9 
  clients/src/test/java/org/apache/kafka/common/record/MemoryRecordsTest.java 
b0745b528cef929c4273f7e2ac4de1476cfc25ad 
  clients/src/test/java/org/apache/kafka/common/record/RecordTest.java 
ae54d67da9907b0a043180c7395a1370b3d0528d 
  clients/src/test/java/org/apache/kafka/common/utils/CrcTest.java PRE-CREATION 
  clients/src/test/java/org/apache/kafka/test/TestUtils.java 
36cfc0fda742eb501af2c2c0330e3f461cf1f40c 
  core/src/main/scala/kafka/producer/ConsoleProducer.scala 
dd39ff22c918fe5b05f04582b748e32349b2055f 
  core/src/test/scala/integration/kafka/api/ProducerCompressionTest.scala 
PRE-CREATION 
  core/src/test/scala/integration/kafka/api/ProducerFailureHandlingTest.scala 
c002f5ea38ece66ad559fadb18ffaf40ac2026aa 
  core/src/test/scala/integration/kafka/api/ProducerSendTest.scala 
66ea76b9b6c0f8839f715c845fb9b9671b8f35c1 
  perf/src/main/scala/kafka/perf/ProducerPerformance.scala 
3df0d130308a35fca96184adc212eea1488f 

Diff: https://reviews.apache.org/r/18299/diff/


Testing (updated)
---

shallow cluster tests with rolling bounces

integration tests

unit tests

stress tests (1K message size, 1M messages in producer performance with ack = 
1, linger time = 0ms/500ms, random bit/all-ones messages)

snappy dynamic load test


Thanks,

Guozhang Wang