[jira] [Created] (KAFKA-1623) kafka is sometimes slow to accept connections

2014-09-04 Thread Shlomi Hazan (JIRA)
Shlomi Hazan created KAFKA-1623:
---

 Summary: kafka is sometimes slow to accept connections
 Key: KAFKA-1623
 URL: https://issues.apache.org/jira/browse/KAFKA-1623
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Shlomi Hazan


from SocketServer.scala:144
the Acceptor can wait up to 500 millis before processing the accumulated FDs. 
Also, the backlog of the acceptor socket seems not to be defined, which may be 
problematic if all 500 millis are elapsed before the thread awakes. setting the 
backlog is doable using the proper ServerSocket Ctor, and maybe better be 
provisioned via configuration.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] 0.8.2 release branch, unofficial release candidates(s), 0.8.1.2 release

2014-09-04 Thread Guozhang Wang
Just made a pass over the unresolved tickets tagged for 0.8.2, I think many
of them can be pushed to 0.8.3 / 0.9.


On Wed, Sep 3, 2014 at 8:05 PM, Jonathan Weeks jonathanbwe...@gmail.com
wrote:


 +1 on a 0.8.1.2 release as described.

 I manually applied patches to cobble together a working gradle build for
 kafka for scala 2.11, but would really appreciate an official release —
 i.e. 0.8.1.2, as we also have other dependent libraries we use as well
 (e.g. akka-kafka) that would be much easier to migrate and support if the
 build was public and official.

 There were at least several others on the “users” list that expressed
 interest in scala 2.11 support, who knows how many more “lurkers” are out
 there.

 Best Regards,

 -Jonathan

  Hey, I wanted to take a quick pulse to see if we are getting closer to a
  branch for 0.8.2.
 
  1) There still seems to be a lot of open issues
 
 https://issues.apache.org/jira/browse/KAFKA/fixforversion/12326167/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel
  and our 30 day summary is showing issues: 51 created and *34* resolved
 and not
  sure how much of that we could really just decide to push off to 0.8.3 or
  0.9.0 vs working on 0.8.2 as stable for release.  There is already so
 much
  goodness on trunk.  I appreciate the double commit pain especially as
 trunk
  and branch drift (ugh).
 
  2) Also, I wanted to float the idea of after making the 0.8.2 branch
 that I
  would do some unofficial release candidates for folks to test prior to
  release and vote.  What I was thinking was I would build, upload and
 stage
  like I was preparing artifacts for vote but let the community know to go
 in
  and have at it well prior to the vote release.  We don't get a lot of
  community votes during a release but issues after (which is natural
 because
  of how things are done).  I have seen four Apache projects doing this
 very
  successfully not only have they had less iterations of RC votes
 (sensitive
  to that myself) but the community kicked back issues they saw by giving
  them some pre release time to go through their own test and staging
  environments as the release are coming about.
 
  3) Checking again on should we have a 0.8.1.2 release if folks in the
  community find important features (this might be best asked on the user
  list maybe not sure) they don't want/can't wait for which wouldn't be too
  much pain/dangerous to back port. Two things that spring to the top of my
  head are 2.11 Scala support and fixing the source jars.  Both of these
 are
  easy to patch personally I don't mind but want to gauge more from the
  community on this too.  I have heard gripes ad hoc from folks in direct
  communication but no complains really in the public forum and wanted to
  open the floor if folks had a need.
 
  4) 0.9 work I feel is being held up some (or at least resourcing it from
 my
  perspective).  We decided to hold up including SSL (even though we have a
  path for it). Jay did a nice update recently to the Security wiki which I
  think we should move forward with.  I have some more to add/change/update
  and want to start getting down to more details and getting specific
 people
  working on specific tasks but without knowing what we are doing when it
 is
  hard to manage.
 
  5) I just updated https://issues.apache.org/jira/browse/KAFKA-1555 I
 think
  it is a really important feature update doesn't have to be in 0.8.2 but
 we
  need consensus (no pun intended). It fundamentally allows for data in min
  two rack requirement which A LOT of data requires for successful save to
  occur.
 
  /***
   Joe Stein
   Founder, Principal Consultant
   Big Data Open Source Security LLC
   http://www.stealth.ly
   Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
  /




-- 
-- Guozhang


[jira] [Updated] (KAFKA-1623) kafka is sometimes slow to accept connections

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1623:
-
Affects Version/s: (was: 0.8.1.1)
   0.9.0

 kafka is sometimes slow to accept connections
 -

 Key: KAFKA-1623
 URL: https://issues.apache.org/jira/browse/KAFKA-1623
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.9.0
Reporter: Shlomi Hazan
  Labels: performance

 from SocketServer.scala:144
 the Acceptor can wait up to 500 millis before processing the accumulated FDs. 
 Also, the backlog of the acceptor socket seems not to be defined, which may 
 be problematic if all 500 millis are elapsed before the thread awakes. 
 setting the backlog is doable using the proper ServerSocket Ctor, and maybe 
 better be provisioned via configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1590) Binarize trace level request logging along with debug level text logging

2014-09-04 Thread Abhishek Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121497#comment-14121497
 ] 

Abhishek Sharma commented on KAFKA-1590:


Please assign it to me. I want to give a try on this.

 Binarize trace level request logging along with debug level text logging
 

 Key: KAFKA-1590
 URL: https://issues.apache.org/jira/browse/KAFKA-1590
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
  Labels: newbie
 Fix For: 0.9.0


 With trace level logging, the request handling logs can grow very fast 
 depending on the client behavior (e.g. consumer with 0 maxWait and hence keep 
 sending fetch requests). Previously we have changed it to debug level which 
 only provides a summary of the requests, omitting request details. However 
 this does not work perfectly since summaries are not sufficient for 
 trouble-shooting, and turning on trace level upon issues will be too late.
 The proposed solution here, is to default to debug level logging with trace 
 level logging printed as binary format at the same time. The generated binary 
 files can then be further compressed / rolled out. When needed, we will then 
 decompress / parse the trace logs into texts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Subscription: outstanding kafka patches

2014-09-04 Thread jira
Issue Subscription
Filter: outstanding kafka patches (127 issues)
The list of outstanding kafka patches
Subscriber: kafka-mailing-list

Key Summary
KAFKA-1620  Make kafka api protocol implementation public
https://issues.apache.org/jira/browse/KAFKA-1620
KAFKA-1619  perf dir can be removed
https://issues.apache.org/jira/browse/KAFKA-1619
KAFKA-1616  Purgatory Size and Num.Delayed.Request metrics are incorrect
https://issues.apache.org/jira/browse/KAFKA-1616
KAFKA-1614  Partition log directory name and segments information exposed via 
JMX
https://issues.apache.org/jira/browse/KAFKA-1614
KAFKA-1611  Improve system test configuration
https://issues.apache.org/jira/browse/KAFKA-1611
KAFKA-1610  Local modifications to collections generated from mapValues will be 
lost
https://issues.apache.org/jira/browse/KAFKA-1610
KAFKA-1604  System Test for Transaction Management
https://issues.apache.org/jira/browse/KAFKA-1604
KAFKA-1601  ConsoleConsumer/SimpleConsumerPerformance should be 
transaction-aware
https://issues.apache.org/jira/browse/KAFKA-1601
KAFKA-1600  Controller failover not working correctly.
https://issues.apache.org/jira/browse/KAFKA-1600
KAFKA-1597  New metrics: ResponseQueueSize and BeingSentResponses
https://issues.apache.org/jira/browse/KAFKA-1597
KAFKA-1596  Exception in KafkaScheduler while shutting down
https://issues.apache.org/jira/browse/KAFKA-1596
KAFKA-1591  Clean-up Unnecessary stack trace in error/warn logs
https://issues.apache.org/jira/browse/KAFKA-1591
KAFKA-1586  support sticky partitioning in the new producer
https://issues.apache.org/jira/browse/KAFKA-1586
KAFKA-1585  Client: Infinite conflict in /consumers/
https://issues.apache.org/jira/browse/KAFKA-1585
KAFKA-1583  Kafka API Refactoring
https://issues.apache.org/jira/browse/KAFKA-1583
KAFKA-1569  Tool for performance and correctness of transactions end-to-end
https://issues.apache.org/jira/browse/KAFKA-1569
KAFKA-1561  Data Loss for Incremented Replica Factor and Leader Election
https://issues.apache.org/jira/browse/KAFKA-1561
KAFKA-1543  Changing replication factor
https://issues.apache.org/jira/browse/KAFKA-1543
KAFKA-1541   Add transactional request definitions to clients package
https://issues.apache.org/jira/browse/KAFKA-1541
KAFKA-1528  Normalize all the line endings
https://issues.apache.org/jira/browse/KAFKA-1528
KAFKA-1527  SimpleConsumer should be transaction-aware
https://issues.apache.org/jira/browse/KAFKA-1527
KAFKA-1526  Producer performance tool should have an option to enable 
transactions
https://issues.apache.org/jira/browse/KAFKA-1526
KAFKA-1525  DumpLogSegments should print transaction IDs
https://issues.apache.org/jira/browse/KAFKA-1525
KAFKA-1524  Implement transactional producer
https://issues.apache.org/jira/browse/KAFKA-1524
KAFKA-1523  Implement transaction manager module
https://issues.apache.org/jira/browse/KAFKA-1523
KAFKA-1517  Messages is a required argument to Producer Performance Test
https://issues.apache.org/jira/browse/KAFKA-1517
KAFKA-1510  Force offset commits when migrating consumer offsets from zookeeper 
to kafka
https://issues.apache.org/jira/browse/KAFKA-1510
KAFKA-1509  Restart of destination broker after unreplicated partition move 
leaves partitions without leader
https://issues.apache.org/jira/browse/KAFKA-1509
KAFKA-1507  Using GetOffsetShell against non-existent topic creates the topic 
unintentionally
https://issues.apache.org/jira/browse/KAFKA-1507
KAFKA-1500  adding new consumer requests using the new protocol
https://issues.apache.org/jira/browse/KAFKA-1500
KAFKA-1499  Broker-side compression configuration
https://issues.apache.org/jira/browse/KAFKA-1499
KAFKA-1496  Using batch message in sync producer only sends the first message 
if we use a Scala Stream as the argument 
https://issues.apache.org/jira/browse/KAFKA-1496
KAFKA-1481  Stop using dashes AND underscores as separators in MBean names
https://issues.apache.org/jira/browse/KAFKA-1481
KAFKA-1477  add authentication layer and initial JKS x509 implementation for 
brokers, producers and consumer for network communication
https://issues.apache.org/jira/browse/KAFKA-1477
KAFKA-1476  Get a list of consumer groups
https://issues.apache.org/jira/browse/KAFKA-1476
KAFKA-1475  Kafka consumer stops LeaderFinder/FetcherThreads, but application 
does not know
https://issues.apache.org/jira/browse/KAFKA-1475
KAFKA-1471  Add Producer Unit Tests for LZ4 and LZ4HC compression
https://issues.apache.org/jira/browse/KAFKA-1471
KAFKA-1468  Improve perf tests

[jira] [Commented] (KAFKA-328) Write unit test for kafka server startup and shutdown API

2014-09-04 Thread BalajiSeshadri (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121932#comment-14121932
 ] 

BalajiSeshadri commented on KAFKA-328:
--

[~nehanarkhede] can u please suggest on my previous comment.

 Write unit test for kafka server startup and shutdown API 
 --

 Key: KAFKA-328
 URL: https://issues.apache.org/jira/browse/KAFKA-328
 Project: Kafka
  Issue Type: Bug
Reporter: Neha Narkhede
Assignee: BalajiSeshadri
  Labels: newbie

 Background discussion in KAFKA-320
 People often try to embed KafkaServer in an application that ends up calling 
 startup() and shutdown() repeatedly and sometimes in odd ways. To ensure this 
 works correctly we have to be very careful about cleaning up resources. This 
 is a good practice for making unit tests reliable anyway.
 A good first step would be to add some unit tests on startup and shutdown to 
 cover various cases:
 1. A Kafka server can startup if it is not already starting up, if it is not 
 currently being shutdown, or if it hasn't been already started
 2. A Kafka server can shutdown if it is not already shutting down, if it is 
 not currently starting up, or if it hasn't been already shutdown. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker

2014-09-04 Thread BalajiSeshadri (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BalajiSeshadri reassigned KAFKA-1618:
-

Assignee: BalajiSeshadri  (was: Gwen Shapira)

 Exception thrown when running console producer with no port number for the 
 broker
 -

 Key: KAFKA-1618
 URL: https://issues.apache.org/jira/browse/KAFKA-1618
 Project: Kafka
  Issue Type: Improvement
Reporter: Gwen Shapira
Assignee: BalajiSeshadri

 When running console producer with just localhost as the broker list, I get 
 ArrayIndexOutOfBounds exception.
 I expect either a clearer error about arguments or for the producer to 
 guess a default port.
 [root@shapira-1 bin]# ./kafka-console-producer.sh  --topic rufus1 
 --broker-list localhost
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97)
   at 
 kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32)
   at 
 kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41)
   at kafka.producer.Producer.init(Producer.scala:59)
   at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158)
   at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1616) Purgatory Size and Num.Delayed.Request metrics are incorrect

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121935#comment-14121935
 ] 

Guozhang Wang commented on KAFKA-1616:
--

Updated reviewboard https://reviews.apache.org/r/25155/diff/
 against branch origin/trunk

 Purgatory Size and Num.Delayed.Request metrics are incorrect
 

 Key: KAFKA-1616
 URL: https://issues.apache.org/jira/browse/KAFKA-1616
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1616.patch, KAFKA-1616_2014-08-28_10:12:17.patch, 
 KAFKA-1616_2014-09-01_14:41:56.patch, KAFKA-1616_2014-09-02_12:58:07.patch, 
 KAFKA-1616_2014-09-02_13:23:13.patch, KAFKA-1616_2014-09-03_12:53:09.patch, 
 KAFKA-1616_2014-09-04_13:26:02.patch


 The request purgatory used two atomic integers watched and unsatisfied to 
 record the purgatory size ( = watched + unsatisfied) and number of delayed 
 requests ( = unsatisfied). But due to some race conditions these two atomic 
 integers are not updated correctly, result in incorrect metrics.
 Proposed solution: to have a cleaner semantics, we can define the purgatory 
 size to be just the number of elements in the watched lists, and the number 
 of delayed requests to be just the length of the expiry queue. And instead 
 of using two atomic integeres we just compute the size of the lists / queue 
 on the fly each time the metrics are pulled. This may use some more CPU 
 cycles for these two metrics but should be minor, and the correctness is 
 guaranteed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25155: Fix KAFKA-1616

2014-09-04 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25155/
---

(Updated Sept. 4, 2014, 8:26 p.m.)


Review request for kafka.


Bugs: KAFKA-1616
https://issues.apache.org/jira/browse/KAFKA-1616


Repository: kafka


Description (updated)
---

Purgatory size to be the sum of watched list sizes; delayed request to be the 
expiry queue length; remove atomic integers for metrics; add a unit test for 
watched list sizes and enqueued requests


Diffs (updated)
-

  core/src/main/scala/kafka/server/RequestPurgatory.scala 
ce06d2c381348deef8559374869fcaed923da1d1 
  core/src/test/scala/unit/kafka/server/RequestPurgatoryTest.scala 
168712de241125982d556c188c76514fceb93779 

Diff: https://reviews.apache.org/r/25155/diff/


Testing
---


Thanks,

Guozhang Wang



[jira] [Updated] (KAFKA-1616) Purgatory Size and Num.Delayed.Request metrics are incorrect

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1616:
-
Attachment: KAFKA-1616_2014-09-04_13:26:02.patch

 Purgatory Size and Num.Delayed.Request metrics are incorrect
 

 Key: KAFKA-1616
 URL: https://issues.apache.org/jira/browse/KAFKA-1616
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1616.patch, KAFKA-1616_2014-08-28_10:12:17.patch, 
 KAFKA-1616_2014-09-01_14:41:56.patch, KAFKA-1616_2014-09-02_12:58:07.patch, 
 KAFKA-1616_2014-09-02_13:23:13.patch, KAFKA-1616_2014-09-03_12:53:09.patch, 
 KAFKA-1616_2014-09-04_13:26:02.patch


 The request purgatory used two atomic integers watched and unsatisfied to 
 record the purgatory size ( = watched + unsatisfied) and number of delayed 
 requests ( = unsatisfied). But due to some race conditions these two atomic 
 integers are not updated correctly, result in incorrect metrics.
 Proposed solution: to have a cleaner semantics, we can define the purgatory 
 size to be just the number of elements in the watched lists, and the number 
 of delayed requests to be just the length of the expiry queue. And instead 
 of using two atomic integeres we just compute the size of the lists / queue 
 on the fly each time the metrics are pulled. This may use some more CPU 
 cycles for these two metrics but should be minor, and the correctness is 
 guaranteed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1533) transient unit test failure in ProducerFailureHandlingTest

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122008#comment-14122008
 ] 

Guozhang Wang commented on KAFKA-1533:
--

Hi [~junrao] do you have time to fix this minor thing before 0.8.2 release?

 transient unit test failure in ProducerFailureHandlingTest
 --

 Key: KAFKA-1533
 URL: https://issues.apache.org/jira/browse/KAFKA-1533
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2
Reporter: Jun Rao
Assignee: Guozhang Wang
 Fix For: 0.8.2

 Attachments: KAFKA-1533.patch, KAFKA-1533.patch, KAFKA-1533.patch, 
 KAFKA-1533_2014-07-21_15:45:58.patch, kafka.threads, stack.out


 Occasionally, saw the test hang on tear down. The following is the stack 
 trace.
 Test worker prio=5 tid=7f9246956000 nid=0x10e078000 in Object.wait() 
 [10e075000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 7f4e69578 (a org.apache.zookeeper.ClientCnxn$Packet)
 at java.lang.Object.wait(Object.java:485)
 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1344)
 - locked 7f4e69578 (a org.apache.zookeeper.ClientCnxn$Packet)
 at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:732)
 at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:91)
 at org.I0Itec.zkclient.ZkClient$8.call(ZkClient.java:720)
 at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
 at org.I0Itec.zkclient.ZkClient.delete(ZkClient.java:716)
 at kafka.utils.ZkUtils$.deletePath(ZkUtils.scala:416)
 at kafka.utils.ZkUtils$.deregisterBrokerInZk(ZkUtils.scala:184)
 at kafka.server.KafkaHealthcheck.shutdown(KafkaHealthcheck.scala:50)
 at 
 kafka.server.KafkaServer$$anonfun$shutdown$2.apply$mcV$sp(KafkaServer.scala:243)
 at kafka.utils.Utils$.swallow(Utils.scala:172)
 at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
 at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
 at kafka.utils.Logging$class.swallow(Logging.scala:94)
 at kafka.utils.Utils$.swallow(Utils.scala:45)
 at kafka.server.KafkaServer.shutdown(KafkaServer.scala:243)
 at 
 kafka.api.ProducerFailureHandlingTest.tearDown(ProducerFailureHandlingTest.scala:90)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1352) Reduce logging on the server

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122009#comment-14122009
 ] 

Guozhang Wang commented on KAFKA-1352:
--

We will be fixing this issue all together with 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Error+Handling+and+Logging
 in 0.9.

 Reduce logging on the server
 

 Key: KAFKA-1352
 URL: https://issues.apache.org/jira/browse/KAFKA-1352
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0, 0.8.1
Reporter: Neha Narkhede
Assignee: Ivan Lyutov
  Labels: newbie, usability
 Fix For: 0.8.2

 Attachments: KAFKA-1352.patch, KAFKA-1352.patch, 
 KAFKA-1352_2014-04-04_21:20:31.patch


 We have excessive logging in the server, making the logs unreadable and also 
 affecting the performance of the server in practice. We need to clean the 
 logs to address these issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1352) Reduce logging on the server

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1352:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 Reduce logging on the server
 

 Key: KAFKA-1352
 URL: https://issues.apache.org/jira/browse/KAFKA-1352
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0, 0.8.1
Reporter: Neha Narkhede
Assignee: Ivan Lyutov
  Labels: newbie, usability
 Fix For: 0.9.0

 Attachments: KAFKA-1352.patch, KAFKA-1352.patch, 
 KAFKA-1352_2014-04-04_21:20:31.patch


 We have excessive logging in the server, making the logs unreadable and also 
 affecting the performance of the server in practice. We need to clean the 
 logs to address these issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1377) transient unit test failure in LogOffsetTest

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122012#comment-14122012
 ] 

Guozhang Wang commented on KAFKA-1377:
--

Pushing to 0.9 as for now, [~omkreddy] is it still a consistently reproducible 
issue on your side?

 transient unit test failure in LogOffsetTest
 

 Key: KAFKA-1377
 URL: https://issues.apache.org/jira/browse/KAFKA-1377
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Jun Rao
 Fix For: 0.8.2

 Attachments: KAFKA-1377.patch, KAFKA-1377_2014-04-11_17:42:13.patch, 
 KAFKA-1377_2014-04-11_18:14:45.patch


 Saw the following transient unit test failure.
 kafka.server.LogOffsetTest  testGetOffsetsBeforeEarliestTime FAILED
 junit.framework.AssertionFailedError: expected:List(0) but 
 was:Vector()
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.failNotEquals(Assert.java:277)
 at junit.framework.Assert.assertEquals(Assert.java:64)
 at junit.framework.Assert.assertEquals(Assert.java:71)
 at 
 kafka.server.LogOffsetTest.testGetOffsetsBeforeEarliestTime(LogOffsetTest.scala:198)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1377) transient unit test failure in LogOffsetTest

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1377:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 transient unit test failure in LogOffsetTest
 

 Key: KAFKA-1377
 URL: https://issues.apache.org/jira/browse/KAFKA-1377
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Jun Rao
  Labels: newbie
 Fix For: 0.9.0

 Attachments: KAFKA-1377.patch, KAFKA-1377_2014-04-11_17:42:13.patch, 
 KAFKA-1377_2014-04-11_18:14:45.patch


 Saw the following transient unit test failure.
 kafka.server.LogOffsetTest  testGetOffsetsBeforeEarliestTime FAILED
 junit.framework.AssertionFailedError: expected:List(0) but 
 was:Vector()
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.failNotEquals(Assert.java:277)
 at junit.framework.Assert.assertEquals(Assert.java:64)
 at junit.framework.Assert.assertEquals(Assert.java:71)
 at 
 kafka.server.LogOffsetTest.testGetOffsetsBeforeEarliestTime(LogOffsetTest.scala:198)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1377) transient unit test failure in LogOffsetTest

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1377:
-
Labels: newbie  (was: )

 transient unit test failure in LogOffsetTest
 

 Key: KAFKA-1377
 URL: https://issues.apache.org/jira/browse/KAFKA-1377
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Jun Rao
  Labels: newbie
 Fix For: 0.9.0

 Attachments: KAFKA-1377.patch, KAFKA-1377_2014-04-11_17:42:13.patch, 
 KAFKA-1377_2014-04-11_18:14:45.patch


 Saw the following transient unit test failure.
 kafka.server.LogOffsetTest  testGetOffsetsBeforeEarliestTime FAILED
 junit.framework.AssertionFailedError: expected:List(0) but 
 was:Vector()
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.failNotEquals(Assert.java:277)
 at junit.framework.Assert.assertEquals(Assert.java:64)
 at junit.framework.Assert.assertEquals(Assert.java:71)
 at 
 kafka.server.LogOffsetTest.testGetOffsetsBeforeEarliestTime(LogOffsetTest.scala:198)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker

2014-09-04 Thread BalajiSeshadri (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BalajiSeshadri updated KAFKA-1618:
--
Comment: was deleted

(was: Please review  patch that defaults to 9091 when no port is specified.)

 Exception thrown when running console producer with no port number for the 
 broker
 -

 Key: KAFKA-1618
 URL: https://issues.apache.org/jira/browse/KAFKA-1618
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.1.1
Reporter: Gwen Shapira
Assignee: BalajiSeshadri
  Labels: newbie
 Fix For: 0.8.2


 When running console producer with just localhost as the broker list, I get 
 ArrayIndexOutOfBounds exception.
 I expect either a clearer error about arguments or for the producer to 
 guess a default port.
 [root@shapira-1 bin]# ./kafka-console-producer.sh  --topic rufus1 
 --broker-list localhost
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97)
   at 
 kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32)
   at 
 kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41)
   at kafka.producer.Producer.init(Producer.scala:59)
   at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158)
   at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker

2014-09-04 Thread BalajiSeshadri (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BalajiSeshadri updated KAFKA-1618:
--
Status: Open  (was: Patch Available)

 Exception thrown when running console producer with no port number for the 
 broker
 -

 Key: KAFKA-1618
 URL: https://issues.apache.org/jira/browse/KAFKA-1618
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.1.1
Reporter: Gwen Shapira
Assignee: BalajiSeshadri
  Labels: newbie
 Fix For: 0.8.2


 When running console producer with just localhost as the broker list, I get 
 ArrayIndexOutOfBounds exception.
 I expect either a clearer error about arguments or for the producer to 
 guess a default port.
 [root@shapira-1 bin]# ./kafka-console-producer.sh  --topic rufus1 
 --broker-list localhost
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97)
   at 
 kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32)
   at 
 kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41)
   at kafka.producer.Producer.init(Producer.scala:59)
   at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158)
   at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker

2014-09-04 Thread BalajiSeshadri (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BalajiSeshadri updated KAFKA-1618:
--
Fix Version/s: 0.8.2
 Reviewer: Joe Stein
   Labels: newbie  (was: )
Affects Version/s: 0.8.1.1
   Status: Patch Available  (was: Open)

Please review  patch that defaults to 9091 when no port is specified.

 Exception thrown when running console producer with no port number for the 
 broker
 -

 Key: KAFKA-1618
 URL: https://issues.apache.org/jira/browse/KAFKA-1618
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.1.1
Reporter: Gwen Shapira
Assignee: BalajiSeshadri
  Labels: newbie
 Fix For: 0.8.2


 When running console producer with just localhost as the broker list, I get 
 ArrayIndexOutOfBounds exception.
 I expect either a clearer error about arguments or for the producer to 
 guess a default port.
 [root@shapira-1 bin]# ./kafka-console-producer.sh  --topic rufus1 
 --broker-list localhost
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97)
   at 
 kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32)
   at 
 kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41)
   at kafka.producer.Producer.init(Producer.scala:59)
   at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158)
   at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker

2014-09-04 Thread BalajiSeshadri (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BalajiSeshadri updated KAFKA-1618:
--
Reviewer: Joe Stein  (was: Joe Stein)
  Status: Patch Available  (was: Open)

 Exception thrown when running console producer with no port number for the 
 broker
 -

 Key: KAFKA-1618
 URL: https://issues.apache.org/jira/browse/KAFKA-1618
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.1.1
Reporter: Gwen Shapira
Assignee: BalajiSeshadri
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1618.patch


 When running console producer with just localhost as the broker list, I get 
 ArrayIndexOutOfBounds exception.
 I expect either a clearer error about arguments or for the producer to 
 guess a default port.
 [root@shapira-1 bin]# ./kafka-console-producer.sh  --topic rufus1 
 --broker-list localhost
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97)
   at 
 kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32)
   at 
 kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41)
   at kafka.producer.Producer.init(Producer.scala:59)
   at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158)
   at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker

2014-09-04 Thread BalajiSeshadri (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BalajiSeshadri updated KAFKA-1618:
--
Attachment: KAFKA-1618.patch

 Exception thrown when running console producer with no port number for the 
 broker
 -

 Key: KAFKA-1618
 URL: https://issues.apache.org/jira/browse/KAFKA-1618
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.1.1
Reporter: Gwen Shapira
Assignee: BalajiSeshadri
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1618.patch


 When running console producer with just localhost as the broker list, I get 
 ArrayIndexOutOfBounds exception.
 I expect either a clearer error about arguments or for the producer to 
 guess a default port.
 [root@shapira-1 bin]# ./kafka-console-producer.sh  --topic rufus1 
 --broker-list localhost
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97)
   at 
 kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32)
   at 
 kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41)
   at kafka.producer.Producer.init(Producer.scala:59)
   at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158)
   at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-686) 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122018#comment-14122018
 ] 

Guozhang Wang commented on KAFKA-686:
-

This is a better-to-have in 0.8.2, if [~phargett] and [~viktortnk] do not have 
time now we can push to 0.9.

 0.8 Kafka broker should give a better error message when running against 0.7 
 zookeeper
 --

 Key: KAFKA-686
 URL: https://issues.apache.org/jira/browse/KAFKA-686
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Jay Kreps
Priority: Blocker
  Labels: newbie, patch
 Fix For: 0.8.2

 Attachments: KAFAK-686-null-pointer-fix.patch, 
 KAFKA-686-null-pointer-fix-2.patch


 People will not know that the zookeeper paths are not compatible. When you 
 try to start the 0.8 broker pointed at a 0.7 zookeeper you get a 
 NullPointerException. We should detect this and give a more sane error.
 Error:
 kafka.common.KafkaException: Can't parse json string: null
 at kafka.utils.Json$.liftedTree1$1(Json.scala:20)
 at kafka.utils.Json$.parseFull(Json.scala:16)
 at 
 kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:498)
 at 
 kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:494)
 at 
 scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
 at scala.collection.immutable.List.foreach(List.scala:45)
 at 
 kafka.utils.ZkUtils$.getReplicaAssignmentForTopics(ZkUtils.scala:494)
 at 
 kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:446)
 at 
 kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:220)
 at 
 kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:85)
 at 
 kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53)
 at 
 kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43)
 at kafka.controller.KafkaController.startup(KafkaController.scala:381)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:90)
 at 
 kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
 at kafka.Kafka$.main(Kafka.scala:46)
 at kafka.Kafka.main(Kafka.scala)
 Caused by: java.lang.NullPointerException
 at 
 scala.util.parsing.combinator.lexical.Scanners$Scanner.init(Scanners.scala:52)
 at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71)
 at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85)
 at kafka.utils.Json$.liftedTree1$1(Json.scala:17)
 ... 16 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1618) Exception thrown when running console producer with no port number for the broker

2014-09-04 Thread BalajiSeshadri (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BalajiSeshadri updated KAFKA-1618:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Exception thrown when running console producer with no port number for the 
 broker
 -

 Key: KAFKA-1618
 URL: https://issues.apache.org/jira/browse/KAFKA-1618
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.1.1
Reporter: Gwen Shapira
Assignee: BalajiSeshadri
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1618.patch


 When running console producer with just localhost as the broker list, I get 
 ArrayIndexOutOfBounds exception.
 I expect either a clearer error about arguments or for the producer to 
 guess a default port.
 [root@shapira-1 bin]# ./kafka-console-producer.sh  --topic rufus1 
 --broker-list localhost
 java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
   at 
 kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:97)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:97)
   at 
 kafka.producer.BrokerPartitionInfo.init(BrokerPartitionInfo.scala:32)
   at 
 kafka.producer.async.DefaultEventHandler.init(DefaultEventHandler.scala:41)
   at kafka.producer.Producer.init(Producer.scala:59)
   at kafka.producer.ConsoleProducer$.main(ConsoleProducer.scala:158)
   at kafka.producer.ConsoleProducer.main(ConsoleProducer.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1190) create a draw performance graph script

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122020#comment-14122020
 ] 

Guozhang Wang commented on KAFKA-1190:
--

Moving out of 0.8.2 (again...)

 create a draw performance graph script
 --

 Key: KAFKA-1190
 URL: https://issues.apache.org/jira/browse/KAFKA-1190
 Project: Kafka
  Issue Type: Improvement
Reporter: Joe Stein
 Fix For: 0.9.0

 Attachments: KAFKA-1190.patch


 This will be an R script to draw relevant graphs given a bunch of csv files 
 from the above tools. The output of this script will be a bunch of png files 
 that can be combined with some html to act as a perf report.
 Here are the graphs that would be good to see:
 * Latency histogram for producer
 * MB/sec and messages/sec produced
 * MB/sec and messages/sec consumed
 * Flush time
 * Errors (should not be any)
 * Consumer cache hit ratio (both the bytes and count, specifically 1 
   #physical_reads / #requests and 1 - physical_bytes_read / bytes_read)
 * Write merge ratio (num_physical_writes/num_produce_requests and 
 avg_request_size/avg_physical_write_size)
 CPU, network, io, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1190) create a draw performance graph script

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1190:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 create a draw performance graph script
 --

 Key: KAFKA-1190
 URL: https://issues.apache.org/jira/browse/KAFKA-1190
 Project: Kafka
  Issue Type: Improvement
Reporter: Joe Stein
 Fix For: 0.9.0

 Attachments: KAFKA-1190.patch


 This will be an R script to draw relevant graphs given a bunch of csv files 
 from the above tools. The output of this script will be a bunch of png files 
 that can be combined with some html to act as a perf report.
 Here are the graphs that would be good to see:
 * Latency histogram for producer
 * MB/sec and messages/sec produced
 * MB/sec and messages/sec consumed
 * Flush time
 * Errors (should not be any)
 * Consumer cache hit ratio (both the bytes and count, specifically 1 
   #physical_reads / #requests and 1 - physical_bytes_read / bytes_read)
 * Write merge ratio (num_physical_writes/num_produce_requests and 
 avg_request_size/avg_physical_write_size)
 CPU, network, io, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1420) Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with TestUtils.createTopic in unit tests

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122022#comment-14122022
 ] 

Guozhang Wang commented on KAFKA-1420:
--

Can some committer take another look and commit this one? It looks good to me 
now.

 Replace AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK with 
 TestUtils.createTopic in unit tests
 --

 Key: KAFKA-1420
 URL: https://issues.apache.org/jira/browse/KAFKA-1420
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Jun Rao
  Labels: newbie
 Fix For: 0.8.2

 Attachments: KAFKA-1420.patch, KAFKA-1420_2014-07-30_11:18:26.patch, 
 KAFKA-1420_2014-07-30_11:24:55.patch, KAFKA-1420_2014-08-02_11:04:15.patch, 
 KAFKA-1420_2014-08-10_14:12:05.patch, KAFKA-1420_2014-08-10_23:03:46.patch


 This is a follow-up JIRA from KAFKA-1389.
 There are a bunch of places in the unit tests where we misuse 
 AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK to create topics, 
 where TestUtils.createTopic needs to be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1300) Added WaitForReplaction admin tool.

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1300:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 Added WaitForReplaction admin tool.
 ---

 Key: KAFKA-1300
 URL: https://issues.apache.org/jira/browse/KAFKA-1300
 Project: Kafka
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.8.0
 Environment: Ubuntu 12.04
Reporter: Brenden Matthews
  Labels: patch
 Fix For: 0.9.0

 Attachments: 0001-Added-WaitForReplaction-admin-tool.patch


 I have created a tool similar to the broker shutdown tool for doing rolling 
 restarts of Kafka clusters.
 The tool watches the max replica lag of the specified broker, and waits until 
 the lag drops to 0 before exiting.
 To do a rolling restart, here's the process we use:
 for (broker - brokers) {
   run shutdown tool for broker
   terminate broker
   start new broker
   run wait for replication tool on new broker
 }
 Here's an example command line use:
 ./kafka-run-class.sh kafka.admin.WaitForReplication --zookeeper 
 zk.host.com:2181 --num.retries 100 --retry.interval.ms 6 --broker 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1300) Added WaitForReplaction admin tool.

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122023#comment-14122023
 ] 

Guozhang Wang commented on KAFKA-1300:
--

Moving out of 0.8.2 as for now.

 Added WaitForReplaction admin tool.
 ---

 Key: KAFKA-1300
 URL: https://issues.apache.org/jira/browse/KAFKA-1300
 Project: Kafka
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.8.0
 Environment: Ubuntu 12.04
Reporter: Brenden Matthews
  Labels: patch
 Fix For: 0.9.0

 Attachments: 0001-Added-WaitForReplaction-admin-tool.patch


 I have created a tool similar to the broker shutdown tool for doing rolling 
 restarts of Kafka clusters.
 The tool watches the max replica lag of the specified broker, and waits until 
 the lag drops to 0 before exiting.
 To do a rolling restart, here's the process we use:
 for (broker - brokers) {
   run shutdown tool for broker
   terminate broker
   start new broker
   run wait for replication tool on new broker
 }
 Here's an example command line use:
 ./kafka-run-class.sh kafka.admin.WaitForReplication --zookeeper 
 zk.host.com:2181 --num.retries 100 --retry.interval.ms 6 --broker 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1215) Rack-Aware replica assignment option

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122028#comment-14122028
 ] 

Guozhang Wang commented on KAFKA-1215:
--

Moving out of 0.8.2 for now..

 Rack-Aware replica assignment option
 

 Key: KAFKA-1215
 URL: https://issues.apache.org/jira/browse/KAFKA-1215
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0
Reporter: Joris Van Remoortere
Assignee: Jun Rao
 Fix For: 0.9.0

 Attachments: rack_aware_replica_assignment_v1.patch, 
 rack_aware_replica_assignment_v2.patch


 Adding a rack-id to kafka config. This rack-id can be used during replica 
 assignment by using the max-rack-replication argument in the admin scripts 
 (create topic, etc.). By default the original replication assignment 
 algorithm is used because max-rack-replication defaults to -1. 
 max-rack-replication  -1 is not honored if you are doing manual replica 
 assignment (preffered).
 If this looks good I can add some test cases specific to the rack-aware 
 assignment.
 I can also port this to trunk. We are currently running 0.8.0 in production 
 and need this, so i wrote the patch against that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122025#comment-14122025
 ] 

Guozhang Wang commented on KAFKA-1481:
--

[~junrao] do you think we can check in this ticket in time for 0.8.2?

 Stop using dashes AND underscores as separators in MBean names
 --

 Key: KAFKA-1481
 URL: https://issues.apache.org/jira/browse/KAFKA-1481
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Otis Gospodnetic
  Labels: patch
 Fix For: 0.8.2

 Attachments: KAFKA-1481_2014-06-06_13-06-35.patch


 MBeans should not use dashes or underscores as separators because these 
 characters are allowed in hostnames, topics, group and consumer IDs, etc., 
 and these are embedded in MBeans names making it impossible to parse out 
 individual bits from MBeans.
 Perhaps a pipe character should be used to avoid the conflict. 
 This looks like a major blocker because it means nobody can write Kafka 0.8.x 
 monitoring tools unless they are doing it for themselves AND do not use 
 dashes AND do not use underscores.
 See: http://search-hadoop.com/m/4TaT4lonIW



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1215) Rack-Aware replica assignment option

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1215:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 Rack-Aware replica assignment option
 

 Key: KAFKA-1215
 URL: https://issues.apache.org/jira/browse/KAFKA-1215
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0
Reporter: Joris Van Remoortere
Assignee: Jun Rao
 Fix For: 0.9.0

 Attachments: rack_aware_replica_assignment_v1.patch, 
 rack_aware_replica_assignment_v2.patch


 Adding a rack-id to kafka config. This rack-id can be used during replica 
 assignment by using the max-rack-replication argument in the admin scripts 
 (create topic, etc.). By default the original replication assignment 
 algorithm is used because max-rack-replication defaults to -1. 
 max-rack-replication  -1 is not honored if you are doing manual replica 
 assignment (preffered).
 If this looks good I can add some test cases specific to the rack-aware 
 assignment.
 I can also port this to trunk. We are currently running 0.8.0 in production 
 and need this, so i wrote the patch against that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-404) When using chroot path, create chroot on startup if it doesn't exist

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122030#comment-14122030
 ] 

Guozhang Wang commented on KAFKA-404:
-

[~marek.dolgos] do you have some time to finish this before 0.8.2 release?

 When using chroot path, create chroot on startup if it doesn't exist
 

 Key: KAFKA-404
 URL: https://issues.apache.org/jira/browse/KAFKA-404
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.1
 Environment: CentOS 5.5, Linux 2.6.18-194.32.1.el5 x86_64 GNU/Linux
Reporter: Jonathan Creasy
  Labels: newbie, patch
 Fix For: 0.8.2

 Attachments: KAFKA-404-0.7.1.patch, KAFKA-404-0.8.patch, 
 KAFKA-404-auto-create-zookeeper-chroot-on-start-up-i.patch, 
 KAFKA-404-auto-create-zookeeper-chroot-on-start-up-v2.patch, 
 KAFKA-404-auto-create-zookeeper-chroot-on-start-up-v3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1305) Controller can hang on controlled shutdown with auto leader balance enabled

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122033#comment-14122033
 ] 

Guozhang Wang commented on KAFKA-1305:
--

Moving out of  0.8.2 for now.

 Controller can hang on controlled shutdown with auto leader balance enabled
 ---

 Key: KAFKA-1305
 URL: https://issues.apache.org/jira/browse/KAFKA-1305
 Project: Kafka
  Issue Type: Bug
Reporter: Joel Koshy
Assignee: Neha Narkhede
Priority: Blocker
 Fix For: 0.8.2


 This is relatively easy to reproduce especially when doing a rolling bounce.
 What happened here is as follows:
 1. The previous controller was bounced and broker 265 became the new 
 controller.
 2. I went on to do a controlled shutdown of broker 265 (the new controller).
 3. In the mean time the automatically scheduled preferred replica leader 
 election process started doing its thing and starts sending 
 LeaderAndIsrRequests/UpdateMetadataRequests to itself (and other brokers).  
 (t@113 below).
 4. While that's happening, the controlled shutdown process on 265 succeeds 
 and proceeds to deregister itself from ZooKeeper and shuts down the socket 
 server.
 5. (ReplicaStateMachine actually removes deregistered brokers from the 
 controller channel manager's list of brokers to send requests to.  However, 
 that removal cannot take place (t@18 below) because preferred replica leader 
 election task owns the controller lock.)
 6. So the request thread to broker 265 gets into infinite retries.
 7. The entire broker shutdown process is blocked on controller shutdown for 
 the same reason (it needs to acquire the controller lock).
 Relevant portions from the thread-dump:
 Controller-265-to-broker-265-send-thread - Thread t@113
java.lang.Thread.State: TIMED_WAITING
   at java.lang.Thread.sleep(Native Method)
   at 
 kafka.controller.RequestSendThread$$anonfun$liftedTree1$1$1.apply$mcV$sp(ControllerChannelManager.scala:143)
   at kafka.utils.Utils$.swallow(Utils.scala:167)
   at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
   at kafka.utils.Utils$.swallowWarn(Utils.scala:46)
   at kafka.utils.Logging$class.swallow(Logging.scala:94)
   at kafka.utils.Utils$.swallow(Utils.scala:46)
   at 
 kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:143)
   at 
 kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
   - locked java.lang.Object@6dbf14a7
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
Locked ownable synchronizers:
   - None
 ...
 Thread-4 - Thread t@17
java.lang.Thread.State: WAITING on 
 java.util.concurrent.locks.ReentrantLock$NonfairSync@4836840 owned by: 
 kafka-scheduler-0
   at sun.misc.Unsafe.park(Native Method)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
   at 
 java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
   at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
   at kafka.utils.Utils$.inLock(Utils.scala:536)
   at kafka.controller.KafkaController.shutdown(KafkaController.scala:642)
   at 
 kafka.server.KafkaServer$$anonfun$shutdown$9.apply$mcV$sp(KafkaServer.scala:242)
   at kafka.utils.Utils$.swallow(Utils.scala:167)
   at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
   at kafka.utils.Utils$.swallowWarn(Utils.scala:46)
   at kafka.utils.Logging$class.swallow(Logging.scala:94)
   at kafka.utils.Utils$.swallow(Utils.scala:46)
   at kafka.server.KafkaServer.shutdown(KafkaServer.scala:242)
   at 
 kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:46)
   at kafka.Kafka$$anon$1.run(Kafka.scala:42)
 ...
 kafka-scheduler-0 - Thread t@117
java.lang.Thread.State: WAITING on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1dc407fc
   at sun.misc.Unsafe.park(Native Method)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:306)
   at 
 kafka.controller.ControllerChannelManager.sendRequest(ControllerChannelManager.scala:57)
   - locked java.lang.Object@578b748f
  

[jira] [Updated] (KAFKA-1305) Controller can hang on controlled shutdown with auto leader balance enabled

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1305:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 Controller can hang on controlled shutdown with auto leader balance enabled
 ---

 Key: KAFKA-1305
 URL: https://issues.apache.org/jira/browse/KAFKA-1305
 Project: Kafka
  Issue Type: Bug
Reporter: Joel Koshy
Assignee: Neha Narkhede
Priority: Blocker
 Fix For: 0.9.0


 This is relatively easy to reproduce especially when doing a rolling bounce.
 What happened here is as follows:
 1. The previous controller was bounced and broker 265 became the new 
 controller.
 2. I went on to do a controlled shutdown of broker 265 (the new controller).
 3. In the mean time the automatically scheduled preferred replica leader 
 election process started doing its thing and starts sending 
 LeaderAndIsrRequests/UpdateMetadataRequests to itself (and other brokers).  
 (t@113 below).
 4. While that's happening, the controlled shutdown process on 265 succeeds 
 and proceeds to deregister itself from ZooKeeper and shuts down the socket 
 server.
 5. (ReplicaStateMachine actually removes deregistered brokers from the 
 controller channel manager's list of brokers to send requests to.  However, 
 that removal cannot take place (t@18 below) because preferred replica leader 
 election task owns the controller lock.)
 6. So the request thread to broker 265 gets into infinite retries.
 7. The entire broker shutdown process is blocked on controller shutdown for 
 the same reason (it needs to acquire the controller lock).
 Relevant portions from the thread-dump:
 Controller-265-to-broker-265-send-thread - Thread t@113
java.lang.Thread.State: TIMED_WAITING
   at java.lang.Thread.sleep(Native Method)
   at 
 kafka.controller.RequestSendThread$$anonfun$liftedTree1$1$1.apply$mcV$sp(ControllerChannelManager.scala:143)
   at kafka.utils.Utils$.swallow(Utils.scala:167)
   at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
   at kafka.utils.Utils$.swallowWarn(Utils.scala:46)
   at kafka.utils.Logging$class.swallow(Logging.scala:94)
   at kafka.utils.Utils$.swallow(Utils.scala:46)
   at 
 kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:143)
   at 
 kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131)
   - locked java.lang.Object@6dbf14a7
   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
Locked ownable synchronizers:
   - None
 ...
 Thread-4 - Thread t@17
java.lang.Thread.State: WAITING on 
 java.util.concurrent.locks.ReentrantLock$NonfairSync@4836840 owned by: 
 kafka-scheduler-0
   at sun.misc.Unsafe.park(Native Method)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
   at 
 java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
   at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
   at kafka.utils.Utils$.inLock(Utils.scala:536)
   at kafka.controller.KafkaController.shutdown(KafkaController.scala:642)
   at 
 kafka.server.KafkaServer$$anonfun$shutdown$9.apply$mcV$sp(KafkaServer.scala:242)
   at kafka.utils.Utils$.swallow(Utils.scala:167)
   at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
   at kafka.utils.Utils$.swallowWarn(Utils.scala:46)
   at kafka.utils.Logging$class.swallow(Logging.scala:94)
   at kafka.utils.Utils$.swallow(Utils.scala:46)
   at kafka.server.KafkaServer.shutdown(KafkaServer.scala:242)
   at 
 kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:46)
   at kafka.Kafka$$anon$1.run(Kafka.scala:42)
 ...
 kafka-scheduler-0 - Thread t@117
java.lang.Thread.State: WAITING on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1dc407fc
   at sun.misc.Unsafe.park(Native Method)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:306)
   at 
 kafka.controller.ControllerChannelManager.sendRequest(ControllerChannelManager.scala:57)
   - locked java.lang.Object@578b748f
   at 
 

[jira] [Updated] (KAFKA-1342) Slow controlled shutdowns can result in stale shutdown requests

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1342:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 Slow controlled shutdowns can result in stale shutdown requests
 ---

 Key: KAFKA-1342
 URL: https://issues.apache.org/jira/browse/KAFKA-1342
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Joel Koshy
Priority: Blocker
 Fix For: 0.9.0


 I don't think this is a bug introduced in 0.8.1., but triggered by the fact
 that controlled shutdown seems to have become slower in 0.8.1 (will file a
 separate ticket to investigate that). When doing a rolling bounce, it is
 possible for a bounced broker to stop all its replica fetchers since the
 previous PID's shutdown requests are still being shutdown.
 - 515 is the controller
 - Controlled shutdown initiated for 503
 - Controller starts controlled shutdown for 503
 - The controlled shutdown takes a long time in moving leaders and moving
   follower replicas on 503 to the offline state.
 - So 503's read from the shutdown channel times out and a new channel is
   created. It issues another shutdown request.  This request (since it is a
   new channel) is accepted at the controller's socket server but then waits
   on the broker shutdown lock held by the previous controlled shutdown which
   is still in progress.
 - The above step repeats for the remaining retries (six more requests).
 - 503 hits SocketTimeout exception on reading the response of the last
   shutdown request and proceeds to do an unclean shutdown.
 - The controller's onBrokerFailure call-back fires and moves 503's replicas
   to offline (not too important in this sequence).
 - 503 is brought back up.
 - The controller's onBrokerStartup call-back fires and moves its replicas
   (and partitions) to online state. 503 starts its replica fetchers.
 - Unfortunately, the (phantom) shutdown requests are still being handled and
   the controller sends StopReplica requests to 503.
 - The first shutdown request finally finishes (after 76 minutes in my case!).
 - The remaining shutdown requests also execute and do the same thing (sends
   StopReplica requests for all partitions to
   503).
 - The remaining requests complete quickly because they end up not having to
   touch zookeeper paths - no leaders left on the broker and no need to
   shrink ISR in zookeeper since it has already been done by the first
   shutdown request.
 - So in the end-state 503 is up, but effectively idle due to the previous
   PID's shutdown requests.
 There are some obvious fixes that can be made to controlled shutdown to help
 address the above issue. E.g., we don't really need to move follower
 partitions to Offline. We did that as an optimization so the broker falls
 out of ISR sooner - which is helpful when producers set required.acks to -1.
 However it adds a lot of latency to controlled shutdown. Also, (more
 importantly) we should have a mechanism to abort any stale shutdown process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1493) Use a well-documented LZ4 compression format and remove redundant LZ4HC option

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122037#comment-14122037
 ] 

Guozhang Wang commented on KAFKA-1493:
--

Could we still have that for 0.8.2?

 Use a well-documented LZ4 compression format and remove redundant LZ4HC option
 --

 Key: KAFKA-1493
 URL: https://issues.apache.org/jira/browse/KAFKA-1493
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.8.2
Reporter: James Oliver
Priority: Blocker
 Fix For: 0.8.2






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1490) remove gradlew initial setup output from source distribution

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122041#comment-14122041
 ] 

Guozhang Wang commented on KAFKA-1490:
--

Should this be done before 0.8.2?

 remove gradlew initial setup output from source distribution
 

 Key: KAFKA-1490
 URL: https://issues.apache.org/jira/browse/KAFKA-1490
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
Assignee: Ivan Lyutov
Priority: Blocker
 Fix For: 0.8.2


 Our current source releases contains lots of stuff in the gradle folder we do 
 not need



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-686) 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper

2014-09-04 Thread Viktor Taranenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122042#comment-14122042
 ] 

Viktor Taranenko commented on KAFKA-686:


[~guozhang] I'm happy to put some effort there over this weekend. When do you 
guys want to release 0.8.2?

I see there is no 'Exception.failAsValue' exists in Scala 2.8. I can make it 
compatible if required

 0.8 Kafka broker should give a better error message when running against 0.7 
 zookeeper
 --

 Key: KAFKA-686
 URL: https://issues.apache.org/jira/browse/KAFKA-686
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Jay Kreps
Priority: Blocker
  Labels: newbie, patch
 Fix For: 0.8.2

 Attachments: KAFAK-686-null-pointer-fix.patch, 
 KAFKA-686-null-pointer-fix-2.patch


 People will not know that the zookeeper paths are not compatible. When you 
 try to start the 0.8 broker pointed at a 0.7 zookeeper you get a 
 NullPointerException. We should detect this and give a more sane error.
 Error:
 kafka.common.KafkaException: Can't parse json string: null
 at kafka.utils.Json$.liftedTree1$1(Json.scala:20)
 at kafka.utils.Json$.parseFull(Json.scala:16)
 at 
 kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:498)
 at 
 kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:494)
 at 
 scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
 at scala.collection.immutable.List.foreach(List.scala:45)
 at 
 kafka.utils.ZkUtils$.getReplicaAssignmentForTopics(ZkUtils.scala:494)
 at 
 kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:446)
 at 
 kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:220)
 at 
 kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:85)
 at 
 kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53)
 at 
 kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43)
 at kafka.controller.KafkaController.startup(KafkaController.scala:381)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:90)
 at 
 kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
 at kafka.Kafka$.main(Kafka.scala:46)
 at kafka.Kafka.main(Kafka.scala)
 Caused by: java.lang.NullPointerException
 at 
 scala.util.parsing.combinator.lexical.Scanners$Scanner.init(Scanners.scala:52)
 at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71)
 at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85)
 at kafka.utils.Json$.liftedTree1$1(Json.scala:17)
 ... 16 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-703) A fetch request in Fetch Purgatory can double count the bytes from the same delayed produce request

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-703.
-
Resolution: Fixed

 A fetch request in Fetch Purgatory can double count the bytes from the same 
 delayed produce request
 ---

 Key: KAFKA-703
 URL: https://issues.apache.org/jira/browse/KAFKA-703
 Project: Kafka
  Issue Type: Bug
  Components: purgatory
Affects Versions: 0.8.1
Reporter: Sriram Subramanian
Assignee: Sriram Subramanian
Priority: Blocker
 Fix For: 0.8.2


 When a producer request is handled, the fetch purgatory is checked to ensure 
 any fetch requests are satisfied. When the produce request is satisfied we do 
 the check again and if the same fetch request was still in the fetch 
 purgatory it would end up double counting the bytes received.
 Possible Solutions
 1. In the delayed produce request case, do the check only after the produce 
 request is satisfied. This could potentially delay the fetch request from 
 being satisfied.
 2. Remove dependency of fetch request on produce request and just look at the 
 last logical log offset (which should mostly be cached). This would need the 
 replica.fetch.min.bytes to be number of messages rather than bytes. This also 
 helps KAFKA-671 in that we would no longer need to pass the ProduceRequest 
 object to the producer purgatory and hence not have to consume any memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-703) A fetch request in Fetch Purgatory can double count the bytes from the same delayed produce request

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122043#comment-14122043
 ] 

Guozhang Wang commented on KAFKA-703:
-

This problem is resolved in the purgatory / API redesign: KAFKA-1583. Closing 
now.

 A fetch request in Fetch Purgatory can double count the bytes from the same 
 delayed produce request
 ---

 Key: KAFKA-703
 URL: https://issues.apache.org/jira/browse/KAFKA-703
 Project: Kafka
  Issue Type: Bug
  Components: purgatory
Affects Versions: 0.8.1
Reporter: Sriram Subramanian
Assignee: Sriram Subramanian
Priority: Blocker
 Fix For: 0.8.2


 When a producer request is handled, the fetch purgatory is checked to ensure 
 any fetch requests are satisfied. When the produce request is satisfied we do 
 the check again and if the same fetch request was still in the fetch 
 purgatory it would end up double counting the bytes received.
 Possible Solutions
 1. In the delayed produce request case, do the check only after the produce 
 request is satisfied. This could potentially delay the fetch request from 
 being satisfied.
 2. Remove dependency of fetch request on produce request and just look at the 
 last logical log offset (which should mostly be cached). This would need the 
 replica.fetch.min.bytes to be number of messages rather than bytes. This also 
 helps KAFKA-671 in that we would no longer need to pass the ProduceRequest 
 object to the producer purgatory and hence not have to consume any memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-686) 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper

2014-09-04 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122049#comment-14122049
 ] 

Joe Stein commented on KAFKA-686:
-

0.8.2 drops support for Scala 2.8 

 0.8 Kafka broker should give a better error message when running against 0.7 
 zookeeper
 --

 Key: KAFKA-686
 URL: https://issues.apache.org/jira/browse/KAFKA-686
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Jay Kreps
Priority: Blocker
  Labels: newbie, patch
 Fix For: 0.8.2

 Attachments: KAFAK-686-null-pointer-fix.patch, 
 KAFKA-686-null-pointer-fix-2.patch


 People will not know that the zookeeper paths are not compatible. When you 
 try to start the 0.8 broker pointed at a 0.7 zookeeper you get a 
 NullPointerException. We should detect this and give a more sane error.
 Error:
 kafka.common.KafkaException: Can't parse json string: null
 at kafka.utils.Json$.liftedTree1$1(Json.scala:20)
 at kafka.utils.Json$.parseFull(Json.scala:16)
 at 
 kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:498)
 at 
 kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:494)
 at 
 scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
 at scala.collection.immutable.List.foreach(List.scala:45)
 at 
 kafka.utils.ZkUtils$.getReplicaAssignmentForTopics(ZkUtils.scala:494)
 at 
 kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:446)
 at 
 kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:220)
 at 
 kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:85)
 at 
 kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53)
 at 
 kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43)
 at kafka.controller.KafkaController.startup(KafkaController.scala:381)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:90)
 at 
 kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
 at kafka.Kafka$.main(Kafka.scala:46)
 at kafka.Kafka.main(Kafka.scala)
 Caused by: java.lang.NullPointerException
 at 
 scala.util.parsing.combinator.lexical.Scanners$Scanner.init(Scanners.scala:52)
 at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71)
 at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85)
 at kafka.utils.Json$.liftedTree1$1(Json.scala:17)
 ... 16 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1585) Client: Infinite conflict in /consumers/

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-1585.
--
Resolution: Fixed

 Client: Infinite conflict in /consumers/
 --

 Key: KAFKA-1585
 URL: https://issues.apache.org/jira/browse/KAFKA-1585
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8.1.1
Reporter: Artur Denysenko
Priority: Critical
 Fix For: 0.8.2

 Attachments: kafka_consumer_ephemeral_node_extract.zip


 Periodically we have kafka consumers cycling in conflict in /consumers/ and 
 I wrote this conflicted ephemeral node. 
 Please see attached log extract.
 After restarting the process kafka consumers are working perfectly. 
 We are using Zookeeper 3.4.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1585) Client: Infinite conflict in /consumers/

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122048#comment-14122048
 ] 

Guozhang Wang commented on KAFKA-1585:
--

I think this issue is already resolved by KAFKA-1451. Please re-open it with 
0.9.0 version if you still see it.

 Client: Infinite conflict in /consumers/
 --

 Key: KAFKA-1585
 URL: https://issues.apache.org/jira/browse/KAFKA-1585
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.8.1.1
Reporter: Artur Denysenko
Priority: Critical
 Fix For: 0.8.2

 Attachments: kafka_consumer_ephemeral_node_extract.zip


 Periodically we have kafka consumers cycling in conflict in /consumers/ and 
 I wrote this conflicted ephemeral node. 
 Please see attached log extract.
 After restarting the process kafka consumers are working perfectly. 
 We are using Zookeeper 3.4.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-665) Outgoing responses delayed on a busy Kafka broker

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-665:

Fix Version/s: (was: 0.8.2)
   0.9.0

 Outgoing responses delayed on a busy Kafka broker 
 --

 Key: KAFKA-665
 URL: https://issues.apache.org/jira/browse/KAFKA-665
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Neha Narkhede
Priority: Critical
  Labels: replication-performance
 Fix For: 0.9.0


 In a long running test, I observed that after a few hours of operation, few 
 requests start timing out, mainly because they spent very long time sitting 
 in the response queue -
 [2012-12-07 22:05:56,670] TRACE Completed request with correlation id 3965966 
 and client : TopicMetadataRequest:4009, queueTime:1, localTime:28, 
 remoteTime:0, sendTime:3980 (kafka.network.RequestChannel$)
 [2012-12-07 22:04:12,046] TRACE Completed request with correlation id 3962561 
 and client : TopicMetadataRequest:3449, queueTime:0, localTime:29, 
 remoteTime:0, sendTime:3420 (kafka.network.RequestChannel$)
 [2012-12-07 22:05:56,670] TRACE Completed request with correlation id 3965966 
 and client : TopicMetadataRequest:4009, queueTime:1, localTime:28, 
 remoteTime:0, sendTime:3980 (kafka.network.RequestChannel$)
 We might have a problem in the way we process outgoing responses. Basically, 
 if the processor thread blocks on enqueuing requests in the request queue, it 
 doesn't come around to processing its responses which are ready to go out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1313) Support adding replicas to existing topic partitions

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122050#comment-14122050
 ] 

Guozhang Wang commented on KAFKA-1313:
--

Moving to 0.9.

 Support adding replicas to existing topic partitions
 

 Key: KAFKA-1313
 URL: https://issues.apache.org/jira/browse/KAFKA-1313
 Project: Kafka
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.8.0
Reporter: Marc Labbe
Priority: Critical
 Fix For: 0.9.0


 There is currently no easy way to add replicas to an existing topic 
 partitions.
 For example, topic create-test has been created with ReplicationFactor=1: 
 Topic:create-test  PartitionCount:3ReplicationFactor:1 Configs:
 Topic: create-test Partition: 0Leader: 1   Replicas: 1 Isr: 1
 Topic: create-test Partition: 1Leader: 2   Replicas: 2 Isr: 2
 Topic: create-test Partition: 2Leader: 3   Replicas: 3 Isr: 3
 I would like to increase the ReplicationFactor=2 (or more) so it shows up 
 like this instead.
 Topic:create-test  PartitionCount:3ReplicationFactor:2 Configs:
 Topic: create-test Partition: 0Leader: 1   Replicas: 1,2 Isr: 1,2
 Topic: create-test Partition: 1Leader: 2   Replicas: 2,3 Isr: 2,3
 Topic: create-test Partition: 2Leader: 3   Replicas: 3,1 Isr: 3,1
 Use cases for this:
 - adding brokers and thus increase fault tolerance
 - fixing human errors for topics created with wrong values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1194) The kafka broker cannot delete the old log files after the configured time

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1194:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 The kafka broker cannot delete the old log files after the configured time
 --

 Key: KAFKA-1194
 URL: https://issues.apache.org/jira/browse/KAFKA-1194
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.1
 Environment: window
Reporter: Tao Qin
Assignee: Jay Kreps
  Labels: features, patch
 Fix For: 0.9.0

 Attachments: KAFKA-1194.patch, kafka-1194-v1.patch

   Original Estimate: 72h
  Remaining Estimate: 72h

 We tested it in windows environment, and set the log.retention.hours to 24 
 hours.
 # The minimum age of a log file to be eligible for deletion
 log.retention.hours=24
 After several days, the kafka broker still cannot delete the old log file. 
 And we get the following exceptions:
 [2013-12-19 01:57:38,528] ERROR Uncaught exception in scheduled task 
 'kafka-log-retention' (kafka.utils.KafkaScheduler)
 kafka.common.KafkaStorageException: Failed to change the log file suffix from 
  to .deleted for log segment 1516723
  at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:249)
  at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:638)
  at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:629)
  at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418)
  at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418)
  at 
 scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
  at scala.collection.immutable.List.foreach(List.scala:76)
  at kafka.log.Log.deleteOldSegments(Log.scala:418)
  at 
 kafka.log.LogManager.kafka$log$LogManager$$cleanupExpiredSegments(LogManager.scala:284)
  at 
 kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:316)
  at 
 kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:314)
  at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
  at scala.collection.Iterator$class.foreach(Iterator.scala:772)
  at 
 scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:573)
  at scala.collection.IterableLike$class.foreach(IterableLike.scala:73)
  at 
 scala.collection.JavaConversions$JListWrapper.foreach(JavaConversions.scala:615)
  at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
  at kafka.log.LogManager.cleanupLogs(LogManager.scala:314)
  at 
 kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(LogManager.scala:143)
  at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100)
  at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
  at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
  at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:724)
 I think this error happens because kafka tries to rename the log file when it 
 is still opened.  So we should close the file first before rename.
 The index file uses a special data structure, the MappedByteBuffer. Javadoc 
 describes it as:
 A mapped byte buffer and the file mapping that it represents remain valid 
 until the buffer itself is garbage-collected.
 Fortunately, I find a forceUnmap function in kafka code, and perhaps it can 
 be used to free the MappedByteBuffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1194) The kafka broker cannot delete the old log files after the configured time

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122053#comment-14122053
 ] 

Guozhang Wang commented on KAFKA-1194:
--

Moving again to 0.9 for now..

 The kafka broker cannot delete the old log files after the configured time
 --

 Key: KAFKA-1194
 URL: https://issues.apache.org/jira/browse/KAFKA-1194
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.1
 Environment: window
Reporter: Tao Qin
Assignee: Jay Kreps
  Labels: features, patch
 Fix For: 0.9.0

 Attachments: KAFKA-1194.patch, kafka-1194-v1.patch

   Original Estimate: 72h
  Remaining Estimate: 72h

 We tested it in windows environment, and set the log.retention.hours to 24 
 hours.
 # The minimum age of a log file to be eligible for deletion
 log.retention.hours=24
 After several days, the kafka broker still cannot delete the old log file. 
 And we get the following exceptions:
 [2013-12-19 01:57:38,528] ERROR Uncaught exception in scheduled task 
 'kafka-log-retention' (kafka.utils.KafkaScheduler)
 kafka.common.KafkaStorageException: Failed to change the log file suffix from 
  to .deleted for log segment 1516723
  at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:249)
  at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:638)
  at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:629)
  at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418)
  at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418)
  at 
 scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
  at scala.collection.immutable.List.foreach(List.scala:76)
  at kafka.log.Log.deleteOldSegments(Log.scala:418)
  at 
 kafka.log.LogManager.kafka$log$LogManager$$cleanupExpiredSegments(LogManager.scala:284)
  at 
 kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:316)
  at 
 kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:314)
  at 
 scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
  at scala.collection.Iterator$class.foreach(Iterator.scala:772)
  at 
 scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:573)
  at scala.collection.IterableLike$class.foreach(IterableLike.scala:73)
  at 
 scala.collection.JavaConversions$JListWrapper.foreach(JavaConversions.scala:615)
  at 
 scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
  at kafka.log.LogManager.cleanupLogs(LogManager.scala:314)
  at 
 kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(LogManager.scala:143)
  at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100)
  at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
  at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
  at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:724)
 I think this error happens because kafka tries to rename the log file when it 
 is still opened.  So we should close the file first before rename.
 The index file uses a special data structure, the MappedByteBuffer. Javadoc 
 describes it as:
 A mapped byte buffer and the file mapping that it represents remain valid 
 until the buffer itself is garbage-collected.
 Fortunately, I find a forceUnmap function in kafka code, and perhaps it can 
 be used to free the MappedByteBuffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1193) Data loss if broker is killed using kill -9

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-1193.
--
Resolution: Fixed

 Data loss if broker is killed using kill -9
 ---

 Key: KAFKA-1193
 URL: https://issues.apache.org/jira/browse/KAFKA-1193
 Project: Kafka
  Issue Type: Bug
  Components: replication
Affects Versions: 0.8.0, 0.8.1
 Environment: Centos 6.3
Reporter: Hanish Bansal
 Fix For: 0.8.2


 We are having kafka cluster of 2 nodes. (Using Kafka 0.8.0 version)
 Replication Factor: 2
 Number of partitions: 2
 Actual Behaviour:
 -
 Out of two nodes, if leader node goes down then data lost happens.
 Steps to Reproduce:
 --
 1. Create a 2 node kafka cluster with replication factor 2
 2. Start the Kafka cluster
 3. Create a topic lets say test-trunk111
 4. Restart any one node.
 5. Check topic status using kafka-list-topic tool.
 topic isr status is:
 topic: test-trunk111partition: 0leader: 0replicas: 1,0isr: 0,1
 topic: test-trunk111partition: 1leader: 0replicas: 0,1isr: 0,1
 If there is only one broker node in isr list then wait for some time and 
 again check isr status of topic. There should be 2 brokers in isr list.
 6. Start producing the data.
 7. Kill leader node (borker-0 in our case) meanwhile of data producing.
 8. After all data is produced start consumer.
 9. Observe the behaviour. There is data loss.
 After leader goes down, topic isr status is:
 topic: test-trunk111partition: 0leader: 1replicas: 1,0isr: 1
 topic: test-trunk111partition: 1leader: 1replicas: 0,1isr: 1
 We have tried below things to avoid data loss:
 
 1. Configured request.required.acks=-1 in producer configuration because as 
 mentioned in documentation 
 http://kafka.apache.org/documentation.html#producerconfigs, setting this 
 value to -1 provides guarantee that no messages will be lost.
 2. Increased the message.send.max.retries from 3 to 10 in producer 
 configuration.
 3. Set controlled.shutdown.enable to true in broker configuration.
 4. Tested with Kafka-0.8.1 after applying patch KAFKA-1188.patch available on 
 https://issues.apache.org/jira/browse/KAFKA-1188 
 Nothing work out from above things in case of leader node is killed using 
 kill -9 pid.
 Expected Behaviour:
 
 No data should be lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-686) 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper

2014-09-04 Thread Viktor Taranenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122054#comment-14122054
 ] 

Viktor Taranenko commented on KAFKA-686:


So I can consider 2.9 minimal version while solving this issue?

 0.8 Kafka broker should give a better error message when running against 0.7 
 zookeeper
 --

 Key: KAFKA-686
 URL: https://issues.apache.org/jira/browse/KAFKA-686
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Jay Kreps
Priority: Blocker
  Labels: newbie, patch
 Fix For: 0.8.2

 Attachments: KAFAK-686-null-pointer-fix.patch, 
 KAFKA-686-null-pointer-fix-2.patch


 People will not know that the zookeeper paths are not compatible. When you 
 try to start the 0.8 broker pointed at a 0.7 zookeeper you get a 
 NullPointerException. We should detect this and give a more sane error.
 Error:
 kafka.common.KafkaException: Can't parse json string: null
 at kafka.utils.Json$.liftedTree1$1(Json.scala:20)
 at kafka.utils.Json$.parseFull(Json.scala:16)
 at 
 kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:498)
 at 
 kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:494)
 at 
 scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
 at scala.collection.immutable.List.foreach(List.scala:45)
 at 
 kafka.utils.ZkUtils$.getReplicaAssignmentForTopics(ZkUtils.scala:494)
 at 
 kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:446)
 at 
 kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:220)
 at 
 kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:85)
 at 
 kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53)
 at 
 kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43)
 at kafka.controller.KafkaController.startup(KafkaController.scala:381)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:90)
 at 
 kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
 at kafka.Kafka$.main(Kafka.scala:46)
 at kafka.Kafka.main(Kafka.scala)
 Caused by: java.lang.NullPointerException
 at 
 scala.util.parsing.combinator.lexical.Scanners$Scanner.init(Scanners.scala:52)
 at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71)
 at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85)
 at kafka.utils.Json$.liftedTree1$1(Json.scala:17)
 ... 16 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1313) Support adding replicas to existing topic partitions

2014-09-04 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122056#comment-14122056
 ] 

Joe Stein commented on KAFKA-1313:
--

Don't we already have this feature with reassign partition tool? Or am I 
missing something? If so we can close this

 Support adding replicas to existing topic partitions
 

 Key: KAFKA-1313
 URL: https://issues.apache.org/jira/browse/KAFKA-1313
 Project: Kafka
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.8.0
Reporter: Marc Labbe
Priority: Critical
 Fix For: 0.9.0


 There is currently no easy way to add replicas to an existing topic 
 partitions.
 For example, topic create-test has been created with ReplicationFactor=1: 
 Topic:create-test  PartitionCount:3ReplicationFactor:1 Configs:
 Topic: create-test Partition: 0Leader: 1   Replicas: 1 Isr: 1
 Topic: create-test Partition: 1Leader: 2   Replicas: 2 Isr: 2
 Topic: create-test Partition: 2Leader: 3   Replicas: 3 Isr: 3
 I would like to increase the ReplicationFactor=2 (or more) so it shows up 
 like this instead.
 Topic:create-test  PartitionCount:3ReplicationFactor:2 Configs:
 Topic: create-test Partition: 0Leader: 1   Replicas: 1,2 Isr: 1,2
 Topic: create-test Partition: 1Leader: 2   Replicas: 2,3 Isr: 2,3
 Topic: create-test Partition: 2Leader: 3   Replicas: 3,1 Isr: 3,1
 Use cases for this:
 - adding brokers and thus increase fault tolerance
 - fixing human errors for topics created with wrong values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1193) Data loss if broker is killed using kill -9

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122055#comment-14122055
 ] 

Guozhang Wang commented on KAFKA-1193:
--

[~hanish.bansal.agarwal] I am closing this ticket now. Please feel free to 
re-open it if you observed this issue again.

 Data loss if broker is killed using kill -9
 ---

 Key: KAFKA-1193
 URL: https://issues.apache.org/jira/browse/KAFKA-1193
 Project: Kafka
  Issue Type: Bug
  Components: replication
Affects Versions: 0.8.0, 0.8.1
 Environment: Centos 6.3
Reporter: Hanish Bansal
 Fix For: 0.8.2


 We are having kafka cluster of 2 nodes. (Using Kafka 0.8.0 version)
 Replication Factor: 2
 Number of partitions: 2
 Actual Behaviour:
 -
 Out of two nodes, if leader node goes down then data lost happens.
 Steps to Reproduce:
 --
 1. Create a 2 node kafka cluster with replication factor 2
 2. Start the Kafka cluster
 3. Create a topic lets say test-trunk111
 4. Restart any one node.
 5. Check topic status using kafka-list-topic tool.
 topic isr status is:
 topic: test-trunk111partition: 0leader: 0replicas: 1,0isr: 0,1
 topic: test-trunk111partition: 1leader: 0replicas: 0,1isr: 0,1
 If there is only one broker node in isr list then wait for some time and 
 again check isr status of topic. There should be 2 brokers in isr list.
 6. Start producing the data.
 7. Kill leader node (borker-0 in our case) meanwhile of data producing.
 8. After all data is produced start consumer.
 9. Observe the behaviour. There is data loss.
 After leader goes down, topic isr status is:
 topic: test-trunk111partition: 0leader: 1replicas: 1,0isr: 1
 topic: test-trunk111partition: 1leader: 1replicas: 0,1isr: 1
 We have tried below things to avoid data loss:
 
 1. Configured request.required.acks=-1 in producer configuration because as 
 mentioned in documentation 
 http://kafka.apache.org/documentation.html#producerconfigs, setting this 
 value to -1 provides guarantee that no messages will be lost.
 2. Increased the message.send.max.retries from 3 to 10 in producer 
 configuration.
 3. Set controlled.shutdown.enable to true in broker configuration.
 4. Tested with Kafka-0.8.1 after applying patch KAFKA-1188.patch available on 
 https://issues.apache.org/jira/browse/KAFKA-1188 
 Nothing work out from above things in case of leader node is killed using 
 kill -9 pid.
 Expected Behaviour:
 
 No data should be lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-686) 0.8 Kafka broker should give a better error message when running against 0.7 zookeeper

2014-09-04 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122058#comment-14122058
 ] 

Joe Stein commented on KAFKA-686:
-

2.9.1 yes

 0.8 Kafka broker should give a better error message when running against 0.7 
 zookeeper
 --

 Key: KAFKA-686
 URL: https://issues.apache.org/jira/browse/KAFKA-686
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Jay Kreps
Priority: Blocker
  Labels: newbie, patch
 Fix For: 0.8.2

 Attachments: KAFAK-686-null-pointer-fix.patch, 
 KAFKA-686-null-pointer-fix-2.patch


 People will not know that the zookeeper paths are not compatible. When you 
 try to start the 0.8 broker pointed at a 0.7 zookeeper you get a 
 NullPointerException. We should detect this and give a more sane error.
 Error:
 kafka.common.KafkaException: Can't parse json string: null
 at kafka.utils.Json$.liftedTree1$1(Json.scala:20)
 at kafka.utils.Json$.parseFull(Json.scala:16)
 at 
 kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:498)
 at 
 kafka.utils.ZkUtils$$anonfun$getReplicaAssignmentForTopics$2.apply(ZkUtils.scala:494)
 at 
 scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
 at scala.collection.immutable.List.foreach(List.scala:45)
 at 
 kafka.utils.ZkUtils$.getReplicaAssignmentForTopics(ZkUtils.scala:494)
 at 
 kafka.controller.KafkaController.initializeControllerContext(KafkaController.scala:446)
 at 
 kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:220)
 at 
 kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:85)
 at 
 kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:53)
 at 
 kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:43)
 at kafka.controller.KafkaController.startup(KafkaController.scala:381)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:90)
 at 
 kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
 at kafka.Kafka$.main(Kafka.scala:46)
 at kafka.Kafka.main(Kafka.scala)
 Caused by: java.lang.NullPointerException
 at 
 scala.util.parsing.combinator.lexical.Scanners$Scanner.init(Scanners.scala:52)
 at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71)
 at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85)
 at kafka.utils.Json$.liftedTree1$1(Json.scala:17)
 ... 16 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1057) Trim whitespaces from user specified configs

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122059#comment-14122059
 ] 

Guozhang Wang commented on KAFKA-1057:
--

Moving out of 0.8.2 for now.

 Trim whitespaces from user specified configs
 

 Key: KAFKA-1057
 URL: https://issues.apache.org/jira/browse/KAFKA-1057
 Project: Kafka
  Issue Type: Bug
  Components: config
Reporter: Neha Narkhede
Assignee: Neha Narkhede
  Labels: newbie
 Fix For: 0.9.0


 Whitespaces in configs are a common problem that leads to config errors. It 
 will be nice if Kafka can trim the whitespaces from configs automatically



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1057) Trim whitespaces from user specified configs

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1057:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 Trim whitespaces from user specified configs
 

 Key: KAFKA-1057
 URL: https://issues.apache.org/jira/browse/KAFKA-1057
 Project: Kafka
  Issue Type: Bug
  Components: config
Reporter: Neha Narkhede
Assignee: Neha Narkhede
  Labels: newbie
 Fix For: 0.9.0


 Whitespaces in configs are a common problem that leads to config errors. It 
 will be nice if Kafka can trim the whitespaces from configs automatically



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1108:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 when controlled shutdown attempt fails, the reason is not always logged
 ---

 Key: KAFKA-1108
 URL: https://issues.apache.org/jira/browse/KAFKA-1108
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Rosenberg
 Fix For: 0.9.0


 In KafkaServer.controlledShutdown(), it initiates a controlled shutdown, and 
 then if there's a failure, it will retry the controlledShutdown.
 Looking at the code, there are 2 ways a retry could fail, one with an error 
 response from the controller, and this messaging code:
 {code}
 info(Remaining partitions to move: 
 %s.format(shutdownResponse.partitionsRemaining.mkString(,)))
 info(Error code from controller: %d.format(shutdownResponse.errorCode))
 {code}
 Alternatively, there could be an IOException, with this code executed:
 {code}
 catch {
   case ioe: java.io.IOException =
 channel.disconnect()
 channel = null
 // ignore and try again
 }
 {code}
 And then finally, in either case:
 {code}
   if (!shutdownSuceeded) {
 Thread.sleep(config.controlledShutdownRetryBackoffMs)
 warn(Retrying controlled shutdown after the previous attempt 
 failed...)
   }
 {code}
 It would be nice if the nature of the IOException were logged in either case 
 (I'd be happy with an ioe.getMessage() instead of a full stack trace, as 
 kafka in general tends to be too willing to dump IOException stack traces!).
 I suspect, in my case, the actual IOException is a socket timeout (as the 
 time between initial Starting controlled shutdown and the first 
 Retrying... message is usually about 35 seconds (the socket timeout + the 
 controlled shutdown retry backoff).  So, it would seem that really, the issue 
 in this case is that controlled shutdown is taking too long.  It would seem 
 sensible instead to have the controller report back to the server (before the 
 socket timeout) that more time is needed, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1147) Consumer socket timeout should be greater than fetch max wait

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122061#comment-14122061
 ] 

Guozhang Wang commented on KAFKA-1147:
--

This is quite an easy fix and could be reviewed / checked in by any committer I 
think.

 Consumer socket timeout should be greater than fetch max wait
 -

 Key: KAFKA-1147
 URL: https://issues.apache.org/jira/browse/KAFKA-1147
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0, 0.8.1
Reporter: Joel Koshy
Assignee: Guozhang Wang
 Fix For: 0.8.2, 0.9.0

 Attachments: KAFKA-1147.patch, KAFKA-1147_2013-12-07_18:22:18.patch, 
 KAFKA-1147_2013-12-09_09:14:24.patch, KAFKA-1147_2013-12-10_14:31:46.patch


 From the mailing list:
 The consumer-config documentation states that The actual timeout set
 will be max.fetch.wait + socket.timeout.ms. - however, that change
 seems to have been lost in the code a while ago - we should either fix the 
 doc or re-introduce the addition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1108) when controlled shutdown attempt fails, the reason is not always logged

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122062#comment-14122062
 ] 

Guozhang Wang commented on KAFKA-1108:
--

Moving to 0.9 for now.

 when controlled shutdown attempt fails, the reason is not always logged
 ---

 Key: KAFKA-1108
 URL: https://issues.apache.org/jira/browse/KAFKA-1108
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Rosenberg
 Fix For: 0.9.0


 In KafkaServer.controlledShutdown(), it initiates a controlled shutdown, and 
 then if there's a failure, it will retry the controlledShutdown.
 Looking at the code, there are 2 ways a retry could fail, one with an error 
 response from the controller, and this messaging code:
 {code}
 info(Remaining partitions to move: 
 %s.format(shutdownResponse.partitionsRemaining.mkString(,)))
 info(Error code from controller: %d.format(shutdownResponse.errorCode))
 {code}
 Alternatively, there could be an IOException, with this code executed:
 {code}
 catch {
   case ioe: java.io.IOException =
 channel.disconnect()
 channel = null
 // ignore and try again
 }
 {code}
 And then finally, in either case:
 {code}
   if (!shutdownSuceeded) {
 Thread.sleep(config.controlledShutdownRetryBackoffMs)
 warn(Retrying controlled shutdown after the previous attempt 
 failed...)
   }
 {code}
 It would be nice if the nature of the IOException were logged in either case 
 (I'd be happy with an ioe.getMessage() instead of a full stack trace, as 
 kafka in general tends to be too willing to dump IOException stack traces!).
 I suspect, in my case, the actual IOException is a socket timeout (as the 
 time between initial Starting controlled shutdown and the first 
 Retrying... message is usually about 35 seconds (the socket timeout + the 
 controlled shutdown retry backoff).  So, it would seem that really, the issue 
 in this case is that controlled shutdown is taking too long.  It would seem 
 sensible instead to have the controller report back to the server (before the 
 socket timeout) that more time is needed, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1056) Evenly Distribute Intervals in OffsetIndex

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122066#comment-14122066
 ] 

Guozhang Wang commented on KAFKA-1056:
--

Moving to 0.9 for now.

 Evenly Distribute Intervals in OffsetIndex
 --

 Key: KAFKA-1056
 URL: https://issues.apache.org/jira/browse/KAFKA-1056
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Guozhang Wang
Assignee: Guozhang Wang
  Labels: newbie++
 Fix For: 0.8.2


 Today a new entry will be created in OffsetIndex for each produce request 
 regardless of the number of messages it contains. It is better to evenly 
 distribute the intervals between index entries for index search efficiency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1054) Elimiate Compilation Warnings for 0.8 Final Release

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122068#comment-14122068
 ] 

Guozhang Wang commented on KAFKA-1054:
--

Moving to 0.9.

 Elimiate Compilation Warnings for 0.8 Final Release
 ---

 Key: KAFKA-1054
 URL: https://issues.apache.org/jira/browse/KAFKA-1054
 Project: Kafka
  Issue Type: Improvement
Reporter: Guozhang Wang
  Labels: newbie
 Fix For: 0.9.0


 Currently we have a total number of 38 warnings for source code compilation 
 of 0.8.
 1) 3 from Unchecked type pattern
 2) 6 from Unchecked conversion
 3) 29 from Deprecated Hadoop API functions
 It's better we finish these before the final release of 0.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1054) Elimiate Compilation Warnings for 0.8 Final Release

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1054:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 Elimiate Compilation Warnings for 0.8 Final Release
 ---

 Key: KAFKA-1054
 URL: https://issues.apache.org/jira/browse/KAFKA-1054
 Project: Kafka
  Issue Type: Improvement
Reporter: Guozhang Wang
  Labels: newbie
 Fix For: 0.9.0


 Currently we have a total number of 38 warnings for source code compilation 
 of 0.8.
 1) 3 from Unchecked type pattern
 2) 6 from Unchecked conversion
 3) 29 from Deprecated Hadoop API functions
 It's better we finish these before the final release of 0.8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1056) Evenly Distribute Intervals in OffsetIndex

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1056:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 Evenly Distribute Intervals in OffsetIndex
 --

 Key: KAFKA-1056
 URL: https://issues.apache.org/jira/browse/KAFKA-1056
 Project: Kafka
  Issue Type: Improvement
  Components: log
Reporter: Guozhang Wang
Assignee: Guozhang Wang
  Labels: newbie++
 Fix For: 0.9.0


 Today a new entry will be created in OffsetIndex for each produce request 
 regardless of the number of messages it contains. It is better to evenly 
 distribute the intervals between index entries for index search efficiency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1043) Time-consuming FetchRequest could block other request in the response queue

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122070#comment-14122070
 ] 

Guozhang Wang commented on KAFKA-1043:
--

Moving to 0.9.

 Time-consuming FetchRequest could block other request in the response queue
 ---

 Key: KAFKA-1043
 URL: https://issues.apache.org/jira/browse/KAFKA-1043
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.8.2


 Since in SocketServer the processor who takes any request is also responsible 
 for writing the response for that request, we make each processor owning its 
 own response queue. If a FetchRequest takes irregularly long time to write 
 the channel buffer it would block all other responses in the queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1043) Time-consuming FetchRequest could block other request in the response queue

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1043:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 Time-consuming FetchRequest could block other request in the response queue
 ---

 Key: KAFKA-1043
 URL: https://issues.apache.org/jira/browse/KAFKA-1043
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0


 Since in SocketServer the processor who takes any request is also responsible 
 for writing the response for that request, we make each processor owning its 
 own response queue. If a FetchRequest takes irregularly long time to write 
 the channel buffer it would block all other responses in the queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1354) Failed to load class org.slf4j.impl.StaticLoggerBinder

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122076#comment-14122076
 ] 

Guozhang Wang commented on KAFKA-1354:
--

Moving to 0.9 for tracking.

 Failed to load class org.slf4j.impl.StaticLoggerBinder
 

 Key: KAFKA-1354
 URL: https://issues.apache.org/jira/browse/KAFKA-1354
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.1
 Environment: RHEL
Reporter: RakeshAcharya
Assignee: Jay Kreps
  Labels: newbie, patch, usability
 Fix For: 0.9.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 Getting below errors during Kafka startup
 SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
 SLF4J: Defaulting to no-operation (NOP) logger implementation
 SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
 details.
 [2014-03-31 18:55:36,488] INFO Will not load MX4J, mx4j-tools.jar is not in 
 the classpath (kafka.utils.Mx4jLoader$)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1354) Failed to load class org.slf4j.impl.StaticLoggerBinder

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1354:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 Failed to load class org.slf4j.impl.StaticLoggerBinder
 

 Key: KAFKA-1354
 URL: https://issues.apache.org/jira/browse/KAFKA-1354
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.8.1
 Environment: RHEL
Reporter: RakeshAcharya
Assignee: Jay Kreps
  Labels: newbie, patch, usability
 Fix For: 0.9.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 Getting below errors during Kafka startup
 SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
 SLF4J: Defaulting to no-operation (NOP) logger implementation
 SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
 details.
 [2014-03-31 18:55:36,488] INFO Will not load MX4J, mx4j-tools.jar is not in 
 the classpath (kafka.utils.Mx4jLoader$)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1348) Producer's Broker Discovery Interface

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122085#comment-14122085
 ] 

Guozhang Wang commented on KAFKA-1348:
--

Closing this ticket now.

 Producer's Broker Discovery Interface
 -

 Key: KAFKA-1348
 URL: https://issues.apache.org/jira/browse/KAFKA-1348
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Reporter: Jay Bae
Assignee: Jun Rao
 Fix For: 0.8.2


 Producer has a property 'broker.list' static configuration. I need a 
 requirement to be able to override this behavior such as Netflix Eureka 
 Discovery module. Let me contribute and please add this to 0.8.1.1 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1348) Producer's Broker Discovery Interface

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-1348.
--
Resolution: Not a Problem

 Producer's Broker Discovery Interface
 -

 Key: KAFKA-1348
 URL: https://issues.apache.org/jira/browse/KAFKA-1348
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Reporter: Jay Bae
Assignee: Jun Rao
 Fix For: 0.8.2


 Producer has a property 'broker.list' static configuration. I need a 
 requirement to be able to override this behavior such as Netflix Eureka 
 Discovery module. Let me contribute and please add this to 0.8.1.1 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1019) kafka-preferred-replica-election.sh will fail without clear error message if /brokers/topics/[topic]/partitions does not exist

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1019:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 kafka-preferred-replica-election.sh will fail without clear error message if 
 /brokers/topics/[topic]/partitions does not exist
 --

 Key: KAFKA-1019
 URL: https://issues.apache.org/jira/browse/KAFKA-1019
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Guozhang Wang
  Labels: newbie
 Fix For: 0.9.0


 From Libo Yu:
 I tried to run kafka-preferred-replica-election.sh on our kafka cluster.
 But I got this expection:
 Failed to start preferred replica election
 org.I0Itec.zkclient.exception.ZkNoNodeException: 
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for /brokers/topics/uattoqaaa.default/partitions
 I checked zookeeper and there is no 
 /brokers/topics/uattoqaaa.default/partitions. All I found is
 /brokers/topics/uattoqaaa.default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1019) kafka-preferred-replica-election.sh will fail without clear error message if /brokers/topics/[topic]/partitions does not exist

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122087#comment-14122087
 ] 

Guozhang Wang commented on KAFKA-1019:
--

Moving to 0.9 now.

 kafka-preferred-replica-election.sh will fail without clear error message if 
 /brokers/topics/[topic]/partitions does not exist
 --

 Key: KAFKA-1019
 URL: https://issues.apache.org/jira/browse/KAFKA-1019
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Guozhang Wang
  Labels: newbie
 Fix For: 0.9.0


 From Libo Yu:
 I tried to run kafka-preferred-replica-election.sh on our kafka cluster.
 But I got this expection:
 Failed to start preferred replica election
 org.I0Itec.zkclient.exception.ZkNoNodeException: 
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for /brokers/topics/uattoqaaa.default/partitions
 I checked zookeeper and there is no 
 /brokers/topics/uattoqaaa.default/partitions. All I found is
 /brokers/topics/uattoqaaa.default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1034) Improve partition reassignment to optimize writes to zookeeper

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122088#comment-14122088
 ] 

Guozhang Wang commented on KAFKA-1034:
--

[~nehanarkhede] Could you provide some more context on this one? And if it is 
still an issue shall we move to 0.9?

 Improve partition reassignment to optimize writes to zookeeper
 --

 Key: KAFKA-1034
 URL: https://issues.apache.org/jira/browse/KAFKA-1034
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0, 0.8.1
Reporter: Sriram Subramanian
Assignee: Sriram Subramanian
 Fix For: 0.8.2


 For ReassignPartition tool, check if optimizing the writes to ZK after every 
 replica reassignment is possible



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1002) Delete aliveLeaders field from LeaderAndIsrRequest

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122090#comment-14122090
 ] 

Guozhang Wang commented on KAFKA-1002:
--

I think we still need aliveLeaders today, so this may not be an issue any more. 
[~junrao] [~swapnilghike] Could you comment on this one?

 Delete aliveLeaders field from LeaderAndIsrRequest
 --

 Key: KAFKA-1002
 URL: https://issues.apache.org/jira/browse/KAFKA-1002
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Swapnil Ghike
 Fix For: 0.8.2


 After KAFKA-999 is committed, we don't need aliveLeaders in 
 LeaderAndIsrRequest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-853) Allow OffsetFetchRequest to initialize offsets

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-853:

Fix Version/s: (was: 0.8.2)
   0.9.0

 Allow OffsetFetchRequest to initialize offsets
 --

 Key: KAFKA-853
 URL: https://issues.apache.org/jira/browse/KAFKA-853
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 0.8.1
Reporter: David Arthur
 Fix For: 0.9.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 It would be nice for the OffsetFetchRequest API to have the option to 
 initialize offsets instead of returning unknown_topic_or_partition. It could 
 mimic the Offsets API by adding the time field and then follow the same 
 code path on the server as the Offset API. 
 In this case, the response would need to a boolean to indicate if the 
 returned offset was initialized or fetched from ZK.
 This would simplify the client logic when dealing with new topics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-853) Allow OffsetFetchRequest to initialize offsets

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122093#comment-14122093
 ] 

Guozhang Wang commented on KAFKA-853:
-

Moving to 0.9 now.

 Allow OffsetFetchRequest to initialize offsets
 --

 Key: KAFKA-853
 URL: https://issues.apache.org/jira/browse/KAFKA-853
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 0.8.1
Reporter: David Arthur
 Fix For: 0.9.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 It would be nice for the OffsetFetchRequest API to have the option to 
 initialize offsets instead of returning unknown_topic_or_partition. It could 
 mimic the Offsets API by adding the time field and then follow the same 
 code path on the server as the Offset API. 
 In this case, the response would need to a boolean to indicate if the 
 returned offset was initialized or fetched from ZK.
 This would simplify the client logic when dealing with new topics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1439) transient unit test failure in testDeleteTopicDuringAddPartition

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-1439.
--
Resolution: Duplicate

 transient unit test failure in testDeleteTopicDuringAddPartition
 

 Key: KAFKA-1439
 URL: https://issues.apache.org/jira/browse/KAFKA-1439
 Project: Kafka
  Issue Type: Bug
Reporter: Jun Rao
Assignee: Sriharsha Chintalapani
  Labels: newbie++
 Fix For: 0.8.2


 Saw the following transient unit test failure.
 kafka.admin.DeleteTopicTest  testDeleteTopicDuringAddPartition FAILED
 junit.framework.AssertionFailedError: Admin path /admin/delete_topic/test 
 path not deleted even after a replica is restarted
 at junit.framework.Assert.fail(Assert.java:47)
 at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:578)
 at 
 kafka.admin.DeleteTopicTest.verifyTopicDeletion(DeleteTopicTest.scala:333)
 at 
 kafka.admin.DeleteTopicTest.testDeleteTopicDuringAddPartition(DeleteTopicTest.scala:216)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1235) Enable server to indefinitely retry on controlled shutdown

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122095#comment-14122095
 ] 

Guozhang Wang commented on KAFKA-1235:
--

Moving to 0.9.

 Enable server to indefinitely retry on controlled shutdown
 --

 Key: KAFKA-1235
 URL: https://issues.apache.org/jira/browse/KAFKA-1235
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
 Fix For: 0.9.0

 Attachments: KAFKA-1235.patch, KAFKA-1235_2014-02-20_11:28:36.patch, 
 KAFKA-1235_2014-02-20_13:08:23.patch


 Today the kafka server can exit silently if it hits an exception that is 
 swallowed during controlled shut down or controlled.shutdown.max.retries 
 has been exhausted. It is better to add an option to let it retry 
 indefinitely.
 Also will fix some other loose-check bugs on socket-closing logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-451) follower replica may need to backoff the fetching if leader is not ready yet

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-451:

Fix Version/s: (was: 0.8.2)
   0.9.0

 follower replica may need to backoff the fetching if leader is not ready yet
 

 Key: KAFKA-451
 URL: https://issues.apache.org/jira/browse/KAFKA-451
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jun Rao
Assignee: Prashanth Menon
  Labels: optimization
 Fix For: 0.9.0

   Original Estimate: 48h
  Remaining Estimate: 48h

 Currently, when a follower starts fetching from a new leader, it just keeps 
 sending fetch requests even if the requests fail because the leader is not 
 ready yet. We probably should let the follower backoff a bit in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-570) Kafka should not need snappy jar at runtime

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122099#comment-14122099
 ] 

Guozhang Wang commented on KAFKA-570:
-

This issue will be automatically fixed when the server code moves to client's 
common libraries, moving to 0.9 for now.

 Kafka should not need snappy jar at runtime
 ---

 Key: KAFKA-570
 URL: https://issues.apache.org/jira/browse/KAFKA-570
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Swapnil Ghike
  Labels: bugs
 Fix For: 0.9.0


 CompressionFactory imports snappy jar in a pattern match. The purpose of 
 importing it this way seems to be avoiding the import unless snappy 
 compression is actually required. However, kafka throws a 
 ClassNotFoundException if snappy jar is removed at runtime from lib_managed. 
 This exception can be easily seen by producing some data with the console 
 producer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-566) Add last modified time to the TopicMetadataRequest

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-566:

Fix Version/s: (was: 0.8.2)
   0.9.0

 Add last modified time to the TopicMetadataRequest
 --

 Key: KAFKA-566
 URL: https://issues.apache.org/jira/browse/KAFKA-566
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Jay Kreps
 Fix For: 0.9.0


 To support KAFKA-560 it would be nice to have a last modified time in the 
 TopicMetadataRequest. This would be the timestamp of the last append to the 
 log as taken from stat on the final log segment.
 Implementation would involve
 1. Adding a new field to TopicMetadataRequest
 2. Adding a method Log.lastModified: Long to get the last modified time from 
 a log
 This timestamp would, of course, be subject to error in the event that the 
 file was touched without modification, but I think that is actually okay 
 since it provides a manual way to avoid gc'ing a topic that you  know you 
 will want.
 It is debatable whether this should go in 0.8. It would be nice to add the 
 field to the metadata request, at least, as that change should be easy and 
 would avoid needing to bump the version in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-95) Create Jenkins readable test output

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-95:
---
Fix Version/s: (was: 0.8.2)
   0.9.0

 Create Jenkins readable test output
 ---

 Key: KAFKA-95
 URL: https://issues.apache.org/jira/browse/KAFKA-95
 Project: Kafka
  Issue Type: Improvement
  Components: packaging
Reporter: Chris Burroughs
 Fix For: 0.9.0


 Jenkinds likes XML.  See 
 http://henkelmann.eu/2010/11/14/sbt_hudson_with_test_integration



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-493) High CPU usage on inactive server

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122100#comment-14122100
 ] 

Guozhang Wang commented on KAFKA-493:
-

Moving the fix of 2) to 0.9

 High CPU usage on inactive server
 -

 Key: KAFKA-493
 URL: https://issues.apache.org/jira/browse/KAFKA-493
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0
Reporter: Jay Kreps
 Fix For: 0.8.2

 Attachments: Kafka-sampling1.zip, Kafka-sampling2.zip, 
 Kafka-sampling3.zip, Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, 
 backtraces.txt, stacktrace.txt


  I've been playing with the 0.8 branch of Kafka and noticed that idle CPU 
  usage is fairly high (13% of a 
  core). Is that to be expected? I did look at the stack, but didn't see 
  anything obvious. A background 
  task?
  I wanted to mention how I am getting into this state. I've set up two 
  machines with the latest 0.8 
  code base and am using a replication factor of 2. On starting the brokers 
  there is no idle CPU activity. 
  Then I run a test that essential does 10k publish operations followed by 
  immediate consume operations 
  (I was measuring latency). Once this has run the kafka nodes seem to 
  consistently be consuming CPU 
  essentially forever.
 hprof results:
 THREAD START (obj=53ae, id = 24, name=RMI TCP Accept-0, 
 group=system)
 THREAD START (obj=53ae, id = 25, name=RMI TCP Accept-, 
 group=system)
 THREAD START (obj=53ae, id = 26, name=RMI TCP Accept-0, 
 group=system)
 THREAD START (obj=53ae, id = 21, name=main, group=main)
 THREAD START (obj=53ae, id = 27, name=Thread-2, group=main)
 THREAD START (obj=53ae, id = 28, name=Thread-3, group=main)
 THREAD START (obj=53ae, id = 29, name=kafka-processor-9092-0, 
 group=main)
 THREAD START (obj=53ae, id = 200010, name=kafka-processor-9092-1, 
 group=main)
 THREAD START (obj=53ae, id = 200011, name=kafka-acceptor, group=main)
 THREAD START (obj=574b, id = 200012, 
 name=ZkClient-EventThread-20-localhost:2181, group=main)
 THREAD START (obj=576e, id = 200014, name=main-SendThread(), 
 group=main)
 THREAD START (obj=576d, id = 200013, name=main-EventThread, 
 group=main)
 THREAD START (obj=53ae, id = 200015, name=metrics-meter-tick-thread-1, 
 group=main)
 THREAD START (obj=53ae, id = 200016, name=metrics-meter-tick-thread-2, 
 group=main)
 THREAD START (obj=53ae, id = 200017, name=request-expiration-task, 
 group=main)
 THREAD START (obj=53ae, id = 200018, name=request-expiration-task, 
 group=main)
 THREAD START (obj=53ae, id = 200019, name=kafka-request-handler-0, 
 group=main)
 THREAD START (obj=53ae, id = 200020, name=kafka-request-handler-1, 
 group=main)
 THREAD START (obj=53ae, id = 200021, name=Thread-6, group=main)
 THREAD START (obj=53ae, id = 200022, name=Thread-7, group=main)
 THREAD START (obj=5899, id = 200023, name=ReplicaFetcherThread-0-2 on 
 broker 1, , group=main)
 THREAD START (obj=5899, id = 200024, name=ReplicaFetcherThread-0-3 on 
 broker 1, , group=main)
 THREAD START (obj=5899, id = 200025, name=ReplicaFetcherThread-0-0 on 
 broker 1, , group=main)
 THREAD START (obj=5899, id = 200026, name=ReplicaFetcherThread-0-1 on 
 broker 1, , group=main)
 THREAD START (obj=53ae, id = 200028, name=SIGINT handler, 
 group=system)
 THREAD START (obj=53ae, id = 200029, name=Thread-5, group=main)
 THREAD START (obj=574b, id = 200030, name=Thread-1, group=main)
 THREAD START (obj=574b, id = 200031, name=Thread-0, group=main)
 THREAD END (id = 200031)
 THREAD END (id = 200029)
 THREAD END (id = 200020)
 THREAD END (id = 200019)
 THREAD END (id = 28)
 THREAD END (id = 200021)
 THREAD END (id = 27)
 THREAD END (id = 200022)
 THREAD END (id = 200018)
 THREAD END (id = 200017)
 THREAD END (id = 200012)
 THREAD END (id = 200013)
 THREAD END (id = 200014)
 THREAD END (id = 200025)
 THREAD END (id = 200023)
 THREAD END (id = 200026)
 THREAD END (id = 200024)
 THREAD END (id = 200011)
 THREAD END (id = 29)
 THREAD END (id = 200010)
 THREAD END (id = 200030)
 THREAD END (id = 200028)
 TRACE 301281:
 sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown 
 line)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
 
 sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:218)
 sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
 
 java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
   

[jira] [Updated] (KAFKA-740) Improve crash-safety of log segment swap

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-740:

Fix Version/s: (was: 0.8.2)
   0.9.0

 Improve crash-safety of log segment swap
 

 Key: KAFKA-740
 URL: https://issues.apache.org/jira/browse/KAFKA-740
 Project: Kafka
  Issue Type: Bug
  Components: log
Reporter: Jay Kreps
Assignee: Jay Kreps
 Fix For: 0.9.0


 Currently Log.replaceSegments has a bug that can cause a swap containing 
 multiple segments to partially complete. This would lead to duplicate data in 
 the log.
 The proposed fix is to use a name like offset1_and_offset2.swap for a segment 
 meant to replace segments with base offsets offset1 and offset2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-740) Improve crash-safety of log segment swap

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122101#comment-14122101
 ] 

Guozhang Wang commented on KAFKA-740:
-

Moving to 0.9.

 Improve crash-safety of log segment swap
 

 Key: KAFKA-740
 URL: https://issues.apache.org/jira/browse/KAFKA-740
 Project: Kafka
  Issue Type: Bug
  Components: log
Reporter: Jay Kreps
Assignee: Jay Kreps
 Fix For: 0.9.0


 Currently Log.replaceSegments has a bug that can cause a swap containing 
 multiple segments to partially complete. This would lead to duplicate data in 
 the log.
 The proposed fix is to use a name like offset1_and_offset2.swap for a segment 
 meant to replace segments with base offsets offset1 and offset2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-747) RequestChannel re-design

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-747:

Fix Version/s: (was: 0.8.2)
   0.9.0

 RequestChannel re-design
 

 Key: KAFKA-747
 URL: https://issues.apache.org/jira/browse/KAFKA-747
 Project: Kafka
  Issue Type: New Feature
  Components: network
Reporter: Jay Kreps
Assignee: Neha Narkhede
 Fix For: 0.9.0


 We have had some discussion around how to handle queuing requests. There are 
 two competing concerns:
 1. We need to maintain request order on a per-socket basis.
 2. We want to be able to balance load flexibly over a pool of threads so that 
 if one thread blocks on I/O request processing continues.
 Two Approaches We Have Considered
 1. Have a global queue of unprocessed requests. All I/O threads read requests 
 off this global queue and process them. To avoid re-ordering have the network 
 layer only read one request at a time.
 2. Have a queue per I/O thread and have the network threads statically map 
 sockets to I/O thread request queues.
 Problems With These Approaches
 In the first case you are not able to get any per-producer parallelism. That 
 is you can't read the next request while the current one is being handled. 
 This seems like it would not be a big deal, but preliminary benchmarks show 
 that it might be. 
 In the second case there are two problems. The first is that when an I/O 
 thread gets blocked all request processing for sockets attached to that I/O 
 thread will grind to a halt. If you have 10,000 connections, and  10 I/O 
 threads, then each blockage will stop 1,000 producers. If there is one topic 
 that has long synchronous flush times enabled (or is experiencing fsync 
 locking) this will cause big latency blips for all producers using that I/O 
 thread. The next problem is around backpressure and memory management. Say we 
 use BlockingQueues to feed the I/O threads. And say that one I/O thread 
 stalls. It's request queue will fill up and it will then block ALL network 
 threads, since they will block on inserting into that queue, even though the 
 other I/O threads are unused and have empty queues.
 A Proposed Better Solution
 The problem with the first solution is that we are not pipelining requests. 
 The problem with the second approach is that we are too constrained in moving 
 work from one I/O thread to another.
 Instead we should have a single request queue-like structure, but internally 
 enforce the condition that requests are not re-ordered.
 Here are the details. We retain RequestChannel but refactor its internals. 
 Internally we replace the blocking queue with a linked list. We also keep an 
 in-flight-keys array with one entry per I/O thread. When removing a work item 
 from the list we can't just take the first thing. Instead we need to walk the 
 list and look for something with a request key not in the in-flight-keys 
 array. When a response is sent, we remove that key from the in-flight array.
 This guarantees that requests for a socket with key K are ordered, but that 
 processing for K can only block requests made by K.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-493) High CPU usage on inactive server

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-493:

Fix Version/s: (was: 0.8.2)
   0.9.0

 High CPU usage on inactive server
 -

 Key: KAFKA-493
 URL: https://issues.apache.org/jira/browse/KAFKA-493
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0
Reporter: Jay Kreps
 Fix For: 0.9.0

 Attachments: Kafka-sampling1.zip, Kafka-sampling2.zip, 
 Kafka-sampling3.zip, Kafka-trace1.zip, Kafka-trace2.zip, Kafka-trace3.zip, 
 backtraces.txt, stacktrace.txt


  I've been playing with the 0.8 branch of Kafka and noticed that idle CPU 
  usage is fairly high (13% of a 
  core). Is that to be expected? I did look at the stack, but didn't see 
  anything obvious. A background 
  task?
  I wanted to mention how I am getting into this state. I've set up two 
  machines with the latest 0.8 
  code base and am using a replication factor of 2. On starting the brokers 
  there is no idle CPU activity. 
  Then I run a test that essential does 10k publish operations followed by 
  immediate consume operations 
  (I was measuring latency). Once this has run the kafka nodes seem to 
  consistently be consuming CPU 
  essentially forever.
 hprof results:
 THREAD START (obj=53ae, id = 24, name=RMI TCP Accept-0, 
 group=system)
 THREAD START (obj=53ae, id = 25, name=RMI TCP Accept-, 
 group=system)
 THREAD START (obj=53ae, id = 26, name=RMI TCP Accept-0, 
 group=system)
 THREAD START (obj=53ae, id = 21, name=main, group=main)
 THREAD START (obj=53ae, id = 27, name=Thread-2, group=main)
 THREAD START (obj=53ae, id = 28, name=Thread-3, group=main)
 THREAD START (obj=53ae, id = 29, name=kafka-processor-9092-0, 
 group=main)
 THREAD START (obj=53ae, id = 200010, name=kafka-processor-9092-1, 
 group=main)
 THREAD START (obj=53ae, id = 200011, name=kafka-acceptor, group=main)
 THREAD START (obj=574b, id = 200012, 
 name=ZkClient-EventThread-20-localhost:2181, group=main)
 THREAD START (obj=576e, id = 200014, name=main-SendThread(), 
 group=main)
 THREAD START (obj=576d, id = 200013, name=main-EventThread, 
 group=main)
 THREAD START (obj=53ae, id = 200015, name=metrics-meter-tick-thread-1, 
 group=main)
 THREAD START (obj=53ae, id = 200016, name=metrics-meter-tick-thread-2, 
 group=main)
 THREAD START (obj=53ae, id = 200017, name=request-expiration-task, 
 group=main)
 THREAD START (obj=53ae, id = 200018, name=request-expiration-task, 
 group=main)
 THREAD START (obj=53ae, id = 200019, name=kafka-request-handler-0, 
 group=main)
 THREAD START (obj=53ae, id = 200020, name=kafka-request-handler-1, 
 group=main)
 THREAD START (obj=53ae, id = 200021, name=Thread-6, group=main)
 THREAD START (obj=53ae, id = 200022, name=Thread-7, group=main)
 THREAD START (obj=5899, id = 200023, name=ReplicaFetcherThread-0-2 on 
 broker 1, , group=main)
 THREAD START (obj=5899, id = 200024, name=ReplicaFetcherThread-0-3 on 
 broker 1, , group=main)
 THREAD START (obj=5899, id = 200025, name=ReplicaFetcherThread-0-0 on 
 broker 1, , group=main)
 THREAD START (obj=5899, id = 200026, name=ReplicaFetcherThread-0-1 on 
 broker 1, , group=main)
 THREAD START (obj=53ae, id = 200028, name=SIGINT handler, 
 group=system)
 THREAD START (obj=53ae, id = 200029, name=Thread-5, group=main)
 THREAD START (obj=574b, id = 200030, name=Thread-1, group=main)
 THREAD START (obj=574b, id = 200031, name=Thread-0, group=main)
 THREAD END (id = 200031)
 THREAD END (id = 200029)
 THREAD END (id = 200020)
 THREAD END (id = 200019)
 THREAD END (id = 28)
 THREAD END (id = 200021)
 THREAD END (id = 27)
 THREAD END (id = 200022)
 THREAD END (id = 200018)
 THREAD END (id = 200017)
 THREAD END (id = 200012)
 THREAD END (id = 200013)
 THREAD END (id = 200014)
 THREAD END (id = 200025)
 THREAD END (id = 200023)
 THREAD END (id = 200026)
 THREAD END (id = 200024)
 THREAD END (id = 200011)
 THREAD END (id = 29)
 THREAD END (id = 200010)
 THREAD END (id = 200030)
 THREAD END (id = 200028)
 TRACE 301281:
 sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown 
 line)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
 
 sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:218)
 sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
 
 java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
 

[jira] [Commented] (KAFKA-747) RequestChannel re-design

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122102#comment-14122102
 ] 

Guozhang Wang commented on KAFKA-747:
-

Moving to 0.9

 RequestChannel re-design
 

 Key: KAFKA-747
 URL: https://issues.apache.org/jira/browse/KAFKA-747
 Project: Kafka
  Issue Type: New Feature
  Components: network
Reporter: Jay Kreps
Assignee: Neha Narkhede
 Fix For: 0.9.0


 We have had some discussion around how to handle queuing requests. There are 
 two competing concerns:
 1. We need to maintain request order on a per-socket basis.
 2. We want to be able to balance load flexibly over a pool of threads so that 
 if one thread blocks on I/O request processing continues.
 Two Approaches We Have Considered
 1. Have a global queue of unprocessed requests. All I/O threads read requests 
 off this global queue and process them. To avoid re-ordering have the network 
 layer only read one request at a time.
 2. Have a queue per I/O thread and have the network threads statically map 
 sockets to I/O thread request queues.
 Problems With These Approaches
 In the first case you are not able to get any per-producer parallelism. That 
 is you can't read the next request while the current one is being handled. 
 This seems like it would not be a big deal, but preliminary benchmarks show 
 that it might be. 
 In the second case there are two problems. The first is that when an I/O 
 thread gets blocked all request processing for sockets attached to that I/O 
 thread will grind to a halt. If you have 10,000 connections, and  10 I/O 
 threads, then each blockage will stop 1,000 producers. If there is one topic 
 that has long synchronous flush times enabled (or is experiencing fsync 
 locking) this will cause big latency blips for all producers using that I/O 
 thread. The next problem is around backpressure and memory management. Say we 
 use BlockingQueues to feed the I/O threads. And say that one I/O thread 
 stalls. It's request queue will fill up and it will then block ALL network 
 threads, since they will block on inserting into that queue, even though the 
 other I/O threads are unused and have empty queues.
 A Proposed Better Solution
 The problem with the first solution is that we are not pipelining requests. 
 The problem with the second approach is that we are too constrained in moving 
 work from one I/O thread to another.
 Instead we should have a single request queue-like structure, but internally 
 enforce the condition that requests are not re-ordered.
 Here are the details. We retain RequestChannel but refactor its internals. 
 Internally we replace the blocking queue with a linked list. We also keep an 
 in-flight-keys array with one entry per I/O thread. When removing a work item 
 from the list we can't just take the first thing. Instead we need to walk the 
 list and look for something with a request key not in the in-flight-keys 
 array. When a response is sent, we remove that key from the in-flight array.
 This guarantees that requests for a socket with key K are ordered, but that 
 processing for K can only block requests made by K.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-661) Prevent a shutting down broker from re-entering the ISR

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-661:

Fix Version/s: (was: 0.8.2)
   0.9.0

 Prevent a shutting down broker from re-entering the ISR
 ---

 Key: KAFKA-661
 URL: https://issues.apache.org/jira/browse/KAFKA-661
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.0, 0.8.1
Reporter: Joel Koshy
 Fix For: 0.9.0


 There is a timing issue in controlled shutdown that affects low-volume 
 topics. The leader that is being shut down receives a leaderAndIsrRequest 
 informing it is no longer the leader and thus starts up a follower which 
 starts issuing fetch requests to the new leader. We then shrink the ISR and 
 send a StopReplicaRequest to the shutting down broker. However, the new 
 leader upon receiving the fetch request expands the ISR again.
 This does not really have critical impact in the sense that it can cause 
 producers to that topic to timeout. However, there are probably very few or 
 no produce requests coming in as it primarily affects low-volume topics. The 
 shutdown logic itself seems to be working correctly in that the leader has 
 been successfully moved.
 One possible approach would be to use the callback feature in the 
 ControllerBrokerRequestBatch and wait until the StopReplicaRequest has been 
 processed by the shutting down broker before shrinking the ISR; and there are 
 probably other ways as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1324) Debian packaging

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1324:
-
Fix Version/s: (was: 0.8.2)
   0.9.0

 Debian packaging
 

 Key: KAFKA-1324
 URL: https://issues.apache.org/jira/browse/KAFKA-1324
 Project: Kafka
  Issue Type: Improvement
  Components: packaging
 Environment: linux
Reporter: David Stendardi
Priority: Minor
  Labels: deb, debian, fpm, packaging
 Fix For: 0.9.0

 Attachments: packaging.patch


 The following patch add a task releaseDeb to the gradle build :
 ./gradlew releaseDeb
 This task should create a debian package in core/build/distributions using 
 fpm :
 https://github.com/jordansissel/fpm.
 We decided to use fpm so other package types would be easy to provide in 
 further iterations (eg : rpm).
 *Some implementations details* :
 - We splitted the releaseTarGz in two tasks : distDir, releaseTarGz.
 - We tried to use gradle builtin variables (project.name etc...)
 - By default the service will not start automatically so the user is free to 
 setup the service with custom configuration.
 Notes : 
  * FPM is required and should be in the path.
  * FPM does not allow yet to declare /etc/default/kafka as a conffiles (see : 
 https://github.com/jordansissel/fpm/issues/668)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1324) Debian packaging

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122104#comment-14122104
 ] 

Guozhang Wang commented on KAFKA-1324:
--

Moving to 0.9 for now.

 Debian packaging
 

 Key: KAFKA-1324
 URL: https://issues.apache.org/jira/browse/KAFKA-1324
 Project: Kafka
  Issue Type: Improvement
  Components: packaging
 Environment: linux
Reporter: David Stendardi
Priority: Minor
  Labels: deb, debian, fpm, packaging
 Fix For: 0.9.0

 Attachments: packaging.patch


 The following patch add a task releaseDeb to the gradle build :
 ./gradlew releaseDeb
 This task should create a debian package in core/build/distributions using 
 fpm :
 https://github.com/jordansissel/fpm.
 We decided to use fpm so other package types would be easy to provide in 
 further iterations (eg : rpm).
 *Some implementations details* :
 - We splitted the releaseTarGz in two tasks : distDir, releaseTarGz.
 - We tried to use gradle builtin variables (project.name etc...)
 - By default the service will not start automatically so the user is free to 
 setup the service with custom configuration.
 Notes : 
  * FPM is required and should be in the path.
  * FPM does not allow yet to declare /etc/default/kafka as a conffiles (see : 
 https://github.com/jordansissel/fpm/issues/668)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-334) Some tests fail when building on a Windows box

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-334:

Fix Version/s: (was: 0.8.2)
   0.9.0

 Some tests fail when building on a Windows box
 --

 Key: KAFKA-334
 URL: https://issues.apache.org/jira/browse/KAFKA-334
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.7
 Environment: Windows 7 - reproduces under command shell, cygwin, and 
 MINGW32 (Git Bash)
Reporter: Roman Garcia
Priority: Minor
  Labels: build-failure, test-fail
 Fix For: 0.9.0


 Trying to create a ZIP distro from sources failed.
 On Win7. On cygwin, command shell and git bash.
 Tried with incubator-src download from ASF download page, as well as fresh 
 checkout from latest trunk (r1329547).
 Once I tried the same on a Linux box, everything was working ok.
 svn co http://svn.apache.org/repos/asf/incubator/kafka/trunk kafka-0.7.0
 ./sbt update (OK)
 ./sbt package (OK)
 ./sbt release-zip (FAIL)
 Tests failing:
 [error] Error running kafka.integration.LazyInitProducerTest: Test FAILED
 [error] Error running kafka.zk.ZKLoadBalanceTest: Test FAILED
 [error] Error running kafka.javaapi.producer.ProducerTest: Test FAILED
 [error] Error running kafka.producer.ProducerTest: Test FAILED
 [error] Error running test: One or more subtasks failed
 [error] Error running doc: Scaladoc generation failed
 Stacks:
 [error] Test Failed: testZKSendWithDeadBroker
 junit.framework.AssertionFailedError: Message set should have another message
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at 
 kafka.javaapi.producer.ProducerTest.testZKSendWithDeadBroker(ProducerTest.scala:448)
 [error] Test Failed: testZKSendToNewTopic
 junit.framework.AssertionFailedError: Message set should have 1 message
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at 
 kafka.javaapi.producer.ProducerTest.testZKSendToNewTopic(ProducerTest.scala:416)
 [error] Test Failed: testLoadBalance(kafka.zk.ZKLoadBalanceTest)
 junit.framework.AssertionFailedError: expected:5 but was:0
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.failNotEquals(Assert.java:277)
 at junit.framework.Assert.assertEquals(Assert.java:64)
 at junit.framework.Assert.assertEquals(Assert.java:195)
 at junit.framework.Assert.assertEquals(Assert.java:201)
 at 
 kafka.zk.ZKLoadBalanceTest.checkSetEqual(ZKLoadBalanceTest.scala:121)
 at 
 kafka.zk.ZKLoadBalanceTest.testLoadBalance(ZKLoadBalanceTest.scala:89)
 [error] Test Failed: testPartitionedSendToNewTopic
 java.lang.AssertionError:
   Unexpected method call send(test-topic1, 0, 
 ByteBufferMessageSet(MessageAndOffset(message(magic = 1, attributes = 0, crc 
 = 2326977762, payload = java.nio.HeapByteBuffer[pos=0 lim=5 cap=5]),15), )):
 close(): expected: 1, actual: 0
 at 
 org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:45)
 at 
 org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:73)
 at 
 org.easymock.internal.ClassProxyFactory$MockMethodInterceptor.intercept(ClassProxyFactory.java:92)
 at 
 kafka.producer.SyncProducer$$EnhancerByCGLIB$$4385e618.send(generated)
 at 
 kafka.producer.ProducerPool$$anonfun$send$1.apply$mcVI$sp(ProducerPool.scala:114)
 at 
 kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100)
 at 
 kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100)
 at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
 at kafka.producer.ProducerPool.send(ProducerPool.scala:100)
 at kafka.producer.Producer.zkSend(Producer.scala:137)
 at kafka.producer.Producer.send(Producer.scala:99)
 at 
 kafka.producer.ProducerTest.testPartitionedSendToNewTopic(ProducerTest.scala:576)
 [error] Test Failed: testZKSendToNewTopic
 junit.framework.AssertionFailedError: Message set should have 1 message
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at 
 kafka.producer.ProducerTest.testZKSendToNewTopic(ProducerTest.scala:429)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-334) Some tests fail when building on a Windows box

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122105#comment-14122105
 ] 

Guozhang Wang commented on KAFKA-334:
-

Moving the fix to post 0.8.2 for now.

 Some tests fail when building on a Windows box
 --

 Key: KAFKA-334
 URL: https://issues.apache.org/jira/browse/KAFKA-334
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.7
 Environment: Windows 7 - reproduces under command shell, cygwin, and 
 MINGW32 (Git Bash)
Reporter: Roman Garcia
Priority: Minor
  Labels: build-failure, test-fail
 Fix For: 0.9.0


 Trying to create a ZIP distro from sources failed.
 On Win7. On cygwin, command shell and git bash.
 Tried with incubator-src download from ASF download page, as well as fresh 
 checkout from latest trunk (r1329547).
 Once I tried the same on a Linux box, everything was working ok.
 svn co http://svn.apache.org/repos/asf/incubator/kafka/trunk kafka-0.7.0
 ./sbt update (OK)
 ./sbt package (OK)
 ./sbt release-zip (FAIL)
 Tests failing:
 [error] Error running kafka.integration.LazyInitProducerTest: Test FAILED
 [error] Error running kafka.zk.ZKLoadBalanceTest: Test FAILED
 [error] Error running kafka.javaapi.producer.ProducerTest: Test FAILED
 [error] Error running kafka.producer.ProducerTest: Test FAILED
 [error] Error running test: One or more subtasks failed
 [error] Error running doc: Scaladoc generation failed
 Stacks:
 [error] Test Failed: testZKSendWithDeadBroker
 junit.framework.AssertionFailedError: Message set should have another message
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at 
 kafka.javaapi.producer.ProducerTest.testZKSendWithDeadBroker(ProducerTest.scala:448)
 [error] Test Failed: testZKSendToNewTopic
 junit.framework.AssertionFailedError: Message set should have 1 message
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at 
 kafka.javaapi.producer.ProducerTest.testZKSendToNewTopic(ProducerTest.scala:416)
 [error] Test Failed: testLoadBalance(kafka.zk.ZKLoadBalanceTest)
 junit.framework.AssertionFailedError: expected:5 but was:0
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.failNotEquals(Assert.java:277)
 at junit.framework.Assert.assertEquals(Assert.java:64)
 at junit.framework.Assert.assertEquals(Assert.java:195)
 at junit.framework.Assert.assertEquals(Assert.java:201)
 at 
 kafka.zk.ZKLoadBalanceTest.checkSetEqual(ZKLoadBalanceTest.scala:121)
 at 
 kafka.zk.ZKLoadBalanceTest.testLoadBalance(ZKLoadBalanceTest.scala:89)
 [error] Test Failed: testPartitionedSendToNewTopic
 java.lang.AssertionError:
   Unexpected method call send(test-topic1, 0, 
 ByteBufferMessageSet(MessageAndOffset(message(magic = 1, attributes = 0, crc 
 = 2326977762, payload = java.nio.HeapByteBuffer[pos=0 lim=5 cap=5]),15), )):
 close(): expected: 1, actual: 0
 at 
 org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:45)
 at 
 org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:73)
 at 
 org.easymock.internal.ClassProxyFactory$MockMethodInterceptor.intercept(ClassProxyFactory.java:92)
 at 
 kafka.producer.SyncProducer$$EnhancerByCGLIB$$4385e618.send(generated)
 at 
 kafka.producer.ProducerPool$$anonfun$send$1.apply$mcVI$sp(ProducerPool.scala:114)
 at 
 kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100)
 at 
 kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100)
 at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
 at kafka.producer.ProducerPool.send(ProducerPool.scala:100)
 at kafka.producer.Producer.zkSend(Producer.scala:137)
 at kafka.producer.Producer.send(Producer.scala:99)
 at 
 kafka.producer.ProducerTest.testPartitionedSendToNewTopic(ProducerTest.scala:576)
 [error] Test Failed: testZKSendToNewTopic
 junit.framework.AssertionFailedError: Message set should have 1 message
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at 
 kafka.producer.ProducerTest.testZKSendToNewTopic(ProducerTest.scala:429)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-313) Add JSON output and looping options to ConsumerOffsetChecker

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122107#comment-14122107
 ] 

Guozhang Wang commented on KAFKA-313:
-

[~jjkoshy] Do we still need this feature today?

 Add JSON output and looping options to ConsumerOffsetChecker
 

 Key: KAFKA-313
 URL: https://issues.apache.org/jira/browse/KAFKA-313
 Project: Kafka
  Issue Type: Improvement
Reporter: Dave DeMaagd
Priority: Minor
  Labels: patch
 Fix For: 0.8.2

 Attachments: KAFKA-313-2012032200.diff


 Adds:
 * '--loop N' - causes the program to loop forever, sleeping for up to N 
 seconds between loops (loop time minus collection time, unless that's less 
 than 0, at which point it will just run again immediately)
 * '--asjson' - display as a JSON string instead of the more human readable 
 output format.
 Neither of the above  depend on each other (you can loop in the human 
 readable output, or do a single shot execution with JSON output).  Existing 
 behavior/output maintained if neither of the above are used.  Diff Attached.
 Impacted files:
 core/src/main/scala/kafka/tools/ConsumerOffsetChecker.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-1313) Support adding replicas to existing topic partitions

2014-09-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-1313.
--
   Resolution: Fixed
Fix Version/s: (was: 0.9.0)
   0.8.2

 Support adding replicas to existing topic partitions
 

 Key: KAFKA-1313
 URL: https://issues.apache.org/jira/browse/KAFKA-1313
 Project: Kafka
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.8.0
Reporter: Marc Labbe
Priority: Critical
 Fix For: 0.8.2


 There is currently no easy way to add replicas to an existing topic 
 partitions.
 For example, topic create-test has been created with ReplicationFactor=1: 
 Topic:create-test  PartitionCount:3ReplicationFactor:1 Configs:
 Topic: create-test Partition: 0Leader: 1   Replicas: 1 Isr: 1
 Topic: create-test Partition: 1Leader: 2   Replicas: 2 Isr: 2
 Topic: create-test Partition: 2Leader: 3   Replicas: 3 Isr: 3
 I would like to increase the ReplicationFactor=2 (or more) so it shows up 
 like this instead.
 Topic:create-test  PartitionCount:3ReplicationFactor:2 Configs:
 Topic: create-test Partition: 0Leader: 1   Replicas: 1,2 Isr: 1,2
 Topic: create-test Partition: 1Leader: 2   Replicas: 2,3 Isr: 2,3
 Topic: create-test Partition: 2Leader: 3   Replicas: 3,1 Isr: 3,1
 Use cases for this:
 - adding brokers and thus increase fault tolerance
 - fixing human errors for topics created with wrong values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1313) Support adding replicas to existing topic partitions

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122115#comment-14122115
 ] 

Guozhang Wang commented on KAFKA-1313:
--

My bad. Closing now.

 Support adding replicas to existing topic partitions
 

 Key: KAFKA-1313
 URL: https://issues.apache.org/jira/browse/KAFKA-1313
 Project: Kafka
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.8.0
Reporter: Marc Labbe
Priority: Critical
 Fix For: 0.8.2


 There is currently no easy way to add replicas to an existing topic 
 partitions.
 For example, topic create-test has been created with ReplicationFactor=1: 
 Topic:create-test  PartitionCount:3ReplicationFactor:1 Configs:
 Topic: create-test Partition: 0Leader: 1   Replicas: 1 Isr: 1
 Topic: create-test Partition: 1Leader: 2   Replicas: 2 Isr: 2
 Topic: create-test Partition: 2Leader: 3   Replicas: 3 Isr: 3
 I would like to increase the ReplicationFactor=2 (or more) so it shows up 
 like this instead.
 Topic:create-test  PartitionCount:3ReplicationFactor:2 Configs:
 Topic: create-test Partition: 0Leader: 1   Replicas: 1,2 Isr: 1,2
 Topic: create-test Partition: 1Leader: 2   Replicas: 2,3 Isr: 2,3
 Topic: create-test Partition: 2Leader: 3   Replicas: 3,1 Isr: 3,1
 Use cases for this:
 - adding brokers and thus increase fault tolerance
 - fixing human errors for topics created with wrong values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1558) AdminUtils.deleteTopic does not work

2014-09-04 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122134#comment-14122134
 ] 

Sriharsha Chintalapani commented on KAFKA-1558:
---

[~guozhang] I am waiting some logs to show up supposed issue. I can put in 
sometime this week get this done if someone can give me reproduction steps or 
logs. Thanks

 AdminUtils.deleteTopic does not work
 

 Key: KAFKA-1558
 URL: https://issues.apache.org/jira/browse/KAFKA-1558
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Henning Schmiedehausen
Assignee: Sriharsha Chintalapani
 Fix For: 0.8.2


 the AdminUtils:.deleteTopic method is implemented as
 {code}
 def deleteTopic(zkClient: ZkClient, topic: String) {
 ZkUtils.createPersistentPath(zkClient, 
 ZkUtils.getDeleteTopicPath(topic))
 }
 {code}
 but the DeleteTopicCommand actually does
 {code}
 zkClient = new ZkClient(zkConnect, 3, 3, ZKStringSerializer)
 zkClient.deleteRecursive(ZkUtils.getTopicPath(topic))
 {code}
 so I guess, that the 'createPersistentPath' above should actually be 
 {code}
 def deleteTopic(zkClient: ZkClient, topic: String) {
 ZkUtils.deletePathRecursive(zkClient, ZkUtils.getTopicPath(topic))
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1482) Transient test failures for kafka.admin.DeleteTopicTest

2014-09-04 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122136#comment-14122136
 ] 

Sriharsha Chintalapani commented on KAFKA-1482:
---

[~guozhang] Are you working on this JIRA or can I give it a shot. Thanks

 Transient test failures for kafka.admin.DeleteTopicTest
 ---

 Key: KAFKA-1482
 URL: https://issues.apache.org/jira/browse/KAFKA-1482
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Jun Rao
  Labels: newbie
 Fix For: 0.8.2


 A couple of test cases have timing related transient test failures:
 kafka.admin.DeleteTopicTest  testPartitionReassignmentDuringDeleteTopic 
 FAILED
 junit.framework.AssertionFailedError: Admin path /admin/delete_topic/test 
 path not deleted even after a replica is restarted
 at junit.framework.Assert.fail(Assert.java:47)
 at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:578)
 at 
 kafka.admin.DeleteTopicTest.verifyTopicDeletion(DeleteTopicTest.scala:333)
 at 
 kafka.admin.DeleteTopicTest.testPartitionReassignmentDuringDeleteTopic(DeleteTopicTest.scala:197)
 kafka.admin.DeleteTopicTest  testDeleteTopicDuringAddPartition FAILED
 junit.framework.AssertionFailedError: Replica logs not deleted after 
 delete topic is complete
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at 
 kafka.admin.DeleteTopicTest.verifyTopicDeletion(DeleteTopicTest.scala:338)
 at 
 kafka.admin.DeleteTopicTest.testDeleteTopicDuringAddPartition(DeleteTopicTest.scala:216)
 kafka.admin.DeleteTopicTest  testRequestHandlingDuringDeleteTopic FAILED
 org.scalatest.junit.JUnitTestFailedError: fails with exception
 at 
 org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:102)
 at 
 org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:142)
 at org.scalatest.Assertions$class.fail(Assertions.scala:664)
 at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:142)
 at 
 kafka.admin.DeleteTopicTest.testRequestHandlingDuringDeleteTopic(DeleteTopicTest.scala:123)
 Caused by:
 org.scalatest.junit.JUnitTestFailedError: Test should fail because 
 the topic is being deleted
 at 
 org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:101)
 at 
 org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:142)
 at org.scalatest.Assertions$class.fail(Assertions.scala:644)
 at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:142)
 at 
 kafka.admin.DeleteTopicTest.testRequestHandlingDuringDeleteTopic(DeleteTopicTest.scala:120)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1510) Force offset commits when migrating consumer offsets from zookeeper to kafka

2014-09-04 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122243#comment-14122243
 ] 

Joel Koshy commented on KAFKA-1510:
---

[~nmarasoi] Your patch looks good to me - however, it does not cleanly apply on 
the latest trunk. Would you mind rebasing? If you don't have time I can take 
care of it as well.

 Force offset commits when migrating consumer offsets from zookeeper to kafka
 

 Key: KAFKA-1510
 URL: https://issues.apache.org/jira/browse/KAFKA-1510
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Joel Koshy
Assignee: Joel Koshy
  Labels: newbie
 Fix For: 0.8.2

 Attachments: 
 Patch_to_push_unfiltered_offsets_to_both_Kafka_and_potentially_Zookeeper_when_Kafka_is_con.patch,
  Unfiltered_to_kafka,_Incremental_to_Zookeeper.patch


 When migrating consumer offsets from ZooKeeper to kafka, we have to turn on 
 dual-commit (i.e., the consumers will commit offsets to both zookeeper and 
 kafka) in addition to setting offsets.storage to kafka. However, when we 
 commit offsets we only commit offsets if they have changed (since the last 
 commit). For low-volume topics or for topics that receive data in bursts 
 offsets may not move for a long period of time. Therefore we may want to 
 force the commit (even if offsets have not changed) when migrating (i.e., 
 when dual-commit is enabled) - we can add a minimum interval threshold (say 
 force commit after every 10 auto-commits) as well as on rebalance and 
 shutdown.
 Also, I think it is safe to switch the default for offsets.storage from 
 zookeeper to kafka and set the default to dual-commit (for people who have 
 not migrated yet). We have deployed this to the largest consumers at linkedin 
 and have not seen any issues so far (except for the migration caveat that 
 this jira will resolve).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1482) Transient test failures for kafka.admin.DeleteTopicTest

2014-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122420#comment-14122420
 ] 

Guozhang Wang commented on KAFKA-1482:
--

I am not work on that JIRA, not sure if Jun is looking into it though.

 Transient test failures for kafka.admin.DeleteTopicTest
 ---

 Key: KAFKA-1482
 URL: https://issues.apache.org/jira/browse/KAFKA-1482
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Jun Rao
  Labels: newbie
 Fix For: 0.8.2


 A couple of test cases have timing related transient test failures:
 kafka.admin.DeleteTopicTest  testPartitionReassignmentDuringDeleteTopic 
 FAILED
 junit.framework.AssertionFailedError: Admin path /admin/delete_topic/test 
 path not deleted even after a replica is restarted
 at junit.framework.Assert.fail(Assert.java:47)
 at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:578)
 at 
 kafka.admin.DeleteTopicTest.verifyTopicDeletion(DeleteTopicTest.scala:333)
 at 
 kafka.admin.DeleteTopicTest.testPartitionReassignmentDuringDeleteTopic(DeleteTopicTest.scala:197)
 kafka.admin.DeleteTopicTest  testDeleteTopicDuringAddPartition FAILED
 junit.framework.AssertionFailedError: Replica logs not deleted after 
 delete topic is complete
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at 
 kafka.admin.DeleteTopicTest.verifyTopicDeletion(DeleteTopicTest.scala:338)
 at 
 kafka.admin.DeleteTopicTest.testDeleteTopicDuringAddPartition(DeleteTopicTest.scala:216)
 kafka.admin.DeleteTopicTest  testRequestHandlingDuringDeleteTopic FAILED
 org.scalatest.junit.JUnitTestFailedError: fails with exception
 at 
 org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:102)
 at 
 org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:142)
 at org.scalatest.Assertions$class.fail(Assertions.scala:664)
 at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:142)
 at 
 kafka.admin.DeleteTopicTest.testRequestHandlingDuringDeleteTopic(DeleteTopicTest.scala:123)
 Caused by:
 org.scalatest.junit.JUnitTestFailedError: Test should fail because 
 the topic is being deleted
 at 
 org.scalatest.junit.AssertionsForJUnit$class.newAssertionFailedException(AssertionsForJUnit.scala:101)
 at 
 org.scalatest.junit.JUnit3Suite.newAssertionFailedException(JUnit3Suite.scala:142)
 at org.scalatest.Assertions$class.fail(Assertions.scala:644)
 at org.scalatest.junit.JUnit3Suite.fail(JUnit3Suite.scala:142)
 at 
 kafka.admin.DeleteTopicTest.testRequestHandlingDuringDeleteTopic(DeleteTopicTest.scala:120)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1510) Force offset commits when migrating consumer offsets from zookeeper to kafka

2014-09-04 Thread nicu marasoiu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nicu marasoiu updated KAFKA-1510:
-
Attachment: (was: Unfiltered_to_kafka,_Incremental_to_Zookeeper.patch)

 Force offset commits when migrating consumer offsets from zookeeper to kafka
 

 Key: KAFKA-1510
 URL: https://issues.apache.org/jira/browse/KAFKA-1510
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Joel Koshy
Assignee: Joel Koshy
  Labels: newbie
 Fix For: 0.8.2

 Attachments: 
 Patch_to_push_unfiltered_offsets_to_both_Kafka_and_potentially_Zookeeper_when_Kafka_is_con.patch


 When migrating consumer offsets from ZooKeeper to kafka, we have to turn on 
 dual-commit (i.e., the consumers will commit offsets to both zookeeper and 
 kafka) in addition to setting offsets.storage to kafka. However, when we 
 commit offsets we only commit offsets if they have changed (since the last 
 commit). For low-volume topics or for topics that receive data in bursts 
 offsets may not move for a long period of time. Therefore we may want to 
 force the commit (even if offsets have not changed) when migrating (i.e., 
 when dual-commit is enabled) - we can add a minimum interval threshold (say 
 force commit after every 10 auto-commits) as well as on rebalance and 
 shutdown.
 Also, I think it is safe to switch the default for offsets.storage from 
 zookeeper to kafka and set the default to dual-commit (for people who have 
 not migrated yet). We have deployed this to the largest consumers at linkedin 
 and have not seen any issues so far (except for the migration caveat that 
 this jira will resolve).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1510) Force offset commits when migrating consumer offsets from zookeeper to kafka

2014-09-04 Thread nicu marasoiu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nicu marasoiu updated KAFKA-1510:
-
Attachment: Unfiltered_offsets_commit_to_kafka_rebased.patch

Attached rebased patch

 Force offset commits when migrating consumer offsets from zookeeper to kafka
 

 Key: KAFKA-1510
 URL: https://issues.apache.org/jira/browse/KAFKA-1510
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Joel Koshy
Assignee: Joel Koshy
  Labels: newbie
 Fix For: 0.8.2

 Attachments: 
 Patch_to_push_unfiltered_offsets_to_both_Kafka_and_potentially_Zookeeper_when_Kafka_is_con.patch,
  Unfiltered_offsets_commit_to_kafka_rebased.patch


 When migrating consumer offsets from ZooKeeper to kafka, we have to turn on 
 dual-commit (i.e., the consumers will commit offsets to both zookeeper and 
 kafka) in addition to setting offsets.storage to kafka. However, when we 
 commit offsets we only commit offsets if they have changed (since the last 
 commit). For low-volume topics or for topics that receive data in bursts 
 offsets may not move for a long period of time. Therefore we may want to 
 force the commit (even if offsets have not changed) when migrating (i.e., 
 when dual-commit is enabled) - we can add a minimum interval threshold (say 
 force commit after every 10 auto-commits) as well as on rebalance and 
 shutdown.
 Also, I think it is safe to switch the default for offsets.storage from 
 zookeeper to kafka and set the default to dual-commit (for people who have 
 not migrated yet). We have deployed this to the largest consumers at linkedin 
 and have not seen any issues so far (except for the migration caveat that 
 this jira will resolve).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1282) Disconnect idle socket connection in Selector

2014-09-04 Thread nicu marasoiu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122456#comment-14122456
 ] 

nicu marasoiu commented on KAFKA-1282:
--

Hi, I am not completely sure I fully understood your solution in point 2: 

Do you mean to close at most one connection per iteration, right? This is ok, 
the worst case scenario is closing 100K old connections in 10 hours, one per 
select.

On storing the time to close in a local variable, the access of the oldest 
entry every iteration is O(1) super cheap so I would skip this optimization. 

 Disconnect idle socket connection in Selector
 -

 Key: KAFKA-1282
 URL: https://issues.apache.org/jira/browse/KAFKA-1282
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.8.2
Reporter: Jun Rao
Assignee: Neha Narkhede
  Labels: newbie++
 Fix For: 0.9.0

 Attachments: 
 KAFKA-1282_Disconnect_idle_socket_connection_in_Selector.patch, 
 idleDisconnect.patch


 To reduce # socket connections, it would be useful for the new producer to 
 close socket connections that are idle. We can introduce a new producer 
 config for the idle time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >