[jira] [Commented] (KAFKA-1408) Kafk broker can not stop itself normaly after problems with connection to ZK

2015-02-09 Thread Dmitry Bugaychenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313765#comment-14313765
 ] 

Dmitry Bugaychenko commented on KAFKA-1408:
---

When topic deletion is disabled the problem is not reproducable. Have not tried 
to enable deletion yet. And yes, I think its a duplicate of KAFKA-1317

> Kafk broker can not stop itself normaly after problems with connection to ZK
> 
>
> Key: KAFKA-1408
> URL: https://issues.apache.org/jira/browse/KAFKA-1408
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.1
>Reporter: Dmitry Bugaychenko
>Assignee: Neha Narkhede
>
> After getting to inconsistence state due to short netwrok failure broker can 
> not stop itself. The last message in the log is:
> {code}
> INFO   | jvm 1| 2014/04/21 08:53:07 | [2014-04-21 09:53:06,999] INFO 
> [kafka-log-cleaner-thread-0], Stopped  (kafka.log.LogCleaner)
> INFO   | jvm 1| 2014/04/21 08:53:07 | [2014-04-21 09:53:06,999] INFO 
> [kafka-log-cleaner-thread-0], Shutdown completed (kafka.log.LogCleaner)
> {code}
> There is also a preceding error:
> {code}
> INFO   | jvm 1| 2014/04/21 08:52:55 | [2014-04-21 09:52:55,015] WARN 
> Controller doesn't exist (kafka.utils.Utils$)
> INFO   | jvm 1| 2014/04/21 08:52:55 | kafka.common.KafkaException: 
> Controller doesn't exist
> INFO   | jvm 1| 2014/04/21 08:52:55 |   at 
> kafka.utils.ZkUtils$.getController(ZkUtils.scala:70)
> INFO   | jvm 1| 2014/04/21 08:52:55 |   at 
> kafka.server.KafkaServer.kafka$server$KafkaServer$$controlledShutdown(KafkaServer.scala:148)
> INFO   | jvm 1| 2014/04/21 08:52:55 |   at 
> kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$sp(KafkaServer.scala:220)
> {code}
> Here is a part of jstack (it looks like there is a deadlock between 
> delete-topics-thread  and ZkClient-EventThread):
> {code}
> IWrapper-Connection id=10 state=WAITING
> - waiting on <0x15d6aa44> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> - locked <0x15d6aa44> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>  owned by ZkClient-EventThread-37-devlnx2:2181 id=37
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> at kafka.utils.Utils$.inLock(Utils.scala:536)
> at kafka.controller.KafkaController.shutdown(KafkaController.scala:641)
> at 
> kafka.server.KafkaServer$$anonfun$shutdown$8.apply$mcV$sp(KafkaServer.scala:233)
> at kafka.utils.Utils$.swallow(Utils.scala:167)
> at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
> at kafka.utils.Utils$.swallowWarn(Utils.scala:46)
> at kafka.utils.Logging$class.swallow(Logging.scala:94)
> at kafka.utils.Utils$.swallow(Utils.scala:46)
> at kafka.server.KafkaServer.shutdown(KafkaServer.scala:233)
> at odkl.databus.server.Main.stop(Main.java:184)
> at 
> org.tanukisoftware.wrapper.WrapperManager.stopInner(WrapperManager.java:1982)
> at 
> org.tanukisoftware.wrapper.WrapperManager.handleSocket(WrapperManager.java:2391)
> at org.tanukisoftware.wrapper.WrapperManager.run(WrapperManager.java:2696)
> at java.lang.Thread.run(Thread.java:744)
> ZkClient-EventThread-37-devlnx2:2181 id=37 state=WAITING
> - waiting on <0x3d5f9878> (a java.util.concurrent.CountDownLatch$Sync)
> - locked <0x3d5f9878> (a java.util.concurrent.CountDownLatch$Sync)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
> at kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:36)
> at 
> kafka.controller.TopicDeletionManager.shutdown(TopicDeletionManager.scala:93)
> at 
> kafka.controller.KafkaControlle

Re: [DISCUSS] KIP-8 Add a flush method to the new Java producer

2015-02-09 Thread Bhavesh Mistry
HI Jay,

Imagine, if you have flaky network connection to brokers, and if flush()
will be blocked if "one of broker is not available" ( basically How would
be address failure mode and io thread not able to drain records or busy due
to pending request". Do you flush() method is only to flush to in mem queue
or flush to broker over the network().

Timeout helps with and pushing caller to handle what to do  ?  e.g
re-enqueue records, drop entire batch or one of message is too big cross
the limit of max.message.size etc...

Also, according to java doc for API  "The method will block until all
previously sent records have completed sending (either successfully or with
an error)", does this by-pass rule set by for block.on.buffer.full or
batch.size
when under load.

That was my intention, and I am sorry I mixed-up close() method here
without knowing that this is only for bulk send.


Thanks,

Bhavesh

On Mon, Feb 9, 2015 at 8:17 PM, Jay Kreps  wrote:

> Yeah I second the problem Guozhang flags with giving flush a timeout. In
> general failover in Kafka is a bounded thing unless you have brought your
> Kafka cluster down entirely so I think depending on that bound implicitly
> is okay.
>
> It is possible to make flush() be instead
>   boolean tryFlush(long timeout, TimeUnit unit);
>
> But I am somewhat skeptical that people will use this correctly. I.e
> consider the mirror maker code snippet I gave above, how would one actually
> recover in this case other than retrying (which the client already does
> automatically)? After all if you are okay losing data then you don't need
> to bother calling flush at all, you can just let the messages be sent
> asynchronously.
>
> I think close() is actually different because you may well want to shutdown
> immediately and just throw away unsent events.
>
> -Jay
>
> On Mon, Feb 9, 2015 at 2:44 PM, Guozhang Wang  wrote:
>
> > The proposal looks good to me, will need some time to review the
> > implementation RB later.
> >
> > Bhavesh, I am wondering how you will use a flush() with a timeout since
> > such a call does not actually provide any flushing guarantees?
> >
> > As for close(), there is a separate JIRA for this:
> >
> > KAFKA-1660 
> >
> > Guozhang
> >
> >
> > On Mon, Feb 9, 2015 at 2:29 PM, Bhavesh Mistry <
> mistry.p.bhav...@gmail.com
> > >
> > wrote:
> >
> > > Hi Jay,
> > >
> > > How about adding timeout for each method calls flush(timeout,TimeUnit)
> > and
> > > close(timeout,TimeUNIT) ?  We had runway io thread issue and caller
> > thread
> > > should not blocked for ever for these methods ?
> > >
> > >
> > > Thanks,
> > >
> > > Bhavesh
> > >
> > > On Sun, Feb 8, 2015 at 12:18 PM, Jay Kreps 
> wrote:
> > >
> > > > Well actually in the case of linger.ms = 0 the send is still
> > > asynchronous
> > > > so calling flush() blocks until all the previously sent records have
> > > > completed. It doesn't speed anything up in that case, though, since
> > they
> > > > are already available to send.
> > > >
> > > > -Jay
> > > >
> > > > On Sun, Feb 8, 2015 at 10:36 AM, Gwen Shapira  >
> > > > wrote:
> > > >
> > > > > Looks good to me.
> > > > >
> > > > > I like the idea of not blocking additional sends but not
> guaranteeing
> > > > that
> > > > > flush() will deliver them.
> > > > >
> > > > > I assume that with linger.ms = 0, flush will just be a noop (since
> > the
> > > > > queue will be empty). Is that correct?
> > > > >
> > > > > Gwen
> > > > >
> > > > > On Sun, Feb 8, 2015 at 10:25 AM, Jay Kreps 
> > > wrote:
> > > > >
> > > > > > Following up on our previous thread on making batch send a little
> > > > easier,
> > > > > > here is a concrete proposal to add a flush() method to the
> > producer:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+method+to+the+producer+API
> > > > > >
> > > > > > A proposed implementation is here:
> > > > > > https://issues.apache.org/jira/browse/KAFKA-1865
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > -Jay
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: New consumer client

2015-02-09 Thread Bhavesh Mistry
Hi Jay,

1) Sorry to get back to you so late.  It is CRC check error on any consumer
thread regardless of the server.   What happens is I have to catch this
exception is skip the message now.  There is no option to re-fetch this
message.   Is there any way to add behavior in Java consumer to re-fetch
this offset CRC failed offset.


2)  Secondly,  can you please add default behavior to auto set
'fetch.message.max.bytes' = broker's message.max.bytes.  This will ensure
smooth configuration for both simple and high level consumer. This will
take burden away from Kafka user to config this property.  We had lag issue
due to this mis configuration and drop messages on Camus side and (camus
has different setting for simple consumer).  It would be great to auto
config this if user did not supply this in configuration.

Let me know if you agree with #2.

Thanks,

Bhavesh

On Mon, Jan 12, 2015 at 9:25 AM, Jay Kreps  wrote:

> Hey Bhavesh,
>
> This seems like a serious issue and not one anyone else has reported. I
> don't know what you mean by corrupt message, are you saying the CRC check
> fails? If so, that check is done both by the broker (prior to appending to
> the log) and the consumer so that implies either a bug in the broker or
> else disk corruption on the server.
>
> I do have an option to disable the CRC check in the consumer, though
> depending on the nature of the corruption that can just lead to more
> serious errors (depending on what is corrupted).
>
> -jay
>
> On Sun, Jan 11, 2015 at 11:00 PM, Bhavesh Mistry <
> mistry.p.bhav...@gmail.com
> > wrote:
>
> > Hi Jay,
> >
> > One of the pain point of existing consumer code is CORRUPT_MESSAGE
> > occasionally. Right now, it is hard to pin-point the problem of
> > CORRUPT_MESSAGE especially when this happen on Mirror Maker side. Is
> there
> > any proposal to auto skip corrupted message and have reporting visibility
> > of CRC error(metics etc or traceability to find corruption).per topic
> etc ?
> > I am not sure if this is correct email thread to address this if not
> please
> > let me know.
> >
> > Will provide feedback about new consumer api and changes.
> > Thanks,
> >
> > Bhavesh
> >
> > On Sun, Jan 11, 2015 at 7:57 PM, Jay Kreps  wrote:
> >
> > > I uploaded an updated version of the new consumer client (
> > > https://issues.apache.org/jira/browse/KAFKA-1760). This is now almost
> > > feature complete, and has pretty reasonable testing and metrics. I
> think
> > it
> > > is ready for review and could be checked in once 0.8.2 is out.
> > >
> > > For those who haven't been following this is meant to be a new consumer
> > > client, like the new producer is 0.8.2, and intended to replace the
> > > existing "high level" and "simple" scala consumers.
> > >
> > > This still needs the server-side implementation of the partition
> > assignment
> > > and group management to be fully functional. I have just stubbed this
> out
> > > in the server to allow the implementation and testing of the server but
> > > actual usage will require it. However the client that exists now is
> > > actually a fully functional replacement for the "simple consumer" that
> is
> > > vastly easier to use correctly as it internally does all the discovery
> > and
> > > failover.
> > >
> > > It would be great if people could take a look at this code, and
> > > particularly at the public apis which have several small changes from
> the
> > > original proposal.
> > >
> > > Summary
> > >
> > > What's there:
> > > 1. Simple consumer functionality
> > > 2. Offset commit and fetch
> > > 3. Ability to change position with seek
> > > 4. Ability to commit all or just some offsets
> > > 5. Controller discovery, failure detection, heartbeat, and fail-over
> > > 6. Controller partition assignment
> > > 7. Logging
> > > 8. Metrics
> > > 9. Integration tests including tests that simulate random broker
> failures
> > > 10. Integration into the consumer performance test
> > >
> > > Limitations:
> > > 1. There could be some lingering bugs in the group management support,
> it
> > > is hard to fully test fully with just the stub support on the server,
> so
> > > we'll need to get the server working to do better I think.
> > > 2. I haven't implemented wild-card subscriptions yet.
> > > 3. No integration with console consumer yet
> > >
> > > Performance
> > >
> > > I did some performance comparison with the old consumer over localhost
> on
> > > my laptop. Usually localhost isn't good for testing but in this case it
> > is
> > > good because it has near infinite bandwidth so it does a good job at
> > > catching inefficiencies that would be hidden with a slower network.
> These
> > > numbers probably aren't representative of what you would get over a
> real
> > > network, but help bring out the relative efficiencies.
> > > Here are the results:
> > > - Old high-level consumer: 213 MB/sec
> > > - New consumer: 225 MB/sec
> > > - Old simple consumer: 242 Mb/sec
> > >
> > > It may be hard to get this client u

Re: Kafka New(Java) Producer Connection reset by peer error and LB

2015-02-09 Thread Bhavesh Mistry
HI Kafka Team,

Please confirm if you would like to open Jira issue to track this ?

Thanks,

Bhavesh

On Mon, Feb 9, 2015 at 12:39 PM, Bhavesh Mistry 
wrote:

> Hi Kakfa Team,
>
> We are getting this connection reset by pears after couple of minute aster
> start-up of producer due to infrastructure deployment strategies we have
> adopted from LinkedIn.
>
> We have LB hostname and port as seed server, and all producers are getting
> following exception because of TCP idle connection timeout set on LB (which
> is 2 minutes and Kafka TCP connection idle is set to 10 minutes).   This
> seems to be  minor bug to close TCP connection after discovering that seed
> server is not part of brokers list immediately.
>
>
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcher.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
> at sun.nio.ch.IOUtil.read(IOUtil.java:171)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
> at
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:60)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:242)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:178)
> at
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:662)
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcher.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
> at sun.nio.ch.IOUtil.read(IOUtil.java:171)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
> at
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:60)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:242)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:178)
> at
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:662)
>
>
> Thanks,
>
> Bhavesh
>
>


[jira] [Created] (KAFKA-1939) Add csv reporter in ProducerPerformance for the new producer

2015-02-09 Thread xinyisu (JIRA)
xinyisu created KAFKA-1939:
--

 Summary: Add csv reporter in ProducerPerformance for the new 
producer
 Key: KAFKA-1939
 URL: https://issues.apache.org/jira/browse/KAFKA-1939
 Project: Kafka
  Issue Type: New Feature
  Components: producer 
Affects Versions: 0.8.2.0
Reporter: xinyisu
Assignee: Jun Rao


Currently, new producer only supports a jmx reporter for the metrics. It does 
not output csv report as old producer does.

We propose to add csv reporter in ProducerPerformance for the new producer by 
using the new metrics api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: KafkaPreCommit #1

2015-02-09 Thread Apache Jenkins Server
See 

--
[...truncated 1963 lines...]
at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:186)
at 
kafka.utils.TestUtils$.verifyNonDaemonThreadsStatus(TestUtils.scala:707)
at 
kafka.server.ServerGenerateBrokerIdTest.testUserConfigAndGeneratedBrokerId(ServerGenerateBrokerIdTest.scala:72)

kafka.server.ServerGenerateBrokerIdTest > testMultipleLogDirsMetaProps FAILED
java.lang.NullPointerException
at 
kafka.utils.TestUtils$$anonfun$verifyNonDaemonThreadsStatus$2.apply(TestUtils.scala:707)
at 
kafka.utils.TestUtils$$anonfun$verifyNonDaemonThreadsStatus$2.apply(TestUtils.scala:707)
at 
scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:115)
at 
scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
scala.collection.TraversableOnce$class.count(TraversableOnce.scala:114)
at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:186)
at 
kafka.utils.TestUtils$.verifyNonDaemonThreadsStatus(TestUtils.scala:707)
at 
kafka.server.ServerGenerateBrokerIdTest.testMultipleLogDirsMetaProps(ServerGenerateBrokerIdTest.scala:95)

kafka.server.ServerGenerateBrokerIdTest > 
testConsistentBrokerIdFromUserConfigAndMetaProps FAILED
java.lang.NullPointerException
at 
kafka.utils.TestUtils$$anonfun$verifyNonDaemonThreadsStatus$2.apply(TestUtils.scala:707)
at 
kafka.utils.TestUtils$$anonfun$verifyNonDaemonThreadsStatus$2.apply(TestUtils.scala:707)
at 
scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:115)
at 
scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
scala.collection.TraversableOnce$class.count(TraversableOnce.scala:114)
at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:186)
at 
kafka.utils.TestUtils$.verifyNonDaemonThreadsStatus(TestUtils.scala:707)
at 
kafka.server.ServerGenerateBrokerIdTest.testConsistentBrokerIdFromUserConfigAndMetaProps(ServerGenerateBrokerIdTest.scala:112)

kafka.server.ServerShutdownTest > testCleanShutdown FAILED
junit.framework.AssertionFailedError: expected:<0> but was:<4>
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:277)
at junit.framework.Assert.assertEquals(Assert.java:64)
at junit.framework.Assert.assertEquals(Assert.java:195)
at junit.framework.Assert.assertEquals(Assert.java:201)
at 
kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:154)
at 
kafka.server.ServerShutdownTest.testCleanShutdown(ServerShutdownTest.scala:101)

kafka.server.ServerShutdownTest > testCleanShutdownWithDeleteTopicEnabled FAILED
junit.framework.AssertionFailedError: expected:<0> but was:<4>
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:277)
at junit.framework.Assert.assertEquals(Assert.java:64)
at junit.framework.Assert.assertEquals(Assert.java:195)
at junit.framework.Assert.assertEquals(Assert.java:201)
at 
kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:154)
at 
kafka.server.ServerShutdownTest.testCleanShutdownWithDeleteTopicEnabled(ServerShutdownTest.scala:114)

kafka.server.ServerShutdownTest > testCleanShutdownAfterFailedStartup FAILED
junit.framework.AssertionFailedError: expected:<0> but was:<4>
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:277)
at junit.framework.Assert.assertEquals(Assert.java:64)
at junit.framework.Assert.assertEquals(Assert.java:195)
at junit.framework.Assert.assertEquals(Assert.java:201)
at 
kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:154)
at 
kafka.server.ServerShutdownTest.testCleanShutdownAfterFailedStartup(ServerShutdownTest.scala:142)

kafka.server.ServerShutdownTest > testConsecutiveShutdown PASSED

kafka.server.KafkaConfigTest > testLogRetentionTimeHoursProvided PASSED

kafka.server.KafkaConfigTest > testLogRetentionTimeMinutesProvided PASSED

kafka.server.KafkaConfigTest > testLogRetentionTimeMsProvided PASSED

kafka.server.KafkaConfigTest > testLogRetentionTimeNoConfigProvided PASSED

kafka.server.KafkaConfigTest > testLogRetentionTimeBothMinutesAndHoursProvided 
PASSED

kafka.server.KafkaConfigTest > t

[jira] [Commented] (KAFKA-1856) Add PreCommit Patch Testing

2015-02-09 Thread Ashish Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313546#comment-14313546
 ] 

Ashish Kumar Singh commented on KAFKA-1856:
---

Sounds good [~charmalloc]. Thanks!

> Add PreCommit Patch Testing
> ---
>
> Key: KAFKA-1856
> URL: https://issues.apache.org/jira/browse/KAFKA-1856
> Project: Kafka
>  Issue Type: Task
>Reporter: Ashish Kumar Singh
>Assignee: Ashish Kumar Singh
> Attachments: KAFKA-1845.result.txt, KAFKA-1856.patch, 
> KAFKA-1856_2015-01-18_21:43:56.patch, KAFKA-1856_2015-02-04_14:57:05.patch, 
> KAFKA-1856_2015-02-04_15:44:47.patch
>
>
> h1. Kafka PreCommit Patch Testing - *Don't wait for it to break*
> h2. Motivation
> *With great power comes great responsibility* - Uncle Ben. As Kafka user list 
> is growing, mechanism to ensure quality of the product is required. Quality 
> becomes hard to measure and maintain in an open source project, because of a 
> wide community of contributors. Luckily, Kafka is not the first open source 
> project and can benefit from learnings of prior projects.
> PreCommit tests are the tests that are run for each patch that gets attached 
> to an open JIRA. Based on tests results, test execution framework, test bot, 
> +1 or -1 the patch. Having PreCommit tests take the load off committers to 
> look at or test each patch.
> h2. Tests in Kafka
> h3. Unit and Integraiton Tests
> [Unit and Integration 
> tests|https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Unit+and+Integration+Tests]
>  are cardinal to help contributors to avoid breaking existing functionalities 
> while adding new functionalities or fixing older ones. These tests, atleast 
> the ones relevant to the changes, must be run by contributors before 
> attaching a patch to a JIRA.
> h3. System Tests
> [System 
> tests|https://cwiki.apache.org/confluence/display/KAFKA/Kafka+System+Tests] 
> are much wider tests that, unlike unit tests, focus on end-to-end scenarios 
> and not some specific method or class.
> h2. Apache PreCommit tests
> Apache provides a mechanism to automatically build a project and run a series 
> of tests whenever a patch is uploaded to a JIRA. Based on test execution, the 
> test framework will comment with a +1 or -1 on the JIRA.
> You can read more about the framework here:
> http://wiki.apache.org/general/PreCommitBuilds
> h2. Plan
> # Create a test-patch.py script (similar to the one used in Flume, Sqoop and 
> other projects) that will take a jira as a parameter, apply on the 
> appropriate branch, build the project, run tests and report results. This 
> script should be committed into the Kafka code-base. To begin with, this will 
> only run unit tests. We can add code sanity checks, system_tests, etc in the 
> future.
> # Create a jenkins job for running the test (as described in 
> http://wiki.apache.org/general/PreCommitBuilds) and validate that it works 
> manually. This must be done by a committer with Jenkins access.
> # Ask someone with access to https://builds.apache.org/job/PreCommit-Admin/ 
> to add Kafka to the list of projects PreCommit-Admin triggers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIPs

2015-02-09 Thread Joe Stein
I did https://cwiki.apache.org/confluence/display/KAFKA/drafts

~ Joestein

On Mon, Feb 9, 2015 at 11:01 PM, Jay Kreps  wrote:

> Yeah no pressure. I think you added a holding area for incomplete KIPs,
> right? I think that is a good idea. We definitely need a place to stash
> these while they are getting built out...
>
> -Jay
>
> On Mon, Feb 9, 2015 at 6:10 PM, Joe Stein  wrote:
>
> > I like the round-up, looks good, thanks Joel.
> >
> > I should be able to get KIP-5 and KIP-6 to have more detail in the coming
> > days.
> >
> > ~ Joestein
> >
> > On Mon, Feb 9, 2015 at 9:01 PM, Joel Koshy  wrote:
> >
> > > I'm looking through a couple of the KIP threads today and had the same
> > > issue; and thought it would be useful to do a status round-up of KIPs.
> > > We could incorporate status in the title itself (so we can quickly see
> > > it in the child-page list) but I just added a table to the top-level
> > > wiki. Hopefully that captures the current state accurately so I know
> > > which threads to follow-up on.
> > >
> > > Thanks,
> > >
> > > Joel
> > >
> > > On Fri, Feb 06, 2015 at 12:47:46PM -0800, Jay Kreps wrote:
> > > > A problem I am having is actually understanding which KIPs are
> intended
> > > to
> > > > be complete proposals and which are works in progress. Joe you seem
> to
> > > have
> > > > a bunch of these. Can you move them elsewhere until they are really
> > fully
> > > > done and ready for review and discussion?
> > > >
> > > > -Jay
> > > >
> > > > On Fri, Feb 6, 2015 at 12:09 PM, Jay Kreps 
> > wrote:
> > > >
> > > > > I think we are focused on making committing new changes easier, but
> > > what
> > > > > we have seen is actually that isn't the bulk of the work
> (especially
> > > with
> > > > > this kind of "public interface" change where it generally has a big
> > > user
> > > > > impact). I think we actually really need the core committers and
> any
> > > other
> > > > > interested parties to stop and fully read each KIP and think about
> > it.
> > > If
> > > > > we don't have time to do that we usually just end up spending a lot
> > > more
> > > > > time after the fact trying to rework things latter when it is a lot
> > > harder.
> > > > > So I really think we should have every active committer read,
> > comment,
> > > and
> > > > > vote on each KIP. I think this may require a little bit of work to
> > > > > co-ordinate/bug people but will end up being worth it because each
> > > person
> > > > > on the project will have a holistic picture of what is going on.
> > > > >
> > > > > -Jay
> > > > >
> > > > > On Thu, Feb 5, 2015 at 11:24 PM, Joel Koshy 
> > > wrote:
> > > > >
> > > > >> Just wanted to add a few more comments on this: KIPs were
> suggested
> > as
> > > > >> a process to help reach early consensus on a major change or not
> so
> > > > >> major (but tricky or backward incompatible) change in order to
> > reduce
> > > > >> the likelihood of multiple iterations and complete rewrites during
> > > > >> code reviews (which is time-intensive for both the contributor and
> > > > >> reviewers); as well as to reduce the likelihood of surprises (say,
> > if
> > > > >> a patch inadvertently changes a public API).  So KIPs are intended
> > to
> > > > >> speed up development since a clear path is charted out and there
> is
> > > > >> prior consensus on whether a feature and its design/implementation
> > > > >> make sense or not.
> > > > >>
> > > > >> Obviously this breaks down if KIPs are not being actively
> discussed
> > -
> > > > >> again I think we can do much better here. I think we ended up
> with a
> > > > >> backlog because as soon as the KIP wiki was started, a number of
> > > > >> pre-existing jiras and discussions were moved there - all within a
> > few
> > > > >> days. Now that there are quite a few outstanding KIPs I think we
> > just
> > > > >> need to methodically work through those - preferably a couple at a
> > > > >> time. I looked through the list and I think we should be able to
> > > > >> resolve all of them relatively quickly if everyone is on board
> with
> > > > >> this.
> > > > >>
> > > > >> > > Its probably more helpful for contributors if its "lazy" as in
> > "no
> > > > >> > > strong objections" .
> > > > >>
> > > > >> Gwen also suggested this and this also sounds ok to me as I wrote
> > > > >> earlier - what do others think? This is important especially if
> > > > >> majority in the community think if this less restrictive policy
> > would
> > > > >> spur and not hinder development - I'm not sure that it does. I
> > > > >> completely agree that KIPs fail to a large degree as far as the
> > > > >> original motivation goes if they require a lazy majority but the
> > > > >> DISCUSS threads are stalled. IOW regardless of that discussion, I
> > > > >> think we should rejuvenate some of those threads especially now
> that
> > > > >> 0.8.2 is out of the way.
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Joel
> > > > >>
> > > > >> On Thu, Feb 05, 2015 at 08:56

Re: [DISCUSS] KIP-8 Add a flush method to the new Java producer

2015-02-09 Thread Jay Kreps
Yeah I second the problem Guozhang flags with giving flush a timeout. In
general failover in Kafka is a bounded thing unless you have brought your
Kafka cluster down entirely so I think depending on that bound implicitly
is okay.

It is possible to make flush() be instead
  boolean tryFlush(long timeout, TimeUnit unit);

But I am somewhat skeptical that people will use this correctly. I.e
consider the mirror maker code snippet I gave above, how would one actually
recover in this case other than retrying (which the client already does
automatically)? After all if you are okay losing data then you don't need
to bother calling flush at all, you can just let the messages be sent
asynchronously.

I think close() is actually different because you may well want to shutdown
immediately and just throw away unsent events.

-Jay

On Mon, Feb 9, 2015 at 2:44 PM, Guozhang Wang  wrote:

> The proposal looks good to me, will need some time to review the
> implementation RB later.
>
> Bhavesh, I am wondering how you will use a flush() with a timeout since
> such a call does not actually provide any flushing guarantees?
>
> As for close(), there is a separate JIRA for this:
>
> KAFKA-1660 
>
> Guozhang
>
>
> On Mon, Feb 9, 2015 at 2:29 PM, Bhavesh Mistry  >
> wrote:
>
> > Hi Jay,
> >
> > How about adding timeout for each method calls flush(timeout,TimeUnit)
> and
> > close(timeout,TimeUNIT) ?  We had runway io thread issue and caller
> thread
> > should not blocked for ever for these methods ?
> >
> >
> > Thanks,
> >
> > Bhavesh
> >
> > On Sun, Feb 8, 2015 at 12:18 PM, Jay Kreps  wrote:
> >
> > > Well actually in the case of linger.ms = 0 the send is still
> > asynchronous
> > > so calling flush() blocks until all the previously sent records have
> > > completed. It doesn't speed anything up in that case, though, since
> they
> > > are already available to send.
> > >
> > > -Jay
> > >
> > > On Sun, Feb 8, 2015 at 10:36 AM, Gwen Shapira 
> > > wrote:
> > >
> > > > Looks good to me.
> > > >
> > > > I like the idea of not blocking additional sends but not guaranteeing
> > > that
> > > > flush() will deliver them.
> > > >
> > > > I assume that with linger.ms = 0, flush will just be a noop (since
> the
> > > > queue will be empty). Is that correct?
> > > >
> > > > Gwen
> > > >
> > > > On Sun, Feb 8, 2015 at 10:25 AM, Jay Kreps 
> > wrote:
> > > >
> > > > > Following up on our previous thread on making batch send a little
> > > easier,
> > > > > here is a concrete proposal to add a flush() method to the
> producer:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+method+to+the+producer+API
> > > > >
> > > > > A proposed implementation is here:
> > > > > https://issues.apache.org/jira/browse/KAFKA-1865
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > -Jay
> > > > >
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>


[jira] [Commented] (KAFKA-1856) Add PreCommit Patch Testing

2015-02-09 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313509#comment-14313509
 ] 

Joe Stein commented on KAFKA-1856:
--

[~singhashish] I created a new jenkins build 
https://builds.apache.org/job/KafkaPreCommit/ and will do some more testing on 
your patch but it lgtm. 

Tomorrow(ish) if I can get it all to hooked up will commit it or let you know 
if any questions/issues.

> Add PreCommit Patch Testing
> ---
>
> Key: KAFKA-1856
> URL: https://issues.apache.org/jira/browse/KAFKA-1856
> Project: Kafka
>  Issue Type: Task
>Reporter: Ashish Kumar Singh
>Assignee: Ashish Kumar Singh
> Attachments: KAFKA-1845.result.txt, KAFKA-1856.patch, 
> KAFKA-1856_2015-01-18_21:43:56.patch, KAFKA-1856_2015-02-04_14:57:05.patch, 
> KAFKA-1856_2015-02-04_15:44:47.patch
>
>
> h1. Kafka PreCommit Patch Testing - *Don't wait for it to break*
> h2. Motivation
> *With great power comes great responsibility* - Uncle Ben. As Kafka user list 
> is growing, mechanism to ensure quality of the product is required. Quality 
> becomes hard to measure and maintain in an open source project, because of a 
> wide community of contributors. Luckily, Kafka is not the first open source 
> project and can benefit from learnings of prior projects.
> PreCommit tests are the tests that are run for each patch that gets attached 
> to an open JIRA. Based on tests results, test execution framework, test bot, 
> +1 or -1 the patch. Having PreCommit tests take the load off committers to 
> look at or test each patch.
> h2. Tests in Kafka
> h3. Unit and Integraiton Tests
> [Unit and Integration 
> tests|https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Unit+and+Integration+Tests]
>  are cardinal to help contributors to avoid breaking existing functionalities 
> while adding new functionalities or fixing older ones. These tests, atleast 
> the ones relevant to the changes, must be run by contributors before 
> attaching a patch to a JIRA.
> h3. System Tests
> [System 
> tests|https://cwiki.apache.org/confluence/display/KAFKA/Kafka+System+Tests] 
> are much wider tests that, unlike unit tests, focus on end-to-end scenarios 
> and not some specific method or class.
> h2. Apache PreCommit tests
> Apache provides a mechanism to automatically build a project and run a series 
> of tests whenever a patch is uploaded to a JIRA. Based on test execution, the 
> test framework will comment with a +1 or -1 on the JIRA.
> You can read more about the framework here:
> http://wiki.apache.org/general/PreCommitBuilds
> h2. Plan
> # Create a test-patch.py script (similar to the one used in Flume, Sqoop and 
> other projects) that will take a jira as a parameter, apply on the 
> appropriate branch, build the project, run tests and report results. This 
> script should be committed into the Kafka code-base. To begin with, this will 
> only run unit tests. We can add code sanity checks, system_tests, etc in the 
> future.
> # Create a jenkins job for running the test (as described in 
> http://wiki.apache.org/general/PreCommitBuilds) and validate that it works 
> manually. This must be done by a committer with Jenkins access.
> # Ask someone with access to https://builds.apache.org/job/PreCommit-Admin/ 
> to add Kafka to the list of projects PreCommit-Admin triggers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-8 Add a flush method to the new Java producer

2015-02-09 Thread Jay Kreps
Hey Joel,

The use case would be for something like mirror maker. You want to do
something like the following:

while(true) {
  val recs = consumer.poll(time);
  for(rec <- recs)
producer.send(rec);
  producer.flush();
  consumer.commit();
}

If you replace flush() with just calling get() on the records the problem
is that the get call will block for linger.ms plus the time to send. But at
the time you call flush you are actually done sending new stuff and you
want that stuff to get sent, lingering around in case of new writes is
silly. But in the absense of flush there is no way to say that. As you say
you only may that penalty on one of the get() calls, but if the linger.ms
is high (say 60 seconds) that will be a huge penalty.

-Jay

On Mon, Feb 9, 2015 at 6:23 PM, Joel Koshy  wrote:

> - WRT the motivation: "if you set linger.ms > 0 to encourage batching
>   of messages, which is likely a good idea for this kind of use case,
>   then the second for loop will block for a ms" -> however, in
>   practice this will really only be for the first couple of calls
>   right? Since the subsequent calls would return immediately since in
>   all likelihood those subsequent messages would have gone out on the
>   previous message's batch.
> - I think Bhavesh's suggestion on the timeout makes sense for
>   consistency (with other blocking-style calls) if nothing else.
> - Does it make sense to fold in the API changes for KAFKA-1660 and
>   KAFKA-1669 and do all at once?
>
> Thanks,
>
> Joel
>
>
> On Mon, Feb 09, 2015 at 02:44:06PM -0800, Guozhang Wang wrote:
> > The proposal looks good to me, will need some time to review the
> > implementation RB later.
> >
> > Bhavesh, I am wondering how you will use a flush() with a timeout since
> > such a call does not actually provide any flushing guarantees?
> >
> > As for close(), there is a separate JIRA for this:
> >
> > KAFKA-1660 
> >
> > Guozhang
> >
> >
> > On Mon, Feb 9, 2015 at 2:29 PM, Bhavesh Mistry <
> mistry.p.bhav...@gmail.com>
> > wrote:
> >
> > > Hi Jay,
> > >
> > > How about adding timeout for each method calls flush(timeout,TimeUnit)
> and
> > > close(timeout,TimeUNIT) ?  We had runway io thread issue and caller
> thread
> > > should not blocked for ever for these methods ?
> > >
> > >
> > > Thanks,
> > >
> > > Bhavesh
> > >
> > > On Sun, Feb 8, 2015 at 12:18 PM, Jay Kreps 
> wrote:
> > >
> > > > Well actually in the case of linger.ms = 0 the send is still
> > > asynchronous
> > > > so calling flush() blocks until all the previously sent records have
> > > > completed. It doesn't speed anything up in that case, though, since
> they
> > > > are already available to send.
> > > >
> > > > -Jay
> > > >
> > > > On Sun, Feb 8, 2015 at 10:36 AM, Gwen Shapira  >
> > > > wrote:
> > > >
> > > > > Looks good to me.
> > > > >
> > > > > I like the idea of not blocking additional sends but not
> guaranteeing
> > > > that
> > > > > flush() will deliver them.
> > > > >
> > > > > I assume that with linger.ms = 0, flush will just be a noop
> (since the
> > > > > queue will be empty). Is that correct?
> > > > >
> > > > > Gwen
> > > > >
> > > > > On Sun, Feb 8, 2015 at 10:25 AM, Jay Kreps 
> > > wrote:
> > > > >
> > > > > > Following up on our previous thread on making batch send a little
> > > > easier,
> > > > > > here is a concrete proposal to add a flush() method to the
> producer:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+method+to+the+producer+API
> > > > > >
> > > > > > A proposed implementation is here:
> > > > > > https://issues.apache.org/jira/browse/KAFKA-1865
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > -Jay
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
>
>


Re: [DISCUSS] KIPs

2015-02-09 Thread Jay Kreps
Yeah no pressure. I think you added a holding area for incomplete KIPs,
right? I think that is a good idea. We definitely need a place to stash
these while they are getting built out...

-Jay

On Mon, Feb 9, 2015 at 6:10 PM, Joe Stein  wrote:

> I like the round-up, looks good, thanks Joel.
>
> I should be able to get KIP-5 and KIP-6 to have more detail in the coming
> days.
>
> ~ Joestein
>
> On Mon, Feb 9, 2015 at 9:01 PM, Joel Koshy  wrote:
>
> > I'm looking through a couple of the KIP threads today and had the same
> > issue; and thought it would be useful to do a status round-up of KIPs.
> > We could incorporate status in the title itself (so we can quickly see
> > it in the child-page list) but I just added a table to the top-level
> > wiki. Hopefully that captures the current state accurately so I know
> > which threads to follow-up on.
> >
> > Thanks,
> >
> > Joel
> >
> > On Fri, Feb 06, 2015 at 12:47:46PM -0800, Jay Kreps wrote:
> > > A problem I am having is actually understanding which KIPs are intended
> > to
> > > be complete proposals and which are works in progress. Joe you seem to
> > have
> > > a bunch of these. Can you move them elsewhere until they are really
> fully
> > > done and ready for review and discussion?
> > >
> > > -Jay
> > >
> > > On Fri, Feb 6, 2015 at 12:09 PM, Jay Kreps 
> wrote:
> > >
> > > > I think we are focused on making committing new changes easier, but
> > what
> > > > we have seen is actually that isn't the bulk of the work (especially
> > with
> > > > this kind of "public interface" change where it generally has a big
> > user
> > > > impact). I think we actually really need the core committers and any
> > other
> > > > interested parties to stop and fully read each KIP and think about
> it.
> > If
> > > > we don't have time to do that we usually just end up spending a lot
> > more
> > > > time after the fact trying to rework things latter when it is a lot
> > harder.
> > > > So I really think we should have every active committer read,
> comment,
> > and
> > > > vote on each KIP. I think this may require a little bit of work to
> > > > co-ordinate/bug people but will end up being worth it because each
> > person
> > > > on the project will have a holistic picture of what is going on.
> > > >
> > > > -Jay
> > > >
> > > > On Thu, Feb 5, 2015 at 11:24 PM, Joel Koshy 
> > wrote:
> > > >
> > > >> Just wanted to add a few more comments on this: KIPs were suggested
> as
> > > >> a process to help reach early consensus on a major change or not so
> > > >> major (but tricky or backward incompatible) change in order to
> reduce
> > > >> the likelihood of multiple iterations and complete rewrites during
> > > >> code reviews (which is time-intensive for both the contributor and
> > > >> reviewers); as well as to reduce the likelihood of surprises (say,
> if
> > > >> a patch inadvertently changes a public API).  So KIPs are intended
> to
> > > >> speed up development since a clear path is charted out and there is
> > > >> prior consensus on whether a feature and its design/implementation
> > > >> make sense or not.
> > > >>
> > > >> Obviously this breaks down if KIPs are not being actively discussed
> -
> > > >> again I think we can do much better here. I think we ended up with a
> > > >> backlog because as soon as the KIP wiki was started, a number of
> > > >> pre-existing jiras and discussions were moved there - all within a
> few
> > > >> days. Now that there are quite a few outstanding KIPs I think we
> just
> > > >> need to methodically work through those - preferably a couple at a
> > > >> time. I looked through the list and I think we should be able to
> > > >> resolve all of them relatively quickly if everyone is on board with
> > > >> this.
> > > >>
> > > >> > > Its probably more helpful for contributors if its "lazy" as in
> "no
> > > >> > > strong objections" .
> > > >>
> > > >> Gwen also suggested this and this also sounds ok to me as I wrote
> > > >> earlier - what do others think? This is important especially if
> > > >> majority in the community think if this less restrictive policy
> would
> > > >> spur and not hinder development - I'm not sure that it does. I
> > > >> completely agree that KIPs fail to a large degree as far as the
> > > >> original motivation goes if they require a lazy majority but the
> > > >> DISCUSS threads are stalled. IOW regardless of that discussion, I
> > > >> think we should rejuvenate some of those threads especially now that
> > > >> 0.8.2 is out of the way.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Joel
> > > >>
> > > >> On Thu, Feb 05, 2015 at 08:56:13PM -0800, Joel Koshy wrote:
> > > >> > I'm just thinking aloud - I don't know what a good number would
> be,
> > and
> > > >> it
> > > >> > is just one possibility to streamline how KIPs are processed. It
> > largely
> > > >> > depends on how complex the proposals are. What would be concerning
> > is if
> > > >> > there are 10 different threads all dealing with large KIPs and n

Re: Cannot stop Kafka server if zookeeper is shutdown first

2015-02-09 Thread Jaikiran Pai

Hi,

I don't have enough context to know if replacing ZkClient is important 
right now. However, I did take a look at the code to see how extensively 
ZkClient gets used and I agree with Gwen that replacing it is a bigger 
task and will need further testing too, to ensure they don't have issues 
of their own, which affect us.


IMO, using the newer version of ZkClient which the ZkClient team are 
willing to release might be a good idea for resolving the immediate 
issues at hand. So I think running certain tests, using the current 
dev/snapshot version of ZkClient, to verify that the JIRAs that we 
expect to be resolved are indeed resolved and then asking the ZkClient 
team to do a release might be something that we should do. If this 
sounds good and if someone can point me to the exact JIRAs that need to 
be verified, then I can look into this. Let me know.


-Jaikiran

On Thursday 05 February 2015 02:51 AM, Gwen Shapira wrote:

Hi,

KAFKA-1155 is likely Zookeeper and not the specific client.
I believe the rest are already fixed in ZKClient and its a matter of asking
them to release, rebase our code and make sure the issues are resolved (or
that we use the features ZKClient added to resolve them).

I'm a fan of Curator, but its not exactly a drop-in replacement for
ZKClient (the APIs are slightly different, if we even decide to just use
the APIs and not the recipes). I suspect that replacing ZKClient with
Curator is a large project. Perhaps too large to resolve 3 issues that are
already resolved in ZKClient.

What are the benefits you guys see in the replacement?

Gwen


On Tue, Feb 3, 2015 at 10:42 PM, Guozhang Wang  wrote:


Now may be a good time.

We could verify if Curator has fixed the known issues we have seen so far,
an incomplete list would be:

KAFKA-1082 
KAFKA-1155 
KAFKA-1907 
KAFKA-992 



Guozhang

On Tue, Feb 3, 2015 at 10:21 PM, Ashish Singh  wrote:


+1 on using curator.

On Tue, Feb 3, 2015 at 10:09 PM, Manikumar Reddy 
wrote:


I think we should consider to moving to  apache curator (KAFKA-873).
Curator is now more mature and a apache top-level project.


On Wed, Feb 4, 2015 at 11:29 AM, Harsha  wrote:


Any reason not to go with apache curator http://curator.apache.org/

.

-Harsha
On Tue, Feb 3, 2015, at 09:55 PM, Guozhang Wang wrote:

I am also +1 on Neha's suggestion that "At some point, if we find
ourselves
fiddling too much with ZkClient, it wouldn't hurt to write our own

little

zookeeper client wrapper." since we have accumulated a bunch of

issues

with
zkClient which takes long time be resolved if ever, so we ended up

have

some hacky way handling zkClient errors.

Guozhang

On Tue, Feb 3, 2015 at 7:47 PM, Jaikiran Pai <

jai.forums2...@gmail.com

wrote:


Yes, that's the plan :)

-Jaikiran

On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:


So I think the current plan is:
1. Add timeout in zkclient
2. Ask zkclient to release new version (we need it for few other

things

too)
3. Rebase on new zkclient
4. Fix this jira and the few others than were waiting for the

new

zkclient

Does that make sense?

Gwen

On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <

jai.forums2...@gmail.com>

wrote:


I just heard back from Stefan, who manages the ZkClient repo

and

he

seems to
be open to have these changes be part of ZkClient project. I'll

be

creating
a pull request for that project to have it reviewed and merged.

Although

I
haven't heard of exact release plans, Stefan's reply did

indicate

that

the
project could be released after this change is merged.

-Jaikiran

On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:


Thanks for pointing to that repo!

I just had a look at it and it appears that the project isn't

much

active
(going by the lack of activity). The latest contribution is

from

Gwen

and
that was around 3 months back. I haven't found release plans

for

that

project or a place to ask about it (filing an issue doesn't

seem

right

to
ask this question). So I'll get in touch with the repo owner

and

see

what
his plans for the project are.

-Jaikiran

On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:


I did!

Thanks for clarifying :)

The client that is part of Zookeeper itself actually does

support

timeouts.

On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <

wangg...@gmail.com>

wrote:


Hi Jaikiran,

I think Gwen was talking about contributing to ZkClient

project:

https://github.com/sgroschupf/zkclient

Guozhang


On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <
jai.forums2...@gmail.com>
wrote:

  Hi Gwen,

Yes, the KafkaZkClient is a wrapper around ZkClient and

not a

complete
replacement.

As for contributing to Zookeeper, yes that indeed in on my

mind,

but

I
haven't yet had a chance to really look deeper into

Zookeeper

or

get

Re: [DISCUSS] KIP-8 Add a flush method to the new Java producer

2015-02-09 Thread Joel Koshy
- WRT the motivation: "if you set linger.ms > 0 to encourage batching
  of messages, which is likely a good idea for this kind of use case,
  then the second for loop will block for a ms" -> however, in
  practice this will really only be for the first couple of calls
  right? Since the subsequent calls would return immediately since in
  all likelihood those subsequent messages would have gone out on the
  previous message's batch.
- I think Bhavesh's suggestion on the timeout makes sense for
  consistency (with other blocking-style calls) if nothing else.
- Does it make sense to fold in the API changes for KAFKA-1660 and
  KAFKA-1669 and do all at once?

Thanks,

Joel


On Mon, Feb 09, 2015 at 02:44:06PM -0800, Guozhang Wang wrote:
> The proposal looks good to me, will need some time to review the
> implementation RB later.
> 
> Bhavesh, I am wondering how you will use a flush() with a timeout since
> such a call does not actually provide any flushing guarantees?
> 
> As for close(), there is a separate JIRA for this:
> 
> KAFKA-1660 
> 
> Guozhang
> 
> 
> On Mon, Feb 9, 2015 at 2:29 PM, Bhavesh Mistry 
> wrote:
> 
> > Hi Jay,
> >
> > How about adding timeout for each method calls flush(timeout,TimeUnit) and
> > close(timeout,TimeUNIT) ?  We had runway io thread issue and caller thread
> > should not blocked for ever for these methods ?
> >
> >
> > Thanks,
> >
> > Bhavesh
> >
> > On Sun, Feb 8, 2015 at 12:18 PM, Jay Kreps  wrote:
> >
> > > Well actually in the case of linger.ms = 0 the send is still
> > asynchronous
> > > so calling flush() blocks until all the previously sent records have
> > > completed. It doesn't speed anything up in that case, though, since they
> > > are already available to send.
> > >
> > > -Jay
> > >
> > > On Sun, Feb 8, 2015 at 10:36 AM, Gwen Shapira 
> > > wrote:
> > >
> > > > Looks good to me.
> > > >
> > > > I like the idea of not blocking additional sends but not guaranteeing
> > > that
> > > > flush() will deliver them.
> > > >
> > > > I assume that with linger.ms = 0, flush will just be a noop (since the
> > > > queue will be empty). Is that correct?
> > > >
> > > > Gwen
> > > >
> > > > On Sun, Feb 8, 2015 at 10:25 AM, Jay Kreps 
> > wrote:
> > > >
> > > > > Following up on our previous thread on making batch send a little
> > > easier,
> > > > > here is a concrete proposal to add a flush() method to the producer:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+method+to+the+producer+API
> > > > >
> > > > > A proposed implementation is here:
> > > > > https://issues.apache.org/jira/browse/KAFKA-1865
> > > > >
> > > > > Thoughts?
> > > > >
> > > > > -Jay
> > > > >
> > > >
> > >
> >
> 
> 
> 
> -- 
> -- Guozhang



Re: [DISCUSS] KIPs

2015-02-09 Thread Joe Stein
I like the round-up, looks good, thanks Joel.

I should be able to get KIP-5 and KIP-6 to have more detail in the coming
days.

~ Joestein

On Mon, Feb 9, 2015 at 9:01 PM, Joel Koshy  wrote:

> I'm looking through a couple of the KIP threads today and had the same
> issue; and thought it would be useful to do a status round-up of KIPs.
> We could incorporate status in the title itself (so we can quickly see
> it in the child-page list) but I just added a table to the top-level
> wiki. Hopefully that captures the current state accurately so I know
> which threads to follow-up on.
>
> Thanks,
>
> Joel
>
> On Fri, Feb 06, 2015 at 12:47:46PM -0800, Jay Kreps wrote:
> > A problem I am having is actually understanding which KIPs are intended
> to
> > be complete proposals and which are works in progress. Joe you seem to
> have
> > a bunch of these. Can you move them elsewhere until they are really fully
> > done and ready for review and discussion?
> >
> > -Jay
> >
> > On Fri, Feb 6, 2015 at 12:09 PM, Jay Kreps  wrote:
> >
> > > I think we are focused on making committing new changes easier, but
> what
> > > we have seen is actually that isn't the bulk of the work (especially
> with
> > > this kind of "public interface" change where it generally has a big
> user
> > > impact). I think we actually really need the core committers and any
> other
> > > interested parties to stop and fully read each KIP and think about it.
> If
> > > we don't have time to do that we usually just end up spending a lot
> more
> > > time after the fact trying to rework things latter when it is a lot
> harder.
> > > So I really think we should have every active committer read, comment,
> and
> > > vote on each KIP. I think this may require a little bit of work to
> > > co-ordinate/bug people but will end up being worth it because each
> person
> > > on the project will have a holistic picture of what is going on.
> > >
> > > -Jay
> > >
> > > On Thu, Feb 5, 2015 at 11:24 PM, Joel Koshy 
> wrote:
> > >
> > >> Just wanted to add a few more comments on this: KIPs were suggested as
> > >> a process to help reach early consensus on a major change or not so
> > >> major (but tricky or backward incompatible) change in order to reduce
> > >> the likelihood of multiple iterations and complete rewrites during
> > >> code reviews (which is time-intensive for both the contributor and
> > >> reviewers); as well as to reduce the likelihood of surprises (say, if
> > >> a patch inadvertently changes a public API).  So KIPs are intended to
> > >> speed up development since a clear path is charted out and there is
> > >> prior consensus on whether a feature and its design/implementation
> > >> make sense or not.
> > >>
> > >> Obviously this breaks down if KIPs are not being actively discussed -
> > >> again I think we can do much better here. I think we ended up with a
> > >> backlog because as soon as the KIP wiki was started, a number of
> > >> pre-existing jiras and discussions were moved there - all within a few
> > >> days. Now that there are quite a few outstanding KIPs I think we just
> > >> need to methodically work through those - preferably a couple at a
> > >> time. I looked through the list and I think we should be able to
> > >> resolve all of them relatively quickly if everyone is on board with
> > >> this.
> > >>
> > >> > > Its probably more helpful for contributors if its "lazy" as in "no
> > >> > > strong objections" .
> > >>
> > >> Gwen also suggested this and this also sounds ok to me as I wrote
> > >> earlier - what do others think? This is important especially if
> > >> majority in the community think if this less restrictive policy would
> > >> spur and not hinder development - I'm not sure that it does. I
> > >> completely agree that KIPs fail to a large degree as far as the
> > >> original motivation goes if they require a lazy majority but the
> > >> DISCUSS threads are stalled. IOW regardless of that discussion, I
> > >> think we should rejuvenate some of those threads especially now that
> > >> 0.8.2 is out of the way.
> > >>
> > >> Thanks,
> > >>
> > >> Joel
> > >>
> > >> On Thu, Feb 05, 2015 at 08:56:13PM -0800, Joel Koshy wrote:
> > >> > I'm just thinking aloud - I don't know what a good number would be,
> and
> > >> it
> > >> > is just one possibility to streamline how KIPs are processed. It
> largely
> > >> > depends on how complex the proposals are. What would be concerning
> is if
> > >> > there are 10 different threads all dealing with large KIPs and no
> one
> > >> has
> > >> > the time to give due diligence to each one and all those threads
> grind
> > >> to a
> > >> > halt due to confusion, incomplete context and misunderstandings.
> > >> >
> > >> > On Thursday, February 5, 2015, Harsha  wrote:
> > >> >
> > >> > > Joel,
> > >> > >Having only 2 or 3 KIPS under active discussion is
> concerning.
> > >> > >This will slow down development process as well.
> > >> > > Having a turn-around time for a KIP is a

Re: [DISCUSS] KIPs

2015-02-09 Thread Joel Koshy
I'm looking through a couple of the KIP threads today and had the same
issue; and thought it would be useful to do a status round-up of KIPs.
We could incorporate status in the title itself (so we can quickly see
it in the child-page list) but I just added a table to the top-level
wiki. Hopefully that captures the current state accurately so I know
which threads to follow-up on.

Thanks,

Joel

On Fri, Feb 06, 2015 at 12:47:46PM -0800, Jay Kreps wrote:
> A problem I am having is actually understanding which KIPs are intended to
> be complete proposals and which are works in progress. Joe you seem to have
> a bunch of these. Can you move them elsewhere until they are really fully
> done and ready for review and discussion?
> 
> -Jay
> 
> On Fri, Feb 6, 2015 at 12:09 PM, Jay Kreps  wrote:
> 
> > I think we are focused on making committing new changes easier, but what
> > we have seen is actually that isn't the bulk of the work (especially with
> > this kind of "public interface" change where it generally has a big user
> > impact). I think we actually really need the core committers and any other
> > interested parties to stop and fully read each KIP and think about it. If
> > we don't have time to do that we usually just end up spending a lot more
> > time after the fact trying to rework things latter when it is a lot harder.
> > So I really think we should have every active committer read, comment, and
> > vote on each KIP. I think this may require a little bit of work to
> > co-ordinate/bug people but will end up being worth it because each person
> > on the project will have a holistic picture of what is going on.
> >
> > -Jay
> >
> > On Thu, Feb 5, 2015 at 11:24 PM, Joel Koshy  wrote:
> >
> >> Just wanted to add a few more comments on this: KIPs were suggested as
> >> a process to help reach early consensus on a major change or not so
> >> major (but tricky or backward incompatible) change in order to reduce
> >> the likelihood of multiple iterations and complete rewrites during
> >> code reviews (which is time-intensive for both the contributor and
> >> reviewers); as well as to reduce the likelihood of surprises (say, if
> >> a patch inadvertently changes a public API).  So KIPs are intended to
> >> speed up development since a clear path is charted out and there is
> >> prior consensus on whether a feature and its design/implementation
> >> make sense or not.
> >>
> >> Obviously this breaks down if KIPs are not being actively discussed -
> >> again I think we can do much better here. I think we ended up with a
> >> backlog because as soon as the KIP wiki was started, a number of
> >> pre-existing jiras and discussions were moved there - all within a few
> >> days. Now that there are quite a few outstanding KIPs I think we just
> >> need to methodically work through those - preferably a couple at a
> >> time. I looked through the list and I think we should be able to
> >> resolve all of them relatively quickly if everyone is on board with
> >> this.
> >>
> >> > > Its probably more helpful for contributors if its "lazy" as in "no
> >> > > strong objections" .
> >>
> >> Gwen also suggested this and this also sounds ok to me as I wrote
> >> earlier - what do others think? This is important especially if
> >> majority in the community think if this less restrictive policy would
> >> spur and not hinder development - I'm not sure that it does. I
> >> completely agree that KIPs fail to a large degree as far as the
> >> original motivation goes if they require a lazy majority but the
> >> DISCUSS threads are stalled. IOW regardless of that discussion, I
> >> think we should rejuvenate some of those threads especially now that
> >> 0.8.2 is out of the way.
> >>
> >> Thanks,
> >>
> >> Joel
> >>
> >> On Thu, Feb 05, 2015 at 08:56:13PM -0800, Joel Koshy wrote:
> >> > I'm just thinking aloud - I don't know what a good number would be, and
> >> it
> >> > is just one possibility to streamline how KIPs are processed. It largely
> >> > depends on how complex the proposals are. What would be concerning is if
> >> > there are 10 different threads all dealing with large KIPs and no one
> >> has
> >> > the time to give due diligence to each one and all those threads grind
> >> to a
> >> > halt due to confusion, incomplete context and misunderstandings.
> >> >
> >> > On Thursday, February 5, 2015, Harsha  wrote:
> >> >
> >> > > Joel,
> >> > >Having only 2 or 3 KIPS under active discussion is concerning.
> >> > >This will slow down development process as well.
> >> > > Having a turn-around time for a KIP is a good idea but what will
> >> happen
> >> > > if it didn't received required votes within that time frame.
> >> > > Its probably more helpful for contributors if its "lazy" as in "no
> >> > > strong objections" .
> >> > > Just to make sure this is only for KIPs not for regular bug fixes
> >> right?
> >> > > Thanks,
> >> > > Harsha
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Feb 5, 2015

Re: [DISCUSSION] KIP-2: Refactor Brokers to Allow Multiple Endpoints

2015-02-09 Thread Joel Koshy
For (1) - +1 especially since the existing clients will keep working.
For (2) - I'm less clear on the proposal. Can you incorporate it into
the KIP and/or linked wiki?

Also, on the KIP itself, can you clarify what the TRACE protocol is?
The RB has a brief comment ("plan is to add instrumentation in the
future") but I'm not sure what that means.

(I already started looking at an earlier version of your RB couple of
days ago, but did not finish. I'll look at your latest RB.)

Thanks,

Joel

On Wed, Jan 28, 2015 at 10:28:39AM -0800, Gwen Shapira wrote:
> Bumping :)
> 
> If there are no objections, I'd like to go with the following:
> 
> 1. Do not support javaapi (SimpleConsumer) with dependency on versions
> higher than 0.8.2. Existing clients will keep working.
> 2. The configuration parameter for upgrades will be
> inter.broker.protocol.version={0.8.2.0, 0.8.3.0, 0.9.0.0...}
> 
> Gwen
> 
> On Mon, Jan 26, 2015 at 8:02 PM, Gwen Shapira  wrote:
> > Hi Kafka Devs,
> >
> > While reviewing the patch for KAFKA-1809, we came across two questions
> > that we are interested in hearing the community out on.
> >
> > 1. This patch changes the Broker class and adds a new class
> > BrokerEndPoint that behaves like the previous broker.
> >
> > While technically kafka.cluster.Broker is not part of the public API,
> > it is returned by javaapi, used with the SimpleConsumer.
> >
> > Getting replicas from PartitionMetadata will now return BrokerEndPoint
> > instead of Broker. All method calls remain the same, but since we
> > return a new type, we break the API.
> >
> > Note that this breakage does not prevent upgrades - existing
> > SimpleConsumers will continue working (because we are
> > wire-compatible).
> > The only thing that won't work is building SimpleConsumers with
> > dependency on Kafka versions higher than 0.8.2. Arguably, we don't
> > want anyone to do it anyway :)
> >
> > So:
> > Do we state that the highest release on which SimpleConsumers can
> > depend is 0.8.2? Or shall we keep Broker as is and create an
> > UberBroker which will contain multiple brokers as its endpoints?
> >
> > 2.
> > The KIP suggests "use.new.wire.protocol" configuration to decide which
> > protocols the brokers will use to talk to each other. The problem is
> > that after the next upgrade, the wire protocol is no longer new, so
> > we'll have to reset it to false for the following upgrade, then change
> > to true again... and upgrading more than a single version will be
> > impossible.
> > Bad idea :)
> >
> > As an alternative, we can have a property for each version and set one
> > of them to true. Or (simple, I think) have "wire.protocol.version"
> > property and accept version numbers (0.8.2, 0.8.3, 0.9) as values.
> >
> > Please share your thoughts :)
> >
> > Gwen



Build failed in Jenkins: Kafka-trunk #389

2015-02-09 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-1333; Add the consumer coordinator to server; reviewed by Onur 
Karaman and Jay Kreps

[wangguoz] KAFKA-1333 follow-up; Add missing files for the coordinator folder

--
[...truncated 1789 lines...]
kafka.producer.AsyncProducerTest > testProducerQueueSize PASSED

kafka.producer.AsyncProducerTest > testProduceAfterClosed PASSED

kafka.producer.AsyncProducerTest > testBatchSize PASSED

kafka.producer.AsyncProducerTest > testQueueTimeExpired PASSED

kafka.producer.AsyncProducerTest > testPartitionAndCollateEvents PASSED

kafka.producer.AsyncProducerTest > testSerializeEvents PASSED

kafka.producer.AsyncProducerTest > testInvalidPartition PASSED

kafka.producer.AsyncProducerTest > testNoBroker PASSED

kafka.producer.AsyncProducerTest > testIncompatibleEncoder PASSED

kafka.producer.AsyncProducerTest > testRandomPartitioner PASSED

kafka.producer.AsyncProducerTest > testFailedSendRetryLogic PASSED

kafka.producer.AsyncProducerTest > testJavaProducer PASSED

kafka.producer.AsyncProducerTest > testInvalidConfiguration PASSED

kafka.log.CleanerTest > testCleanSegments PASSED

kafka.log.CleanerTest > testCleaningWithDeletes PASSED

kafka.log.CleanerTest > testCleanSegmentsWithAbort PASSED

kafka.log.CleanerTest > testSegmentGrouping PASSED

kafka.log.CleanerTest > testBuildOffsetMap PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[0] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[1] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[2] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[3] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[4] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[5] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[6] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[7] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[14] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[15] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[16] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[17] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[18] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[19] PASSED

kafka.log.LogManagerTest > testCreateLog PASSED

kafka.log.LogManagerTest > testGetNonExistentLog PASSED

kafka.log.LogManagerTest > testCleanupExpiredSegments PASSED

kafka.log.LogManagerTest > testCleanupSegmentsToMaintainSize PASSED

kafka.log.LogManagerTest > testTimeBasedFlush PASSED

kafka.log.LogManagerTest > testLeastLoadedAssignment PASSED

kafka.log.LogManagerTest > testTwoLogManagersUsingSameDirFails PASSED

kafka.log.LogManagerTest > testCheckpointRecoveryPoints PASSED

kafka.log.LogManagerTest > testRecoveryDirectoryMappingWithTrailingSlash PASSED

kafka.log.LogManagerTest > testRecoveryDirectoryMappingWithRelativeDirectory 
PASSED

kafka.log.LogConfigTest > testFromPropsDefaults PASSED

kafka.log.LogConfigTest > testFromPropsEmpty PASSED

kafka.log.LogConfigTest > testFromPropsToProps PASSED

kafka.log.LogConfigTest > testFromPropsInvalid PASSED

kafka.log.OffsetIndexTest > truncate PASSED

kafka.log.OffsetIndexTest > randomLookupTest PASSED

kafka.log.OffsetIndexTest > lookupExtremeCases PASSED

kafka.log.OffsetIndexTest > appendTooMany PASSED

kafka.log.OffsetIndexTest > appendOutOfOrder PASSED

kafka.log.OffsetIndexTest > testReopen PASSED

kafka.log.FileMessageSetTest > testWrittenEqualsRead PASSED

kafka.log.FileMessageSetTest > testIteratorIsConsistent PASSED

kafka.log.FileMessageSetTest > testSizeInBytes PASSED

kafka.log.FileMessageSetTest > testWriteTo PASSED

kafka.log.FileMessageSetTest > testFileSize PASSED

kafka.log.FileMessageSetTest > testIterationOverPartialAndTruncation PASSED

kafka.log.FileMessageSetTest > testIterationDoesntChangePosition PASSED

kafka.log.FileMessageSetTest > testRead PASSED

kafka.log.FileMessageSetTest > testSearch PASSED

kafka.log.FileMessageSetTest > testIteratorWithLimits PASSED

kafka.log.FileMessageSetTest > testTruncate PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest PASSED

kafka.log.OffsetMapTest > testBasicValidation PASSED

kafka.log.OffsetMapTest > testClear PASSED

kafka.log.LogTest > testTimeBasedLogRoll PASSED

kafka.log.LogTest > testTimeBasedLogRollJitter PASSED

kafka.log.LogTest > testSizeBasedLogRoll PASSED

kafka.log

[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-02-09 Thread Aditya Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313301#comment-14313301
 ] 

Aditya Auradkar commented on KAFKA-1546:


[~nmarasoi] are you actively working on this? I was planning on picking it up.

> Automate replica lag tuning
> ---
>
> Key: KAFKA-1546
> URL: https://issues.apache.org/jira/browse/KAFKA-1546
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
>Reporter: Neha Narkhede
>Assignee: nicu marasoiu
>  Labels: newbie++
>
> Currently, there is no good way to tune the replica lag configs to 
> automatically account for high and low volume topics on the same cluster. 
> For the low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1938) [doc] Quick start example should reference appropriate Kafka version

2015-02-09 Thread Stevo Slavic (JIRA)
Stevo Slavic created KAFKA-1938:
---

 Summary: [doc] Quick start example should reference appropriate 
Kafka version
 Key: KAFKA-1938
 URL: https://issues.apache.org/jira/browse/KAFKA-1938
 Project: Kafka
  Issue Type: Improvement
  Components: website
Affects Versions: 0.8.2
Reporter: Stevo Slavic
Priority: Trivial


Kafka 0.8.2.0 documentation, quick start example on 
https://kafka.apache.org/documentation.html#quickstart in step 1 links and 
instructs reader to download Kafka 0.8.1.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1937) Mirror maker needs to clear the unacked offset map after rebalance.

2015-02-09 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1937:

Attachment: KAFKA-1937.patch

> Mirror maker needs to clear the unacked offset map after rebalance.
> ---
>
> Key: KAFKA-1937
> URL: https://issues.apache.org/jira/browse/KAFKA-1937
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Attachments: KAFKA-1937.patch
>
>
> Offset map needs to be clear during rebalance to avoid committing offsets to 
> partitions that are not owned by the consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1937) Mirror maker needs to clear the unacked offset map after rebalance.

2015-02-09 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313195#comment-14313195
 ] 

Jiangjie Qin commented on KAFKA-1937:
-

Created reviewboard https://reviews.apache.org/r/30810/diff/
 against branch origin/trunk

> Mirror maker needs to clear the unacked offset map after rebalance.
> ---
>
> Key: KAFKA-1937
> URL: https://issues.apache.org/jira/browse/KAFKA-1937
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Attachments: KAFKA-1937.patch
>
>
> Offset map needs to be clear during rebalance to avoid committing offsets to 
> partitions that are not owned by the consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1937) Mirror maker needs to clear the unacked offset map after rebalance.

2015-02-09 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1937:

Status: Patch Available  (was: Open)

> Mirror maker needs to clear the unacked offset map after rebalance.
> ---
>
> Key: KAFKA-1937
> URL: https://issues.apache.org/jira/browse/KAFKA-1937
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Attachments: KAFKA-1937.patch
>
>
> Offset map needs to be clear during rebalance to avoid committing offsets to 
> partitions that are not owned by the consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 30810: Patch for KAFKA-1937

2015-02-09 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30810/
---

Review request for kafka.


Bugs: KAFKA-1937
https://issues.apache.org/jira/browse/KAFKA-1937


Repository: kafka


Description
---

Fix for KAFKA-1937


Diffs
-

  core/src/main/scala/kafka/tools/MirrorMaker.scala 
5374280dc97dc8e01e9b3ba61fd036dc13ae48cb 

Diff: https://reviews.apache.org/r/30810/diff/


Testing
---


Thanks,

Jiangjie Qin



Re: Review Request 30809: Patch for KAFKA-1888

2015-02-09 Thread Abhishek Nigam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/
---

(Updated Feb. 9, 2015, 11:53 p.m.)


Review request for kafka.


Bugs: KAFKA-1888
https://issues.apache.org/jira/browse/KAFKA-1888


Repository: kafka


Description (updated)
---

Essentially this test does the following:
a) Start a java process with 3 threads
   Producer - producing continuously
   Consumer - consuming from latest
   Bootstrap consumer - started after a pause to bootstrap from beginning.
   
   It uses sequentially increasing numbers and timestamps to make sure we are 
not receiving out of order messages and do real-time validation. 
   
b) Script which wraps this and takes two directories which contain the kafka 
version specific jars:
kafka_2.10-0.8.3-SNAPSHOT-test.jar
kafka_2.10-0.8.3-SNAPSHOT.jar

The first argument is the directory containing the older version of the jars.
The second argument is the directory containing the newer version of the jars.

The reason for choosing directories was because there are two jars in these 
directories:


Diffs
-

  build.gradle c3e6bb839ad65c512c9db4695d2bb49b82c80da5 
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
  system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
  system_test/broker_upgrade/bin/test.sh PRE-CREATION 
  system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
  system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
  system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 

Diff: https://reviews.apache.org/r/30809/diff/


Testing (updated)
---

Scripted it to run 20 times without any failures.
Command-line: broker-upgrade/bin/test.sh  


Thanks,

Abhishek Nigam



[jira] [Created] (KAFKA-1937) Mirror maker needs to clear the unacked offset map after rebalance.

2015-02-09 Thread Jiangjie Qin (JIRA)
Jiangjie Qin created KAFKA-1937:
---

 Summary: Mirror maker needs to clear the unacked offset map after 
rebalance.
 Key: KAFKA-1937
 URL: https://issues.apache.org/jira/browse/KAFKA-1937
 Project: Kafka
  Issue Type: Bug
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin


Offset map needs to be clear during rebalance to avoid committing offsets to 
partitions that are not owned by the consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 30809: Patch for KAFKA-1888

2015-02-09 Thread Abhishek Nigam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/
---

Review request for kafka.


Bugs: KAFKA-1888
https://issues.apache.org/jira/browse/KAFKA-1888


Repository: kafka


Description
---

Cleaning up the scripts, forgot to add build file pulling in guava library

Fixing build.gradle


Diffs
-

  build.gradle c3e6bb839ad65c512c9db4695d2bb49b82c80da5 
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
  system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
  system_test/broker_upgrade/bin/test.sh PRE-CREATION 
  system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
  system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
  system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 

Diff: https://reviews.apache.org/r/30809/diff/


Testing
---


Thanks,

Abhishek Nigam



[jira] [Commented] (KAFKA-1333) Add consumer co-ordinator module to the server

2015-02-09 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313130#comment-14313130
 ] 

Guozhang Wang commented on KAFKA-1333:
--

Pushed to trunk and leaving this ticket as open for any follow-up items.

> Add consumer co-ordinator module to the server
> --
>
> Key: KAFKA-1333
> URL: https://issues.apache.org/jira/browse/KAFKA-1333
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Affects Versions: 0.9.0
>Reporter: Neha Narkhede
>Assignee: Guozhang Wang
> Attachments: KAFKA-1333_2015-01-31_17:40:51.patch, 
> KAFKA-1333_2015-02-06_15:02:48.patch
>
>
> Scope of this JIRA is to just add a consumer co-ordinator module that do the 
> following:
> 1) coordinator start-up, metadata initialization
> 2) simple join group handling (just updating metadata, no failure detection / 
> rebalancing): this should be sufficient for consumers doing self offset / 
> partition management.
> Offset manager will still run side-by-side with the coordinator in this JIRA, 
> and we will merge it in KAFKA-1740.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-8 Add a flush method to the new Java producer

2015-02-09 Thread Guozhang Wang
The proposal looks good to me, will need some time to review the
implementation RB later.

Bhavesh, I am wondering how you will use a flush() with a timeout since
such a call does not actually provide any flushing guarantees?

As for close(), there is a separate JIRA for this:

KAFKA-1660 

Guozhang


On Mon, Feb 9, 2015 at 2:29 PM, Bhavesh Mistry 
wrote:

> Hi Jay,
>
> How about adding timeout for each method calls flush(timeout,TimeUnit) and
> close(timeout,TimeUNIT) ?  We had runway io thread issue and caller thread
> should not blocked for ever for these methods ?
>
>
> Thanks,
>
> Bhavesh
>
> On Sun, Feb 8, 2015 at 12:18 PM, Jay Kreps  wrote:
>
> > Well actually in the case of linger.ms = 0 the send is still
> asynchronous
> > so calling flush() blocks until all the previously sent records have
> > completed. It doesn't speed anything up in that case, though, since they
> > are already available to send.
> >
> > -Jay
> >
> > On Sun, Feb 8, 2015 at 10:36 AM, Gwen Shapira 
> > wrote:
> >
> > > Looks good to me.
> > >
> > > I like the idea of not blocking additional sends but not guaranteeing
> > that
> > > flush() will deliver them.
> > >
> > > I assume that with linger.ms = 0, flush will just be a noop (since the
> > > queue will be empty). Is that correct?
> > >
> > > Gwen
> > >
> > > On Sun, Feb 8, 2015 at 10:25 AM, Jay Kreps 
> wrote:
> > >
> > > > Following up on our previous thread on making batch send a little
> > easier,
> > > > here is a concrete proposal to add a flush() method to the producer:
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+method+to+the+producer+API
> > > >
> > > > A proposed implementation is here:
> > > > https://issues.apache.org/jira/browse/KAFKA-1865
> > > >
> > > > Thoughts?
> > > >
> > > > -Jay
> > > >
> > >
> >
>



-- 
-- Guozhang


Re: Review Request 29831: Patch for KAFKA-1476

2015-02-09 Thread Onur Karaman


> On Feb. 7, 2015, 3:33 p.m., Neha Narkhede wrote:
> > Thanks for attaching the latest run of the tool. I observed that we should 
> > get rid of zookeeper error message from 
> > Could not fetch offset from zookeeper for group g1 partition [t1,0] due to 
> > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> > NoNode for /consumers/g1/offsets/t1/0.
> > 
> > It is somewhat unreadable and not very useful for the user. The best way to 
> > handle this in a user facing tool is to explain the cause, rather than use 
> > the error message directly.
> > 
> > Other than that, the patch looks good. Once you fix the above problem, I 
> > will check it in.

Updated the rb with a hopefully more readable output and attached another 
sample run.

I take it that the known issue regarding deleted topics and describing groups I 
mentioned earlier should get addressed in a later patch?


- Onur


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29831/#review71555
---


On Feb. 9, 2015, 10:37 p.m., Onur Karaman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29831/
> ---
> 
> (Updated Feb. 9, 2015, 10:37 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1476
> https://issues.apache.org/jira/browse/KAFKA-1476
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Merged in work for KAFKA-1476 and sub-task KAFKA-1826
> 
> 
> Diffs
> -
> 
>   bin/kafka-consumer-groups.sh PRE-CREATION 
>   core/src/main/scala/kafka/admin/AdminUtils.scala 
> 28b12c7b89a56c113b665fbde1b95f873f8624a3 
>   core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala PRE-CREATION 
>   core/src/main/scala/kafka/utils/ZkUtils.scala 
> c14bd455b6642f5e6eb254670bef9f57ae41d6cb 
>   core/src/test/scala/unit/kafka/admin/DeleteConsumerGroupTest.scala 
> PRE-CREATION 
>   core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala 
> 33c27678bf8ae8feebcbcdaa4b90a1963157b4a5 
>   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
> 54755e8dd3f23ced313067566cd4ea867f8a496e 
> 
> Diff: https://reviews.apache.org/r/29831/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Onur Karaman
> 
>



[jira] [Updated] (KAFKA-1476) Get a list of consumer groups

2015-02-09 Thread Onur Karaman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Onur Karaman updated KAFKA-1476:

Attachment: sample-kafka-consumer-groups-sh-output-2-9-2015.txt

> Get a list of consumer groups
> -
>
> Key: KAFKA-1476
> URL: https://issues.apache.org/jira/browse/KAFKA-1476
> Project: Kafka
>  Issue Type: Wish
>  Components: tools
>Affects Versions: 0.8.1.1
>Reporter: Ryan Williams
>Assignee: Balaji Seshadri
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: ConsumerCommand.scala, KAFKA-1476-LIST-GROUPS.patch, 
> KAFKA-1476-RENAME.patch, KAFKA-1476-REVIEW-COMMENTS.patch, KAFKA-1476.patch, 
> KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476.patch, 
> KAFKA-1476_2014-11-10_11:58:26.patch, KAFKA-1476_2014-11-10_12:04:01.patch, 
> KAFKA-1476_2014-11-10_12:06:35.patch, KAFKA-1476_2014-12-05_12:00:12.patch, 
> KAFKA-1476_2015-01-12_16:22:26.patch, KAFKA-1476_2015-01-12_16:31:20.patch, 
> KAFKA-1476_2015-01-13_10:36:18.patch, KAFKA-1476_2015-01-15_14:30:04.patch, 
> KAFKA-1476_2015-01-22_02:32:52.patch, KAFKA-1476_2015-01-30_11:09:59.patch, 
> KAFKA-1476_2015-02-04_15:41:50.patch, KAFKA-1476_2015-02-04_18:03:15.patch, 
> KAFKA-1476_2015-02-05_03:01:09.patch, KAFKA-1476_2015-02-09_14:37:30.patch, 
> sample-kafka-consumer-groups-sh-output-1-23-2015.txt, 
> sample-kafka-consumer-groups-sh-output-2-5-2015.txt, 
> sample-kafka-consumer-groups-sh-output-2-9-2015.txt, 
> sample-kafka-consumer-groups-sh-output.txt
>
>
> It would be useful to have a way to get a list of consumer groups currently 
> active via some tool/script that ships with kafka. This would be helpful so 
> that the system tools can be explored more easily.
> For example, when running the ConsumerOffsetChecker, it requires a group 
> option
> bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group 
> ?
> But, when just getting started with kafka, using the console producer and 
> consumer, it is not clear what value to use for the group option.  If a list 
> of consumer groups could be listed, then it would be clear what value to use.
> Background:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1476) Get a list of consumer groups

2015-02-09 Thread Onur Karaman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Onur Karaman updated KAFKA-1476:

Attachment: KAFKA-1476_2015-02-09_14:37:30.patch

> Get a list of consumer groups
> -
>
> Key: KAFKA-1476
> URL: https://issues.apache.org/jira/browse/KAFKA-1476
> Project: Kafka
>  Issue Type: Wish
>  Components: tools
>Affects Versions: 0.8.1.1
>Reporter: Ryan Williams
>Assignee: Balaji Seshadri
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: ConsumerCommand.scala, KAFKA-1476-LIST-GROUPS.patch, 
> KAFKA-1476-RENAME.patch, KAFKA-1476-REVIEW-COMMENTS.patch, KAFKA-1476.patch, 
> KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476.patch, 
> KAFKA-1476_2014-11-10_11:58:26.patch, KAFKA-1476_2014-11-10_12:04:01.patch, 
> KAFKA-1476_2014-11-10_12:06:35.patch, KAFKA-1476_2014-12-05_12:00:12.patch, 
> KAFKA-1476_2015-01-12_16:22:26.patch, KAFKA-1476_2015-01-12_16:31:20.patch, 
> KAFKA-1476_2015-01-13_10:36:18.patch, KAFKA-1476_2015-01-15_14:30:04.patch, 
> KAFKA-1476_2015-01-22_02:32:52.patch, KAFKA-1476_2015-01-30_11:09:59.patch, 
> KAFKA-1476_2015-02-04_15:41:50.patch, KAFKA-1476_2015-02-04_18:03:15.patch, 
> KAFKA-1476_2015-02-05_03:01:09.patch, KAFKA-1476_2015-02-09_14:37:30.patch, 
> sample-kafka-consumer-groups-sh-output-1-23-2015.txt, 
> sample-kafka-consumer-groups-sh-output-2-5-2015.txt, 
> sample-kafka-consumer-groups-sh-output.txt
>
>
> It would be useful to have a way to get a list of consumer groups currently 
> active via some tool/script that ships with kafka. This would be helpful so 
> that the system tools can be explored more easily.
> For example, when running the ConsumerOffsetChecker, it requires a group 
> option
> bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group 
> ?
> But, when just getting started with kafka, using the console producer and 
> consumer, it is not clear what value to use for the group option.  If a list 
> of consumer groups could be listed, then it would be clear what value to use.
> Background:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1476) Get a list of consumer groups

2015-02-09 Thread Onur Karaman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313035#comment-14313035
 ] 

Onur Karaman commented on KAFKA-1476:
-

Updated reviewboard https://reviews.apache.org/r/29831/diff/
 against branch origin/trunk

> Get a list of consumer groups
> -
>
> Key: KAFKA-1476
> URL: https://issues.apache.org/jira/browse/KAFKA-1476
> Project: Kafka
>  Issue Type: Wish
>  Components: tools
>Affects Versions: 0.8.1.1
>Reporter: Ryan Williams
>Assignee: Balaji Seshadri
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: ConsumerCommand.scala, KAFKA-1476-LIST-GROUPS.patch, 
> KAFKA-1476-RENAME.patch, KAFKA-1476-REVIEW-COMMENTS.patch, KAFKA-1476.patch, 
> KAFKA-1476.patch, KAFKA-1476.patch, KAFKA-1476.patch, 
> KAFKA-1476_2014-11-10_11:58:26.patch, KAFKA-1476_2014-11-10_12:04:01.patch, 
> KAFKA-1476_2014-11-10_12:06:35.patch, KAFKA-1476_2014-12-05_12:00:12.patch, 
> KAFKA-1476_2015-01-12_16:22:26.patch, KAFKA-1476_2015-01-12_16:31:20.patch, 
> KAFKA-1476_2015-01-13_10:36:18.patch, KAFKA-1476_2015-01-15_14:30:04.patch, 
> KAFKA-1476_2015-01-22_02:32:52.patch, KAFKA-1476_2015-01-30_11:09:59.patch, 
> KAFKA-1476_2015-02-04_15:41:50.patch, KAFKA-1476_2015-02-04_18:03:15.patch, 
> KAFKA-1476_2015-02-05_03:01:09.patch, KAFKA-1476_2015-02-09_14:37:30.patch, 
> sample-kafka-consumer-groups-sh-output-1-23-2015.txt, 
> sample-kafka-consumer-groups-sh-output-2-5-2015.txt, 
> sample-kafka-consumer-groups-sh-output.txt
>
>
> It would be useful to have a way to get a list of consumer groups currently 
> active via some tool/script that ships with kafka. This would be helpful so 
> that the system tools can be explored more easily.
> For example, when running the ConsumerOffsetChecker, it requires a group 
> option
> bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --topic test --group 
> ?
> But, when just getting started with kafka, using the console producer and 
> consumer, it is not clear what value to use for the group option.  If a list 
> of consumer groups could be listed, then it would be clear what value to use.
> Background:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201405.mbox/%3cCAOq_b1w=slze5jrnakxvak0gu9ctdkpazak1g4dygvqzbsg...@mail.gmail.com%3e



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 29831: Patch for KAFKA-1476

2015-02-09 Thread Onur Karaman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29831/
---

(Updated Feb. 9, 2015, 10:37 p.m.)


Review request for kafka.


Bugs: KAFKA-1476
https://issues.apache.org/jira/browse/KAFKA-1476


Repository: kafka


Description
---

Merged in work for KAFKA-1476 and sub-task KAFKA-1826


Diffs (updated)
-

  bin/kafka-consumer-groups.sh PRE-CREATION 
  core/src/main/scala/kafka/admin/AdminUtils.scala 
28b12c7b89a56c113b665fbde1b95f873f8624a3 
  core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala PRE-CREATION 
  core/src/main/scala/kafka/utils/ZkUtils.scala 
c14bd455b6642f5e6eb254670bef9f57ae41d6cb 
  core/src/test/scala/unit/kafka/admin/DeleteConsumerGroupTest.scala 
PRE-CREATION 
  core/src/test/scala/unit/kafka/admin/DeleteTopicTest.scala 
33c27678bf8ae8feebcbcdaa4b90a1963157b4a5 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
54755e8dd3f23ced313067566cd4ea867f8a496e 

Diff: https://reviews.apache.org/r/29831/diff/


Testing
---


Thanks,

Onur Karaman



Re: [DISCUSS] KIP-8 Add a flush method to the new Java producer

2015-02-09 Thread Bhavesh Mistry
Hi Jay,

How about adding timeout for each method calls flush(timeout,TimeUnit) and
close(timeout,TimeUNIT) ?  We had runway io thread issue and caller thread
should not blocked for ever for these methods ?


Thanks,

Bhavesh

On Sun, Feb 8, 2015 at 12:18 PM, Jay Kreps  wrote:

> Well actually in the case of linger.ms = 0 the send is still asynchronous
> so calling flush() blocks until all the previously sent records have
> completed. It doesn't speed anything up in that case, though, since they
> are already available to send.
>
> -Jay
>
> On Sun, Feb 8, 2015 at 10:36 AM, Gwen Shapira 
> wrote:
>
> > Looks good to me.
> >
> > I like the idea of not blocking additional sends but not guaranteeing
> that
> > flush() will deliver them.
> >
> > I assume that with linger.ms = 0, flush will just be a noop (since the
> > queue will be empty). Is that correct?
> >
> > Gwen
> >
> > On Sun, Feb 8, 2015 at 10:25 AM, Jay Kreps  wrote:
> >
> > > Following up on our previous thread on making batch send a little
> easier,
> > > here is a concrete proposal to add a flush() method to the producer:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+method+to+the+producer+API
> > >
> > > A proposed implementation is here:
> > > https://issues.apache.org/jira/browse/KAFKA-1865
> > >
> > > Thoughts?
> > >
> > > -Jay
> > >
> >
>


[jira] [Commented] (KAFKA-1927) Replace requests in kafka.api with requests in org.apache.kafka.common.requests

2015-02-09 Thread Tong Li (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313017#comment-14313017
 ] 

Tong Li commented on KAFKA-1927:


@jay, @Gwen, thanks so much for both of your comments. I will start with 1926, 
once that one is done, I will move on to this one or other ones. Thanks.

> Replace requests in kafka.api with requests in 
> org.apache.kafka.common.requests
> ---
>
> Key: KAFKA-1927
> URL: https://issues.apache.org/jira/browse/KAFKA-1927
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> The common package introduced a better way of defining requests using a new 
> protocol definition DSL and also includes wrapper objects for these.
> We should switch KafkaApis over to use these request definitions and consider 
> the scala classes deprecated (we probably need to retain some of them for a 
> while for the scala clients).
> This will be a big improvement because
> 1. We will have each request now defined in only one place (Protocol.java)
> 2. We will have built-in support for multi-version requests
> 3. We will have much better error messages (no more cryptic underflow errors)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1926) Replace kafka.utils.Utils with o.a.k.common.utils.Utils

2015-02-09 Thread Tong Li (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313014#comment-14313014
 ] 

Tong Li commented on KAFKA-1926:


Jay. awesome comment, really appreciated. I will start with this one.

> Replace kafka.utils.Utils with o.a.k.common.utils.Utils
> ---
>
> Key: KAFKA-1926
> URL: https://issues.apache.org/jira/browse/KAFKA-1926
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>  Labels: newbie
>
> There is currently a lot of duplication between the Utils class in common and 
> the one in core.
> Our plan has been to deprecate duplicate code in the server and replace it 
> with the new common code.
> As such we should evaluate each method in the scala Utils and do one of the 
> following:
> 1. Migrate it to o.a.k.common.utils.Utils if it is a sensible general purpose 
> utility in active use that is not Kafka-specific. If we migrate it we should 
> really think about the API and make sure there is some test coverage. A few 
> things in there are kind of funky and we shouldn't just blindly copy them 
> over.
> 2. Create a new class ServerUtils or ScalaUtils in kafka.utils that will hold 
> any utilities that really need to make use of Scala features to be convenient.
> 3. Delete it if it is not used, or has a bad api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 29893: Patch for KAFKA-1856

2015-02-09 Thread Ashish Singh


> On Feb. 8, 2015, 4:07 p.m., Joe Stein wrote:
> > dev-utils/test-patch.py, line 147
> > 
> >
> > What is this structure that comes back? When it sorts does it do the 
> > latest time first and uses that patch? I am running this right now so not 
> > much feedback but would like to understand more about which patches we can 
> > flag for it or do we always go with the latest one? If we ever have a patch 
> > that has to go in 2 branches then we just then have 2 JIRA which is ok by 
> > me.

*matches* is basically list of patches attached to a JIRA, in the order of 
dates they were attached. jira_get_attachment returns the latest patch attached 
to the JIRA. I have tested this with a few JIRAs, like KAFKA-1634, and it works 
fine.


- Ashish


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29893/#review71574
---


On Feb. 4, 2015, 11:44 p.m., Ashish Singh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29893/
> ---
> 
> (Updated Feb. 4, 2015, 11:44 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1856
> https://issues.apache.org/jira/browse/KAFKA-1856
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1856: Add PreCommit Patch Testing
> 
> 
> Diffs
> -
> 
>   dev-utils/test-patch.py PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/29893/diff/
> 
> 
> Testing
> ---
> 
> Tested on KAFKA-1664, 
> https://issues.apache.org/jira/browse/KAFKA-1664?focusedCommentId=14277439&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14277439
> 
> How to run:
> python dev-utils/test-patch.py --defect KAFKA-1664 --username  
> --password  --run-tests --post-results
> 
> 
> Thanks,
> 
> Ashish Singh
> 
>



Re: Review Request 29893: Patch for KAFKA-1856

2015-02-09 Thread Ashish Singh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29893/#review71700
---



dev-utils/test-patch.py


*matches* is basically list of patches attached to a JIRA, in the order of 
dates they were attached. jira_get_attachment returns the latest patch attached 
to the JIRA. I have tested this with a few JIRAs, like KAFKA-1634, and it works 
fine.


- Ashish Singh


On Feb. 4, 2015, 11:44 p.m., Ashish Singh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29893/
> ---
> 
> (Updated Feb. 4, 2015, 11:44 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1856
> https://issues.apache.org/jira/browse/KAFKA-1856
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1856: Add PreCommit Patch Testing
> 
> 
> Diffs
> -
> 
>   dev-utils/test-patch.py PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/29893/diff/
> 
> 
> Testing
> ---
> 
> Tested on KAFKA-1664, 
> https://issues.apache.org/jira/browse/KAFKA-1664?focusedCommentId=14277439&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14277439
> 
> How to run:
> python dev-utils/test-patch.py --defect KAFKA-1664 --username  
> --password  --run-tests --post-results
> 
> 
> Thanks,
> 
> Ashish Singh
> 
>



Re: [KIP-DISCUSSION] Mirror Maker Enhancement

2015-02-09 Thread Jiangjie Qin
I just updated the KIP page and incorporated Jay and Neha’s suggestion. As
a brief summary of where we are:

Consensus reached:
Have N independent mirror maker threads each has their own consumers but
share a producer. The mirror maker threads will be responsible for
decompression, compression and offset commit. No data channel and separate
offset commit thread is needed. Consumer rebalance callback will be used
to avoid duplicates on rebalance.

Still under discussion:
Whether message handler is needed.

My arguments for adding a message handler are that:
1. It is more efficient to do something in common for all the clients in
pipeline than letting each client do the same thing for many times. And
there are concrete use cases for the message handler already.
2. It is not a big complicated add-on to mirror maker.
3. Without a message handler, for customers needs it, they have to
re-implement all the logics of mirror maker by themselves just in order to
add this handling in pipeline.

Any thoughts?

Thanks.

―Jiangjie (Becket) Qin

On 2/8/15, 6:35 PM, "Jiangjie Qin"  wrote:

>Hi Jay, thanks a lot for the comments.
>I think this solution is better. We probably don’t need data channel
>anymore. It can be replaced with a list of producer if we need more sender
>thread.
>I’ll update the KIP page.
>
>The reasoning about message handler is mainly for efficiency purpose. I’m
>thinking that if something can be done in pipeline for all the clients
>such as filtering/reformatting, it is probably better to do it in the
>pipeline than asking 100 clients do the same thing for 100 times.
>
>―Jiangjie (Becket) Qin
>
>
>On 2/8/15, 4:59 PM, "Jay Kreps"  wrote:
>
>>Yeah, I second Neha's comments. The current mm code has taken something
>>pretty simple and made it pretty scary with callbacks and wait/notify
>>stuff. Do we believe this works? I can't tell by looking at it which is
>>kind of bad for something important like this. I don't mean this as
>>criticism, I know the history: we added in memory queues to help with
>>other
>>performance problems without thinking about correctness, then we added
>>stuff to work around the in-memory queues not lose data, and so on.
>>
>>Can we instead do the opposite exercise and start with the basics of what
>>mm should do and think about what deficiencies prevents this approach
>>from
>>working? Then let's make sure the currently in-flight work will remove
>>these deficiencies. After all mm is kind of the prototypical kafka use
>>case
>>so if we can't make our clients to this probably no one else can.
>>
>>I think mm should just be N independent threads each of which has their
>>own
>>consumer but share a producer and each of which looks like this:
>>
>>while(true) {
>>val recs = consumer.poll(Long.MaxValue);
>>for (rec <- recs)
>>producer.send(rec, logErrorCallback)
>>if(System.currentTimeMillis - lastCommit > commitInterval) {
>>producer.flush()
>>consumer.commit()
>>lastCommit = System.currentTimeMillis
>>}
>>}
>>
>>This will depend on setting the retry count in the producer to something
>>high with a largish backoff so that a failed send attempt doesn't drop
>>data.
>>
>>We will need to use the callback to force a flush and offset commit on
>>rebalance.
>>
>>This approach may have a few more TCP connections due to using multiple
>>consumers but I think it is a lot easier to reason about and the total
>>number of mm instances is always going to be small.
>>
>>Let's talk about where this simple approach falls short, I think that
>>will
>>help us understand your motivations for additional elements.
>>
>>Another advantage of this is that it is so simple I don't think we really
>>even need to both making mm extensible because writing your own code that
>>does custom processing or transformation is just ten lines and no plug in
>>system is going to make it simpler.
>>
>>-Jay
>>
>>
>>On Sun, Feb 8, 2015 at 2:40 PM, Neha Narkhede  wrote:
>>
>>> Few comments -
>>>
>>> 1. Why do we need the message handler? Do you have concrete use cases
>>>in
>>> mind? If not, we should consider adding it in the future when/if we do
>>>have
>>> use cases for it. The purpose of the mirror maker is a simple tool for
>>> setting up Kafka cluster replicas. I don't see why we need to include a
>>> message handler for doing stream transformations or filtering. You can
>>> always write a simple process for doing that once the data is copied as
>>>is
>>> in the target cluster
>>> 2. Why keep both designs? We should prefer the simpler design unless it
>>>is
>>> not feasible due to the performance issue that we previously had. Did
>>>you
>>> get a chance to run some tests to see if that is really still a problem
>>>or
>>> not? It will be easier to think about the design and also make the KIP
>>> complete if we make a call on the design first.
>>> 3. Can you explain the need for keeping a list of unacked offsets per
>>> partition? Consider adding a section on retries and ho

Kafka New(Java) Producer Connection reset by peer error and LB

2015-02-09 Thread Bhavesh Mistry
Hi Kakfa Team,

We are getting this connection reset by pears after couple of minute aster
start-up of producer due to infrastructure deployment strategies we have
adopted from LinkedIn.

We have LB hostname and port as seed server, and all producers are getting
following exception because of TCP idle connection timeout set on LB (which
is 2 minutes and Kafka TCP connection idle is set to 10 minutes).   This
seems to be  minor bug to close TCP connection after discovering that seed
server is not part of brokers list immediately.


java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
at
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:60)
at org.apache.kafka.common.network.Selector.poll(Selector.java:242)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:178)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
at java.lang.Thread.run(Thread.java:662)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
at
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:60)
at org.apache.kafka.common.network.Selector.poll(Selector.java:242)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:178)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
at java.lang.Thread.run(Thread.java:662)


Thanks,

Bhavesh


[jira] [Commented] (KAFKA-1758) corrupt recovery file prevents startup

2015-02-09 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312603#comment-14312603
 ] 

Manikumar Reddy commented on KAFKA-1758:


Attaching a patch which handles NumberFormatException while reading   recovery 
checkpoint file. We still fail for other IOExceptions. On NumberFormatException 
we will set the last recovery point to zero.

> corrupt recovery file prevents startup
> --
>
> Key: KAFKA-1758
> URL: https://issues.apache.org/jira/browse/KAFKA-1758
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Jason Rosenberg
>Assignee: Manikumar Reddy
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1758.patch
>
>
> Hi,
> We recently had a kafka node go down suddenly. When it came back up, it 
> apparently had a corrupt recovery file, and refused to startup:
> {code}
> 2014-11-06 08:17:19,299  WARN [main] server.KafkaServer - Error starting up 
> KafkaServer
> java.lang.NumberFormatException: For input string: 
> "^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:481)
> at java.lang.Integer.parseInt(Integer.java:527)
> at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
> at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
> at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at 
> scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
> at kafka.log.LogManager.loadLogs(LogManager.scala:105)
> at kafka.log.LogManager.(LogManager.scala:57)
> at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:72)
> {code}
> And the app is under a monitor (so it was repeatedly restarting and failing 
> with this error for several minutes before we got to it)…
> We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it 
> then restarted cleanly (but of course re-synced all it’s data from replicas, 
> so we had no data loss).
> Anyway, I’m wondering if that’s the expected behavior? Or should it not 
> declare it corrupt and then proceed automatically to an unclean restart?
> Should this NumberFormatException be handled a bit more gracefully?
> We saved the corrupt file if it’s worth inspecting (although I doubt it will 
> be useful!)….
> The corrupt files appeared to be all zeroes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1926) Replace kafka.utils.Utils with o.a.k.common.utils.Utils

2015-02-09 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312598#comment-14312598
 ] 

Jay Kreps commented on KAFKA-1926:
--

Hey [~tongli] this is a great bug to start with. Basically there is a bunch of 
code in core/src/main/kafka/utils/Utils.scala that duplicates 
clients/src/main/org/apache/kafka/common/utils/Utils.java. We want to do an 
intelligent dedupe.

I think for each scala utility it should either:
a. Be determined to be unused or not really very general purpose and hence 
deleted or moved to the class where it is used, or else
b. Get converted to Java and moved to Utils.java
c. Or, if it is server specific or very scala dependent then moved to a new 
class called ScalaUtils (which may just provide a convenient wrapper for the 
java version if it needs to be in both places).

> Replace kafka.utils.Utils with o.a.k.common.utils.Utils
> ---
>
> Key: KAFKA-1926
> URL: https://issues.apache.org/jira/browse/KAFKA-1926
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>  Labels: newbie
>
> There is currently a lot of duplication between the Utils class in common and 
> the one in core.
> Our plan has been to deprecate duplicate code in the server and replace it 
> with the new common code.
> As such we should evaluate each method in the scala Utils and do one of the 
> following:
> 1. Migrate it to o.a.k.common.utils.Utils if it is a sensible general purpose 
> utility in active use that is not Kafka-specific. If we migrate it we should 
> really think about the API and make sure there is some test coverage. A few 
> things in there are kind of funky and we shouldn't just blindly copy them 
> over.
> 2. Create a new class ServerUtils or ScalaUtils in kafka.utils that will hold 
> any utilities that really need to make use of Scala features to be convenient.
> 3. Delete it if it is not used, or has a bad api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1856) Add PreCommit Patch Testing

2015-02-09 Thread Ashish Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312596#comment-14312596
 ] 

Ashish Kumar Singh commented on KAFKA-1856:
---

[~gwenshap] and [~charmalloc] thanks for trying this out. The checkstyle is 
also working fine, thanks to [~jkreps] for adding checkstyle to the project.

[~charmalloc] could you help with adding a jenkins job to automate this. As 
[~gwenshap] suggested, it will be a good idea to drop a mail to dev list once 
we do that. I do not think we need a KIP for this, but if we do let me know.

> Add PreCommit Patch Testing
> ---
>
> Key: KAFKA-1856
> URL: https://issues.apache.org/jira/browse/KAFKA-1856
> Project: Kafka
>  Issue Type: Task
>Reporter: Ashish Kumar Singh
>Assignee: Ashish Kumar Singh
> Attachments: KAFKA-1845.result.txt, KAFKA-1856.patch, 
> KAFKA-1856_2015-01-18_21:43:56.patch, KAFKA-1856_2015-02-04_14:57:05.patch, 
> KAFKA-1856_2015-02-04_15:44:47.patch
>
>
> h1. Kafka PreCommit Patch Testing - *Don't wait for it to break*
> h2. Motivation
> *With great power comes great responsibility* - Uncle Ben. As Kafka user list 
> is growing, mechanism to ensure quality of the product is required. Quality 
> becomes hard to measure and maintain in an open source project, because of a 
> wide community of contributors. Luckily, Kafka is not the first open source 
> project and can benefit from learnings of prior projects.
> PreCommit tests are the tests that are run for each patch that gets attached 
> to an open JIRA. Based on tests results, test execution framework, test bot, 
> +1 or -1 the patch. Having PreCommit tests take the load off committers to 
> look at or test each patch.
> h2. Tests in Kafka
> h3. Unit and Integraiton Tests
> [Unit and Integration 
> tests|https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Unit+and+Integration+Tests]
>  are cardinal to help contributors to avoid breaking existing functionalities 
> while adding new functionalities or fixing older ones. These tests, atleast 
> the ones relevant to the changes, must be run by contributors before 
> attaching a patch to a JIRA.
> h3. System Tests
> [System 
> tests|https://cwiki.apache.org/confluence/display/KAFKA/Kafka+System+Tests] 
> are much wider tests that, unlike unit tests, focus on end-to-end scenarios 
> and not some specific method or class.
> h2. Apache PreCommit tests
> Apache provides a mechanism to automatically build a project and run a series 
> of tests whenever a patch is uploaded to a JIRA. Based on test execution, the 
> test framework will comment with a +1 or -1 on the JIRA.
> You can read more about the framework here:
> http://wiki.apache.org/general/PreCommitBuilds
> h2. Plan
> # Create a test-patch.py script (similar to the one used in Flume, Sqoop and 
> other projects) that will take a jira as a parameter, apply on the 
> appropriate branch, build the project, run tests and report results. This 
> script should be committed into the Kafka code-base. To begin with, this will 
> only run unit tests. We can add code sanity checks, system_tests, etc in the 
> future.
> # Create a jenkins job for running the test (as described in 
> http://wiki.apache.org/general/PreCommitBuilds) and validate that it works 
> manually. This must be done by a committer with Jenkins access.
> # Ask someone with access to https://builds.apache.org/job/PreCommit-Admin/ 
> to add Kafka to the list of projects PreCommit-Admin triggers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1758) corrupt recovery file prevents startup

2015-02-09 Thread Manikumar Reddy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312592#comment-14312592
 ] 

Manikumar Reddy commented on KAFKA-1758:


Created reviewboard https://reviews.apache.org/r/30801/diff/
 against branch origin/trunk

> corrupt recovery file prevents startup
> --
>
> Key: KAFKA-1758
> URL: https://issues.apache.org/jira/browse/KAFKA-1758
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Jason Rosenberg
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1758.patch
>
>
> Hi,
> We recently had a kafka node go down suddenly. When it came back up, it 
> apparently had a corrupt recovery file, and refused to startup:
> {code}
> 2014-11-06 08:17:19,299  WARN [main] server.KafkaServer - Error starting up 
> KafkaServer
> java.lang.NumberFormatException: For input string: 
> "^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:481)
> at java.lang.Integer.parseInt(Integer.java:527)
> at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
> at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
> at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at 
> scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
> at kafka.log.LogManager.loadLogs(LogManager.scala:105)
> at kafka.log.LogManager.(LogManager.scala:57)
> at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:72)
> {code}
> And the app is under a monitor (so it was repeatedly restarting and failing 
> with this error for several minutes before we got to it)…
> We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it 
> then restarted cleanly (but of course re-synced all it’s data from replicas, 
> so we had no data loss).
> Anyway, I’m wondering if that’s the expected behavior? Or should it not 
> declare it corrupt and then proceed automatically to an unclean restart?
> Should this NumberFormatException be handled a bit more gracefully?
> We saved the corrupt file if it’s worth inspecting (although I doubt it will 
> be useful!)….
> The corrupt files appeared to be all zeroes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1758) corrupt recovery file prevents startup

2015-02-09 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1758:
---
Assignee: Manikumar Reddy
  Status: Patch Available  (was: Open)

> corrupt recovery file prevents startup
> --
>
> Key: KAFKA-1758
> URL: https://issues.apache.org/jira/browse/KAFKA-1758
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Jason Rosenberg
>Assignee: Manikumar Reddy
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1758.patch
>
>
> Hi,
> We recently had a kafka node go down suddenly. When it came back up, it 
> apparently had a corrupt recovery file, and refused to startup:
> {code}
> 2014-11-06 08:17:19,299  WARN [main] server.KafkaServer - Error starting up 
> KafkaServer
> java.lang.NumberFormatException: For input string: 
> "^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:481)
> at java.lang.Integer.parseInt(Integer.java:527)
> at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
> at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
> at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at 
> scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
> at kafka.log.LogManager.loadLogs(LogManager.scala:105)
> at kafka.log.LogManager.(LogManager.scala:57)
> at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:72)
> {code}
> And the app is under a monitor (so it was repeatedly restarting and failing 
> with this error for several minutes before we got to it)…
> We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it 
> then restarted cleanly (but of course re-synced all it’s data from replicas, 
> so we had no data loss).
> Anyway, I’m wondering if that’s the expected behavior? Or should it not 
> declare it corrupt and then proceed automatically to an unclean restart?
> Should this NumberFormatException be handled a bit more gracefully?
> We saved the corrupt file if it’s worth inspecting (although I doubt it will 
> be useful!)….
> The corrupt files appeared to be all zeroes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1758) corrupt recovery file prevents startup

2015-02-09 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1758:
---
Attachment: KAFKA-1758.patch

> corrupt recovery file prevents startup
> --
>
> Key: KAFKA-1758
> URL: https://issues.apache.org/jira/browse/KAFKA-1758
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Jason Rosenberg
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1758.patch
>
>
> Hi,
> We recently had a kafka node go down suddenly. When it came back up, it 
> apparently had a corrupt recovery file, and refused to startup:
> {code}
> 2014-11-06 08:17:19,299  WARN [main] server.KafkaServer - Error starting up 
> KafkaServer
> java.lang.NumberFormatException: For input string: 
> "^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:481)
> at java.lang.Integer.parseInt(Integer.java:527)
> at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
> at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
> at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106)
> at 
> kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at 
> scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
> at kafka.log.LogManager.loadLogs(LogManager.scala:105)
> at kafka.log.LogManager.(LogManager.scala:57)
> at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:72)
> {code}
> And the app is under a monitor (so it was repeatedly restarting and failing 
> with this error for several minutes before we got to it)…
> We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it 
> then restarted cleanly (but of course re-synced all it’s data from replicas, 
> so we had no data loss).
> Anyway, I’m wondering if that’s the expected behavior? Or should it not 
> declare it corrupt and then proceed automatically to an unclean restart?
> Should this NumberFormatException be handled a bit more gracefully?
> We saved the corrupt file if it’s worth inspecting (although I doubt it will 
> be useful!)….
> The corrupt files appeared to be all zeroes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1936) Track offset commit requests separately from produce requests

2015-02-09 Thread Aditya Auradkar (JIRA)
Aditya Auradkar created KAFKA-1936:
--

 Summary: Track offset commit requests separately from produce 
requests
 Key: KAFKA-1936
 URL: https://issues.apache.org/jira/browse/KAFKA-1936
 Project: Kafka
  Issue Type: Bug
Reporter: Aditya Auradkar
Assignee: Aditya Auradkar


In ReplicaManager, failed and total produce requests are updated from 
appendToLocalLog. Since offset commit requests also follow the same path, they 
are counted along with produce requests. Add a metric and count them separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 30801: Patch for KAFKA-1758

2015-02-09 Thread Manikumar Reddy O

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30801/
---

Review request for kafka.


Bugs: KAFKA-1758
https://issues.apache.org/jira/browse/KAFKA-1758


Repository: kafka


Description
---

handled NumberFormatException while reading recovery-point-offset-checkpoint 
file


Diffs
-

  core/src/main/scala/kafka/log/LogManager.scala 
47d250af62c1aa53d11204a332d0684fb4217c8d 

Diff: https://reviews.apache.org/r/30801/diff/


Testing
---


Thanks,

Manikumar Reddy O



Re: Review Request 30570: Patch for KAFKA-1914

2015-02-09 Thread Aditya Auradkar


> On Feb. 9, 2015, 6:29 p.m., Joel Koshy wrote:
> > core/src/main/scala/kafka/server/ReplicaManager.scala, lines 296-303
> > 
> >
> > I agree with Guozhang that the offset commits need to be disambiguated.
> > 
> > How about we do the following:
> > * Aditya, you can go ahead with this change. i.e., record the total 
> > rate in ReplicaManager
> > * File a separate jira to address the discrepancies. E.g., right now 
> > offset commit failures due to append failures are counted in failed produce 
> > rate.

Thanks Joel! I've filed this ticket 
https://issues.apache.org/jira/browse/KAFKA-1936.


- Aditya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30570/#review71660
---


On Feb. 3, 2015, 7:13 p.m., Aditya Auradkar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30570/
> ---
> 
> (Updated Feb. 3, 2015, 7:13 p.m.)
> 
> 
> Review request for kafka and Joel Koshy.
> 
> 
> Bugs: KAFKA-1914
> https://issues.apache.org/jira/browse/KAFKA-1914
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fixing KAFKA-1914. Adding metrics to count total number of produce and fetch 
> metrics
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/server/KafkaRequestHandler.scala 
> e4053fbe8ef78bf8bc39cb3f8ea4c21032613a16 
>   core/src/main/scala/kafka/server/ReplicaManager.scala 
> fb948b9ab28c516e81dab14dcbe211dcd99842b6 
>   core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
> ccf5e2e36260b2484181b81d1b06e81de972674b 
> 
> Diff: https://reviews.apache.org/r/30570/diff/
> 
> 
> Testing
> ---
> 
> I've added asserts to the SimpleFetchTest to count the number of fetch 
> requests. I'm going to file an additional jira to add unit tests for all the 
> BrokerTopicMetrics updated via ReplicaManager
> 
> 
> Thanks,
> 
> Aditya Auradkar
> 
>



[jira] [Resolved] (KAFKA-1661) Move MockConsumer and MockProducer from src/main to src/test

2015-02-09 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps resolved KAFKA-1661.
--
Resolution: Fixed

Hey guys, I'd like to close this. The mock is in the public part of the main 
jar intentionally. We really don't want people depending on anything in our 
test code--we have enough challenges keeping our public classes stable without 
people intertangling themselves with our unit tests.

It is possible that someone will ship something to production that uses 
MockProducer, but I don't think a separate jar will really do much to prevent 
that.

If any strong objections reopen and we can discuss further.

> Move MockConsumer and MockProducer from src/main to src/test
> 
>
> Key: KAFKA-1661
> URL: https://issues.apache.org/jira/browse/KAFKA-1661
> Project: Kafka
>  Issue Type: Task
>  Components: clients, consumer, producer 
>Affects Versions: 0.8.1.1
> Environment: N/A
>Reporter: Andras Hatvani
>Priority: Trivial
>  Labels: newbie, test
> Fix For: 0.8.3
>
>
> The MockConsumer and MockProducer are currently in src/main although they 
> belong in src/test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1927) Replace requests in kafka.api with requests in org.apache.kafka.common.requests

2015-02-09 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312581#comment-14312581
 ] 

Jay Kreps commented on KAFKA-1927:
--

Hey [~tongli] Gwen is right that this will be a pretty big chunk of code to 
move around. By all means go for it if you want to! Usually though it easier to 
get started with something kind of trivial just to get the mechanics of the 
code base, review process, etc down. Generally the best starting place is any 
bug marked with the newbie tag 
https://issues.apache.org/jira/browse/KAFKA-1926?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20newbie%20AND%20status%20%3D%20Open

> Replace requests in kafka.api with requests in 
> org.apache.kafka.common.requests
> ---
>
> Key: KAFKA-1927
> URL: https://issues.apache.org/jira/browse/KAFKA-1927
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> The common package introduced a better way of defining requests using a new 
> protocol definition DSL and also includes wrapper objects for these.
> We should switch KafkaApis over to use these request definitions and consider 
> the scala classes deprecated (we probably need to retain some of them for a 
> while for the scala clients).
> This will be a big improvement because
> 1. We will have each request now defined in only one place (Protocol.java)
> 2. We will have built-in support for multi-version requests
> 3. We will have much better error messages (no more cryptic underflow errors)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-313) Add JSON/CSV output and looping options to ConsumerOffsetChecker

2015-02-09 Thread Ashish Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312578#comment-14312578
 ] 

Ashish Kumar Singh commented on KAFKA-313:
--

Sounds good to me.

> Add JSON/CSV output and looping options to ConsumerOffsetChecker
> 
>
> Key: KAFKA-313
> URL: https://issues.apache.org/jira/browse/KAFKA-313
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Dave DeMaagd
>Assignee: Ashish Kumar Singh
>Priority: Minor
>  Labels: newbie, patch
> Fix For: 0.8.3
>
> Attachments: KAFKA-313-2012032200.diff, KAFKA-313.1.patch, 
> KAFKA-313.patch
>
>
> Adds:
> * '--loop N' - causes the program to loop forever, sleeping for up to N 
> seconds between loops (loop time minus collection time, unless that's less 
> than 0, at which point it will just run again immediately)
> * '--asjson' - display as a JSON string instead of the more human readable 
> output format.
> Neither of the above  depend on each other (you can loop in the human 
> readable output, or do a single shot execution with JSON output).  Existing 
> behavior/output maintained if neither of the above are used.  Diff Attached.
> Impacted files:
> core/src/main/scala/kafka/tools/ConsumerOffsetChecker.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30570: Patch for KAFKA-1914

2015-02-09 Thread Joel Koshy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30570/#review71660
---



core/src/main/scala/kafka/server/ReplicaManager.scala


I agree with Guozhang that the offset commits need to be disambiguated.

How about we do the following:
* Aditya, you can go ahead with this change. i.e., record the total rate in 
ReplicaManager
* File a separate jira to address the discrepancies. E.g., right now offset 
commit failures due to append failures are counted in failed produce rate.


- Joel Koshy


On Feb. 3, 2015, 7:13 p.m., Aditya Auradkar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30570/
> ---
> 
> (Updated Feb. 3, 2015, 7:13 p.m.)
> 
> 
> Review request for kafka and Joel Koshy.
> 
> 
> Bugs: KAFKA-1914
> https://issues.apache.org/jira/browse/KAFKA-1914
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fixing KAFKA-1914. Adding metrics to count total number of produce and fetch 
> metrics
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/server/KafkaRequestHandler.scala 
> e4053fbe8ef78bf8bc39cb3f8ea4c21032613a16 
>   core/src/main/scala/kafka/server/ReplicaManager.scala 
> fb948b9ab28c516e81dab14dcbe211dcd99842b6 
>   core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
> ccf5e2e36260b2484181b81d1b06e81de972674b 
> 
> Diff: https://reviews.apache.org/r/30570/diff/
> 
> 
> Testing
> ---
> 
> I've added asserts to the SimpleFetchTest to count the number of fetch 
> requests. I'm going to file an additional jira to add unit tests for all the 
> BrokerTopicMetrics updated via ReplicaManager
> 
> 
> Thanks,
> 
> Aditya Auradkar
> 
>



Re: org.apache.common migration

2015-02-09 Thread Jay Kreps
Hey Jun,

I think the existing scala clients should just remain as they are. There is
no point updating them, and as you say it would be quite fragile. The
conversion to the new requests would just be for the server usage.

-Jay

On Mon, Feb 9, 2015 at 9:48 AM, Jun Rao  wrote:

> We need to be a bit careful when doing 2b. Currently, our public apis
> include SimpleConsumer, which unfortunately exposes our RPC
> requests/responses. Doing 2b would mean api changes to SimpleConsumer. So,
> if we want to do 2b before 3, we would need to agree on making such api
> changes. Otherwise, 2b will need to be done after 3.
>
> Thanks,
>
> Jun
>
> On Sun, Feb 8, 2015 at 7:26 AM, Jay Kreps  wrote:
>
> > Hey all,
> >
> > Someone asked about why there is code duplication between
> org.apache.common
> > and core. The answer seemed like it might be useful to others, so
> including
> > it here:
> >
> > Originally Kafka was more of a proof of concept and we didn't separate
> the
> > clients from the server. LinkedIn was much smaller and it wasn't open
> > source, and keeping those separate always adds a lot of overhead. So we
> > ended up with just one big jar.
> >
> > Next thing we know the kafka jar is embedded everywhere. Lot's of fallout
> > from that
> > - It has to be really sensitive to dependencies
> > - Scala causes all kinds of pain for users. Ironically it causes the most
> > pain for people using scala because of compatibility. I think the single
> > biggest Kafka complaint was the scala clients and resulting scary
> > exceptions, lack of javadoc, etc.
> > - Many of the client interfaces weren't well thought out as permanent
> > long-term commitments.
> > - We new we had to rewrite both clients due to technical deficiencies
> > anyway. The clients really needed to move to non-blocking I/O which is
> > basically a rewrite on it's own.
> >
> > So how to go about that?
> >
> > Well we felt we needed to maintain the old client interfaces for a good
> > period of time. Any kind of breaking cut-over was kind of a non-starter.
> > But a major refactoring in place was really hard since so many classes
> were
> > public and so little attention had been paid to the difference between
> > public and private classes.
> >
> > Naturally since the client and server do the inverse of each other there
> is
> > a ton of shared logic. So we thought we needed to break it up into three
> > independent chunks:
> > 1. common - shared helper code used by both clients and server
> > 2. clients - the producer, consumer, and eventually admin java
> interfaces.
> > This depends on common.
> > 3. server - the server (and legacy clients). This is currently called
> core.
> > This will depend on common and clients (because sometimes the server
> needs
> > to make client requests)
> >
> > Common and clients were left as a single jar and just logically separate
> so
> > that people wouldn't have to deal with two jars (and hence the
> possibility
> > of getting different versions of each).
> >
> > The dependency is actually a little counter-intuitive to people--they
> > usually think of the client as depending on the server since the client
> > calls the server. But in terms of code dependencies it is the other
> way--if
> > you depend on the client you obviously don't want to drag in the server.
> >
> > So to get all this done we decided to just go big and do a rewrite of the
> > clients in Java. A result of this is that any shared code would have to
> > move to Java (so the clients don't pull in Scala). We felt this was
> > probably a good thing in its own right as it gave a chance to improve a
> few
> > of these utility libraries like config parsing, etc.
> >
> > So the plan was and is:
> > 1. Rewrite producer, release and roll out
> > 2a. Rewrite consumer, release and roll out
> > 2b. Migrate server from scala code to org.apache.common classes
> > 3. Deprecate scala clients
> >
> > (2a) Is is in flight now, and that means (2b) is totally up for grabs. Of
> > these the request conversion is definitely the most pressing since having
> > those defined twice duplicates a ton of work. We will have to be
> > hyper-conscientious during the conversion about making the shared code in
> > common really solve the problem well and conveniently on the server as
> well
> > (so we don't end up just shoe-horning it in). My hope is that we can
> treat
> > this common code really well--it isn't as permanent as the public classes
> > but ends up heavily used so we should take good care of it. Most the
> shared
> > code is private so we can refactor the stuff in common to meet the needs
> of
> > the server if we find mismatches or missing functionality. I tried to
> keep
> > in mind the eventual server usage while writing it, but I doubt it will
> be
> > as trivial as just deleting the old and adding the new.
> >
> > In terms of the simplicity:
> > - Converting exceptions should be trivial
> > - Converting utils is straight-forward but we should evaluate the
> > 

Re: Review Request 30570: Patch for KAFKA-1914

2015-02-09 Thread Aditya Auradkar


> On Feb. 9, 2015, 5:47 p.m., Guozhang Wang wrote:
> > core/src/main/scala/kafka/server/ReplicaManager.scala, lines 296-303
> > 
> >
> > appendToLocalLog can be called for both produce and commit-offset 
> > requests. And in general I think we might better record the request-level 
> > metrics on KafkaApis layer intead of the ReplicaManager layer. There are 
> > some previous work on isolating the request-level information in KafkaApis 
> > layer.
> 
> Aditya Auradkar wrote:
> All the BrokerTopicMetrics reporting happens below KafkaApis i.e. inside 
> the ReplicaManager and Log layer. Since failedProduceRequestRate is being 
> counted in appendToLocalLog, I felt it was more consistent to report 
> totalProduceRequestRate also in the same place. If we do want to report these 
> new stats in KafkaApis, then we should probably change existing metrics 
> reporting as well. bytesInRate, bytesOutRate, failedProduceRequestRate, 
> failedFetchRequestRate etc.. In general it does seem nicer to track these 
> metrics in KafkaApis and I can certainly work on this but IMO it should be 
> tracked separately from this issue. Thoughts?
> 
> Regarding offset-commit, aren't the new commit-offset requests simply 
> produce requests to a special topic? In that case, it doesn't seem 
> appropriate to do this here.

Typo: 
"Regarding offset-commit, aren't the new commit-offset requests simply produce 
requests to a special topic? In that case, it doesn't seem inappropriate to do 
this here."


- Aditya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30570/#review71646
---


On Feb. 3, 2015, 7:13 p.m., Aditya Auradkar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30570/
> ---
> 
> (Updated Feb. 3, 2015, 7:13 p.m.)
> 
> 
> Review request for kafka and Joel Koshy.
> 
> 
> Bugs: KAFKA-1914
> https://issues.apache.org/jira/browse/KAFKA-1914
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fixing KAFKA-1914. Adding metrics to count total number of produce and fetch 
> metrics
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/server/KafkaRequestHandler.scala 
> e4053fbe8ef78bf8bc39cb3f8ea4c21032613a16 
>   core/src/main/scala/kafka/server/ReplicaManager.scala 
> fb948b9ab28c516e81dab14dcbe211dcd99842b6 
>   core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
> ccf5e2e36260b2484181b81d1b06e81de972674b 
> 
> Diff: https://reviews.apache.org/r/30570/diff/
> 
> 
> Testing
> ---
> 
> I've added asserts to the SimpleFetchTest to count the number of fetch 
> requests. I'm going to file an additional jira to add unit tests for all the 
> BrokerTopicMetrics updated via ReplicaManager
> 
> 
> Thanks,
> 
> Aditya Auradkar
> 
>



Re: Review Request 30570: Patch for KAFKA-1914

2015-02-09 Thread Aditya Auradkar


> On Feb. 9, 2015, 5:47 p.m., Guozhang Wang wrote:
> > core/src/main/scala/kafka/server/ReplicaManager.scala, lines 296-303
> > 
> >
> > appendToLocalLog can be called for both produce and commit-offset 
> > requests. And in general I think we might better record the request-level 
> > metrics on KafkaApis layer intead of the ReplicaManager layer. There are 
> > some previous work on isolating the request-level information in KafkaApis 
> > layer.

All the BrokerTopicMetrics reporting happens below KafkaApis i.e. inside the 
ReplicaManager and Log layer. Since failedProduceRequestRate is being counted 
in appendToLocalLog, I felt it was more consistent to report 
totalProduceRequestRate also in the same place. If we do want to report these 
new stats in KafkaApis, then we should probably change existing metrics 
reporting as well. bytesInRate, bytesOutRate, failedProduceRequestRate, 
failedFetchRequestRate etc.. In general it does seem nicer to track these 
metrics in KafkaApis and I can certainly work on this but IMO it should be 
tracked separately from this issue. Thoughts?

Regarding offset-commit, aren't the new commit-offset requests simply produce 
requests to a special topic? In that case, it doesn't seem appropriate to do 
this here.


- Aditya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30570/#review71646
---


On Feb. 3, 2015, 7:13 p.m., Aditya Auradkar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30570/
> ---
> 
> (Updated Feb. 3, 2015, 7:13 p.m.)
> 
> 
> Review request for kafka and Joel Koshy.
> 
> 
> Bugs: KAFKA-1914
> https://issues.apache.org/jira/browse/KAFKA-1914
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fixing KAFKA-1914. Adding metrics to count total number of produce and fetch 
> metrics
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/server/KafkaRequestHandler.scala 
> e4053fbe8ef78bf8bc39cb3f8ea4c21032613a16 
>   core/src/main/scala/kafka/server/ReplicaManager.scala 
> fb948b9ab28c516e81dab14dcbe211dcd99842b6 
>   core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
> ccf5e2e36260b2484181b81d1b06e81de972674b 
> 
> Diff: https://reviews.apache.org/r/30570/diff/
> 
> 
> Testing
> ---
> 
> I've added asserts to the SimpleFetchTest to count the number of fetch 
> requests. I'm going to file an additional jira to add unit tests for all the 
> BrokerTopicMetrics updated via ReplicaManager
> 
> 
> Thanks,
> 
> Aditya Auradkar
> 
>



[jira] [Commented] (KAFKA-1925) Coordinator Node Id set to INT_MAX breaks coordinator metadata updates

2015-02-09 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312538#comment-14312538
 ] 

Guozhang Wang commented on KAFKA-1925:
--

Pushed to trunk.

> Coordinator Node Id set to INT_MAX breaks coordinator metadata updates
> --
>
> Key: KAFKA-1925
> URL: https://issues.apache.org/jira/browse/KAFKA-1925
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Critical
> Attachments: KAFKA-1925.v1.patch
>
>
> KafkaConsumer used INT_MAX to mimic a new socket for coordinator (details can 
> be found in KAFKA-1760). However, this behavior breaks the coordinator as the 
> underlying NetworkClient only used the node id to determine when to initiate 
> a new connection:
> {code}
> if (connectionStates.canConnect(node.id(), now))
> // if we are interested in sending to a node and we don't have a 
> connection to it, initiate one
> initiateConnect(node, now);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1925) Coordinator Node Id set to INT_MAX breaks coordinator metadata updates

2015-02-09 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-1925:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Coordinator Node Id set to INT_MAX breaks coordinator metadata updates
> --
>
> Key: KAFKA-1925
> URL: https://issues.apache.org/jira/browse/KAFKA-1925
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Critical
> Attachments: KAFKA-1925.v1.patch
>
>
> KafkaConsumer used INT_MAX to mimic a new socket for coordinator (details can 
> be found in KAFKA-1760). However, this behavior breaks the coordinator as the 
> underlying NetworkClient only used the node id to determine when to initiate 
> a new connection:
> {code}
> if (connectionStates.canConnect(node.id(), now))
> // if we are interested in sending to a node and we don't have a 
> connection to it, initiate one
> initiateConnect(node, now);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1935) Consumer should use a separate socket for Coordinator connection

2015-02-09 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-1935:


 Summary: Consumer should use a separate socket for Coordinator 
connection
 Key: KAFKA-1935
 URL: https://issues.apache.org/jira/browse/KAFKA-1935
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
 Fix For: 0.9.0


KAFKA-1925 is just a quick-fix of this issue, we need to let consumer to be 
able to create separate sockets for the same server for coordinator / broker 
roles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: org.apache.common migration

2015-02-09 Thread Jun Rao
We need to be a bit careful when doing 2b. Currently, our public apis
include SimpleConsumer, which unfortunately exposes our RPC
requests/responses. Doing 2b would mean api changes to SimpleConsumer. So,
if we want to do 2b before 3, we would need to agree on making such api
changes. Otherwise, 2b will need to be done after 3.

Thanks,

Jun

On Sun, Feb 8, 2015 at 7:26 AM, Jay Kreps  wrote:

> Hey all,
>
> Someone asked about why there is code duplication between org.apache.common
> and core. The answer seemed like it might be useful to others, so including
> it here:
>
> Originally Kafka was more of a proof of concept and we didn't separate the
> clients from the server. LinkedIn was much smaller and it wasn't open
> source, and keeping those separate always adds a lot of overhead. So we
> ended up with just one big jar.
>
> Next thing we know the kafka jar is embedded everywhere. Lot's of fallout
> from that
> - It has to be really sensitive to dependencies
> - Scala causes all kinds of pain for users. Ironically it causes the most
> pain for people using scala because of compatibility. I think the single
> biggest Kafka complaint was the scala clients and resulting scary
> exceptions, lack of javadoc, etc.
> - Many of the client interfaces weren't well thought out as permanent
> long-term commitments.
> - We new we had to rewrite both clients due to technical deficiencies
> anyway. The clients really needed to move to non-blocking I/O which is
> basically a rewrite on it's own.
>
> So how to go about that?
>
> Well we felt we needed to maintain the old client interfaces for a good
> period of time. Any kind of breaking cut-over was kind of a non-starter.
> But a major refactoring in place was really hard since so many classes were
> public and so little attention had been paid to the difference between
> public and private classes.
>
> Naturally since the client and server do the inverse of each other there is
> a ton of shared logic. So we thought we needed to break it up into three
> independent chunks:
> 1. common - shared helper code used by both clients and server
> 2. clients - the producer, consumer, and eventually admin java interfaces.
> This depends on common.
> 3. server - the server (and legacy clients). This is currently called core.
> This will depend on common and clients (because sometimes the server needs
> to make client requests)
>
> Common and clients were left as a single jar and just logically separate so
> that people wouldn't have to deal with two jars (and hence the possibility
> of getting different versions of each).
>
> The dependency is actually a little counter-intuitive to people--they
> usually think of the client as depending on the server since the client
> calls the server. But in terms of code dependencies it is the other way--if
> you depend on the client you obviously don't want to drag in the server.
>
> So to get all this done we decided to just go big and do a rewrite of the
> clients in Java. A result of this is that any shared code would have to
> move to Java (so the clients don't pull in Scala). We felt this was
> probably a good thing in its own right as it gave a chance to improve a few
> of these utility libraries like config parsing, etc.
>
> So the plan was and is:
> 1. Rewrite producer, release and roll out
> 2a. Rewrite consumer, release and roll out
> 2b. Migrate server from scala code to org.apache.common classes
> 3. Deprecate scala clients
>
> (2a) Is is in flight now, and that means (2b) is totally up for grabs. Of
> these the request conversion is definitely the most pressing since having
> those defined twice duplicates a ton of work. We will have to be
> hyper-conscientious during the conversion about making the shared code in
> common really solve the problem well and conveniently on the server as well
> (so we don't end up just shoe-horning it in). My hope is that we can treat
> this common code really well--it isn't as permanent as the public classes
> but ends up heavily used so we should take good care of it. Most the shared
> code is private so we can refactor the stuff in common to meet the needs of
> the server if we find mismatches or missing functionality. I tried to keep
> in mind the eventual server usage while writing it, but I doubt it will be
> as trivial as just deleting the old and adding the new.
>
> In terms of the simplicity:
> - Converting exceptions should be trivial
> - Converting utils is straight-forward but we should evaluate the
> individual utilities and see if they actually make sense, have tests, are
> used, etc.
> - Converting the requests may not be too complex but touches a huge hunk of
> code and may require some effort to decouple the network layer.
> - Converting the network code will be delicate and may require some changes
> in org.apache.common.network to meet the server's needs
>
> This is all a lot of work, but if we stick to it at the end we will have
> really nice clients and a nice modular code base.

Re: Review Request 30570: Patch for KAFKA-1914

2015-02-09 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30570/#review71646
---



core/src/main/scala/kafka/server/ReplicaManager.scala


appendToLocalLog can be called for both produce and commit-offset requests. 
And in general I think we might better record the request-level metrics on 
KafkaApis layer intead of the ReplicaManager layer. There are some previous 
work on isolating the request-level information in KafkaApis layer.


- Guozhang Wang


On Feb. 3, 2015, 7:13 p.m., Aditya Auradkar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30570/
> ---
> 
> (Updated Feb. 3, 2015, 7:13 p.m.)
> 
> 
> Review request for kafka and Joel Koshy.
> 
> 
> Bugs: KAFKA-1914
> https://issues.apache.org/jira/browse/KAFKA-1914
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fixing KAFKA-1914. Adding metrics to count total number of produce and fetch 
> metrics
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/server/KafkaRequestHandler.scala 
> e4053fbe8ef78bf8bc39cb3f8ea4c21032613a16 
>   core/src/main/scala/kafka/server/ReplicaManager.scala 
> fb948b9ab28c516e81dab14dcbe211dcd99842b6 
>   core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
> ccf5e2e36260b2484181b81d1b06e81de972674b 
> 
> Diff: https://reviews.apache.org/r/30570/diff/
> 
> 
> Testing
> ---
> 
> I've added asserts to the SimpleFetchTest to count the number of fetch 
> requests. I'm going to file an additional jira to add unit tests for all the 
> BrokerTopicMetrics updated via ReplicaManager
> 
> 
> Thanks,
> 
> Aditya Auradkar
> 
>



Re: Review Request 30777: Patch for KAFKA-1919

2015-02-09 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30777/#review71643
---

Ship it!


Ship It!

- Guozhang Wang


On Feb. 9, 2015, 1:01 a.m., Jay Kreps wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30777/
> ---
> 
> (Updated Feb. 9, 2015, 1:01 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1919
> https://issues.apache.org/jira/browse/KAFKA-1919
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1919: Always update the metadata, when a metadata response is received 
> to ensure we back off.
> 
> 
> Diffs
> -
> 
>   clients/src/main/java/org/apache/kafka/clients/Metadata.java 
> b8cdd145bfcc6633763b25fc9812c49627c8df92 
>   clients/src/main/java/org/apache/kafka/clients/NetworkClient.java 
> fef90a03ed04d2ded1971f4a7b69b730494aacf8 
> 
> Diff: https://reviews.apache.org/r/30777/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jay Kreps
> 
>



Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations

2015-02-09 Thread Guozhang Wang
I feel the benefits of lowering the development bar for new clients does
not worth the complexity we need to introduce in the server side, as today
the clients just need one more request type (metadata request) to send the
produce / fetch to the right brokers, whereas re-routing mechanism will
result in complicated between-brokers communication patterns that
potentially impact Kafka performance and making debugging / trouble
shooting much harder.

An alternative way to ease the development of the clients is to use a proxy
in front of the kafka servers, like the rest proxy we have built before,
which we use for non-java clients primarily but also can be treated as
handling cluster metadata discovery for clients. Comparing to the
re-routing idea, the proxy also introduces two-hops but its layered
architecture is simpler.

Guozhang


On Sun, Feb 8, 2015 at 8:00 AM, Jay Kreps  wrote:

> Hey Jiangjie,
>
> Re routing support doesn't force clients to use it. Java and all existing
> clients would work as now where request are intelligently routed by the
> client, but this would lower the bar for new clients. That said I agree the
> case for reroute get admin commands is much stronger than data.
>
> The idea of separating admin/metadata from would definitely solve some
> problems but it would also add a lot of complexity--new ports, thread
> pools, etc. this is an interesting idea to think over but I'm not sure if
> it's worth it. Probably a separate effort in any case.
>
> -jay
>
> On Friday, February 6, 2015, Jiangjie Qin 
> wrote:
>
> > I¹m a little bit concerned about the request routers among brokers.
> > Typically we have a dominant percentage of produce and fetch
> > request/response. Routing them from one broker to another seems not
> wanted.
> > Also I think we generally have two types of requests/responses: data
> > related and admin related. It is typically a good practice to separate
> > data plain from control plain. That suggests we should have another admin
> > port to serve those admin requests and probably have different
> > authentication/authorization from the data port.
> >
> > Jiangjie (Becket) Qin
> >
> > On 2/6/15, 11:18 AM, "Joe Stein"  wrote:
> >
> > >I updated the installation and sample usage for the existing patches on
> > >the
> > >KIP site
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and
> > >+centralized+administrative+operations
> > >
> > >There are still a few pending items here.
> > >
> > >1) There was already some discussion about using the Broker that is the
> > >Controller here https://issues.apache.org/jira/browse/KAFKA-1772 and we
> > >should elaborate on that more in the thread or agree we are ok with
> admin
> > >asking for the controller to talk to and then just sending that broker
> the
> > >admin tasks.
> > >
> > >2) I like this idea https://issues.apache.org/jira/browse/KAFKA-1912
> but
> > >we
> > >can refactor after KAFK-1694 committed, no? I know folks just want to
> talk
> > >to the broker that is the controller. It may even become useful to have
> > >the
> > >controller run on a broker that isn't even a topic broker anymore (small
> > >can of worms I am opening here but it elaborates on Guozhang's hot spot
> > >point.
> > >
> > >3) anymore feedback?
> > >
> > >- Joe Stein
> > >
> > >On Fri, Jan 23, 2015 at 3:15 PM, Guozhang Wang 
> > wrote:
> > >
> > >> A centralized admin operation protocol would be very useful.
> > >>
> > >> One more general comment here is that controller is originally
> designed
> > >>to
> > >> only talk to other brokers through ControllerChannel, while the broker
> > >> instance which carries the current controller is agnostic of its
> > >>existence,
> > >> and use KafkaApis to handle general Kafka requests. Having all admin
> > >> requests redirected to the controller instance will force the broker
> to
> > >>be
> > >> aware of its carried controller, and access its internal data for
> > >>handling
> > >> these requests. Plus with the number of clients out of Kafka's
> control,
> > >> this may easily cause the controller to be a hot spot in terms of
> > >>request
> > >> load.
> > >>
> > >>
> > >> On Thu, Jan 22, 2015 at 10:09 PM, Joe Stein 
> > >>wrote:
> > >>
> > >> > inline
> > >> >
> > >> > On Thu, Jan 22, 2015 at 11:59 PM, Jay Kreps 
> > >>wrote:
> > >> >
> > >> > > Hey Joe,
> > >> > >
> > >> > > This is great. A few comments on KIP-4
> > >> > >
> > >> > > 1. This is much needed functionality, but there are a lot of the
> so
> > >> let's
> > >> > > really think these protocols through. We really want to end up
> with
> > >>a
> > >> set
> > >> > > of well thought-out, orthoganol apis. For this reason I think it
> is
> > >> > really
> > >> > > important to think through the end state even if that includes
> APIs
> > >>we
> > >> > > won't implement in the first phase.
> > >> > >
> > >> >
> > >> > ok
> > >> >
> > >> >
> > >> > >
> > >> > > 2. Let's please please please wait until we have switched the
> ser

[jira] [Commented] (KAFKA-1927) Replace requests in kafka.api with requests in org.apache.kafka.common.requests

2015-02-09 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312466#comment-14312466
 ] 

Gwen Shapira commented on KAFKA-1927:
-

Hey, didn't mean to make it sound like you are a new developer :) I meant that 
its a bit of a large refactoring for someone who is new to Kafka.
Glad to have you on board whether you choose to pick this one up or any other 
issue.

I suspect that its not very urgent or [~jkreps] would have uploaded a patch 
already ;)


> Replace requests in kafka.api with requests in 
> org.apache.kafka.common.requests
> ---
>
> Key: KAFKA-1927
> URL: https://issues.apache.org/jira/browse/KAFKA-1927
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> The common package introduced a better way of defining requests using a new 
> protocol definition DSL and also includes wrapper objects for these.
> We should switch KafkaApis over to use these request definitions and consider 
> the scala classes deprecated (we probably need to retain some of them for a 
> while for the scala clients).
> This will be a big improvement because
> 1. We will have each request now defined in only one place (Protocol.java)
> 2. We will have built-in support for multi-version requests
> 3. We will have much better error messages (no more cryptic underflow errors)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Ivanov updated KAFKA-1933:

Description: 
This patch adds finer locking when appending to log. It breaks
global append lock into 2 sequential and 1 parallel phase.

Basic idea is to allow every thread to "reserve" offsets in non
overlapping ranges, then do compression in parallel and then
"commit" write to log in the same order offsets where reserved.

Results on a server with 16 cores CPU available:
gzip: 564.0 sec -> 45.2 sec (12.4x speedup)
LZ4: 56.7 sec -> 9.9 sec (5.7x speedup)

Kafka was configured to run 16  IO threads, data was pushed using 32 netcat 
instances pushing in parallel batches of 200 msg 6.2 kb each (3264 MB in total)

  was:
This patch adds finer locking when appending to log. It breaks
global append lock into 2 sequential and 1 parallel phase.

Basic idea is to allow every thread to "reserve" offsets in non
overlapping ranges, then do compression in parallel and then
"commit" write to log in the same order offsets where reserved.

On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following performance 
boost:

LZ4: 7.2 sec -> 4.2 sec
Gzip: 62.3 sec -> 26.9 sec

Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in total, 
82180 messages in total)



> Fine-grained locking in log append
> --
>
> Key: KAFKA-1933
> URL: https://issues.apache.org/jira/browse/KAFKA-1933
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Maxim Ivanov
>Assignee: Jay Kreps
>Priority: Minor
> Fix For: 0.8.2
>
> Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:27:06.patch
>
>
> This patch adds finer locking when appending to log. It breaks
> global append lock into 2 sequential and 1 parallel phase.
> Basic idea is to allow every thread to "reserve" offsets in non
> overlapping ranges, then do compression in parallel and then
> "commit" write to log in the same order offsets where reserved.
> Results on a server with 16 cores CPU available:
> gzip: 564.0 sec -> 45.2 sec (12.4x speedup)
> LZ4: 56.7 sec -> 9.9 sec (5.7x speedup)
> Kafka was configured to run 16  IO threads, data was pushed using 32 netcat 
> instances pushing in parallel batches of 200 msg 6.2 kb each (3264 MB in 
> total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1737) Document required ZkSerializer for ZkClient used with AdminUtils

2015-02-09 Thread Vivek Madani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312350#comment-14312350
 ] 

Vivek Madani commented on KAFKA-1737:
-

I'll take care of this and submit a new patch - most likely tomorrow

> Document required ZkSerializer for ZkClient used with AdminUtils
> 
>
> Key: KAFKA-1737
> URL: https://issues.apache.org/jira/browse/KAFKA-1737
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 0.8.1.1
>Reporter: Stevo Slavic
>Assignee: Vivek Madani
>Priority: Minor
> Attachments: KAFKA-1737.patch
>
>
> {{ZkClient}} instances passed to {{AdminUtils}} calls must have 
> {{kafka.utils.ZKStringSerializer}} set as {{ZkSerializer}}. Otherwise 
> commands executed via {{AdminUtils}} may not be seen/recognizable to broker, 
> producer or consumer. E.g. producer (with auto topic creation turned off) 
> will not be able to send messages to a topic created via {{AdminUtils}}, it 
> will throw {{UnknownTopicOrPartitionException}}.
> Please consider at least documenting this requirement in {{AdminUtils}} 
> scaladoc.
> For more info see [related discussion on Kafka user mailing 
> list|http://mail-archives.apache.org/mod_mbox/kafka-users/201410.mbox/%3CCAAUywg-oihNiXuQRYeS%3D8Z3ymsmEHo6ghLs%3Dru4nbm%2BdHVz6TA%40mail.gmail.com%3E].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312349#comment-14312349
 ] 

Maxim Ivanov commented on KAFKA-1933:
-

Got a results on a beefier machine  with 16 cores CPU available, kafka 
configured with 16 IO threads, 32 netcat clients pushing messages:

gzip: 564.0 sec -> 45.2 sec (12.4x speedup)
LZ4: 56.7 sec -> 9.9 sec (5.7x speedup)


> Fine-grained locking in log append
> --
>
> Key: KAFKA-1933
> URL: https://issues.apache.org/jira/browse/KAFKA-1933
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Maxim Ivanov
>Assignee: Jay Kreps
>Priority: Minor
> Fix For: 0.8.2
>
> Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:27:06.patch
>
>
> This patch adds finer locking when appending to log. It breaks
> global append lock into 2 sequential and 1 parallel phase.
> Basic idea is to allow every thread to "reserve" offsets in non
> overlapping ranges, then do compression in parallel and then
> "commit" write to log in the same order offsets where reserved.
> On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
> performance boost:
> LZ4: 7.2 sec -> 4.2 sec
> Gzip: 62.3 sec -> 26.9 sec
> Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
> instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
> total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1927) Replace requests in kafka.api with requests in org.apache.kafka.common.requests

2015-02-09 Thread Tong Li (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312280#comment-14312280
 ] 

Tong Li commented on KAFKA-1927:


Gwen Shapira, indeed. I was doing OpenStack development for quite some time. 
Recently IBM wants to be involved in Kafka, that is why I am here. I like to 
give it a shot. Reading the link and the process probably will be slow. How 
urgent is this patch being needed? If this is very urgent, probably better for 
some more experienced Kafka developer. Thanks a lot for your quick response 
though.

> Replace requests in kafka.api with requests in 
> org.apache.kafka.common.requests
> ---
>
> Key: KAFKA-1927
> URL: https://issues.apache.org/jira/browse/KAFKA-1927
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> The common package introduced a better way of defining requests using a new 
> protocol definition DSL and also includes wrapper objects for these.
> We should switch KafkaApis over to use these request definitions and consider 
> the scala classes deprecated (we probably need to retain some of them for a 
> while for the scala clients).
> This will be a big improvement because
> 1. We will have each request now defined in only one place (Protocol.java)
> 2. We will have built-in support for multi-version requests
> 3. We will have much better error messages (no more cryptic underflow errors)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1927) Replace requests in kafka.api with requests in org.apache.kafka.common.requests

2015-02-09 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312263#comment-14312263
 ] 

Gwen Shapira commented on KAFKA-1927:
-

I'm not sure I'd take it as my first patch (in fact, I find the scope here a 
bit intimidating).

But if you are feeling brave, I'll be happy to help out :)

If you look at core/src/main/scala/kafka/api you'll see gazillion 
request/response objects? 
Many (but not all) of those are documented in: 
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol

Those are all duplicated in:
/Users/gshapira/workspace/kafka/clients/src/main/java/org/apache/kafka/common/protocol/Protocol.java

The old request/responses are used all over "core" module.

The goal here is to remove them and use the ones in o.a.k.common.protocol in 
"core" module.

The API themselves should not change, unittests and system tests should pass - 
as well as upgrade tests (i.e. old client with new brokers and old and new 
broker mix)

> Replace requests in kafka.api with requests in 
> org.apache.kafka.common.requests
> ---
>
> Key: KAFKA-1927
> URL: https://issues.apache.org/jira/browse/KAFKA-1927
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> The common package introduced a better way of defining requests using a new 
> protocol definition DSL and also includes wrapper objects for these.
> We should switch KafkaApis over to use these request definitions and consider 
> the scala classes deprecated (we probably need to retain some of them for a 
> while for the scala clients).
> This will be a big improvement because
> 1. We will have each request now defined in only one place (Protocol.java)
> 2. We will have built-in support for multi-version requests
> 3. We will have much better error messages (no more cryptic underflow errors)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1926) Replace kafka.utils.Utils with o.a.k.common.utils.Utils

2015-02-09 Thread Tong Li (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312229#comment-14312229
 ] 

Tong Li commented on KAFKA-1926:


Jay, if you can provide a bit more direction, I would like to start working on 
these things.

> Replace kafka.utils.Utils with o.a.k.common.utils.Utils
> ---
>
> Key: KAFKA-1926
> URL: https://issues.apache.org/jira/browse/KAFKA-1926
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>  Labels: newbie
>
> There is currently a lot of duplication between the Utils class in common and 
> the one in core.
> Our plan has been to deprecate duplicate code in the server and replace it 
> with the new common code.
> As such we should evaluate each method in the scala Utils and do one of the 
> following:
> 1. Migrate it to o.a.k.common.utils.Utils if it is a sensible general purpose 
> utility in active use that is not Kafka-specific. If we migrate it we should 
> really think about the API and make sure there is some test coverage. A few 
> things in there are kind of funky and we shouldn't just blindly copy them 
> over.
> 2. Create a new class ServerUtils or ScalaUtils in kafka.utils that will hold 
> any utilities that really need to make use of Scala features to be convenient.
> 3. Delete it if it is not used, or has a bad api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1927) Replace requests in kafka.api with requests in org.apache.kafka.common.requests

2015-02-09 Thread Tong Li (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312224#comment-14312224
 ] 

Tong Li commented on KAFKA-1927:


I wonder if Jay can provide a bit more information so that I can start looking 
at this and working on providing patch?

> Replace requests in kafka.api with requests in 
> org.apache.kafka.common.requests
> ---
>
> Key: KAFKA-1927
> URL: https://issues.apache.org/jira/browse/KAFKA-1927
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> The common package introduced a better way of defining requests using a new 
> protocol definition DSL and also includes wrapper objects for these.
> We should switch KafkaApis over to use these request definitions and consider 
> the scala classes deprecated (we probably need to retain some of them for a 
> while for the scala clients).
> This will be a big improvement because
> 1. We will have each request now defined in only one place (Protocol.java)
> 2. We will have built-in support for multi-version requests
> 3. We will have much better error messages (no more cryptic underflow errors)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312181#comment-14312181
 ] 

Maxim Ivanov commented on KAFKA-1933:
-

Updated patch, now it correctly calculates offset ranges. On next iteration 
I'll clenup spaghetti code in Log.append so it is more clear  what is happening

> Fine-grained locking in log append
> --
>
> Key: KAFKA-1933
> URL: https://issues.apache.org/jira/browse/KAFKA-1933
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Maxim Ivanov
>Assignee: Jay Kreps
>Priority: Minor
> Fix For: 0.8.2
>
> Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:27:06.patch
>
>
> This patch adds finer locking when appending to log. It breaks
> global append lock into 2 sequential and 1 parallel phase.
> Basic idea is to allow every thread to "reserve" offsets in non
> overlapping ranges, then do compression in parallel and then
> "commit" write to log in the same order offsets where reserved.
> On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
> performance boost:
> LZ4: 7.2 sec -> 4.2 sec
> Gzip: 62.3 sec -> 26.9 sec
> Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
> instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
> total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Ivanov updated KAFKA-1933:

Attachment: KAFKA-1933_2015-02-09_12:27:06.patch

> Fine-grained locking in log append
> --
>
> Key: KAFKA-1933
> URL: https://issues.apache.org/jira/browse/KAFKA-1933
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Maxim Ivanov
>Assignee: Jay Kreps
>Priority: Minor
> Fix For: 0.8.2
>
> Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:27:06.patch
>
>
> This patch adds finer locking when appending to log. It breaks
> global append lock into 2 sequential and 1 parallel phase.
> Basic idea is to allow every thread to "reserve" offsets in non
> overlapping ranges, then do compression in parallel and then
> "commit" write to log in the same order offsets where reserved.
> On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
> performance boost:
> LZ4: 7.2 sec -> 4.2 sec
> Gzip: 62.3 sec -> 26.9 sec
> Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
> instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
> total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30775: Fine-grained locking in log.append

2015-02-09 Thread Maxim Ivanov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30775/
---

(Updated Feb. 9, 2015, 12:27 p.m.)


Review request for kafka.


Bugs: KAFKA-1933
https://issues.apache.org/jira/browse/KAFKA-1933


Repository: kafka


Description (updated)
---

This patch adds finer locking when appending to log. It breaks
global append lock into 2 sequential and 1 parallel phase.

Basic idea is to allow every thread to "reserve" offsets in non
overlapping ranges, then do compression in parallel and then
"commit" write to log in the same order offsets where reserved.

Fix offset range calculation


Forgot to commit modified ByteBufferMessageSet


Diffs (updated)
-

  core/src/main/scala/kafka/log/Log.scala 
ec192155bec7b643025f8044b0b6565c7b9977d1 
  core/src/main/scala/kafka/message/ByteBufferMessageSet.scala 
788c7864bc881b935975ab4a4e877b690e65f1f1 

Diff: https://reviews.apache.org/r/30775/diff/


Testing
---


Thanks,

Maxim Ivanov



[jira] [Commented] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312171#comment-14312171
 ] 

Maxim Ivanov commented on KAFKA-1933:
-

Updated reviewboard https://reviews.apache.org/r/30775/diff/
 against branch origin/0.8.2

> Fine-grained locking in log append
> --
>
> Key: KAFKA-1933
> URL: https://issues.apache.org/jira/browse/KAFKA-1933
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Maxim Ivanov
>Assignee: Jay Kreps
>Priority: Minor
> Fix For: 0.8.2
>
> Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:27:06.patch
>
>
> This patch adds finer locking when appending to log. It breaks
> global append lock into 2 sequential and 1 parallel phase.
> Basic idea is to allow every thread to "reserve" offsets in non
> overlapping ranges, then do compression in parallel and then
> "commit" write to log in the same order offsets where reserved.
> On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
> performance boost:
> LZ4: 7.2 sec -> 4.2 sec
> Gzip: 62.3 sec -> 26.9 sec
> Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
> instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
> total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Ivanov updated KAFKA-1933:

Attachment: (was: KAFKA-1933_2015-02-09_12:15:06.patch)

> Fine-grained locking in log append
> --
>
> Key: KAFKA-1933
> URL: https://issues.apache.org/jira/browse/KAFKA-1933
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Maxim Ivanov
>Assignee: Jay Kreps
>Priority: Minor
> Fix For: 0.8.2
>
> Attachments: KAFKA-1933.patch
>
>
> This patch adds finer locking when appending to log. It breaks
> global append lock into 2 sequential and 1 parallel phase.
> Basic idea is to allow every thread to "reserve" offsets in non
> overlapping ranges, then do compression in parallel and then
> "commit" write to log in the same order offsets where reserved.
> On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
> performance boost:
> LZ4: 7.2 sec -> 4.2 sec
> Gzip: 62.3 sec -> 26.9 sec
> Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
> instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
> total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Ivanov updated KAFKA-1933:

Description: 
This patch adds finer locking when appending to log. It breaks
global append lock into 2 sequential and 1 parallel phase.

Basic idea is to allow every thread to "reserve" offsets in non
overlapping ranges, then do compression in parallel and then
"commit" write to log in the same order offsets where reserved.

On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following performance 
boost:

LZ4: 7.2 sec -> 4.2 sec
Gzip: 62.3 sec -> 26.9 sec

Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in total, 
82180 messages in total)


  was:
This patch adds finer locking when appending to log. It breaks
global append lock into 2 sequential and 1 parallel phase.

Basic idea is to allow every thread to "reserve" offsets in non
overlapping ranges, then do compression in parallel and then
"commit" write to log in the same order offsets where reserved.

On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following performance 
boost:

LZ4: 7.2 sec -> 3.9 sec
Gzip: 62.3 sec -> 24.8 sec

Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in total, 
82180 messages in total)



> Fine-grained locking in log append
> --
>
> Key: KAFKA-1933
> URL: https://issues.apache.org/jira/browse/KAFKA-1933
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Maxim Ivanov
>Assignee: Jay Kreps
>Priority: Minor
> Fix For: 0.8.2
>
> Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:15:06.patch
>
>
> This patch adds finer locking when appending to log. It breaks
> global append lock into 2 sequential and 1 parallel phase.
> Basic idea is to allow every thread to "reserve" offsets in non
> overlapping ranges, then do compression in parallel and then
> "commit" write to log in the same order offsets where reserved.
> On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
> performance boost:
> LZ4: 7.2 sec -> 4.2 sec
> Gzip: 62.3 sec -> 26.9 sec
> Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
> instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
> total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Ivanov updated KAFKA-1933:

Attachment: KAFKA-1933_2015-02-09_12:15:06.patch

> Fine-grained locking in log append
> --
>
> Key: KAFKA-1933
> URL: https://issues.apache.org/jira/browse/KAFKA-1933
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Maxim Ivanov
>Assignee: Jay Kreps
>Priority: Minor
> Fix For: 0.8.2
>
> Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:15:06.patch
>
>
> This patch adds finer locking when appending to log. It breaks
> global append lock into 2 sequential and 1 parallel phase.
> Basic idea is to allow every thread to "reserve" offsets in non
> overlapping ranges, then do compression in parallel and then
> "commit" write to log in the same order offsets where reserved.
> On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
> performance boost:
> LZ4: 7.2 sec -> 3.9 sec
> Gzip: 62.3 sec -> 24.8 sec
> Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
> instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
> total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312160#comment-14312160
 ] 

Maxim Ivanov commented on KAFKA-1933:
-

Updated reviewboard https://reviews.apache.org/r/30775/diff/
 against branch origin/0.8.2

> Fine-grained locking in log append
> --
>
> Key: KAFKA-1933
> URL: https://issues.apache.org/jira/browse/KAFKA-1933
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Maxim Ivanov
>Assignee: Jay Kreps
>Priority: Minor
> Fix For: 0.8.2
>
> Attachments: KAFKA-1933.patch, KAFKA-1933_2015-02-09_12:15:06.patch
>
>
> This patch adds finer locking when appending to log. It breaks
> global append lock into 2 sequential and 1 parallel phase.
> Basic idea is to allow every thread to "reserve" offsets in non
> overlapping ranges, then do compression in parallel and then
> "commit" write to log in the same order offsets where reserved.
> On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
> performance boost:
> LZ4: 7.2 sec -> 3.9 sec
> Gzip: 62.3 sec -> 24.8 sec
> Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
> instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
> total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30775: Fine-grained locking in log.append

2015-02-09 Thread Maxim Ivanov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30775/
---

(Updated Feb. 9, 2015, 12:15 p.m.)


Review request for kafka.


Bugs: KAFKA-1933
https://issues.apache.org/jira/browse/KAFKA-1933


Repository: kafka


Description (updated)
---

This patch adds finer locking when appending to log. It breaks
global append lock into 2 sequential and 1 parallel phase.

Basic idea is to allow every thread to "reserve" offsets in non
overlapping ranges, then do compression in parallel and then
"commit" write to log in the same order offsets where reserved.

Fix offset range calculation


Diffs (updated)
-

  core/src/main/scala/kafka/log/Log.scala 
ec192155bec7b643025f8044b0b6565c7b9977d1 

Diff: https://reviews.apache.org/r/30775/diff/


Testing
---


Thanks,

Maxim Ivanov



[jira] [Comment Edited] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311977#comment-14311977
 ] 

Maxim Ivanov edited comment on KAFKA-1933 at 2/9/15 10:38 AM:
--

Hi, thank for looking into this. 

Today I realised that my assumptions were wrong, and without checking them I 
proceeded with writing a patch. Whole idea is based on the fact, that number of 
messages is known in advance and I used "shallowCount" to get this number for 
some reason. I don't know what got into my head, because shallow count turned 
out to be always 1. So the whole offset allocation logic is wrong here. I need 
to redo it to do decompression before reservin correct range of offsets. I'll 
redo it in near future and resubmit version 2.

As for your concerns, the locking scheme isn't very sophisticated, with a bit 
of (much needed) cleanup in log.append it will be easy to follow.

1. it will be present in one way or another, because we have to synchronize 
access in 2 phases and have a constraint that second synhronization should be 
done in the same order as first one. I am not familiar with JVM/Scala 
concurrency primitives, I couldn't find anything in java.utils.concurrent which 
can help me achieving that out of the box. If you prefer I can abstract it into 
separate class, then from log.Log point of view it will be sequence of actions: 
1) register in the queue and obtain the ticket 2) wait in the queue presenting 
its ticket, where ticket is pair of semaphores, but that would be such a thin 
shim, that I decided just to do it all in place. If you have other ideas how to 
synchronize and resynchornize again, keeping the order, I'd be happy utilize 
your way of doing it if it makes merging patch easier.

2. Non-compressed case as well as "assignOffset = false" mode should be 
refactored into separate code paths. My thinking was that once idea got 
approval, I'll do it. So at the end there should be no impact for 
non-compressed case

3. There must be more than that. Raw gzip compressor is capable of processing 
~40MB/sec, Kafka with single topic,single partition, no replication and 5 
netcat clients pushing into it prerecorded messages (== infinite pipelining)  
is capable of doing 8.18 MB/sec, so the overhead of whole Kafka system is 
massive, especially given the fact that network handling and parsing is done in 
separate threads. Paralellizing compression seemed to bring most value for time 
spent on the patch and wouldn't prevent any other optimisations to take place.

4. That was my thinking the moment I discovered why our Kafka servers are 
choking on CPU, not network or disks, when massive push from hadoop is 
happening. Kafka had a chance to implement that when protocol was changing in 
0.8, but now it is very intrusive, I certainly would not propose it as my first 
patch :) Changes to Log.append are self containing, least intrusive, very local 
and most important give a relief to our immediate problem without thinking 
about migration to new log format or changing client protocol. When there will 
be anoter breaking change, please do so, but seeing how kafka 0.7 -> 0.8 
migration is ongoing, you wont make many friends.



was (Author: rossmohax):
Hi, thank for looking into this. 

Today I realised that my assumptions were wrong, and without checking them I 
proceeded with writing a patch. Whole idea is based on the fact, that number of 
messages is known in advance and I used "shallowCount" to get this number for 
some reason. I don't know what got into my head, because shallow count turned 
out to be always 1. So the whole offset allocation logic is wrong here. I need 
to redo it to do decompression in critical path, just to reserve correct range 
of offsets, that in turn will make performance improvement of this patch 
smaller, but still substantial. I'll redo it in near future and resubmit 
version 2.

As for your concerns, the locking scheme isn't very sophisticated, with a bit 
of (much needed) cleanup in log.append it will be easy to follow.

1. it will be present in one way or another, because we have to synchronize 
access in 2 phases and have a constraint that second synhronization should be 
done in the same order as first one. I am not familiar with JVM/Scala 
concurrency primitives, I couldn't find anything in java.utils.concurrent which 
can help me achieving that out of the box. If you prefer I can abstract it into 
separate class, then from log.Log point of view it will be sequence of actions: 
1) register in the queue and obtain the ticket 2) wait in the queue presenting 
its ticket, where ticket is pair of semaphores, but that would be such a thin 
shim, that I decided just to do it all in place. If you have other ideas how to 
synchronize and resynchornize again, keeping the order, I'd be happy utilize 
your way of doing it if it makes merging patch easier.

2. Non-

[jira] [Commented] (KAFKA-1933) Fine-grained locking in log append

2015-02-09 Thread Maxim Ivanov (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311977#comment-14311977
 ] 

Maxim Ivanov commented on KAFKA-1933:
-

Hi, thank for looking into this. 

Today I realised that my assumptions were wrong, and without checking them I 
proceeded with writing a patch. Whole idea is based on the fact, that number of 
messages is known in advance and I used "shallowCount" to get this number for 
some reason. I don't know what got into my head, because shallow count turned 
out to be always 1. So the whole offset allocation logic is wrong here. I need 
to redo it to do decompression in critical path, just to reserve correct range 
of offsets, that in turn will make performance improvement of this patch 
smaller, but still substantial. I'll redo it in near future and resubmit 
version 2.

As for your concerns, the locking scheme isn't very sophisticated, with a bit 
of (much needed) cleanup in log.append it will be easy to follow.

1. it will be present in one way or another, because we have to synchronize 
access in 2 phases and have a constraint that second synhronization should be 
done in the same order as first one. I am not familiar with JVM/Scala 
concurrency primitives, I couldn't find anything in java.utils.concurrent which 
can help me achieving that out of the box. If you prefer I can abstract it into 
separate class, then from log.Log point of view it will be sequence of actions: 
1) register in the queue and obtain the ticket 2) wait in the queue presenting 
its ticket, where ticket is pair of semaphores, but that would be such a thin 
shim, that I decided just to do it all in place. If you have other ideas how to 
synchronize and resynchornize again, keeping the order, I'd be happy utilize 
your way of doing it if it makes merging patch easier.

2. Non-compressed case as well as "assignOffset = false" mode should be 
refactored into separate code paths. My thinking was that once idea got 
approval, I'll do it. So at the end there should be no impact for 
non-compressed case

3. There must be more than that. Raw gzip compressor is capable of processing 
~40MB/sec, Kafka with single topic,single partition, no replication and 5 
netcat clients pushing into it prerecorded messages (== infinite pipelining)  
is capable of doing 8.18 MB/sec, so the overhead of whole Kafka system is 
massive, especially given the fact that network handling and parsing is done in 
separate threads. Paralellizing compression seemed to bring most value for time 
spent on the patch and wouldn't prevent any other optimisations to take place.

4. That was my thinking the moment I discovered why our Kafka servers are 
choking on CPU, not network or disks, when massive push from hadoop is 
happening. Kafka had a chance to implement that when protocol was changing in 
0.8, but now it is very intrusive, I certainly would not propose it as my first 
patch :) Changes to Log.append are self containing, least intrusive, very local 
and most important give a relief to our immediate problem without thinking 
about migration to new log format or changing client protocol. When there will 
be anoter breaking change, please do so, but seeing how kafka 0.7 -> 0.8 
migration is ongoing, you wont make many friends.


> Fine-grained locking in log append
> --
>
> Key: KAFKA-1933
> URL: https://issues.apache.org/jira/browse/KAFKA-1933
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Maxim Ivanov
>Assignee: Jay Kreps
>Priority: Minor
> Fix For: 0.8.2
>
> Attachments: KAFKA-1933.patch
>
>
> This patch adds finer locking when appending to log. It breaks
> global append lock into 2 sequential and 1 parallel phase.
> Basic idea is to allow every thread to "reserve" offsets in non
> overlapping ranges, then do compression in parallel and then
> "commit" write to log in the same order offsets where reserved.
> On my Core i3 M370 @2.4Ghz (2 cores + HT) it resulted in following 
> performance boost:
> LZ4: 7.2 sec -> 3.9 sec
> Gzip: 62.3 sec -> 24.8 sec
> Kafka was configured to run 4 IO threads, data was pushed using 5 netcat 
> instances pushing in parallel batches of 200 msg 6.2 kb each (510 MB in 
> total, 82180 messages in total)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)