Re: [DISCUSS] KIP-32 - Add CreateTime and LogAppendTime to Kafka message

2015-10-20 Thread Kartik Paramasivam
Joel or Becket will probably respond back in more detail.. but here are my
2c.

>From the standpoint of LinkedIN, the suggested proposal works.. in essence
max.appenddelay can be used to turn "creationTime" into "logAppendTime".
   this does mean that at LinkedIn we won't be able to use "creationTime"..
however that might also be fine because we anyways use the timeStamp that
is set inside the avro payload.

Keeping LI aside though, it looks like there are two distinct possible
goals.
1. The broker will retain messages for x days after a message shows up at
the broker.   This behavior would super deterministic and would never
change depending on the contents of the message or anything else.

2. The client is in "partial" control of how long a message stays in the
broker based on the creationTime stamped by the client.

Although (2) could be a feature in some scenarios..but in many scenarios it
can be pretty unintuitive and be perceived as an anti-feature.  For e.g say
a mobile client buffered up some messages because the device was offline
(maybe in a plane).. and then sent the message after say 23 hours on a
plane.  The message shows up in a Kafka topic with 24 hour retention.. and
now the message gets deleted in 1 hour.

Kartik


On Tue, Oct 13, 2015 at 12:32 AM, Jay Kreps  wrote:

> Here's my basic take:
> - I agree it would be nice to have a notion of time baked in if it were
> done right
> - All the proposals so far seem pretty complex--I think they might make
> things worse rather than better overall
> - I think adding 2x8 byte timestamps to the message is probably a
> non-starter from a size perspective
> - Even if it isn't in the message, having two notions of time that control
> different things is a bit confusing
> - The mechanics of basing retention etc on log append time when that's not
> in the log seem complicated
>
> To that end here is a possible 4th option. Let me know what you think.
>
> The basic idea is that the message creation time is closest to what the
> user actually cares about but is dangerous if set wrong. So rather than
> substitute another notion of time, let's try to ensure the correctness of
> message creation time by preventing arbitrarily bad message creation times.
>
> First, let's see if we can agree that log append time is not something
> anyone really cares about but rather an implementation detail. The
> timestamp that matters to the user is when the message occurred (the
> creation time). The log append time is basically just an approximation to
> this on the assumption that the message creation and the message receive on
> the server occur pretty close together and the reason to prefer .
>
> But as these values diverge the issue starts to become apparent. Say you
> set the retention to one week and then mirror data from a topic containing
> two years of retention. Your intention is clearly to keep the last week,
> but because the mirroring is appending right now you will keep two years.
>
> The reason we are liking log append time is because we are (justifiably)
> concerned that in certain situations the creation time may not be
> trustworthy. This same problem exists on the servers but there are fewer
> servers and they just run the kafka code so it is less of an issue.
>
> There are two possible ways to handle this:
>
>1. Just tell people to add size based retention. I think this is not
>entirely unreasonable, we're basically saying we retain data based on
> the
>timestamp you give us in the data. If you give us bad data we will
> retain
>it for a bad amount of time. If you want to ensure we don't retain "too
>much" data, define "too much" by setting a time-based retention setting.
>This is not entirely unreasonable but kind of suffers from a "one bad
>apple" problem in a very large environment.
>2. Prevent bad timestamps. In general we can't say a timestamp is bad.
>However the definition we're implicitly using is that we think there
> are a
>set of topics/clusters where the creation timestamp should always be
> "very
>close" to the log append timestamp. This is true for data sources that
> have
>no buffering capability (which at LinkedIn is very common, but is more
> rare
>elsewhere). The solution in this case would be to allow a setting along
> the
>lines of max.append.delay which checks the creation timestamp against
> the
>server time to look for too large a divergence. The solution would
> either
>be to reject the message or to override it with the server time.
>
> So in LI's environment you would configure the clusters used for direct,
> unbuffered, message production (e.g. tracking and metrics local) to enforce
> a reasonably aggressive timestamp bound (say 10 mins), and all other
> clusters would just inherent these.
>
> The downside of this approach is requiring the special configuration.
> However I think in 99% of environments this could be skipped entirely, it's
> only 

[GitHub] kafka pull request: KAFKA-2667: Fix transient error in KafkaBasedL...

2015-10-20 Thread ewencp
GitHub user ewencp opened a pull request:

https://github.com/apache/kafka/pull/333

KAFKA-2667: Fix transient error in KafkaBasedLogTest.

The test required a specific sequence of events for each Consumer.poll() 
call,
but the MockConsumer.waitForPollThen() method could not guarantee that,
resulting in race conditions. Add support for scheduling sequences of events
even when running in multi-threaded environments.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ewencp/kafka 
kafka-2667-kafka-based-log-transient-error

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/333.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #333


commit 988530556d60f68766e02d26998f3b86090fabb0
Author: Ewen Cheslack-Postava 
Date:   2015-10-20T07:49:47Z

KAFKA-2667: Fix transient error in KafkaBasedLogTest.

The test required a specific sequence of events for each Consumer.poll() 
call,
but the MockConsumer.waitForPollThen() method could not guarantee that,
resulting in race conditions. Add support for scheduling sequences of events
even when running in multi-threaded environments.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-32 - Add CreateTime and LogAppendTime to Kafka message

2015-10-20 Thread Joel Koshy
 I’m in favor of adding the create-time in the message (although some would
argue even that should really be an application-level header), but I don’t
think it should be mutable after it leaves the client and I think we should
avoid having the server use that for any server-side indexing. The
max.append.delay config helps, but I wouldn’t be surprised if it ends up
becoming a very confusing configuration.

I also agree with being thrifty with headers and that having both
create-time and log-append-time in the header is overkill which is what the
third option addresses. I’m not fully convinced that the implementation
details of basing retention on log append time (i.e., the third option) are
terribly complicated. Can you describe your concerns with the earlier
approach? While it is true that using log-append-time to drive retention
won’t handle the bootstrap case, it is a clear approach in that it makes no
promises about that scenario - i.e., retention/rolling/offset lookup will
all be based on arrival time at the server and not message creation
time. There are use-cases which don’t care about log append time, but for
those use-cases the create time (if we add it) will be available in each
message. It’s just that retention/rolling will be driven off log append
time.

Becket has already brought out some scenarios where the usage of
create-time in combination with max.append.delay may be ambiguous and
unintuitive. Here are some others: say if we use the
create-time-driven-index for retention; if a new segment gets created with
time t1 and a message arrives out of create-time order (t0 < t1). Then the
second message will be held hostage until t1 + retention so retention is
violated. I actually may be completely unclear on how retention should work
with create-time-driven-indexes and max.append.delay. Say we set
max.append.delay to something high (or infinity). Wouldn’t the user have to
set retention appropriately as well? Otherwise really old (by create-time)
messages that are say from a bootstrapping source would just get purged
shortly after arrival. So if max.append.delay is infinity it seems the
right retention setting is also infinity. Can you clarify how retention
should work if driven off an index that is built from create-time?

Also wrt the max.append.delay - allowing the server to override it would
render the create-time field pretty much untrustworthy right? The most
intuitive policy I can think of for create time is to make it immutable
after it has been set at the sender. Otherwise we would need to add some
dirty flag or a generation field in order to be able to distinguish between
messages with the true create-time and the ones that are not - but that
seems to be a hack that suggests it should be immutable in the first place.

Thanks,

Joel

On Tue, Oct 20, 2015 at 12:02 AM, Kartik Paramasivam <
kparamasi...@linkedin.com.invalid> wrote:

> Joel or Becket will probably respond back in more detail.. but here are my
> 2c.
>
> From the standpoint of LinkedIN, the suggested proposal works.. in essence
> max.appenddelay can be used to turn "creationTime" into "logAppendTime".
>this does mean that at LinkedIn we won't be able to use "creationTime"..
> however that might also be fine because we anyways use the timeStamp that
> is set inside the avro payload.
>
> Keeping LI aside though, it looks like there are two distinct possible
> goals.
> 1. The broker will retain messages for x days after a message shows up at
> the broker.   This behavior would super deterministic and would never
> change depending on the contents of the message or anything else.
>
> 2. The client is in "partial" control of how long a message stays in the
> broker based on the creationTime stamped by the client.
>
> Although (2) could be a feature in some scenarios..but in many scenarios it
> can be pretty unintuitive and be perceived as an anti-feature.  For e.g say
> a mobile client buffered up some messages because the device was offline
> (maybe in a plane).. and then sent the message after say 23 hours on a
> plane.  The message shows up in a Kafka topic with 24 hour retention.. and
> now the message gets deleted in 1 hour.
>
> Kartik
>
>
> On Tue, Oct 13, 2015 at 12:32 AM, Jay Kreps  wrote:
>
> > Here's my basic take:
> > - I agree it would be nice to have a notion of time baked in if it were
> > done right
> > - All the proposals so far seem pretty complex--I think they might make
> > things worse rather than better overall
> > - I think adding 2x8 byte timestamps to the message is probably a
> > non-starter from a size perspective
> > - Even if it isn't in the message, having two notions of time that
> control
> > different things is a bit confusing
> > - The mechanics of basing retention etc on log append time when that's
> not
> > in the log seem complicated
> >
> > To that end here is a possible 4th option. Let me know what you think.
> >
> > The basic idea is that the message creation time is 

[jira] [Updated] (KAFKA-2667) Copycat KafkaBasedLogTest.testSendAndReadToEnd transient failure

2015-10-20 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2667:
-
Component/s: copycat

> Copycat KafkaBasedLogTest.testSendAndReadToEnd transient failure
> 
>
> Key: KAFKA-2667
> URL: https://issues.apache.org/jira/browse/KAFKA-2667
> Project: Kafka
>  Issue Type: Sub-task
>  Components: copycat
>Reporter: Jason Gustafson
>Assignee: Ewen Cheslack-Postava
>
> Seen in recent builds:
> {code}
> org.apache.kafka.copycat.util.KafkaBasedLogTest > testSendAndReadToEnd FAILED
> java.lang.AssertionError
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertTrue(Assert.java:52)
> at 
> org.apache.kafka.copycat.util.KafkaBasedLogTest.testSendAndReadToEnd(KafkaBasedLogTest.java:335)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-1686; Implement SASL/Kerberos

2015-10-20 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/334

KAFKA-1686; Implement SASL/Kerberos

This PR implements SASL/Kerberos which was originally submitted by 
@harshach as https://github.com/apache/kafka/pull/191.

I've been submitting PRs to Harsha's branch with fixes and improvements and 
he has integrated all, but the most recent one. I'm creating this PR so that 
the Jenkins can run the tests on the branch (they pass locally).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka KAFKA-1686-V1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/334.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #334


commit 82737e5bb71f67271d90c059dede74935f8a5e56
Author: Sriharsha Chintalapani 
Date:   2015-08-31T23:07:15Z

KAFKA-1686. Implement SASL/Kerberos.

commit a3417d7f2c558c0082799b117a3c62c706ad519d
Author: Sriharsha Chintalapani 
Date:   2015-09-03T03:31:34Z

KAFKA-1686. Implement SASL/Kerberos.

commit 8f718ce6b03a9c86712dc8f960af2b739b8ed510
Author: Sriharsha Chintalapani 
Date:   2015-09-03T04:10:40Z

KAFKA-1686. Implement SASL/Kerberos.

commit aa928952305a31c5b6e2bac705d350f94c9f7501
Author: Sriharsha Chintalapani 
Date:   2015-09-03T13:48:47Z

Added licesense.

commit f178107b516af414162634fc7253cedd2a6a3bf5
Author: Sriharsha Chintalapani 
Date:   2015-09-03T13:57:57Z

KAFKA-1686. Implement SASL/Kerberos.

commit 71b6fdbc841cffd5279eb2044c4da69acc172626
Author: Sriharsha Chintalapani 
Date:   2015-10-03T23:09:23Z

Merge remote-tracking branch 'refs/remotes/origin/trunk' into KAFKA-1686-V1

commit 9d260c67472296d752f74bc04eefb1e95b6b9746
Author: Sriharsha Chintalapani 
Date:   2015-10-04T18:36:52Z

KAFKA-1686. Fixes after the merge.

commit 5723dd2a392a307cfd6484c1f3f7c32cc8891940
Author: Sriharsha Chintalapani 
Date:   2015-10-09T06:43:51Z

KAFKA-1686. Addressing comments.

commit 8cf30d0b3a0aefa08cb9d86d59f0f16d810d7481
Author: Ismael Juma 
Date:   2015-10-09T07:36:19Z

Merge remote-tracking branch 'apache/trunk' into KAFKA-1686-V1

* apache/trunk:
  KAFKA-2596: reject commits from unknown groups with positive generations
  MINOR: typing ProcessorDef
  KAFKA-2477: Fix a race condition between log append and fetch that causes 
OffsetOutOfRangeException.
  KAFKA-2428: Add sanity check in KafkaConsumer for the timeouts
  Kafka-2587:  Only notification handler will update the cache and all 
verifications will use waitUntilTrue.
  KAFKA-2419; Garbage collect unused sensors
  KAFKA-2534: Fixes and unit tests for SSLTransportLayer buffer overflow
  KAFKA-2476: Add Decimal, Date, and Timestamp logical types.
  KAFKA-2474: Add caching of JSON schema conversions to JsonConverter
  KAFKA-2482: Allow sink tasks to get their current assignment, as well as 
pause and resume topic partitions.
  KAFKA-2573: Mirror maker system test hangs and eventually fails
  KAFKA-2599: Fix Metadata.getClusterForCurrentTopics throws NPE
  TRIVIAL: remove TODO in KafkaConsumer after KAFKA-2120
  HOTFIX: Persistent store in ProcessorStateManagerTest
  KAFKA-2604; Remove `completeAll` and improve timeout passed to 
`Selector.poll` from `NetworkClient.poll`
  KAFKA-2601; ConsoleProducer tool shows stacktrace on invalid command 
parameters

commit 2596c4a668f7095f4cfce36b34504c50f4603631
Author: Ismael Juma 
Date:   2015-10-09T12:21:05Z

Remove unused code, fix formatting and minor javadoc tweaks

commit 2919bc3ae474b3e27ca5cb0c75e4cff0fee9ca93
Author: Ismael Juma 
Date:   2015-10-09T12:23:17Z

Fix bad merge in `TestUtils`

commit 9ed1a2635d97c290e42b723ce8db2bf60c1c6440
Author: Ismael Juma 
Date:   2015-10-09T12:23:46Z

Remove -XX:-MaxFDLimit from `gradle.properties`

commit 2d2fcecb7bda62519d36d4f71a955cf55c8bbd2a
Author: Ismael Juma 
Date:   2015-10-09T12:36:06Z

Support `SSLSASL` in `ChannelBuilders`, reduce duplication in `TestUtils` 
and clean-up `SaslTestHarness`

commit 6a13667232c2946ed92fdebcb467f27d6adf075f
Author: Harsha 
Date:   2015-10-09T14:16:30Z

Merge pull request #1 from ijuma/KAFKA-1686-V1

Merge trunk and a few improvements and fixes

commit 32ab6f468505edf10be686905019c4d202663f72
Author: Sriharsha Chintalapani 
Date:   2015-10-09T22:21:26Z

KAFKA-1686. Added SaslConsumerTest, fixed a bug in SecurityProtocol.

commit 58064b46a7ddbb7d2293e33c7b66c35f76043588

[jira] [Commented] (KAFKA-1686) Implement SASL/Kerberos

2015-10-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964807#comment-14964807
 ] 

ASF GitHub Bot commented on KAFKA-1686:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/334

KAFKA-1686; Implement SASL/Kerberos

This PR implements SASL/Kerberos which was originally submitted by 
@harshach as https://github.com/apache/kafka/pull/191.

I've been submitting PRs to Harsha's branch with fixes and improvements and 
he has integrated all, but the most recent one. I'm creating this PR so that 
the Jenkins can run the tests on the branch (they pass locally).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka KAFKA-1686-V1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/334.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #334


commit 82737e5bb71f67271d90c059dede74935f8a5e56
Author: Sriharsha Chintalapani 
Date:   2015-08-31T23:07:15Z

KAFKA-1686. Implement SASL/Kerberos.

commit a3417d7f2c558c0082799b117a3c62c706ad519d
Author: Sriharsha Chintalapani 
Date:   2015-09-03T03:31:34Z

KAFKA-1686. Implement SASL/Kerberos.

commit 8f718ce6b03a9c86712dc8f960af2b739b8ed510
Author: Sriharsha Chintalapani 
Date:   2015-09-03T04:10:40Z

KAFKA-1686. Implement SASL/Kerberos.

commit aa928952305a31c5b6e2bac705d350f94c9f7501
Author: Sriharsha Chintalapani 
Date:   2015-09-03T13:48:47Z

Added licesense.

commit f178107b516af414162634fc7253cedd2a6a3bf5
Author: Sriharsha Chintalapani 
Date:   2015-09-03T13:57:57Z

KAFKA-1686. Implement SASL/Kerberos.

commit 71b6fdbc841cffd5279eb2044c4da69acc172626
Author: Sriharsha Chintalapani 
Date:   2015-10-03T23:09:23Z

Merge remote-tracking branch 'refs/remotes/origin/trunk' into KAFKA-1686-V1

commit 9d260c67472296d752f74bc04eefb1e95b6b9746
Author: Sriharsha Chintalapani 
Date:   2015-10-04T18:36:52Z

KAFKA-1686. Fixes after the merge.

commit 5723dd2a392a307cfd6484c1f3f7c32cc8891940
Author: Sriharsha Chintalapani 
Date:   2015-10-09T06:43:51Z

KAFKA-1686. Addressing comments.

commit 8cf30d0b3a0aefa08cb9d86d59f0f16d810d7481
Author: Ismael Juma 
Date:   2015-10-09T07:36:19Z

Merge remote-tracking branch 'apache/trunk' into KAFKA-1686-V1

* apache/trunk:
  KAFKA-2596: reject commits from unknown groups with positive generations
  MINOR: typing ProcessorDef
  KAFKA-2477: Fix a race condition between log append and fetch that causes 
OffsetOutOfRangeException.
  KAFKA-2428: Add sanity check in KafkaConsumer for the timeouts
  Kafka-2587:  Only notification handler will update the cache and all 
verifications will use waitUntilTrue.
  KAFKA-2419; Garbage collect unused sensors
  KAFKA-2534: Fixes and unit tests for SSLTransportLayer buffer overflow
  KAFKA-2476: Add Decimal, Date, and Timestamp logical types.
  KAFKA-2474: Add caching of JSON schema conversions to JsonConverter
  KAFKA-2482: Allow sink tasks to get their current assignment, as well as 
pause and resume topic partitions.
  KAFKA-2573: Mirror maker system test hangs and eventually fails
  KAFKA-2599: Fix Metadata.getClusterForCurrentTopics throws NPE
  TRIVIAL: remove TODO in KafkaConsumer after KAFKA-2120
  HOTFIX: Persistent store in ProcessorStateManagerTest
  KAFKA-2604; Remove `completeAll` and improve timeout passed to 
`Selector.poll` from `NetworkClient.poll`
  KAFKA-2601; ConsoleProducer tool shows stacktrace on invalid command 
parameters

commit 2596c4a668f7095f4cfce36b34504c50f4603631
Author: Ismael Juma 
Date:   2015-10-09T12:21:05Z

Remove unused code, fix formatting and minor javadoc tweaks

commit 2919bc3ae474b3e27ca5cb0c75e4cff0fee9ca93
Author: Ismael Juma 
Date:   2015-10-09T12:23:17Z

Fix bad merge in `TestUtils`

commit 9ed1a2635d97c290e42b723ce8db2bf60c1c6440
Author: Ismael Juma 
Date:   2015-10-09T12:23:46Z

Remove -XX:-MaxFDLimit from `gradle.properties`

commit 2d2fcecb7bda62519d36d4f71a955cf55c8bbd2a
Author: Ismael Juma 
Date:   2015-10-09T12:36:06Z

Support `SSLSASL` in `ChannelBuilders`, reduce duplication in `TestUtils` 
and clean-up `SaslTestHarness`

commit 6a13667232c2946ed92fdebcb467f27d6adf075f
Author: Harsha 
Date:   2015-10-09T14:16:30Z

Merge pull request #1 from ijuma/KAFKA-1686-V1

Merge trunk and a few improvements and fixes

commit 

Jenkins build is back to normal : kafka_system_tests #113

2015-10-20 Thread ewen
See 



[jira] [Commented] (KAFKA-2472) Fix kafka ssl configs to not throw warnings

2015-10-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965052#comment-14965052
 ] 

Ismael Juma commented on KAFKA-2472:


I did a quick spike:

* Introduced a `Config` interface with the `get*` methods
* Changed `Configurable.configure` to take the `Config` interface
* Introduced `SimpleConfig` that just wraps a `Map` for tests and 
code that doesn't use a `ConfigDef`
* Adapted all the code so that it compiles
* Ran the tests

It looks promising. There were some test failures, I investigated one and it 
was due to a genuine bug in trunk (there is no `define` for 
KafkaConfig.SSLEndpointIdentificationAlgorithmProp in the broker).

Thoughts?


> Fix kafka ssl configs to not throw warnings
> ---
>
> Key: KAFKA-2472
> URL: https://issues.apache.org/jira/browse/KAFKA-2472
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriharsha Chintalapani
>Assignee: Ismael Juma
> Fix For: 0.9.0.0
>
>
> This is a follow-up fix on kafka-1690.
> [2015-08-25 18:20:48,236] WARN The configuration ssl.truststore.password = 
> striker was supplied but isn't a known config. 
> (org.apache.kafka.clients.producer.ProducerConfig)
> [2015-08-25 18:20:48,236] WARN The configuration security.protocol = SSL was 
> supplied but isn't a known config. 
> (org.apache.kafka.clients.producer.ProducerConfig)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-2628) KafkaOffsetBackingStoreTest.testGetSet transient test failure

2015-10-20 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava resolved KAFKA-2628.
--
   Resolution: Fixed
Fix Version/s: 0.9.0.0

Closing because this test has completely changed as of KAFKA-2372 because much 
of this code has been refactored into KafkaBasedLog and 
KafkaOffsetBackingStoreTest can be implemented more simply with plain mocks. 
Further, it appears to be a duplicate of KAFKA-2667 which now has a patch and I 
haven't seen it since KAFKA-2372.

> KafkaOffsetBackingStoreTest.testGetSet transient test failure
> -
>
> Key: KAFKA-2628
> URL: https://issues.apache.org/jira/browse/KAFKA-2628
> Project: Kafka
>  Issue Type: Bug
>  Components: copycat
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.9.0.0
>
>
> {quote}
> org.apache.kafka.copycat.storage.KafkaOffsetBackingStoreTest > testGetSet 
> FAILED
> java.lang.AssertionError
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertTrue(Assert.java:52)
> at 
> org.apache.kafka.copycat.storage.KafkaOffsetBackingStoreTest.testGetSet(KafkaOffsetBackingStoreTest.java:308)
> {quote}
> Haven't noticed this on Apache's Jenkins yet, but have seen it on 
> Confluent's. May be due to limited resources under some conditions, although 
> the timeout is already quite generous at 10s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2667) Copycat KafkaBasedLogTest.testSendAndReadToEnd transient failure

2015-10-20 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2667:
-
 Reviewer: Guozhang Wang
Fix Version/s: 0.9.0.0
   Status: Patch Available  (was: Open)

> Copycat KafkaBasedLogTest.testSendAndReadToEnd transient failure
> 
>
> Key: KAFKA-2667
> URL: https://issues.apache.org/jira/browse/KAFKA-2667
> Project: Kafka
>  Issue Type: Sub-task
>  Components: copycat
>Reporter: Jason Gustafson
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.9.0.0
>
>
> Seen in recent builds:
> {code}
> org.apache.kafka.copycat.util.KafkaBasedLogTest > testSendAndReadToEnd FAILED
> java.lang.AssertionError
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertTrue(Assert.java:52)
> at 
> org.apache.kafka.copycat.util.KafkaBasedLogTest.testSendAndReadToEnd(KafkaBasedLogTest.java:335)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2472) Fix kafka ssl configs to not throw warnings

2015-10-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964788#comment-14964788
 ] 

Ismael Juma commented on KAFKA-2472:


[~junrao], the reason why we have the warnings is that we are passing a 
Map to the various `configure` methods and we do that by calling 
`AbstractConfig.values()`. This means that the usage of the parameters is never 
recorded and we also have to cast instead of using the nicer `get{String, Int, 
...}` methods. And we also have `KafkaConfig.channelConfigs` where we must 
remember to add the relevant configs (which is error-prone).

Is there a reason why the `configure` methods can't accept an `AbstractConfig` 
instead of `Map`?

> Fix kafka ssl configs to not throw warnings
> ---
>
> Key: KAFKA-2472
> URL: https://issues.apache.org/jira/browse/KAFKA-2472
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriharsha Chintalapani
>Assignee: Ismael Juma
> Fix For: 0.9.0.0
>
>
> This is a follow-up fix on kafka-1690.
> [2015-08-25 18:20:48,236] WARN The configuration ssl.truststore.password = 
> striker was supplied but isn't a known config. 
> (org.apache.kafka.clients.producer.ProducerConfig)
> [2015-08-25 18:20:48,236] WARN The configuration security.protocol = SSL was 
> supplied but isn't a known config. 
> (org.apache.kafka.clients.producer.ProducerConfig)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-2667) Copycat KafkaBasedLogTest.testSendAndReadToEnd transient failure

2015-10-20 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava reassigned KAFKA-2667:


Assignee: Ewen Cheslack-Postava

> Copycat KafkaBasedLogTest.testSendAndReadToEnd transient failure
> 
>
> Key: KAFKA-2667
> URL: https://issues.apache.org/jira/browse/KAFKA-2667
> Project: Kafka
>  Issue Type: Sub-task
>  Components: copycat
>Reporter: Jason Gustafson
>Assignee: Ewen Cheslack-Postava
>
> Seen in recent builds:
> {code}
> org.apache.kafka.copycat.util.KafkaBasedLogTest > testSendAndReadToEnd FAILED
> java.lang.AssertionError
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertTrue(Assert.java:52)
> at 
> org.apache.kafka.copycat.util.KafkaBasedLogTest.testSendAndReadToEnd(KafkaBasedLogTest.java:335)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2667) Copycat KafkaBasedLogTest.testSendAndReadToEnd transient failure

2015-10-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964751#comment-14964751
 ] 

ASF GitHub Bot commented on KAFKA-2667:
---

GitHub user ewencp opened a pull request:

https://github.com/apache/kafka/pull/333

KAFKA-2667: Fix transient error in KafkaBasedLogTest.

The test required a specific sequence of events for each Consumer.poll() 
call,
but the MockConsumer.waitForPollThen() method could not guarantee that,
resulting in race conditions. Add support for scheduling sequences of events
even when running in multi-threaded environments.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ewencp/kafka 
kafka-2667-kafka-based-log-transient-error

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/333.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #333


commit 988530556d60f68766e02d26998f3b86090fabb0
Author: Ewen Cheslack-Postava 
Date:   2015-10-20T07:49:47Z

KAFKA-2667: Fix transient error in KafkaBasedLogTest.

The test required a specific sequence of events for each Consumer.poll() 
call,
but the MockConsumer.waitForPollThen() method could not guarantee that,
resulting in race conditions. Add support for scheduling sequences of events
even when running in multi-threaded environments.




> Copycat KafkaBasedLogTest.testSendAndReadToEnd transient failure
> 
>
> Key: KAFKA-2667
> URL: https://issues.apache.org/jira/browse/KAFKA-2667
> Project: Kafka
>  Issue Type: Sub-task
>  Components: copycat
>Reporter: Jason Gustafson
>Assignee: Ewen Cheslack-Postava
>
> Seen in recent builds:
> {code}
> org.apache.kafka.copycat.util.KafkaBasedLogTest > testSendAndReadToEnd FAILED
> java.lang.AssertionError
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertTrue(Assert.java:52)
> at 
> org.apache.kafka.copycat.util.KafkaBasedLogTest.testSendAndReadToEnd(KafkaBasedLogTest.java:335)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2472) Fix kafka ssl configs to not throw warnings

2015-10-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964788#comment-14964788
 ] 

Ismael Juma edited comment on KAFKA-2472 at 10/20/15 8:27 AM:
--

[~junrao], the reason why we have the warnings is that we are passing a 
Map to the various `configure` methods and we do that by calling 
`AbstractConfig.values()`. This means that the usage of the parameters is never 
recorded and we also have to cast instead of using the nicer getString, getInt, 
etc. methods. And we also have `KafkaConfig.channelConfigs` where we must 
remember to add the relevant configs (which is error-prone).

Is there a reason why the `configure` methods can't accept an `AbstractConfig` 
instead of `Map`?


was (Author: ijuma):
[~junrao], the reason why we have the warnings is that we are passing a 
Map to the various `configure` methods and we do that by calling 
`AbstractConfig.values()`. This means that the usage of the parameters is never 
recorded and we also have to cast instead of using the nicer `get{String, Int, 
...}` methods. And we also have `KafkaConfig.channelConfigs` where we must 
remember to add the relevant configs (which is error-prone).

Is there a reason why the `configure` methods can't accept an `AbstractConfig` 
instead of `Map`?

> Fix kafka ssl configs to not throw warnings
> ---
>
> Key: KAFKA-2472
> URL: https://issues.apache.org/jira/browse/KAFKA-2472
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriharsha Chintalapani
>Assignee: Ismael Juma
> Fix For: 0.9.0.0
>
>
> This is a follow-up fix on kafka-1690.
> [2015-08-25 18:20:48,236] WARN The configuration ssl.truststore.password = 
> striker was supplied but isn't a known config. 
> (org.apache.kafka.clients.producer.ProducerConfig)
> [2015-08-25 18:20:48,236] WARN The configuration security.protocol = SSL was 
> supplied but isn't a known config. 
> (org.apache.kafka.clients.producer.ProducerConfig)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-2617) Move protocol field default values to Protocol

2015-10-20 Thread Jakub Nowak (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakub Nowak reassigned KAFKA-2617:
--

Assignee: Jakub Nowak

> Move protocol field default values to Protocol
> --
>
> Key: KAFKA-2617
> URL: https://issues.apache.org/jira/browse/KAFKA-2617
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Jakub Nowak
>  Labels: newbie
> Fix For: 0.9.0.0
>
>
> Right now the default values are scattered in the Request / Response classes, 
> and some duplicates already exists like JoinGroupRequest.UNKNOWN_CONSUMER_ID 
> and OffsetCommitRequest.DEFAULT_CONSUMER_ID. We would like to move all 
> default values into org.apache.kafka.common.protocol.Protocol since 
> org.apache.kafka.common.requests depends on org.apache.kafka.common.protocol 
> anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[DISCUSS] KIP-39 Pinning controller to a broker

2015-10-20 Thread Abhishek Nigam
Hi,
Can we please discuss this KIP. The background for this is that it allows
us to pin controller to a broker. This is useful in a couple of scenarios:
a) If we want to do a rolling bounce we can reduce the number of controller
moves down to 1.
b) Again pick a designated broker and reduce the number of partitions on it
through admin reassign partitions and designate it as a controller.
c) Dynamically move controller if we see any problems on the broker which
it is running.

Here is the wiki page
https://cwiki.apache.org/confluence/display/KAFKA/KIP-39+Pinning+controller+to+broker

-Abhishek


[jira] [Comment Edited] (KAFKA-2672) SendFailedException when new consumer is run with SSL

2015-10-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965331#comment-14965331
 ] 

Ismael Juma edited comment on KAFKA-2672 at 10/20/15 4:22 PM:
--

[~junrao], that is right. As far as I can see, it does not:

{code}
private boolean trySend(long now) {
// send any requests that can be sent now
boolean requestsSent = false;
for (Map.Entry requestEntry: 
unsent.entrySet()) {
Node node = requestEntry.getKey();
Iterator iterator = 
requestEntry.getValue().iterator();
while (iterator.hasNext()) {
ClientRequest request = iterator.next();
if (client.ready(node, now)) {
client.send(request, now);
iterator.remove();
requestsSent = true;
} else if (client.connectionFailed(node)) {
RequestFutureCompletionHandler handler =
(RequestFutureCompletionHandler) request.callback();
handler.onComplete(new ClientResponse(request, now, true, 
null));
iterator.remove();
}
}
}
return requestsSent;
}
{code}

The log entry that Rajini quoted is triggered by an exception raised in 
`clearUnsentRequests`:

{code}
private void clearUnsentRequests(long now) {
// clear all unsent requests and fail their corresponding futures
for (Map.Entry requestEntry: 
unsent.entrySet()) {
Iterator iterator = 
requestEntry.getValue().iterator();
while (iterator.hasNext()) {
ClientRequest request = iterator.next();
RequestFutureCompletionHandler handler =
(RequestFutureCompletionHandler) request.callback();
handler.raise(SendFailedException.INSTANCE);
iterator.remove();
}
}
unsent.clear();
}
{code}

In my original comment, I showed that `clearUnsentRequests` is called before 
`ConsumerNetworkClient.poll` returns. I haven't investigated in detail, but it 
sounds like the handshake takes longer to complete than `poll` and hence the 
log output. I was hoping that Jason would confirm or clarify this. :)


was (Author: ijuma):
[~junrao], that is right. As far as I can see, it does not:

{code}
private boolean trySend(long now) {
// send any requests that can be sent now
boolean requestsSent = false;
for (Map.Entry requestEntry: 
unsent.entrySet()) {
Node node = requestEntry.getKey();
Iterator iterator = 
requestEntry.getValue().iterator();
while (iterator.hasNext()) {
ClientRequest request = iterator.next();
if (client.ready(node, now)) {
client.send(request, now);
iterator.remove();
requestsSent = true;
} else if (client.connectionFailed(node)) {
RequestFutureCompletionHandler handler =
(RequestFutureCompletionHandler) request.callback();
handler.onComplete(new ClientResponse(request, now, true, 
null));
iterator.remove();
}
}
}
return requestsSent;
}
{code}

The log entry that Rajini quoted happens in `clearUnsentRequests`:

{code}
private void clearUnsentRequests(long now) {
// clear all unsent requests and fail their corresponding futures
for (Map.Entry requestEntry: 
unsent.entrySet()) {
Iterator iterator = 
requestEntry.getValue().iterator();
while (iterator.hasNext()) {
ClientRequest request = iterator.next();
RequestFutureCompletionHandler handler =
(RequestFutureCompletionHandler) request.callback();
handler.raise(SendFailedException.INSTANCE);
iterator.remove();
}
}
unsent.clear();
}
{code}

In my original comment, I showed that `clearUnsentRequests` is called before 
`ConsumerNetworkClient.poll` returns. I haven't investigated in detail, but it 
sounds like the handshake takes longer to complete than `poll` and hence the 
log output. I was hoping that Jason would confirm or clarify this. :)

> SendFailedException when new consumer is run with SSL
> -
>
> Key: KAFKA-2672
> URL: https://issues.apache.org/jira/browse/KAFKA-2672
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Rajini Sivaram
>Assignee: Neha Narkhede
> Fix For: 0.9.0.0
>
>
> When running new consumer with SSL, debug 

[jira] [Commented] (KAFKA-2472) Fix kafka ssl configs to not throw warnings

2015-10-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965373#comment-14965373
 ] 

Ismael Juma commented on KAFKA-2472:


[~jkreps], good point regarding Partitioner and other public classes. That's a 
problem for the approach I suggested. Without default methods, it seems like we 
can't change `Configurable`.

We could introduce a new interface and use it in new classes and classes that 
are not public, but it's not great to have two interfaces for the same thing. 
An alternative is to pass to a special Map implementation to `configure` that 
records usage. It still means that we can't use the nicer `Config` API, but 
that is no worse than what we have today.

> Fix kafka ssl configs to not throw warnings
> ---
>
> Key: KAFKA-2472
> URL: https://issues.apache.org/jira/browse/KAFKA-2472
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriharsha Chintalapani
>Assignee: Ismael Juma
> Fix For: 0.9.0.0
>
>
> This is a follow-up fix on kafka-1690.
> [2015-08-25 18:20:48,236] WARN The configuration ssl.truststore.password = 
> striker was supplied but isn't a known config. 
> (org.apache.kafka.clients.producer.ProducerConfig)
> [2015-08-25 18:20:48,236] WARN The configuration security.protocol = SSL was 
> supplied but isn't a known config. 
> (org.apache.kafka.clients.producer.ProducerConfig)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2674) ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer close

2015-10-20 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965532#comment-14965532
 ] 

Jason Gustafson commented on KAFKA-2674:


[~onurkaraman] Maybe this issue can be addressed when we update the client-side 
code for the LeaveGroup request?

> ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer 
> close
> ---
>
> Key: KAFKA-2674
> URL: https://issues.apache.org/jira/browse/KAFKA-2674
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.0
>Reporter: Michal Turek
>Assignee: Neha Narkhede
>
> Hi, I'm investigating and testing behavior of new consumer from the planned 
> release 0.9 and found an inconsistency in calling of rebalance callbacks.
> I noticed that ConsumerRebalanceListener.onPartitionsRevoked() is NOT called 
> during consumer close and application shutdown. It's JavaDoc contract says:
> - "This method will be called before a rebalance operation starts and after 
> the consumer stops fetching data."
> - "It is recommended that offsets should be committed in this callback to 
> either Kafka or a custom offset store to prevent duplicate data."
> I believe calling consumer.close() is a start of rebalance operation and even 
> the local consumer that is actually closing should be notified to be able to 
> process any rebalance logic including offsets commit (e.g. if auto-commit is 
> disabled).
> There are commented logs of current and expected behaviors.
> {noformat}
> // Application start
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka version : 0.9.0.0-SNAPSHOT 
> (AppInfoParser.java:82)
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka commitId : 241b9ab58dcbde0c 
> (AppInfoParser.java:83)
> // Consumer started (the first one in group), rebalance callbacks are called 
> including empty onPartitionsRevoked()
> 2015-10-20 15:14:02.333 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:02.343 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:100)
> // Another consumer joined the group, rebalancing
> 2015-10-20 15:14:17.345 INFO  o.a.k.c.c.internals.Coordinator 
> [TestConsumer-worker-0]: Attempt to heart beat failed since the group is 
> rebalancing, try to re-join group. (Coordinator.java:714)
> 2015-10-20 15:14:17.346 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:17.349 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-3, testA-4, 
> testB-4, testA-3] (TestConsumer.java:100)
> // Consumer started closing, there SHOULD be onPartitionsRevoked() callback 
> to commit offsets like during standard rebalance, but it is missing
> 2015-10-20 15:14:39.280 INFO  c.a.e.kafka.newapi.TestConsumer [main]: 
> Closing instance (TestConsumer.java:42)
> 2015-10-20 15:14:40.264 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Worker thread stopped (TestConsumer.java:89)
> {noformat}
> Workaround is to call onPartitionsRevoked() explicitly and manually just 
> before calling consumer.close() but it seems dirty and error prone for me. It 
> can be simply forgotten be someone without such experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2671) Enable starting Kafka server with a Properties object

2015-10-20 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965456#comment-14965456
 ] 

Ashish K Singh commented on KAFKA-2671:
---

[~gwenshap] and [~jkreps] as long as there is an easy way to embed Kafka, my 
purpose, and many other use cases as mentioned by Jay, will be served. I will 
update the PR to address that.

> Enable starting Kafka server with a Properties object
> -
>
> Key: KAFKA-2671
> URL: https://issues.apache.org/jira/browse/KAFKA-2671
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Ashish K Singh
>Assignee: Ashish K Singh
>
> Kafka, as of now, can only be started with a properties file and override 
> params. It makes life easier for management applications to be able to start 
> Kafka with a properties object programatically.
> The changes required to enable this are minimal, just a tad bit of 
> refactoring of kafka.Kafka. The changes must maintain current behavior intact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-2664) Adding a new metric with several pre-existing metrics is very expensive

2015-10-20 Thread Aditya Auradkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Auradkar reassigned KAFKA-2664:
--

Assignee: Aditya Auradkar  (was: Onur Karaman)

> Adding a new metric with several pre-existing metrics is very expensive
> ---
>
> Key: KAFKA-2664
> URL: https://issues.apache.org/jira/browse/KAFKA-2664
> Project: Kafka
>  Issue Type: Bug
>Reporter: Joel Koshy
>Assignee: Aditya Auradkar
> Fix For: 0.9.0.1
>
>
> I know the summary sounds expected, but we recently ran into a socket server 
> request queue backup that I suspect was caused by a combination of improperly 
> implemented applications that reconnect with a different (random) client-id 
> each time; and the fact that for quotas we now register a new quota 
> metric-set for each client-id.
> So here is what happened: a broker went down and a handful of other brokers 
> starting seeing queue times go up significantly. This caused the request 
> queue to backup, which caused socket timeouts and a further deluge of 
> reconnects. The only way we could get out of this was to fire-wall the broker 
> and downgrade to a version without quotas (or I think it would have worked to 
> just restart the broker).
> My guess is that there were a ton of pre-existing client-id metrics. I don’t 
> know for sure but I’m basing that on the fact that there were several new 
> unique client-ids showing up in the public access logs and request local 
> times for fetches started going up inexplicably. (It would have been useful 
> to have a metric for the number of metrics.) So it turns out that in the 
> above scenario (with say 50k pre-existing client-ids), the avg local time for 
> fetch can go up to the order of 50-100ms (at least with tests on a linux box) 
> largely due to the time taken to create new metrics; and that’s because we 
> use a copy-on-write map underneath. If you have enough (say, hundreds) of 
> clients re-connecting at the same time with new client-id's, that can cause 
> the request queues to start backing up and the overall queuing system to 
> become unstable; and the line starts to spill out of the building.
> I think this is a fairly new scenario with quotas - i.e., I don’t think the 
> past per-X metrics (per-topic for e.g.,) creation rate would ever come this 
> close.
> To be clear, the clients are clearly doing the wrong thing but I think the 
> broker can and should protect itself adequately against such rogue scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Slow request log in Kafka

2015-10-20 Thread Aditya Auradkar
Fair points. Kafka doesn't really have slow queries. I was thinking about
this kind of log in response to a request processing slowdown we had during
an internal release.. it's unlikely a slow query log would have really
helped since it slowed down requests from all entities (see KAFKA-2664 for
more).

I suppose one example of a "slow" query is produce with "ack=-1" in case
the replicas aren't catching up. However, we do have other metrics to catch
this.

Aditya

On Thu, Oct 15, 2015 at 12:43 AM, Ewen Cheslack-Postava 
wrote:

> Kafka doesn't have the same type of queries that RDBMS systems have. What
> "slow queries" would we be trying to capture info about?
>
> -Ewen
>
> On Wed, Oct 14, 2015 at 4:27 PM, Gwen Shapira  wrote:
>
> > I had some experience with the feature in MySQL.
> >
> > Its main good use is to identify queries that are obviously bad (full
> scans
> > on OLTP system) and need optimization. You can't infer from it anything
> > about the system as a whole because it lacks context and information
> about
> > what the rest of the system was doing at the same time.
> >
> > I'd like to hear how you see yourself using it in Apache Kafka to better
> > understand its usefulness. Can you share some details about how you would
> > have used it in the recent issue you mentioned?
> >
> > What I see as helpful:
> > 1. Ability to enable/disable trace/debug level logging of request
> handling
> > for specific request types and clients without restarting the broker
> (i.e.
> > through JMX, protocol or ZK)
> > 2. Publish histograms of the existing request time metrics
> > 3. Capture detailed timing of a random sample of the requests and log it
> > (i.e sample metrics rather than avgs). Note that clients that send more
> > requests and longer requests are more likely to get sampled. I've found
> > this super useful in the past.
> >
> > Gwen
> >
> > On Wed, Oct 14, 2015 at 3:39 PM, Aditya Auradkar <
> > aaurad...@linkedin.com.invalid> wrote:
> >
> > > Hey everyone,
> > >
> > > We were recently discussing a small logging improvement for Kafka.
> > > Basically, add a request log for queries that took longer than a
> certain
> > > configurable time to execute. This can be quite useful for debugging
> > > purposes, in fact it would have proven handy while investigating a
> recent
> > > issue during one of our deployments at LinkedIn.
> > >
> > > There is also supported in several other projects. For example: MySQL
> and
> > > Postgres both have slow request logs.
> > > https://dev.mysql.com/doc/refman/5.0/en/slow-query-log.html
> > > https://wiki.postgresql.org/wiki/Logging_Difficult_Queries
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Aditya
> > >
> >
>
>
>
> --
> Thanks,
> Ewen
>


[jira] [Commented] (KAFKA-2672) SendFailedException when new consumer is run with SSL

2015-10-20 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965321#comment-14965321
 ] 

Jun Rao commented on KAFKA-2672:


[~ijuma], the consumer shouldn't send any requests until the SSL handshake 
completes, right?

> SendFailedException when new consumer is run with SSL
> -
>
> Key: KAFKA-2672
> URL: https://issues.apache.org/jira/browse/KAFKA-2672
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Rajini Sivaram
>Assignee: Neha Narkhede
> Fix For: 0.9.0.0
>
>
> When running new consumer with SSL, debug logs show these exceptions every 
> time:
> {quote}
> [2015-10-19 20:57:43,389] DEBUG Fetch failed 
> (org.apache.kafka.clients.consumer.internals.Fetcher)
> org.apache.kafka.clients.consumer.internals.SendFailedException 
> {quote}
> The exception occurs  because send is queued before SSL handshake is 
> complete. I am not sure if the exception is harmless, but it will be good to 
> avoid the exception either way since it feels like an exception that exists 
> to handle edge cases like buffer overflow rather than something in a normal 
> code path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2675) SASL/Kerberos follow-up

2015-10-20 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-2675:
--

 Summary: SASL/Kerberos follow-up
 Key: KAFKA-2675
 URL: https://issues.apache.org/jira/browse/KAFKA-2675
 Project: Kafka
  Issue Type: Sub-task
Reporter: Ismael Juma
Assignee: Ismael Juma
 Fix For: 0.9.0.0


This is a follow-up to KAFKA-1686. 

1. Decide on `serviceName` configuration: do we want to keep it in two places?
2. auth.to.local config name is a bit opaque, is there a better one?
3. Implement or remove SASL_KAFKA_SERVER_REALM config
4. Consider making Login's thread a daemon thread
5. Write test that shows authentication failure due to invalid user
6. Write test that shows authentication failure due to wrong password
7. Write test that shows authentication failure due ticket expiring



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2672) SendFailedException when new consumer is run with SSL

2015-10-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965331#comment-14965331
 ] 

Ismael Juma commented on KAFKA-2672:


[~junrao], that is right. As far as I can see, it does not:

{code}
private boolean trySend(long now) {
// send any requests that can be sent now
boolean requestsSent = false;
for (Map.Entry requestEntry: 
unsent.entrySet()) {
Node node = requestEntry.getKey();
Iterator iterator = 
requestEntry.getValue().iterator();
while (iterator.hasNext()) {
ClientRequest request = iterator.next();
if (client.ready(node, now)) {
client.send(request, now);
iterator.remove();
requestsSent = true;
} else if (client.connectionFailed(node)) {
RequestFutureCompletionHandler handler =
(RequestFutureCompletionHandler) request.callback();
handler.onComplete(new ClientResponse(request, now, true, 
null));
iterator.remove();
}
}
}
return requestsSent;
}
{code}

The log entry that Rajini quoted happens in `clearUnsentRequests`:

{code}
private void clearUnsentRequests(long now) {
// clear all unsent requests and fail their corresponding futures
for (Map.Entry requestEntry: 
unsent.entrySet()) {
Iterator iterator = 
requestEntry.getValue().iterator();
while (iterator.hasNext()) {
ClientRequest request = iterator.next();
RequestFutureCompletionHandler handler =
(RequestFutureCompletionHandler) request.callback();
handler.raise(SendFailedException.INSTANCE);
iterator.remove();
}
}
unsent.clear();
}
{code}

In my original comment, I showed that `clearUnsentRequests` is called before 
`ConsumerNetworkClient.poll` returns. I haven't investigated in detail, but it 
sounds like the handshake takes longer to complete than `poll` and hence the 
log output. I was hoping that Jason would confirm or clarify this. :)

> SendFailedException when new consumer is run with SSL
> -
>
> Key: KAFKA-2672
> URL: https://issues.apache.org/jira/browse/KAFKA-2672
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Rajini Sivaram
>Assignee: Neha Narkhede
> Fix For: 0.9.0.0
>
>
> When running new consumer with SSL, debug logs show these exceptions every 
> time:
> {quote}
> [2015-10-19 20:57:43,389] DEBUG Fetch failed 
> (org.apache.kafka.clients.consumer.internals.Fetcher)
> org.apache.kafka.clients.consumer.internals.SendFailedException 
> {quote}
> The exception occurs  because send is queued before SSL handshake is 
> complete. I am not sure if the exception is harmless, but it will be good to 
> avoid the exception either way since it feels like an exception that exists 
> to handle edge cases like buffer overflow rather than something in a normal 
> code path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-36 - Rack aware replica assignment

2015-10-20 Thread Aditya Auradkar
Hey Allen,

1. If we choose fail fast topic creation, we will have topic creation
failures while upgrading the cluster. I really doubt we want this behavior.
Ideally, this should be invisible to clients of a cluster. Currently, each
broker is effectively its own rack. So we probably can use the rack
information whenever possible but not make it a hard requirement. To extend
Gwen's example, one badly configured broker should not degrade topic
creation for the entire cluster.

2. Upgrade scenario - Can you add a section on the upgrade piece to confirm
that old clients will not see errors? I believe ZookeeperConsumerConnector
reads the Broker objects from ZK. I wanted to confirm that this will not
cause any problems.

3. Could you elaborate your proposed changes to the UpdateMetadataRequest
in the "Public Interfaces" section? Personally, I find this format easy to
read in terms of wire protocol changes:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest

Aditya

On Fri, Oct 16, 2015 at 3:45 PM, Allen Wang  wrote:

> KIP is updated include rack as an optional property for broker. Please take
> a look and let me know if more details are needed.
>
> For the case where some brokers have rack and some do not, the current KIP
> uses the fail-fast behavior. If there are concerns, we can further discuss
> this in the email thread or next hangout.
>
>
>
> On Thu, Oct 15, 2015 at 10:42 AM, Allen Wang  wrote:
>
> > That's a good question. I can think of three actions if the rack
> > information is incomplete:
> >
> > 1. Treat the node without rack as if it is on its unique rack
> > 2. Disregard all rack information and fallback to current algorithm
> > 3. Fail-fast
> >
> > Now I think about it, one and three make more sense. The reason for
> > fail-fast is that user mistake for not providing the rack may never be
> > found if we tolerate that and the assignment may not be rack aware as the
> > user has expected and this creates debug problems when things fail.
> >
> > What do you think? If not fail-fast, is there anyway we can make the user
> > error standing out?
> >
> >
> > On Thu, Oct 15, 2015 at 10:17 AM, Gwen Shapira 
> wrote:
> >
> >> Thanks! Just to clarify, when some brokers have rack assignment and some
> >> don't, do we act like none of them have it? or like those without
> >> assignment are in their own rack?
> >>
> >> The first scenario is good when first setting up rack-awareness, but the
> >> second makes more sense for on-going maintenance (I can totally see
> >> someone
> >> adding a node and forgetting to set the rack property, we don't want
> this
> >> to change behavior for anything except the new node).
> >>
> >> What do you think?
> >>
> >> Gwen
> >>
> >> On Thu, Oct 15, 2015 at 10:13 AM, Allen Wang 
> >> wrote:
> >>
> >> > For scenario 1:
> >> >
> >> > - Add the rack information to broker property file or dynamically set
> >> it in
> >> > the wrapper code to bootstrap Kafka server. You would do that for all
> >> > brokers and restart the brokers one by one.
> >> >
> >> > In this scenario, the complete broker to rack mapping may not be
> >> available
> >> > until every broker is restarted. During that time we fall back to
> >> default
> >> > replica assignment algorithm.
> >> >
> >> > For scenario 2:
> >> >
> >> > - Add the rack information to broker property file or dynamically set
> >> it in
> >> > the wrapper code and start the broker.
> >> >
> >> >
> >> > On Wed, Oct 14, 2015 at 2:36 PM, Gwen Shapira 
> >> wrote:
> >> >
> >> > > Can you clarify the workflow for the following scenarios:
> >> > >
> >> > > 1. I currently have 6 brokers and want to add rack information for
> >> each
> >> > > 2. I'm adding a new broker and I want to specify which rack it
> >> belongs on
> >> > > while adding it.
> >> > >
> >> > > Thanks!
> >> > >
> >> > > On Tue, Oct 13, 2015 at 2:21 PM, Allen Wang 
> >> > wrote:
> >> > >
> >> > > > We discussed the KIP in the hangout today. The recommendation is
> to
> >> > make
> >> > > > rack as a broker property in ZooKeeper. For users with existing
> rack
> >> > > > information stored somewhere, they would need to retrieve the
> >> > information
> >> > > > at broker start up and dynamically set the rack property, which
> can
> >> be
> >> > > > implemented as a wrapper to bootstrap broker. There will be no
> >> > interface
> >> > > or
> >> > > > pluggable implementation to retrieve the rack information.
> >> > > >
> >> > > > The assumption is that you always need to restart the broker to
> >> make a
> >> > > > change to the rack.
> >> > > >
> >> > > > Once the rack becomes a broker property, it will be possible to
> make
> >> > rack
> >> > > > part of the meta data to help the consumer choose which in sync
> >> replica
> 

[jira] [Commented] (KAFKA-2472) Fix kafka ssl configs to not throw warnings

2015-10-20 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965348#comment-14965348
 ] 

Jay Kreps commented on KAFKA-2472:
--

[~gwenshap] This should work with plugins too. The config helper code records 
which configs are requested, so even a config specific to a plugin like a 
serializer should still get requested by the serializer.

[~ijuma] The only thing to be careful of is that if we are changing the 
Configurable interface that that doesn't result in changing any public 
interfaces people have implemented such as Partitioner.

> Fix kafka ssl configs to not throw warnings
> ---
>
> Key: KAFKA-2472
> URL: https://issues.apache.org/jira/browse/KAFKA-2472
> Project: Kafka
>  Issue Type: Bug
>Reporter: Sriharsha Chintalapani
>Assignee: Ismael Juma
> Fix For: 0.9.0.0
>
>
> This is a follow-up fix on kafka-1690.
> [2015-08-25 18:20:48,236] WARN The configuration ssl.truststore.password = 
> striker was supplied but isn't a known config. 
> (org.apache.kafka.clients.producer.ProducerConfig)
> [2015-08-25 18:20:48,236] WARN The configuration security.protocol = SSL was 
> supplied but isn't a known config. 
> (org.apache.kafka.clients.producer.ProducerConfig)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-2657; Kafka clients fail to start if one...

2015-10-20 Thread apakulov
GitHub user apakulov opened a pull request:

https://github.com/apache/kafka/pull/336

KAFKA-2657; Kafka clients fail to start if one of broker isn't resolved by 
DNS



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apakulov/kafka KAFKA-2657

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/336.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #336


commit d1a54ed3d9d13ce640a521ca21b2141e7c1c9ab1
Author: Alexander Pakulov 
Date:   2015-10-20T19:43:22Z

KAFKA-2657; Kafka clients fail to start if one of broker isn't resolved by 
DNS




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2657) Kafka clients fail to start if one of broker isn't resolved by DNS

2015-10-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965646#comment-14965646
 ] 

ASF GitHub Bot commented on KAFKA-2657:
---

GitHub user apakulov opened a pull request:

https://github.com/apache/kafka/pull/336

KAFKA-2657; Kafka clients fail to start if one of broker isn't resolved by 
DNS



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apakulov/kafka KAFKA-2657

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/336.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #336


commit d1a54ed3d9d13ce640a521ca21b2141e7c1c9ab1
Author: Alexander Pakulov 
Date:   2015-10-20T19:43:22Z

KAFKA-2657; Kafka clients fail to start if one of broker isn't resolved by 
DNS




> Kafka clients fail to start if one of broker isn't resolved by DNS 
> ---
>
> Key: KAFKA-2657
> URL: https://issues.apache.org/jira/browse/KAFKA-2657
> Project: Kafka
>  Issue Type: Bug
>Reporter: Alexander Pakulov
>Priority: Minor
>
> During org.apache.kafka.clients.producer.KafkaProducer and 
> org.apache.kafka.clients.consumer.KafkaConsumer object creation constructors 
> invoke org.apache.kafka.common.utils.ClientUtils#parseAndValidateAddresses
> which potentially could throw an exception if one the nodes hasn't been 
> resolved by DNS. As a result of that - object hasn't been created and you 
> aren't able to use Kafka clients.
> I personally think that Kafka should be able to operate with cluster with 
> quorum number of instances.
> {code:java}
> try {
> InetSocketAddress address = new InetSocketAddress(host, port);
> if (address.isUnresolved())
> throw new ConfigException("DNS resolution failed for url in " + 
> ProducerConfig.BOOTSTRAP_SERVERS_CONFIG + ": " + url);
> addresses.add(address);
> } catch (NumberFormatException e) {
> throw new ConfigException("Invalid port in " + 
> ProducerConfig.BOOTSTRAP_SERVERS_CONFIG + ": " + url);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2676) AclCommandTest has wrong package name

2015-10-20 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-2676:
--

 Summary: AclCommandTest has wrong package name 
 Key: KAFKA-2676
 URL: https://issues.apache.org/jira/browse/KAFKA-2676
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.9.0.0
Reporter: Jun Rao


AclCommandTest has the package unit.kafka.security.auth. We should remove the 
unit par. ResourceTypeTest has the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2645) Document potentially breaking changes in the release notes for 0.9.0

2015-10-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965743#comment-14965743
 ] 

ASF GitHub Bot commented on KAFKA-2645:
---

GitHub user granthenke opened a pull request:

https://github.com/apache/kafka/pull/337

KAFKA-2645: Document potentially breaking changes in the release note…

…s for 0.9.0

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/granthenke/kafka docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/337.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #337


commit bf4b435009583120b54369b454da49ab3fbaad8d
Author: Grant Henke 
Date:   2015-10-20T20:53:18Z

KAFKA-2645: Document potentially breaking changes in the release notes for 
0.9.0




> Document potentially breaking changes in the release notes for 0.9.0
> 
>
> Key: KAFKA-2645
> URL: https://issues.apache.org/jira/browse/KAFKA-2645
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Grant Henke
> Fix For: 0.9.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2677) Coordinator disconnects not propagated to new consumer

2015-10-20 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-2677:
--

 Summary: Coordinator disconnects not propagated to new consumer
 Key: KAFKA-2677
 URL: https://issues.apache.org/jira/browse/KAFKA-2677
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


Currently, disconnects by the coordinator are not always seen by the consumer. 
This can result in a long delay after the old coordinator has shutdown or 
failed before the consumer knows that it needs to find the new coordinator. The 
NetworkClient makes socket disconnects available to users in two ways:

1. through a flag in the ClientResponse object for requests pending when the 
disconnect occurred, and 
2. through the connectionFailed() method. 

The first method clearly cannot be depended on since it only helps when a 
request is pending, which is relatively rare for the connection with the 
coordinator. Instead, we can probably use the second method with a little 
rework of ConsumerNetworkClient to check for failed connections immediately 
after returning from poll(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2617) Move protocol field default values to Protocol

2015-10-20 Thread Jakub Nowak (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakub Nowak updated KAFKA-2617:
---
Status: Patch Available  (was: In Progress)

> Move protocol field default values to Protocol
> --
>
> Key: KAFKA-2617
> URL: https://issues.apache.org/jira/browse/KAFKA-2617
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Jakub Nowak
>  Labels: newbie
> Fix For: 0.9.0.0
>
>
> Right now the default values are scattered in the Request / Response classes, 
> and some duplicates already exists like JoinGroupRequest.UNKNOWN_CONSUMER_ID 
> and OffsetCommitRequest.DEFAULT_CONSUMER_ID. We would like to move all 
> default values into org.apache.kafka.common.protocol.Protocol since 
> org.apache.kafka.common.requests depends on org.apache.kafka.common.protocol 
> anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-2672) SendFailedException when new consumer is run with SSL

2015-10-20 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson reassigned KAFKA-2672:
--

Assignee: Jason Gustafson  (was: Neha Narkhede)

> SendFailedException when new consumer is run with SSL
> -
>
> Key: KAFKA-2672
> URL: https://issues.apache.org/jira/browse/KAFKA-2672
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Rajini Sivaram
>Assignee: Jason Gustafson
> Fix For: 0.9.0.0
>
>
> When running new consumer with SSL, debug logs show these exceptions every 
> time:
> {quote}
> [2015-10-19 20:57:43,389] DEBUG Fetch failed 
> (org.apache.kafka.clients.consumer.internals.Fetcher)
> org.apache.kafka.clients.consumer.internals.SendFailedException 
> {quote}
> The exception occurs  because send is queued before SSL handshake is 
> complete. I am not sure if the exception is harmless, but it will be good to 
> avoid the exception either way since it feels like an exception that exists 
> to handle edge cases like buffer overflow rather than something in a normal 
> code path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-39 Pinning controller to a broker

2015-10-20 Thread Neha Narkhede
Agree with Jay on staying away from pinning roles to brokers. This is
actually harder to operate and monitor.

Regarding the problems you mentioned-
1. Reducing the controller moves during rolling bounce is useful but really
something that should be handled by the tooling. The root cause is that
currently the controller move is expensive. I think we'd be better off
investing time and effort in thinning out the controller. Just moving to
the batch write APIs in ZooKeeper will make a huge difference.
2. I'm not sure I understood the motivation behind moving partitions out of
the controller broker. That seems like a proposal for a solution, but can
you describe the problems you saw that affected controller functionality?

Regarding the location of the controller, it seems there are 2 things you
are suggesting:
1. Optimizing the strategy of picking a broker as the controller (e.g.
least loaded node)
2. Moving the controller if a broker soft fails.

I don't think #1 is worth the effort involved. The better way of addressing
it is to make the controller thinner and faster. #2 is interesting since
the problem is that while a broker fails, all state changes fail or are
queued up which globally impacts the cluster. There are 2 alternatives -
have a tool that allows you to move the controller or just kill the broker
so the controller moves. I prefer the latter since it is simple and also
because a misbehaving broker is better off shutdown anyway.

Having said that, it will be helpful to know details of the problems you
saw while operating the controller. I think understanding those will help
guide the solution better.

On Tue, Oct 20, 2015 at 12:49 PM, Jay Kreps  wrote:

> This seems like a step backwards--we really don't want people to manually
> manage the location of the controller and try to manually balance
> partitions off that broker.
>
> I think it might make sense to consider directly fixing the things you
> actual want to fix:
> 1. Two many controller moves--we could either just make this cheaper or
> make the controller location more deterministic e.g. having the election
> prefer the node with the smallest node id so there were fewer failovers in
> rolling bounces.
> 2. You seem to think having the controller on a normal node is a problem.
> Can you elaborate on what the negative consequences you've observed? Let's
> focus on fixing those.
>
> In general we've worked very hard to avoid having a bunch of dedicated
> roles for different nodes and I would be very very loath to see us move
> away from that philosophy. I have a fair amount of experience with both
> homogenous systems that have a single role and also systems with many
> differentiated roles and I really think that the differentiated approach
> causes more problems than it solves for most deployments due to the added
> complexity.
>
> I think we could also fix up this KIP a bit. For example it says there are
> no public interfaces involved but surely there are new admin commands to
> control the location? There are also some minor things like listing it as
> released in 0.8.3 that seem wrong.
>
> -Jay
>
> On Tue, Oct 20, 2015 at 12:18 PM, Abhishek Nigam <
> ani...@linkedin.com.invalid> wrote:
>
> > Hi,
> > Can we please discuss this KIP. The background for this is that it allows
> > us to pin controller to a broker. This is useful in a couple of
> scenarios:
> > a) If we want to do a rolling bounce we can reduce the number of
> controller
> > moves down to 1.
> > b) Again pick a designated broker and reduce the number of partitions on
> it
> > through admin reassign partitions and designate it as a controller.
> > c) Dynamically move controller if we see any problems on the broker which
> > it is running.
> >
> > Here is the wiki page
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-39+Pinning+controller+to+broker
> >
> > -Abhishek
> >
>



-- 
Thanks,
Neha


[GitHub] kafka pull request: KAFKA-1686; Implement SASL/Kerberos

2015-10-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/334


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: KAFKA-2645: Document potentially breaking chan...

2015-10-20 Thread granthenke
GitHub user granthenke opened a pull request:

https://github.com/apache/kafka/pull/337

KAFKA-2645: Document potentially breaking changes in the release note…

…s for 0.9.0

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/granthenke/kafka docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/337.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #337


commit bf4b435009583120b54369b454da49ab3fbaad8d
Author: Grant Henke 
Date:   2015-10-20T20:53:18Z

KAFKA-2645: Document potentially breaking changes in the release notes for 
0.9.0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-1686) Implement SASL/Kerberos

2015-10-20 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-1686.

Resolution: Fixed

Issue resolved by pull request 334
[https://github.com/apache/kafka/pull/334]

> Implement SASL/Kerberos
> ---
>
> Key: KAFKA-1686
> URL: https://issues.apache.org/jira/browse/KAFKA-1686
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 0.8.2.1
>Reporter: Jay Kreps
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.9.0.0
>
>
> Implement SASL/Kerberos authentication.
> To do this we will need to introduce a new SASLRequest and SASLResponse pair 
> to the client protocol. This request and response will each have only a 
> single byte[] field and will be used to handle the SASL challenge/response 
> cycle. Doing this will initialize the SaslServer instance and associate it 
> with the session in a manner similar to KAFKA-1684.
> When using integrity or encryption mechanisms with SASL we will need to wrap 
> and unwrap bytes as in KAFKA-1684 so the same interface that covers the 
> SSLEngine will need to also cover the SaslServer instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1686) Implement SASL/Kerberos

2015-10-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965776#comment-14965776
 ] 

ASF GitHub Bot commented on KAFKA-1686:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/334


> Implement SASL/Kerberos
> ---
>
> Key: KAFKA-1686
> URL: https://issues.apache.org/jira/browse/KAFKA-1686
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 0.8.2.1
>Reporter: Jay Kreps
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.9.0.0
>
>
> Implement SASL/Kerberos authentication.
> To do this we will need to introduce a new SASLRequest and SASLResponse pair 
> to the client protocol. This request and response will each have only a 
> single byte[] field and will be used to handle the SASL challenge/response 
> cycle. Doing this will initialize the SaslServer instance and associate it 
> with the session in a manner similar to KAFKA-1684.
> When using integrity or encryption mechanisms with SASL we will need to wrap 
> and unwrap bytes as in KAFKA-1684 so the same interface that covers the 
> SSLEngine will need to also cover the SaslServer instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1686) Implement SASL/Kerberos

2015-10-20 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965789#comment-14965789
 ] 

Jun Rao commented on KAFKA-1686:


[~sriharsha], thanks a lot of the patch. I committed the sasl patch using the 
PR (#334) from Ismael. Please take a look and see if you see any issues. We 
have filed KAFKA-2675 to address any followup items.

> Implement SASL/Kerberos
> ---
>
> Key: KAFKA-1686
> URL: https://issues.apache.org/jira/browse/KAFKA-1686
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 0.8.2.1
>Reporter: Jay Kreps
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.9.0.0
>
>
> Implement SASL/Kerberos authentication.
> To do this we will need to introduce a new SASLRequest and SASLResponse pair 
> to the client protocol. This request and response will each have only a 
> single byte[] field and will be used to handle the SASL challenge/response 
> cycle. Doing this will initialize the SaslServer instance and associate it 
> with the session in a manner similar to KAFKA-1684.
> When using integrity or encryption mechanisms with SASL we will need to wrap 
> and unwrap bytes as in KAFKA-1684 so the same interface that covers the 
> SSLEngine will need to also cover the SaslServer instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2674) ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer close

2015-10-20 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965782#comment-14965782
 ] 

Jiangjie Qin commented on KAFKA-2674:
-

[~hachikuji] I am reviewing KAFKA-2464 and also noticed this. I am actually 
wondering if rebalance listener should be invoked when consumer shuts down. It 
seems to me that when consumer shuts down, it should simply commit offsets then 
send a leave group request. It is not actually participating the rebalance any 
more. Only the rest of the members in the group are doing rebalance.

> ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer 
> close
> ---
>
> Key: KAFKA-2674
> URL: https://issues.apache.org/jira/browse/KAFKA-2674
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.0
>Reporter: Michal Turek
>Assignee: Neha Narkhede
>
> Hi, I'm investigating and testing behavior of new consumer from the planned 
> release 0.9 and found an inconsistency in calling of rebalance callbacks.
> I noticed that ConsumerRebalanceListener.onPartitionsRevoked() is NOT called 
> during consumer close and application shutdown. It's JavaDoc contract says:
> - "This method will be called before a rebalance operation starts and after 
> the consumer stops fetching data."
> - "It is recommended that offsets should be committed in this callback to 
> either Kafka or a custom offset store to prevent duplicate data."
> I believe calling consumer.close() is a start of rebalance operation and even 
> the local consumer that is actually closing should be notified to be able to 
> process any rebalance logic including offsets commit (e.g. if auto-commit is 
> disabled).
> There are commented logs of current and expected behaviors.
> {noformat}
> // Application start
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka version : 0.9.0.0-SNAPSHOT 
> (AppInfoParser.java:82)
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka commitId : 241b9ab58dcbde0c 
> (AppInfoParser.java:83)
> // Consumer started (the first one in group), rebalance callbacks are called 
> including empty onPartitionsRevoked()
> 2015-10-20 15:14:02.333 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:02.343 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:100)
> // Another consumer joined the group, rebalancing
> 2015-10-20 15:14:17.345 INFO  o.a.k.c.c.internals.Coordinator 
> [TestConsumer-worker-0]: Attempt to heart beat failed since the group is 
> rebalancing, try to re-join group. (Coordinator.java:714)
> 2015-10-20 15:14:17.346 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:17.349 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-3, testA-4, 
> testB-4, testA-3] (TestConsumer.java:100)
> // Consumer started closing, there SHOULD be onPartitionsRevoked() callback 
> to commit offsets like during standard rebalance, but it is missing
> 2015-10-20 15:14:39.280 INFO  c.a.e.kafka.newapi.TestConsumer [main]: 
> Closing instance (TestConsumer.java:42)
> 2015-10-20 15:14:40.264 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Worker thread stopped (TestConsumer.java:89)
> {noformat}
> Workaround is to call onPartitionsRevoked() explicitly and manually just 
> before calling consumer.close() but it seems dirty and error prone for me. It 
> can be simply forgotten be someone without such experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2617) Move protocol field default values to Protocol

2015-10-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965809#comment-14965809
 ] 

ASF GitHub Bot commented on KAFKA-2617:
---

GitHub user Mszak opened a pull request:

https://github.com/apache/kafka/pull/338

KAFKA-2617: Move protocol field default values to Protocol.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Mszak/kafka kafka-2617-move-default-values

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/338.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #338


commit a99c5dff41aa548303c4f6dec586cb22e3b70a59
Author: Jakub Nowak 
Date:   2015-10-20T21:11:30Z

Move protocol field default values to Protocol.




> Move protocol field default values to Protocol
> --
>
> Key: KAFKA-2617
> URL: https://issues.apache.org/jira/browse/KAFKA-2617
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Jakub Nowak
>  Labels: newbie
> Fix For: 0.9.0.0
>
>
> Right now the default values are scattered in the Request / Response classes, 
> and some duplicates already exists like JoinGroupRequest.UNKNOWN_CONSUMER_ID 
> and OffsetCommitRequest.DEFAULT_CONSUMER_ID. We would like to move all 
> default values into org.apache.kafka.common.protocol.Protocol since 
> org.apache.kafka.common.requests depends on org.apache.kafka.common.protocol 
> anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-2617: Move protocol field default values...

2015-10-20 Thread Mszak
GitHub user Mszak opened a pull request:

https://github.com/apache/kafka/pull/338

KAFKA-2617: Move protocol field default values to Protocol.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Mszak/kafka kafka-2617-move-default-values

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/338.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #338


commit a99c5dff41aa548303c4f6dec586cb22e3b70a59
Author: Jakub Nowak 
Date:   2015-10-20T21:11:30Z

Move protocol field default values to Protocol.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Work started] (KAFKA-2617) Move protocol field default values to Protocol

2015-10-20 Thread Jakub Nowak (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-2617 started by Jakub Nowak.
--
> Move protocol field default values to Protocol
> --
>
> Key: KAFKA-2617
> URL: https://issues.apache.org/jira/browse/KAFKA-2617
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Jakub Nowak
>  Labels: newbie
> Fix For: 0.9.0.0
>
>
> Right now the default values are scattered in the Request / Response classes, 
> and some duplicates already exists like JoinGroupRequest.UNKNOWN_CONSUMER_ID 
> and OffsetCommitRequest.DEFAULT_CONSUMER_ID. We would like to move all 
> default values into org.apache.kafka.common.protocol.Protocol since 
> org.apache.kafka.common.requests depends on org.apache.kafka.common.protocol 
> anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Kafka KIP meeting Oct 20 at 11:00am PST

2015-10-20 Thread Jun Rao
The following are the notes from today's KIP discussion.

* KIP-38: No concerns with this KIP. Flavio will initiate the voting on
this.
* KIP-37: There are questions on how ACL, configurations, etc will work,
and whether we should support "move" or not. We will discuss the details
more in the mailing list.
* KIP-32/KIP-33: Jiangjie raised some concerns on the approach that Jay
proposed. Guozhang and Jay will follow up on the mailing list.

The video will be uploaded soon in
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
 .

Thanks,

Jun

On Mon, Oct 19, 2015 at 2:26 PM, Jun Rao  wrote:

> Hi, Everyone,
>
> We will have a Kafka KIP meeting tomorrow at 11:00am PST. If you plan to
> attend but haven't received an invite, please let me know. The following is
> the agenda.
>
> Agenda:
> 1. KIP-38: Zookeeper authentication
> 2. KIP-37: Add namespaces in Kafka
>
> Thanks,
>
> Jun
>


Re: [DISCUSS] KIP-39 Pinning controller to a broker

2015-10-20 Thread Jay Kreps
This seems like a step backwards--we really don't want people to manually
manage the location of the controller and try to manually balance
partitions off that broker.

I think it might make sense to consider directly fixing the things you
actual want to fix:
1. Two many controller moves--we could either just make this cheaper or
make the controller location more deterministic e.g. having the election
prefer the node with the smallest node id so there were fewer failovers in
rolling bounces.
2. You seem to think having the controller on a normal node is a problem.
Can you elaborate on what the negative consequences you've observed? Let's
focus on fixing those.

In general we've worked very hard to avoid having a bunch of dedicated
roles for different nodes and I would be very very loath to see us move
away from that philosophy. I have a fair amount of experience with both
homogenous systems that have a single role and also systems with many
differentiated roles and I really think that the differentiated approach
causes more problems than it solves for most deployments due to the added
complexity.

I think we could also fix up this KIP a bit. For example it says there are
no public interfaces involved but surely there are new admin commands to
control the location? There are also some minor things like listing it as
released in 0.8.3 that seem wrong.

-Jay

On Tue, Oct 20, 2015 at 12:18 PM, Abhishek Nigam <
ani...@linkedin.com.invalid> wrote:

> Hi,
> Can we please discuss this KIP. The background for this is that it allows
> us to pin controller to a broker. This is useful in a couple of scenarios:
> a) If we want to do a rolling bounce we can reduce the number of controller
> moves down to 1.
> b) Again pick a designated broker and reduce the number of partitions on it
> through admin reassign partitions and designate it as a controller.
> c) Dynamically move controller if we see any problems on the broker which
> it is running.
>
> Here is the wiki page
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-39+Pinning+controller+to+broker
>
> -Abhishek
>


[jira] [Commented] (KAFKA-2672) SendFailedException when new consumer is run with SSL

2015-10-20 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965675#comment-14965675
 ] 

Ismael Juma commented on KAFKA-2672:


Thanks Jason. If this is harmless and since it's only logged at debug level, I 
think it's probably OK to leave as is for 0.9.0.0

> SendFailedException when new consumer is run with SSL
> -
>
> Key: KAFKA-2672
> URL: https://issues.apache.org/jira/browse/KAFKA-2672
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Rajini Sivaram
>Assignee: Jason Gustafson
> Fix For: 0.9.0.0
>
>
> When running new consumer with SSL, debug logs show these exceptions every 
> time:
> {quote}
> [2015-10-19 20:57:43,389] DEBUG Fetch failed 
> (org.apache.kafka.clients.consumer.internals.Fetcher)
> org.apache.kafka.clients.consumer.internals.SendFailedException 
> {quote}
> The exception occurs  because send is queued before SSL handshake is 
> complete. I am not sure if the exception is harmless, but it will be good to 
> avoid the exception either way since it feels like an exception that exists 
> to handle edge cases like buffer overflow rather than something in a normal 
> code path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2674) ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer close

2015-10-20 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965803#comment-14965803
 ] 

Jason Gustafson commented on KAFKA-2674:


[~becket_qin] I think the only problem with that is that some users might be 
doing their own offset management, in which case they might not even be using 
Kafka to store offsets. Currently, we only commit on close if auto-commit is 
enabled. I guess we could also depend on the user to manually do commits prior 
to close, but it seems like they'd probably already have their commit logic in 
the revoke callback. What do you think?

> ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer 
> close
> ---
>
> Key: KAFKA-2674
> URL: https://issues.apache.org/jira/browse/KAFKA-2674
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.0
>Reporter: Michal Turek
>Assignee: Neha Narkhede
>
> Hi, I'm investigating and testing behavior of new consumer from the planned 
> release 0.9 and found an inconsistency in calling of rebalance callbacks.
> I noticed that ConsumerRebalanceListener.onPartitionsRevoked() is NOT called 
> during consumer close and application shutdown. It's JavaDoc contract says:
> - "This method will be called before a rebalance operation starts and after 
> the consumer stops fetching data."
> - "It is recommended that offsets should be committed in this callback to 
> either Kafka or a custom offset store to prevent duplicate data."
> I believe calling consumer.close() is a start of rebalance operation and even 
> the local consumer that is actually closing should be notified to be able to 
> process any rebalance logic including offsets commit (e.g. if auto-commit is 
> disabled).
> There are commented logs of current and expected behaviors.
> {noformat}
> // Application start
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka version : 0.9.0.0-SNAPSHOT 
> (AppInfoParser.java:82)
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka commitId : 241b9ab58dcbde0c 
> (AppInfoParser.java:83)
> // Consumer started (the first one in group), rebalance callbacks are called 
> including empty onPartitionsRevoked()
> 2015-10-20 15:14:02.333 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:02.343 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:100)
> // Another consumer joined the group, rebalancing
> 2015-10-20 15:14:17.345 INFO  o.a.k.c.c.internals.Coordinator 
> [TestConsumer-worker-0]: Attempt to heart beat failed since the group is 
> rebalancing, try to re-join group. (Coordinator.java:714)
> 2015-10-20 15:14:17.346 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:17.349 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-3, testA-4, 
> testB-4, testA-3] (TestConsumer.java:100)
> // Consumer started closing, there SHOULD be onPartitionsRevoked() callback 
> to commit offsets like during standard rebalance, but it is missing
> 2015-10-20 15:14:39.280 INFO  c.a.e.kafka.newapi.TestConsumer [main]: 
> Closing instance (TestConsumer.java:42)
> 2015-10-20 15:14:40.264 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Worker thread stopped (TestConsumer.java:89)
> {noformat}
> Workaround is to call onPartitionsRevoked() explicitly and manually just 
> before calling consumer.close() but it seems dirty and error prone for me. It 
> can be simply forgotten be someone without such experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-2618; Disable SSL renegotiation for 0.9....

2015-10-20 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/339

KAFKA-2618; Disable SSL renegotiation for 0.9.0.0



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka kafka-2618-disable-renegotiation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #339


commit 2a1df538dbcf134069ae8a0a0eaa6acd74e7fcbc
Author: Ismael Juma 
Date:   2015-10-19T15:21:21Z

Disable renegotiation for now




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (KAFKA-2674) ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer close

2015-10-20 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965865#comment-14965865
 ] 

Jiangjie Qin edited comment on KAFKA-2674 at 10/20/15 10:15 PM:


I think it would be clearer if rebalance callback is only called when rebalance 
occurred, but not on consumer closure. 

If auto commit is turned off, and users are committing offset on there own, the 
following code looks clean to me when user close a consumer.
{code}
if (!failure)
  consumer.commitOffsetSync();
consumer.close();
{code}

And this is what we do in mirror maker. I prefer this because it is not complex 
and we are not re-purposing rebalance listener for something else. I can 
imagine some logic that only makes sense for actual rebalance, say holding a 
lock and release it when onPartitionAssigned() is called.

Thoughts?




was (Author: becket_qin):
I think it would be clearer if rebalance callback is only called when rebalance 
occurred, but not on consumer closure. 

If auto commit is turned off, and users are committing offset on there own, the 
following code looks clean to me when user close a consumer.
{code}
if (!failure)
  consumer.commitOffsetSync();
consumer.close();
{code}

And this is what we do in mirror maker. I prefer this because it does not 
complex and we are not re-purposing rebalance listener for something else. I 
can imagine some logic that only makes sense for actual rebalance, say holding 
a lock and release it when onPartitionAssigned() is called.

Thoughts?



> ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer 
> close
> ---
>
> Key: KAFKA-2674
> URL: https://issues.apache.org/jira/browse/KAFKA-2674
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.0
>Reporter: Michal Turek
>Assignee: Neha Narkhede
>
> Hi, I'm investigating and testing behavior of new consumer from the planned 
> release 0.9 and found an inconsistency in calling of rebalance callbacks.
> I noticed that ConsumerRebalanceListener.onPartitionsRevoked() is NOT called 
> during consumer close and application shutdown. It's JavaDoc contract says:
> - "This method will be called before a rebalance operation starts and after 
> the consumer stops fetching data."
> - "It is recommended that offsets should be committed in this callback to 
> either Kafka or a custom offset store to prevent duplicate data."
> I believe calling consumer.close() is a start of rebalance operation and even 
> the local consumer that is actually closing should be notified to be able to 
> process any rebalance logic including offsets commit (e.g. if auto-commit is 
> disabled).
> There are commented logs of current and expected behaviors.
> {noformat}
> // Application start
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka version : 0.9.0.0-SNAPSHOT 
> (AppInfoParser.java:82)
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka commitId : 241b9ab58dcbde0c 
> (AppInfoParser.java:83)
> // Consumer started (the first one in group), rebalance callbacks are called 
> including empty onPartitionsRevoked()
> 2015-10-20 15:14:02.333 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:02.343 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:100)
> // Another consumer joined the group, rebalancing
> 2015-10-20 15:14:17.345 INFO  o.a.k.c.c.internals.Coordinator 
> [TestConsumer-worker-0]: Attempt to heart beat failed since the group is 
> rebalancing, try to re-join group. (Coordinator.java:714)
> 2015-10-20 15:14:17.346 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:17.349 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-3, testA-4, 
> testB-4, testA-3] (TestConsumer.java:100)
> // Consumer started closing, there SHOULD be onPartitionsRevoked() callback 
> to commit offsets like during standard rebalance, but it is missing
> 2015-10-20 15:14:39.280 INFO  c.a.e.kafka.newapi.TestConsumer [main]: 
> Closing instance (TestConsumer.java:42)
> 2015-10-20 15:14:40.264 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Worker thread stopped (TestConsumer.java:89)
> {noformat}
> Workaround is to call 

[jira] [Commented] (KAFKA-2618) Disable SSL renegotiation for 0.9.0.0

2015-10-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965893#comment-14965893
 ] 

ASF GitHub Bot commented on KAFKA-2618:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/339

KAFKA-2618; Disable SSL renegotiation for 0.9.0.0



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka kafka-2618-disable-renegotiation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #339


commit 2a1df538dbcf134069ae8a0a0eaa6acd74e7fcbc
Author: Ismael Juma 
Date:   2015-10-19T15:21:21Z

Disable renegotiation for now




> Disable SSL renegotiation for 0.9.0.0
> -
>
> Key: KAFKA-2618
> URL: https://issues.apache.org/jira/browse/KAFKA-2618
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 0.9.0.0
>
>
> As discussed in KAFKA-2609, we don't have enough tests for SSL renegotiation 
> to be confident that it works well. In addition, neither the clients or the 
> server make use of renegotiation at this point.
> For 0.9.0.0, we should disable renegotiation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2674) ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer close

2015-10-20 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965865#comment-14965865
 ] 

Jiangjie Qin commented on KAFKA-2674:
-

I think it would be clearer if rebalance callback is only called when rebalance 
occurred, but not on consumer closure. 

If auto commit is turned off, and users are committing offset on there own, the 
following code looks clean to me when user close a consumer.
{code}
if (!failure)
  consumer.commitOffsetSync();
consumer.close();
{code}

And this is what we do in mirror maker. I prefer this because it does not 
complex and we are not re-purposing rebalance listener for something else. I 
can imagine some logic that only makes sense for actual rebalance, say holding 
a lock and release it when onPartitionAssigned() is called.

Thoughts?



> ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer 
> close
> ---
>
> Key: KAFKA-2674
> URL: https://issues.apache.org/jira/browse/KAFKA-2674
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.0
>Reporter: Michal Turek
>Assignee: Neha Narkhede
>
> Hi, I'm investigating and testing behavior of new consumer from the planned 
> release 0.9 and found an inconsistency in calling of rebalance callbacks.
> I noticed that ConsumerRebalanceListener.onPartitionsRevoked() is NOT called 
> during consumer close and application shutdown. It's JavaDoc contract says:
> - "This method will be called before a rebalance operation starts and after 
> the consumer stops fetching data."
> - "It is recommended that offsets should be committed in this callback to 
> either Kafka or a custom offset store to prevent duplicate data."
> I believe calling consumer.close() is a start of rebalance operation and even 
> the local consumer that is actually closing should be notified to be able to 
> process any rebalance logic including offsets commit (e.g. if auto-commit is 
> disabled).
> There are commented logs of current and expected behaviors.
> {noformat}
> // Application start
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka version : 0.9.0.0-SNAPSHOT 
> (AppInfoParser.java:82)
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka commitId : 241b9ab58dcbde0c 
> (AppInfoParser.java:83)
> // Consumer started (the first one in group), rebalance callbacks are called 
> including empty onPartitionsRevoked()
> 2015-10-20 15:14:02.333 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:02.343 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:100)
> // Another consumer joined the group, rebalancing
> 2015-10-20 15:14:17.345 INFO  o.a.k.c.c.internals.Coordinator 
> [TestConsumer-worker-0]: Attempt to heart beat failed since the group is 
> rebalancing, try to re-join group. (Coordinator.java:714)
> 2015-10-20 15:14:17.346 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:17.349 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-3, testA-4, 
> testB-4, testA-3] (TestConsumer.java:100)
> // Consumer started closing, there SHOULD be onPartitionsRevoked() callback 
> to commit offsets like during standard rebalance, but it is missing
> 2015-10-20 15:14:39.280 INFO  c.a.e.kafka.newapi.TestConsumer [main]: 
> Closing instance (TestConsumer.java:42)
> 2015-10-20 15:14:40.264 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Worker thread stopped (TestConsumer.java:89)
> {noformat}
> Workaround is to call onPartitionsRevoked() explicitly and manually just 
> before calling consumer.close() but it seems dirty and error prone for me. It 
> can be simply forgotten be someone without such experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2618) Disable SSL renegotiation for 0.9.0.0

2015-10-20 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2618:
---
Reviewer: Jun Rao
  Status: Patch Available  (was: In Progress)

> Disable SSL renegotiation for 0.9.0.0
> -
>
> Key: KAFKA-2618
> URL: https://issues.apache.org/jira/browse/KAFKA-2618
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Reporter: Ismael Juma
>Assignee: Ismael Juma
> Fix For: 0.9.0.0
>
>
> As discussed in KAFKA-2609, we don't have enough tests for SSL renegotiation 
> to be confident that it works well. In addition, neither the clients or the 
> server make use of renegotiation at this point.
> For 0.9.0.0, we should disable renegotiation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2674) ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer close

2015-10-20 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965913#comment-14965913
 ] 

Guozhang Wang commented on KAFKA-2674:
--

Today we are already calling ```commitOffsetSync``` upon consumer closing in 
the coordinator if auto commit is turned on. In addition, we also call 
```commitOffsetSync``` before we call ```onPartitionsRevoked``` during the 
rebalance if auto commit is turned on. So I think the question is really 
whether we should also call ```onPartitionsRevoked``` upon closing after we 
call ```commitOffsetSync``` as well.

I prefer adding the ```onPartitionsRevoked``` call since it may be used not 
only for manual offset management. 

BTW there is a discrepancy between the old and new consumer in MirrorMaker, 
that with the old consumer we use a rebalance listener that returns the global 
assignment in ```onPartitionsAssigned``` whereas in the new consumer it only 
returns its own assignment. We need to think about how it can be resolved.

> ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer 
> close
> ---
>
> Key: KAFKA-2674
> URL: https://issues.apache.org/jira/browse/KAFKA-2674
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.0
>Reporter: Michal Turek
>Assignee: Neha Narkhede
>
> Hi, I'm investigating and testing behavior of new consumer from the planned 
> release 0.9 and found an inconsistency in calling of rebalance callbacks.
> I noticed that ConsumerRebalanceListener.onPartitionsRevoked() is NOT called 
> during consumer close and application shutdown. It's JavaDoc contract says:
> - "This method will be called before a rebalance operation starts and after 
> the consumer stops fetching data."
> - "It is recommended that offsets should be committed in this callback to 
> either Kafka or a custom offset store to prevent duplicate data."
> I believe calling consumer.close() is a start of rebalance operation and even 
> the local consumer that is actually closing should be notified to be able to 
> process any rebalance logic including offsets commit (e.g. if auto-commit is 
> disabled).
> There are commented logs of current and expected behaviors.
> {noformat}
> // Application start
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka version : 0.9.0.0-SNAPSHOT 
> (AppInfoParser.java:82)
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka commitId : 241b9ab58dcbde0c 
> (AppInfoParser.java:83)
> // Consumer started (the first one in group), rebalance callbacks are called 
> including empty onPartitionsRevoked()
> 2015-10-20 15:14:02.333 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:02.343 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:100)
> // Another consumer joined the group, rebalancing
> 2015-10-20 15:14:17.345 INFO  o.a.k.c.c.internals.Coordinator 
> [TestConsumer-worker-0]: Attempt to heart beat failed since the group is 
> rebalancing, try to re-join group. (Coordinator.java:714)
> 2015-10-20 15:14:17.346 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:17.349 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-3, testA-4, 
> testB-4, testA-3] (TestConsumer.java:100)
> // Consumer started closing, there SHOULD be onPartitionsRevoked() callback 
> to commit offsets like during standard rebalance, but it is missing
> 2015-10-20 15:14:39.280 INFO  c.a.e.kafka.newapi.TestConsumer [main]: 
> Closing instance (TestConsumer.java:42)
> 2015-10-20 15:14:40.264 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Worker thread stopped (TestConsumer.java:89)
> {noformat}
> Workaround is to call onPartitionsRevoked() explicitly and manually just 
> before calling consumer.close() but it seems dirty and error prone for me. It 
> can be simply forgotten be someone without such experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1686) Implement SASL/Kerberos

2015-10-20 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965975#comment-14965975
 ] 

Jun Rao commented on KAFKA-1686:


Also, [~sriharsha], could you write up a wiki of using SASL like you did for 
SSL? Thanks,

> Implement SASL/Kerberos
> ---
>
> Key: KAFKA-1686
> URL: https://issues.apache.org/jira/browse/KAFKA-1686
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 0.8.2.1
>Reporter: Jay Kreps
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.9.0.0
>
>
> Implement SASL/Kerberos authentication.
> To do this we will need to introduce a new SASLRequest and SASLResponse pair 
> to the client protocol. This request and response will each have only a 
> single byte[] field and will be used to handle the SASL challenge/response 
> cycle. Doing this will initialize the SaslServer instance and associate it 
> with the session in a manner similar to KAFKA-1684.
> When using integrity or encryption mechanisms with SASL we will need to wrap 
> and unwrap bytes as in KAFKA-1684 so the same interface that covers the 
> SSLEngine will need to also cover the SaslServer instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2645) Document potentially breaking changes in the release notes for 0.9.0

2015-10-20 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-2645:
-
Reviewer: Jun Rao

> Document potentially breaking changes in the release notes for 0.9.0
> 
>
> Key: KAFKA-2645
> URL: https://issues.apache.org/jira/browse/KAFKA-2645
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Grant Henke
> Fix For: 0.9.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-2338: Warn on max.message.bytes change

2015-10-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/322


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-1686) Implement SASL/Kerberos

2015-10-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966076#comment-14966076
 ] 

ASF GitHub Bot commented on KAFKA-1686:
---

Github user harshach closed the pull request at:

https://github.com/apache/kafka/pull/191


> Implement SASL/Kerberos
> ---
>
> Key: KAFKA-1686
> URL: https://issues.apache.org/jira/browse/KAFKA-1686
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 0.8.2.1
>Reporter: Jay Kreps
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.9.0.0
>
>
> Implement SASL/Kerberos authentication.
> To do this we will need to introduce a new SASLRequest and SASLResponse pair 
> to the client protocol. This request and response will each have only a 
> single byte[] field and will be used to handle the SASL challenge/response 
> cycle. Doing this will initialize the SaslServer instance and associate it 
> with the session in a manner similar to KAFKA-1684.
> When using integrity or encryption mechanisms with SASL we will need to wrap 
> and unwrap bytes as in KAFKA-1684 so the same interface that covers the 
> SSLEngine will need to also cover the SaslServer instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-1686: Implement SASL/Kerberos.

2015-10-20 Thread harshach
Github user harshach closed the pull request at:

https://github.com/apache/kafka/pull/191


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2674) ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer close

2015-10-20 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965935#comment-14965935
 ] 

Jason Gustafson commented on KAFKA-2674:


Since LeaveGroup will cause a group rebalance, it doesn't seem inconsistent to 
call the revocation callback prior to closing, but I don't have a strong 
preference either way as long as the documentation is clear. 

> ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer 
> close
> ---
>
> Key: KAFKA-2674
> URL: https://issues.apache.org/jira/browse/KAFKA-2674
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.0
>Reporter: Michal Turek
>Assignee: Neha Narkhede
>
> Hi, I'm investigating and testing behavior of new consumer from the planned 
> release 0.9 and found an inconsistency in calling of rebalance callbacks.
> I noticed that ConsumerRebalanceListener.onPartitionsRevoked() is NOT called 
> during consumer close and application shutdown. It's JavaDoc contract says:
> - "This method will be called before a rebalance operation starts and after 
> the consumer stops fetching data."
> - "It is recommended that offsets should be committed in this callback to 
> either Kafka or a custom offset store to prevent duplicate data."
> I believe calling consumer.close() is a start of rebalance operation and even 
> the local consumer that is actually closing should be notified to be able to 
> process any rebalance logic including offsets commit (e.g. if auto-commit is 
> disabled).
> There are commented logs of current and expected behaviors.
> {noformat}
> // Application start
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka version : 0.9.0.0-SNAPSHOT 
> (AppInfoParser.java:82)
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka commitId : 241b9ab58dcbde0c 
> (AppInfoParser.java:83)
> // Consumer started (the first one in group), rebalance callbacks are called 
> including empty onPartitionsRevoked()
> 2015-10-20 15:14:02.333 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:02.343 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:100)
> // Another consumer joined the group, rebalancing
> 2015-10-20 15:14:17.345 INFO  o.a.k.c.c.internals.Coordinator 
> [TestConsumer-worker-0]: Attempt to heart beat failed since the group is 
> rebalancing, try to re-join group. (Coordinator.java:714)
> 2015-10-20 15:14:17.346 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:17.349 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-3, testA-4, 
> testB-4, testA-3] (TestConsumer.java:100)
> // Consumer started closing, there SHOULD be onPartitionsRevoked() callback 
> to commit offsets like during standard rebalance, but it is missing
> 2015-10-20 15:14:39.280 INFO  c.a.e.kafka.newapi.TestConsumer [main]: 
> Closing instance (TestConsumer.java:42)
> 2015-10-20 15:14:40.264 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Worker thread stopped (TestConsumer.java:89)
> {noformat}
> Workaround is to call onPartitionsRevoked() explicitly and manually just 
> before calling consumer.close() but it seems dirty and error prone for me. It 
> can be simply forgotten be someone without such experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #43

2015-10-20 Thread Apache Jenkins Server
See 

Changes:

[junrao] KAFKA-1686; Implement SASL/Kerberos

--
[...truncated 4564 lines...]

org.apache.kafka.copycat.file.FileStreamSourceConnectorTest > 
testMultipleSourcesInvalid PASSED

org.apache.kafka.copycat.file.FileStreamSinkConnectorTest > testSinkTasks PASSED

org.apache.kafka.copycat.file.FileStreamSinkConnectorTest > testTaskClass PASSED

org.apache.kafka.copycat.file.FileStreamSinkTaskTest > testPutFlush PASSED

org.apache.kafka.copycat.file.FileStreamSourceTaskTest > testNormalLifecycle 
PASSED

org.apache.kafka.copycat.file.FileStreamSourceTaskTest > testMissingTopic PASSED
:copycat:json:checkstyleMain
:copycat:json:compileTestJavawarning: [options] bootstrap class path not set in 
conjunction with -source 1.7
Note: 

 uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 warning

:copycat:json:processTestResources UP-TO-DATE
:copycat:json:testClasses
:copycat:json:checkstyleTest
:copycat:json:test

org.apache.kafka.copycat.json.JsonConverterTest > longToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > 
testCacheSchemaToJsonConversion PASSED

org.apache.kafka.copycat.json.JsonConverterTest > 
nullSchemaAndMapNonStringKeysToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > floatToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > booleanToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > nullSchemaAndMapToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > stringToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > timestampToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > 
testCopycatSchemaMetadataTranslation PASSED

org.apache.kafka.copycat.json.JsonConverterTest > timestampToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > decimalToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > mapToCopycatStringKeys PASSED

org.apache.kafka.copycat.json.JsonConverterTest > mapToJsonNonStringKeys PASSED

org.apache.kafka.copycat.json.JsonConverterTest > longToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > mismatchSchemaJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > 
testCacheSchemaToCopycatConversion PASSED

org.apache.kafka.copycat.json.JsonConverterTest > 
testJsonSchemaMetadataTranslation PASSED

org.apache.kafka.copycat.json.JsonConverterTest > bytesToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > shortToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > intToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > structToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > stringToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > nullSchemaAndArrayToJson 
PASSED

org.apache.kafka.copycat.json.JsonConverterTest > byteToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > nullSchemaPrimitiveToCopycat 
PASSED

org.apache.kafka.copycat.json.JsonConverterTest > byteToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > intToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > dateToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > noSchemaToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > noSchemaToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > nullSchemaAndPrimitiveToJson 
PASSED

org.apache.kafka.copycat.json.JsonConverterTest > mapToJsonStringKeys PASSED

org.apache.kafka.copycat.json.JsonConverterTest > arrayToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > nullToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > timeToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > structToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > shortToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > dateToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > doubleToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > timeToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > floatToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > decimalToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > arrayToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > booleanToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > mapToCopycatNonStringKeys 
PASSED

org.apache.kafka.copycat.json.JsonConverterTest > bytesToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > doubleToCopycat PASSED
:copycat:runtime:checkstyleMain
:copycat:runtime:compileTestJavawarning: [options] bootstrap class path not set 
in conjunction with -source 1.7
Note: Some input files use unchecked or unsafe operations.
Note: 

[jira] [Commented] (KAFKA-2674) ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer close

2015-10-20 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966003#comment-14966003
 ] 

Jiangjie Qin commented on KAFKA-2674:
-

[~guozhang] The original reason we add consumer rebalance listener was because 
rebalance can be triggered without user awareness. But when rebalance occurs, 
user might want to do something. This is different from consumer closure case. 
When users close the consumer, they know what they are doing, and likely they 
will do some cleanup and necessary state checkpoint before closing the 
consumer. I am not sure how valuable it is to put some pre-closure tasks to 
onPartitionRevoked(). To me it is a little confusing. Also user might need to 
add some check to see if the rebalance is caused by close() or it is an actual 
rebalance.(for example the grabbing a lock as I mentioned before).

Good point about the Mirror Maker. The reason we have global assignment for 
mirror maker during rebalance listener is because we want to allow mirror maker 
to do some administrative logic when rebalance is triggered. (e.g. when 
rebalance is triggered because of a new topic is created in source cluster, we 
want to create the topic in target cluster with the same number of partitions). 
In order to let the rebalance listener to perform such action we need a global 
knowledge of topic so we know which topics changed. This global topic 
information is part of global assignment (global topic info + owners).

With the new client-side assignment patch, at least the leader has global 
assignment knowledge, so I think for that use case we should be fine. Although 
letting the leader to do everything is not ideal if the administrative work is 
heavy, but it is still doable.



> ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer 
> close
> ---
>
> Key: KAFKA-2674
> URL: https://issues.apache.org/jira/browse/KAFKA-2674
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.0
>Reporter: Michal Turek
>Assignee: Neha Narkhede
>
> Hi, I'm investigating and testing behavior of new consumer from the planned 
> release 0.9 and found an inconsistency in calling of rebalance callbacks.
> I noticed that ConsumerRebalanceListener.onPartitionsRevoked() is NOT called 
> during consumer close and application shutdown. It's JavaDoc contract says:
> - "This method will be called before a rebalance operation starts and after 
> the consumer stops fetching data."
> - "It is recommended that offsets should be committed in this callback to 
> either Kafka or a custom offset store to prevent duplicate data."
> I believe calling consumer.close() is a start of rebalance operation and even 
> the local consumer that is actually closing should be notified to be able to 
> process any rebalance logic including offsets commit (e.g. if auto-commit is 
> disabled).
> There are commented logs of current and expected behaviors.
> {noformat}
> // Application start
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka version : 0.9.0.0-SNAPSHOT 
> (AppInfoParser.java:82)
> 2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
> [TestConsumer-worker-0]: Kafka commitId : 241b9ab58dcbde0c 
> (AppInfoParser.java:83)
> // Consumer started (the first one in group), rebalance callbacks are called 
> including empty onPartitionsRevoked()
> 2015-10-20 15:14:02.333 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:02.343 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:100)
> // Another consumer joined the group, rebalancing
> 2015-10-20 15:14:17.345 INFO  o.a.k.c.c.internals.Coordinator 
> [TestConsumer-worker-0]: Attempt to heart beat failed since the group is 
> rebalancing, try to re-join group. (Coordinator.java:714)
> 2015-10-20 15:14:17.346 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, revoked: [testB-1, testA-0, 
> testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
> (TestConsumer.java:95)
> 2015-10-20 15:14:17.349 INFO  c.a.e.kafka.newapi.TestConsumer 
> [TestConsumer-worker-0]: Rebalance callback, assigned: [testB-3, testA-4, 
> testB-4, testA-3] (TestConsumer.java:100)
> // Consumer started closing, there SHOULD be onPartitionsRevoked() callback 
> to commit offsets like during standard rebalance, but it is missing
> 2015-10-20 15:14:39.280 INFO  c.a.e.kafka.newapi.TestConsumer [main]: 
> Closing instance 

Re: [DISCUSS] KIP-39 Pinning controller to a broker

2015-10-20 Thread Abhishek Nigam
Hi Jay/Neha,
I just subscribed to the mailing list so I read your response but did not
receive your email so adding the context into this email thread.

"

Agree with Jay on staying away from pinning roles to brokers. This is
actually harder to operate and monitor.

Regarding the problems you mentioned-
1. Reducing the controller moves during rolling bounce is useful but really
something that should be handled by the tooling. The root cause is that
currently the controller move is expensive. I think we'd be better off
investing time and effort in thinning out the controller. Just moving to
the batch write APIs in ZooKeeper will make a huge difference.
2. I'm not sure I understood the motivation behind moving partitions out of
the controller broker. That seems like a proposal for a solution, but can
you describe the problems you saw that affected controller functionality?

Regarding the location of the controller, it seems there are 2 things you
are suggesting:
1. Optimizing the strategy of picking a broker as the controller (e.g.
least loaded node)
2. Moving the controller if a broker soft fails.

I don't think #1 is worth the effort involved. The better way of addressing
it is to make the controller thinner and faster. #2 is interesting since
the problem is that while a broker fails, all state changes fail or are
queued up which globally impacts the cluster. There are 2 alternatives -
have a tool that allows you to move the controller or just kill the broker
so the controller moves. I prefer the latter since it is simple and also
because a misbehaving broker is better off shutdown anyway.

Having said that, it will be helpful to know details of the problems you
saw while operating the controller. I think understanding those will help
guide the solution better.

On Tue, Oct 20, 2015 at 12:49 PM, Jay Kreps  wrote:

> This seems like a step backwards--we really don't want people to manually
> manage the location of the controller and try to manually balance
> partitions off that broker.
>
> I think it might make sense to consider directly fixing the things you
> actual want to fix:
> 1. Two many controller moves--we could either just make this cheaper or
> make the controller location more deterministic e.g. having the election
> prefer the node with the smallest node id so there were fewer failovers in
> rolling bounces.
> 2. You seem to think having the controller on a normal node is a problem.
> Can you elaborate on what the negative consequences you've observed? Let's
> focus on fixing those.
>
> In general we've worked very hard to avoid having a bunch of dedicated
> roles for different nodes and I would be very very loath to see us move
> away from that philosophy. I have a fair amount of experience with both
> homogenous systems that have a single role and also systems with many
> differentiated roles and I really think that the differentiated approach
> causes more problems than it solves for most deployments due to the added
> complexity.
>
> I think we could also fix up this KIP a bit. For example it says there are
> no public interfaces involved but surely there are new admin commands to
> control the location? There are also some minor things like listing it as
> released in 0.8.3 that seem wrong.
>
> -Jay
>
> On Tue, Oct 20, 2015 at 12:18 PM, Abhishek Nigam <
> ani...@linkedin.com.invalid> wrote:
>
> > Hi,
> > Can we please discuss this KIP. The background for this is that it allows
> > us to pin controller to a broker. This is useful in a couple of
> scenarios:
> > a) If we want to do a rolling bounce we can reduce the number of
> controller
> > moves down to 1.
> > b) Again pick a designated broker and reduce the number of partitions on
> it
> > through admin reassign partitions and designate it as a controller.
> > c) Dynamically move controller if we see any problems on the broker which
> > it is running.
> >
> > Here is the wiki page
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-39+Pinning+controller+to+broker
> >
> > -Abhishek
> >
>

"

I think based on the feedback we can limit the discussion to the rolling
upgrade scenario and how best to address it. I think the only scenario
which I have heard
where we wanted to move controller off a broker was due to a bug where we
had multiple controllers due to a bug which has since been fixed.

I will update the KIP on how we can optimize the placement of controller
(pinning it to a preferred broker id (potentially config enabled) ) if that
sounds reasonable.
Many of the ideas of the original KIP can still apply in the limited scope.

-Abhishek


[GitHub] kafka pull request: KAFKA 2480 Add backoff timeout and support rew...

2015-10-20 Thread Ishiihara
GitHub user Ishiihara opened a pull request:

https://github.com/apache/kafka/pull/340

KAFKA 2480 Add backoff timeout and support rewinds



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Ishiihara/kafka backoff

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/340.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #340


commit d7fa3cc3feffec8349ce3ec08d0b70a242de632e
Author: Liquan Pei 
Date:   2015-10-21T01:00:26Z

Add backoff timeout and support rewinds




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-1686) Implement SASL/Kerberos

2015-10-20 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966059#comment-14966059
 ] 

Sriharsha Chintalapani commented on KAFKA-1686:
---

[~junrao] working on it. I'll post it on the wiki.

> Implement SASL/Kerberos
> ---
>
> Key: KAFKA-1686
> URL: https://issues.apache.org/jira/browse/KAFKA-1686
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 0.8.2.1
>Reporter: Jay Kreps
>Assignee: Sriharsha Chintalapani
>Priority: Blocker
> Fix For: 0.9.0.0
>
>
> Implement SASL/Kerberos authentication.
> To do this we will need to introduce a new SASLRequest and SASLResponse pair 
> to the client protocol. This request and response will each have only a 
> single byte[] field and will be used to handle the SASL challenge/response 
> cycle. Doing this will initialize the SaslServer instance and associate it 
> with the session in a manner similar to KAFKA-1684.
> When using integrity or encryption mechanisms with SASL we will need to wrap 
> and unwrap bytes as in KAFKA-1684 so the same interface that covers the 
> SSLEngine will need to also cover the SaslServer instance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2338) Warn users if they change max.message.bytes that they also need to update broker and consumer settings

2015-10-20 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-2338:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 322
[https://github.com/apache/kafka/pull/322]

> Warn users if they change max.message.bytes that they also need to update 
> broker and consumer settings
> --
>
> Key: KAFKA-2338
> URL: https://issues.apache.org/jira/browse/KAFKA-2338
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
>Reporter: Ewen Cheslack-Postava
>Assignee: Ben Stopford
>Priority: Critical
> Fix For: 0.9.0.0
>
> Attachments: KAFKA-2338.patch, KAFKA-2338_2015-07-18_00:37:31.patch, 
> KAFKA-2338_2015-07-21_13:21:19.patch, KAFKA-2338_2015-08-24_14:32:38.patch, 
> KAFKA-2338_2015-09-02_19:27:17.patch
>
>
> We already have KAFKA-1756 filed to more completely address this issue, but 
> it is waiting for some other major changes to configs to completely protect 
> users from this problem.
> This JIRA should address the low hanging fruit to at least warn users of the 
> potential problems. Currently the only warning is in our documentation.
> 1. Generate a warning in the kafka-topics.sh tool when they change this 
> setting on a topic to be larger than the default. This needs to be very 
> obvious in the output.
> 2. Currently, the broker's replica fetcher isn't logging any useful error 
> messages when replication can't succeed because a message size is too large. 
> Logging an error here would allow users that get into a bad state to find out 
> why it is happening more easily. (Consumers should already be logging a 
> useful error message.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2338) Warn users if they change max.message.bytes that they also need to update broker and consumer settings

2015-10-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966022#comment-14966022
 ] 

ASF GitHub Bot commented on KAFKA-2338:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/322


> Warn users if they change max.message.bytes that they also need to update 
> broker and consumer settings
> --
>
> Key: KAFKA-2338
> URL: https://issues.apache.org/jira/browse/KAFKA-2338
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
>Reporter: Ewen Cheslack-Postava
>Assignee: Ben Stopford
>Priority: Critical
> Fix For: 0.9.0.0
>
> Attachments: KAFKA-2338.patch, KAFKA-2338_2015-07-18_00:37:31.patch, 
> KAFKA-2338_2015-07-21_13:21:19.patch, KAFKA-2338_2015-08-24_14:32:38.patch, 
> KAFKA-2338_2015-09-02_19:27:17.patch
>
>
> We already have KAFKA-1756 filed to more completely address this issue, but 
> it is waiting for some other major changes to configs to completely protect 
> users from this problem.
> This JIRA should address the low hanging fruit to at least warn users of the 
> potential problems. Currently the only warning is in our documentation.
> 1. Generate a warning in the kafka-topics.sh tool when they change this 
> setting on a topic to be larger than the default. This needs to be very 
> obvious in the output.
> 2. Currently, the broker's replica fetcher isn't logging any useful error 
> messages when replication can't succeed because a message size is too large. 
> Logging an error here would allow users that get into a bad state to find out 
> why it is happening more easily. (Consumers should already be logging a 
> useful error message.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2678) partition level lag metrics can be negative

2015-10-20 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-2678:
--

 Summary: partition level lag metrics can be negative
 Key: KAFKA-2678
 URL: https://issues.apache.org/jira/browse/KAFKA-2678
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.2.1
Reporter: Jun Rao


Currently, the per partition level lag metric can be negative since the last 
committed offset can be smaller than the follower's offset. This is a bit 
confusing to end users. We probably should lower bound it by 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2678) partition level lag metrics can be negative

2015-10-20 Thread Dong Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated KAFKA-2678:

Assignee: Dong Lin

> partition level lag metrics can be negative
> ---
>
> Key: KAFKA-2678
> URL: https://issues.apache.org/jira/browse/KAFKA-2678
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
>Reporter: Jun Rao
>Assignee: Dong Lin
>
> Currently, the per partition level lag metric can be negative since the last 
> committed offset can be smaller than the follower's offset. This is a bit 
> confusing to end users. We probably should lower bound it by 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #44

2015-10-20 Thread Apache Jenkins Server
See 

Changes:

[junrao] KAFKA-2338; Warn on max.message.bytes change

--
[...truncated 4569 lines...]

org.apache.kafka.copycat.file.FileStreamSourceConnectorTest > 
testMultipleSourcesInvalid PASSED

org.apache.kafka.copycat.file.FileStreamSinkConnectorTest > testSinkTasks PASSED

org.apache.kafka.copycat.file.FileStreamSinkConnectorTest > testTaskClass PASSED

org.apache.kafka.copycat.file.FileStreamSinkTaskTest > testPutFlush PASSED

org.apache.kafka.copycat.file.FileStreamSourceTaskTest > testNormalLifecycle 
PASSED

org.apache.kafka.copycat.file.FileStreamSourceTaskTest > testMissingTopic PASSED
:copycat:json:checkstyleMain
:copycat:json:compileTestJavawarning: [options] bootstrap class path not set in 
conjunction with -source 1.7
Note: 

 uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 warning

:copycat:json:processTestResources UP-TO-DATE
:copycat:json:testClasses
:copycat:json:checkstyleTest
:copycat:json:test

org.apache.kafka.copycat.json.JsonConverterTest > longToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > 
testCacheSchemaToJsonConversion PASSED

org.apache.kafka.copycat.json.JsonConverterTest > 
nullSchemaAndMapNonStringKeysToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > floatToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > booleanToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > nullSchemaAndMapToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > stringToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > timestampToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > 
testCopycatSchemaMetadataTranslation PASSED

org.apache.kafka.copycat.json.JsonConverterTest > timestampToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > decimalToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > mapToCopycatStringKeys PASSED

org.apache.kafka.copycat.json.JsonConverterTest > mapToJsonNonStringKeys PASSED

org.apache.kafka.copycat.json.JsonConverterTest > longToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > mismatchSchemaJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > 
testCacheSchemaToCopycatConversion PASSED

org.apache.kafka.copycat.json.JsonConverterTest > 
testJsonSchemaMetadataTranslation PASSED

org.apache.kafka.copycat.json.JsonConverterTest > bytesToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > shortToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > intToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > structToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > stringToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > nullSchemaAndArrayToJson 
PASSED

org.apache.kafka.copycat.json.JsonConverterTest > byteToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > nullSchemaPrimitiveToCopycat 
PASSED

org.apache.kafka.copycat.json.JsonConverterTest > byteToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > intToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > dateToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > noSchemaToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > noSchemaToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > nullSchemaAndPrimitiveToJson 
PASSED

org.apache.kafka.copycat.json.JsonConverterTest > mapToJsonStringKeys PASSED

org.apache.kafka.copycat.json.JsonConverterTest > arrayToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > nullToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > timeToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > structToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > shortToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > dateToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > doubleToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > timeToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > floatToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > decimalToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > arrayToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > booleanToJson PASSED

org.apache.kafka.copycat.json.JsonConverterTest > mapToCopycatNonStringKeys 
PASSED

org.apache.kafka.copycat.json.JsonConverterTest > bytesToCopycat PASSED

org.apache.kafka.copycat.json.JsonConverterTest > doubleToCopycat PASSED
:copycat:runtime:checkstyleMain
:copycat:runtime:compileTestJavawarning: [options] bootstrap class path not set 
in conjunction with -source 1.7
Note: Some input files use unchecked or unsafe operations.

[jira] [Created] (KAFKA-2679) Mirror Maker produces duplicated data

2015-10-20 Thread He Tianyi (JIRA)
He Tianyi created KAFKA-2679:


 Summary: Mirror Maker produces duplicated data
 Key: KAFKA-2679
 URL: https://issues.apache.org/jira/browse/KAFKA-2679
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.8.2.1
Reporter: He Tianyi


Observed that Mirror Maker produced duplicated messages.
Doing further investigation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-39 Pinning controller to a broker

2015-10-20 Thread Neha Narkhede
>
> I will update the KIP on how we can optimize the placement of controller
> (pinning it to a preferred broker id (potentially config enabled) ) if that
> sounds reasonable.


The point I (and I think Jay too) was making is that pinning a controller
to a broker through config is what we should stay away from. This should be
handled by whatever tool you may be using to bounce the cluster in a
rolling restart fashion (where it detects the current controller and
restarts it at the very end).


On Tue, Oct 20, 2015 at 5:35 PM, Abhishek Nigam  wrote:

> Hi Jay/Neha,
> I just subscribed to the mailing list so I read your response but did not
> receive your email so adding the context into this email thread.
>
> "
>
> Agree with Jay on staying away from pinning roles to brokers. This is
> actually harder to operate and monitor.
>
> Regarding the problems you mentioned-
> 1. Reducing the controller moves during rolling bounce is useful but really
> something that should be handled by the tooling. The root cause is that
> currently the controller move is expensive. I think we'd be better off
> investing time and effort in thinning out the controller. Just moving to
> the batch write APIs in ZooKeeper will make a huge difference.
> 2. I'm not sure I understood the motivation behind moving partitions out of
> the controller broker. That seems like a proposal for a solution, but can
> you describe the problems you saw that affected controller functionality?
>
> Regarding the location of the controller, it seems there are 2 things you
> are suggesting:
> 1. Optimizing the strategy of picking a broker as the controller (e.g.
> least loaded node)
> 2. Moving the controller if a broker soft fails.
>
> I don't think #1 is worth the effort involved. The better way of addressing
> it is to make the controller thinner and faster. #2 is interesting since
> the problem is that while a broker fails, all state changes fail or are
> queued up which globally impacts the cluster. There are 2 alternatives -
> have a tool that allows you to move the controller or just kill the broker
> so the controller moves. I prefer the latter since it is simple and also
> because a misbehaving broker is better off shutdown anyway.
>
> Having said that, it will be helpful to know details of the problems you
> saw while operating the controller. I think understanding those will help
> guide the solution better.
>
> On Tue, Oct 20, 2015 at 12:49 PM, Jay Kreps  wrote:
>
> > This seems like a step backwards--we really don't want people to manually
> > manage the location of the controller and try to manually balance
> > partitions off that broker.
> >
> > I think it might make sense to consider directly fixing the things you
> > actual want to fix:
> > 1. Two many controller moves--we could either just make this cheaper or
> > make the controller location more deterministic e.g. having the election
> > prefer the node with the smallest node id so there were fewer failovers
> in
> > rolling bounces.
> > 2. You seem to think having the controller on a normal node is a problem.
> > Can you elaborate on what the negative consequences you've observed?
> Let's
> > focus on fixing those.
> >
> > In general we've worked very hard to avoid having a bunch of dedicated
> > roles for different nodes and I would be very very loath to see us move
> > away from that philosophy. I have a fair amount of experience with both
> > homogenous systems that have a single role and also systems with many
> > differentiated roles and I really think that the differentiated approach
> > causes more problems than it solves for most deployments due to the added
> > complexity.
> >
> > I think we could also fix up this KIP a bit. For example it says there
> are
> > no public interfaces involved but surely there are new admin commands to
> > control the location? There are also some minor things like listing it as
> > released in 0.8.3 that seem wrong.
> >
> > -Jay
> >
> > On Tue, Oct 20, 2015 at 12:18 PM, Abhishek Nigam <
> > ani...@linkedin.com.invalid> wrote:
> >
> > > Hi,
> > > Can we please discuss this KIP. The background for this is that it
> allows
> > > us to pin controller to a broker. This is useful in a couple of
> > scenarios:
> > > a) If we want to do a rolling bounce we can reduce the number of
> > controller
> > > moves down to 1.
> > > b) Again pick a designated broker and reduce the number of partitions
> on
> > it
> > > through admin reassign partitions and designate it as a controller.
> > > c) Dynamically move controller if we see any problems on the broker
> which
> > > it is running.
> > >
> > > Here is the wiki page
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-39+Pinning+controller+to+broker
> > >
> > > -Abhishek
> > >
> >
>
> "
>
> I think based on the feedback we can limit the discussion to the rolling
> upgrade scenario and how best to address it. I think the only scenario
> 

Re: [DISCUSS] KIP-39 Pinning controller to a broker

2015-10-20 Thread Aditya Auradkar
Hi Abhishek -

Perhaps it would help if you explained the motivation behind your proposal.
I know there was a bunch of discussion on KAFKA-1778, can you summarize?
Currently, I'd agree with Neha and Jay that there isn't really a strong
reason to pin the controller to a given broker or restricted to a set of
brokers.

For rolling upgrades, it should be simpler to bounce the existing
controller last.
As for choosing a relatively lightly loaded broker, I think we should
ideally eliminate those by distributing partitions (and data rate) as
evenly as possible. If for some reason a broker cannot become the
controller, (by virtue of load or something else) arguably that is a
separate problem that needs addressing.

Thanks,
Aditya

On Tue, Oct 20, 2015 at 9:27 PM, Neha Narkhede  wrote:

> >
> > I will update the KIP on how we can optimize the placement of controller
> > (pinning it to a preferred broker id (potentially config enabled) ) if
> that
> > sounds reasonable.
>
>
> The point I (and I think Jay too) was making is that pinning a controller
> to a broker through config is what we should stay away from. This should be
> handled by whatever tool you may be using to bounce the cluster in a
> rolling restart fashion (where it detects the current controller and
> restarts it at the very end).
>
>
> On Tue, Oct 20, 2015 at 5:35 PM, Abhishek Nigam
>  > wrote:
>
> > Hi Jay/Neha,
> > I just subscribed to the mailing list so I read your response but did not
> > receive your email so adding the context into this email thread.
> >
> > "
> >
> > Agree with Jay on staying away from pinning roles to brokers. This is
> > actually harder to operate and monitor.
> >
> > Regarding the problems you mentioned-
> > 1. Reducing the controller moves during rolling bounce is useful but
> really
> > something that should be handled by the tooling. The root cause is that
> > currently the controller move is expensive. I think we'd be better off
> > investing time and effort in thinning out the controller. Just moving to
> > the batch write APIs in ZooKeeper will make a huge difference.
> > 2. I'm not sure I understood the motivation behind moving partitions out
> of
> > the controller broker. That seems like a proposal for a solution, but can
> > you describe the problems you saw that affected controller functionality?
> >
> > Regarding the location of the controller, it seems there are 2 things you
> > are suggesting:
> > 1. Optimizing the strategy of picking a broker as the controller (e.g.
> > least loaded node)
> > 2. Moving the controller if a broker soft fails.
> >
> > I don't think #1 is worth the effort involved. The better way of
> addressing
> > it is to make the controller thinner and faster. #2 is interesting since
> > the problem is that while a broker fails, all state changes fail or are
> > queued up which globally impacts the cluster. There are 2 alternatives -
> > have a tool that allows you to move the controller or just kill the
> broker
> > so the controller moves. I prefer the latter since it is simple and also
> > because a misbehaving broker is better off shutdown anyway.
> >
> > Having said that, it will be helpful to know details of the problems you
> > saw while operating the controller. I think understanding those will help
> > guide the solution better.
> >
> > On Tue, Oct 20, 2015 at 12:49 PM, Jay Kreps  wrote:
> >
> > > This seems like a step backwards--we really don't want people to
> manually
> > > manage the location of the controller and try to manually balance
> > > partitions off that broker.
> > >
> > > I think it might make sense to consider directly fixing the things you
> > > actual want to fix:
> > > 1. Two many controller moves--we could either just make this cheaper or
> > > make the controller location more deterministic e.g. having the
> election
> > > prefer the node with the smallest node id so there were fewer failovers
> > in
> > > rolling bounces.
> > > 2. You seem to think having the controller on a normal node is a
> problem.
> > > Can you elaborate on what the negative consequences you've observed?
> > Let's
> > > focus on fixing those.
> > >
> > > In general we've worked very hard to avoid having a bunch of dedicated
> > > roles for different nodes and I would be very very loath to see us move
> > > away from that philosophy. I have a fair amount of experience with both
> > > homogenous systems that have a single role and also systems with many
> > > differentiated roles and I really think that the differentiated
> approach
> > > causes more problems than it solves for most deployments due to the
> added
> > > complexity.
> > >
> > > I think we could also fix up this KIP a bit. For example it says there
> > are
> > > no public interfaces involved but surely there are new admin commands
> to
> > > control the location? There are also some minor things like listing it
> as
> > > released in 0.8.3 that seem wrong.
> 

[GitHub] kafka pull request: Fix bash scripts to use `/usr/bin/env`.

2015-10-20 Thread aloiscochard
GitHub user aloiscochard opened a pull request:

https://github.com/apache/kafka/pull/335

Fix bash scripts to use `/usr/bin/env`.

Which makes them compatible with NixOS.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aloiscochard/kafka feature/scripts-compat

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/335.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #335


commit 9091b2c9667b6446266206e7210e530060ebc8e5
Author: Alois Cochard 
Date:   2015-10-20T13:06:22Z

Fix bash scripts to use `/usr/bin/env`.

Which make them compatible with NixOS.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-2674) ConsumerRebalanceListener.onPartitionsRevoked() is not called on consumer close

2015-10-20 Thread Michal Turek (JIRA)
Michal Turek created KAFKA-2674:
---

 Summary: ConsumerRebalanceListener.onPartitionsRevoked() is not 
called on consumer close
 Key: KAFKA-2674
 URL: https://issues.apache.org/jira/browse/KAFKA-2674
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.9.0.0
Reporter: Michal Turek
Assignee: Neha Narkhede


Hi, I'm investigating and testing behavior of new consumer from the planned 
release 0.9 and found an inconsistency in calling of rebalance callbacks.

I noticed that ConsumerRebalanceListener.onPartitionsRevoked() is NOT called 
during consumer close and application shutdown. It's JavaDoc contract says:

- "This method will be called before a rebalance operation starts and after the 
consumer stops fetching data."
- "It is recommended that offsets should be committed in this callback to 
either Kafka or a custom offset store to prevent duplicate data."

I believe calling consumer.close() is a start of rebalance operation and even 
the local consumer that is actually closing should be notified to be able to 
process any rebalance logic including offsets commit (e.g. if auto-commit is 
disabled).

There are commented logs of current and expected behaviors.

{noformat}
// Application start
2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
[TestConsumer-worker-0]: Kafka version : 0.9.0.0-SNAPSHOT 
(AppInfoParser.java:82)
2015-10-20 15:14:02.208 INFO  o.a.k.common.utils.AppInfoParser
[TestConsumer-worker-0]: Kafka commitId : 241b9ab58dcbde0c 
(AppInfoParser.java:83)

// Consumer started (the first one in group), rebalance callbacks are called 
including empty onPartitionsRevoked()
2015-10-20 15:14:02.333 INFO  c.a.e.kafka.newapi.TestConsumer 
[TestConsumer-worker-0]: Rebalance callback, revoked: [] (TestConsumer.java:95)
2015-10-20 15:14:02.343 INFO  c.a.e.kafka.newapi.TestConsumer 
[TestConsumer-worker-0]: Rebalance callback, assigned: [testB-1, testA-0, 
testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
(TestConsumer.java:100)

// Another consumer joined the group, rebalancing
2015-10-20 15:14:17.345 INFO  o.a.k.c.c.internals.Coordinator 
[TestConsumer-worker-0]: Attempt to heart beat failed since the group is 
rebalancing, try to re-join group. (Coordinator.java:714)
2015-10-20 15:14:17.346 INFO  c.a.e.kafka.newapi.TestConsumer 
[TestConsumer-worker-0]: Rebalance callback, revoked: [testB-1, testA-0, 
testB-0, testB-3, testA-2, testB-2, testA-1, testA-4, testB-4, testA-3] 
(TestConsumer.java:95)
2015-10-20 15:14:17.349 INFO  c.a.e.kafka.newapi.TestConsumer 
[TestConsumer-worker-0]: Rebalance callback, assigned: [testB-3, testA-4, 
testB-4, testA-3] (TestConsumer.java:100)

// Consumer started closing, there SHOULD be onPartitionsRevoked() callback to 
commit offsets like during standard rebalance, but it is missing
2015-10-20 15:14:39.280 INFO  c.a.e.kafka.newapi.TestConsumer [main]: 
Closing instance (TestConsumer.java:42)
2015-10-20 15:14:40.264 INFO  c.a.e.kafka.newapi.TestConsumer 
[TestConsumer-worker-0]: Worker thread stopped (TestConsumer.java:89)
{noformat}

Workaround is to call onPartitionsRevoked() explicitly and manually just before 
calling consumer.close() but it seems dirty and error prone for me. It can be 
simply forgotten be someone without such experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)