Jenkins build is back to normal : kafka-trunk-jdk7 #1351

2016-06-06 Thread Apache Jenkins Server
See 



[jira] [Commented] (KAFKA-3797) Improve security of __consumer_offsets topic

2016-06-06 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317835#comment-15317835
 ] 

Vahid Hashemian commented on KAFKA-3797:


[~hachikuji] I can take this on if you don't mind.

> Improve security of __consumer_offsets topic
> 
>
> Key: KAFKA-3797
> URL: https://issues.apache.org/jira/browse/KAFKA-3797
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>
> By default, we allow clients to override committed offsets and group metadata 
> using the Produce API as long as they have Write access to the 
> __consumer_offsets topic. From one perspective, this is fine: administrators 
> can restrict access to this topic to trusted users. From another, it seems 
> less than ideal for Write permission on that topic to subsume Group-level 
> permissions for the full cluster. With this access, a user can cause all 
> kinds of mischief including making the group "lose" data by setting offsets 
> ahead of the actual position. This is probably not obvious to administrators 
> who grant access to topics using a wildcard and it increases the risk from 
> incorrectly applying topic patterns (if we ever add support for them). It 
> seems reasonable to consider safer default behavior:
> 1. A simple option to fix this would be to prevent wildcard topic rules from 
> applying to internal topics. To write to an internal topic, you need a 
> separate rule which explicitly grants authorization to that topic.
> 2. A more extreme and perhaps safer option might be to prevent all writes to 
> this topic (and potentially other internal topics) through the Produce API. 
> Do we have any use cases which actually require writing to 
> __consumer_offsets? The only potential case that comes to mind is replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1185: KAFKA-3501: Console consumer process hangs on exit

2016-06-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1185


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3501) Console consumer process hangs on SIGINT

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317832#comment-15317832
 ] 

ASF GitHub Bot commented on KAFKA-3501:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1185


> Console consumer process hangs on SIGINT
> 
>
> Key: KAFKA-3501
> URL: https://issues.apache.org/jira/browse/KAFKA-3501
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, core
>Affects Versions: 0.9.0.0
> Environment: Ubuntu 12.04.5 LTS
> OpenJDK Runtime Environment (IcedTea 2.6.3) (7u91-2.6.3-0ubuntu0.12.04.1)
> OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)
>Reporter: Sébastien Launay
>Assignee: Neha Narkhede
> Fix For: 0.10.1.0
>
> Attachments: jstack.txt
>
>
> Sometimes when running the `kafka-console-consumer` script inside a pipe and 
> trying to stop it with a `SIGINT` (`ctrl+c`), the process will not stop.
> {noformat}
> ubuntu@xxx:~$ kafka-console-consumer --zookeeper localhost --topic topic 
> --from-beginning | grep "pattern"
> record1
> ...
> recordN
> ^CUnable to write to standard out, closing consumer.
>  ^C
> # process is still running
> {noformat}
> When looking at the various threads running on the JVM, I noticed that one 
> user thread is waiting on a latch preventing the JVM from shutting down:
> {noformat}
> ...
> "Thread-6" prio=10 tid=0x7f258c016000 nid=0x289f waiting on condition 
> [0x7f259aee3000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xd6640c80> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
>   at kafka.tools.ConsoleConsumer$$anon$1.run(ConsoleConsumer.scala:101)
> ...
> "main" prio=10 tid=0x7f25c400e000 nid=0x2878 waiting for monitor entry 
> [0x7f25cbefc000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at java.lang.Shutdown.exit(Shutdown.java:212)
>   - waiting to lock <0xd6974308> (a java.lang.Class for 
> java.lang.Shutdown)
>   at java.lang.Runtime.exit(Runtime.java:109)
>   at java.lang.System.exit(System.java:962)
>   at kafka.tools.ConsoleConsumer$.checkErr(ConsoleConsumer.scala:149)
>   at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:136)
>   at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
>   at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47)
>   at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> {noformat}
> I believe the standard output linked to the defunct grep process get closed 
> and triggers a `System.exit(1)` that prevents the latch from getting count 
> down and therefore the main thread to hang on forever:
> {code:scala}
>   def checkErr(formatter: MessageFormatter) {
> if (System.out.checkError()) {
>// This means no one is listening to our output stream any more, time 
> to shutdown
>System.err.println("Unable to write to standard out, closing 
> consumer.")
>formatter.close()
>System.exit(1)
>  }
>}
> {code:java}
> This only happens when `System.out` is checked for errors after consuming a 
> message and before the consumer get closed, definitely a race condition that 
> is most likely to happen when messages are consumed at a high throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3501) Console consumer process hangs on SIGINT

2016-06-06 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3501:
-
   Resolution: Fixed
Fix Version/s: 0.10.1.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1185
[https://github.com/apache/kafka/pull/1185]

> Console consumer process hangs on SIGINT
> 
>
> Key: KAFKA-3501
> URL: https://issues.apache.org/jira/browse/KAFKA-3501
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, core
>Affects Versions: 0.9.0.0
> Environment: Ubuntu 12.04.5 LTS
> OpenJDK Runtime Environment (IcedTea 2.6.3) (7u91-2.6.3-0ubuntu0.12.04.1)
> OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)
>Reporter: Sébastien Launay
>Assignee: Neha Narkhede
> Fix For: 0.10.1.0
>
> Attachments: jstack.txt
>
>
> Sometimes when running the `kafka-console-consumer` script inside a pipe and 
> trying to stop it with a `SIGINT` (`ctrl+c`), the process will not stop.
> {noformat}
> ubuntu@xxx:~$ kafka-console-consumer --zookeeper localhost --topic topic 
> --from-beginning | grep "pattern"
> record1
> ...
> recordN
> ^CUnable to write to standard out, closing consumer.
>  ^C
> # process is still running
> {noformat}
> When looking at the various threads running on the JVM, I noticed that one 
> user thread is waiting on a latch preventing the JVM from shutting down:
> {noformat}
> ...
> "Thread-6" prio=10 tid=0x7f258c016000 nid=0x289f waiting on condition 
> [0x7f259aee3000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xd6640c80> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
>   at kafka.tools.ConsoleConsumer$$anon$1.run(ConsoleConsumer.scala:101)
> ...
> "main" prio=10 tid=0x7f25c400e000 nid=0x2878 waiting for monitor entry 
> [0x7f25cbefc000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at java.lang.Shutdown.exit(Shutdown.java:212)
>   - waiting to lock <0xd6974308> (a java.lang.Class for 
> java.lang.Shutdown)
>   at java.lang.Runtime.exit(Runtime.java:109)
>   at java.lang.System.exit(System.java:962)
>   at kafka.tools.ConsoleConsumer$.checkErr(ConsoleConsumer.scala:149)
>   at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:136)
>   at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
>   at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47)
>   at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> {noformat}
> I believe the standard output linked to the defunct grep process get closed 
> and triggers a `System.exit(1)` that prevents the latch from getting count 
> down and therefore the main thread to hang on forever:
> {code:scala}
>   def checkErr(formatter: MessageFormatter) {
> if (System.out.checkError()) {
>// This means no one is listening to our output stream any more, time 
> to shutdown
>System.err.println("Unable to write to standard out, closing 
> consumer.")
>formatter.close()
>System.exit(1)
>  }
>}
> {code:java}
> This only happens when `System.out` is checked for errors after consuming a 
> message and before the consumer get closed, definitely a race condition that 
> is most likely to happen when messages are consumed at a high throughput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3797) Improve security of __consumer_offsets topic

2016-06-06 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-3797:
--

 Summary: Improve security of __consumer_offsets topic
 Key: KAFKA-3797
 URL: https://issues.apache.org/jira/browse/KAFKA-3797
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson


By default, we allow clients to override committed offsets and group metadata 
using the Produce API as long as they have Write access to the 
__consumer_offsets topic. From one perspective, this is fine: administrators 
can restrict access to this topic to trusted users. From another, it seems less 
than ideal for Write permission on that topic to subsume Group-level 
permissions for the full cluster. With this access, a user can cause all kinds 
of mischief including making the group "lose" data by setting offsets ahead of 
the actual position. This is probably not obvious to administrators who grant 
access to topics using a wildcard and it increases the risk from incorrectly 
applying topic patterns (if we ever add support for them). It seems reasonable 
to consider safer default behavior:

1. A simple option to fix this would be to prevent wildcard topic rules from 
applying to internal topics. To write to an internal topic, you need a separate 
rule which explicitly grants authorization to that topic.
2. A more extreme and perhaps safer option might be to prevent all writes to 
this topic (and potentially other internal topics) through the Produce API. Do 
we have any use cases which actually require writing to __consumer_offsets? The 
only potential case that comes to mind is replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3796) SslTransportLayerTest.testInvalidEndpointIdentification fails on trunk

2016-06-06 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317764#comment-15317764
 ] 

Ismael Juma commented on KAFKA-3796:


Does this happen consistently on your computer?

> SslTransportLayerTest.testInvalidEndpointIdentification fails on trunk
> --
>
> Key: KAFKA-3796
> URL: https://issues.apache.org/jira/browse/KAFKA-3796
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, security
>Affects Versions: 0.10.0.0
>Reporter: Rekha Joshi
>Assignee: Rekha Joshi
>
> org.apache.kafka.common.network.SslTransportLayerTest > 
> testEndpointIdentificationDisabled FAILED
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:433)
> at sun.nio.ch.Net.bind(Net.java:425)
> at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
> at 
> org.apache.kafka.common.network.NioEchoServer.(NioEchoServer.java:48)
> at 
> org.apache.kafka.common.network.SslTransportLayerTest.testEndpointIdentificationDisabled(SslTransportLayerTest.java:120)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3781) Errors.exceptionName() can throw NPE

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317757#comment-15317757
 ] 

ASF GitHub Bot commented on KAFKA-3781:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1476


> Errors.exceptionName() can throw NPE
> 
>
> Key: KAFKA-3781
> URL: https://issues.apache.org/jira/browse/KAFKA-3781
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Grant Henke
>Assignee: Ismael Juma
>  Labels: newbie++
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> When calling Errors.exceptionName() on the NONE error an NPE is thrown.
> {noformat}
> public String exceptionName() {
>return exception.getClass().getName();
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1476: KAFKA-3781; Errors.exceptionName() can throw NPE

2016-06-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1476


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3781) Errors.exceptionName() can throw NPE

2016-06-06 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3781:
-
   Resolution: Fixed
Fix Version/s: 0.10.1.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1476
[https://github.com/apache/kafka/pull/1476]

> Errors.exceptionName() can throw NPE
> 
>
> Key: KAFKA-3781
> URL: https://issues.apache.org/jira/browse/KAFKA-3781
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Grant Henke
>Assignee: Ismael Juma
>  Labels: newbie++
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> When calling Errors.exceptionName() on the NONE error an NPE is thrown.
> {noformat}
> public String exceptionName() {
>return exception.getClass().getName();
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3443) Support regex topics in addSource() and stream()

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317753#comment-15317753
 ] 

ASF GitHub Bot commented on KAFKA-3443:
---

GitHub user bbejeck opened a pull request:

https://github.com/apache/kafka/pull/1477

KAFKA-3443 [Kafka Stream] support for adding sources to KafkaStreams via 
Pattern

This PR is the follow on to the closed PR #1410. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bbejeck/kafka 
KAFKA-3443_streams_support_for_regex_sources

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1477.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1477


commit 3002fbcf071bde4a543abcb154fd0de61b016ba1
Author: bbejeck 
Date:   2016-06-07T03:19:46Z

KAFKA-3443 support for adding sources to KafkaStreams via Pattern




> Support regex topics in addSource() and stream()
> 
>
> Key: KAFKA-3443
> URL: https://issues.apache.org/jira/browse/KAFKA-3443
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: api
> Fix For: 0.10.1.0
>
>
> Currently Kafka Streams only support specific topics in creating source 
> streams, while we can leverage consumer's regex subscription to allow regex 
> topics as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1477: KAFKA-3443 [Kafka Stream] support for adding sourc...

2016-06-06 Thread bbejeck
GitHub user bbejeck opened a pull request:

https://github.com/apache/kafka/pull/1477

KAFKA-3443 [Kafka Stream] support for adding sources to KafkaStreams via 
Pattern

This PR is the follow on to the closed PR #1410. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bbejeck/kafka 
KAFKA-3443_streams_support_for_regex_sources

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1477.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1477


commit 3002fbcf071bde4a543abcb154fd0de61b016ba1
Author: bbejeck 
Date:   2016-06-07T03:19:46Z

KAFKA-3443 support for adding sources to KafkaStreams via Pattern




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-54 Sticky Partition Assignment Strategy

2016-06-06 Thread Jason Gustafson
Hi Vahid,

The only thing I added was the specification of the UserData field. The
rest comes from here:
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol.
See the section on the JoinGroup request.

Generally speaking, I think having fewer assignment strategies included
with Kafka is probably better. One of the advantages of the client-side
assignment approach is that there's no actual need to bundle them into the
release. Applications can use them by depending on a separate library. That
said, sticky assignment seems like a generally good idea and a common need,
so it may be helpful for a lot of users to make it easily available in the
release. If it also addresses the issues raised in KIP-49, then so much the
better.

As for whether we should include both, there I'm not too sure. Most users
probably wouldn't have a strong reason to choose the "fair" assignment over
the "sticky" assignment since they both seem to have the same properties in
terms of balancing the group's partitions. The overhead is a concern for
large groups with many topic subscriptions though, so if people think that
the "fair" approach brings a lot of benefit over round-robin, then it may
be worth including also.

-Jason

On Mon, Jun 6, 2016 at 5:17 PM, Vahid S Hashemian  wrote:

> Hi Jason,
>
> Thanks for reviewing the KIP.
> I will add the details you requested, but to summarize:
>
> Regarding the structure of the user data:
>
> Right now the user data will have the current assignments only which is a
> mapping of consumers to their assigned topic partitions. Is this mapping
> what you're also suggesting with CurrentAssignment field?
> I see how adding a version (as sticky assignor version) will be useful.
> Also how having a protocol name would be useful, perhaps for validation.
> But could you clarify the "Subscription" field and how you think it'll
> come into play?
>
>
> Regarding the algorithm:
>
> There could be similarities between how this KIP is implemented and how
> KIP-49 is handling the fairness. But since we had to take stickiness into
> consideration we started fresh and did not adopt from KIP-49.
> The Sticky assignor implementation is comprehensive and guarantees the
> fairest possible assignment with highest stickiness. I even have a unit
> test that randomly generates an assignment problem and verifies that a
> fair and sticky assignment is calculated.
> KIP-54 gives priority to fairness over stickiness (which makes the
> implementation more complex). We could have another strategy that gives
> priority to stickiness over fairness (which supposedly will have a better
> performance).
> The main distinction between KIP-54 and KIP-49 is that KIP-49 calculates
> the assignment without considering the previous assignments (fairness
> only); whereas for KIP-54 previous assignments play a big role (fairness
> and stickiness).
> I believe if there is a situation where the stickiness requirements do not
> exist it would make sense to use a fair-only assignment without the
> overhead of sticky assignment, as you mentioned.
> So, I could see three different strategies that could enrich assignment
> policy options.
> It would be great to have some feedback from the community about what is
> the best way to move forward with these two KIPs.
>
> In the meantime, I'll add some more details in the KIP about the approach
> for calculating assignments.
>
> Thanks again.
>
> Regards,
> --Vahid
>
>
>
>
> From:   Jason Gustafson 
> To: dev@kafka.apache.org
> Date:   06/06/2016 01:26 PM
> Subject:Re: [DISCUSS] KIP-54 Sticky Partition Assignment Strategy
>
>
>
> Hi Vahid,
>
> Can you add some detail to the KIP on the structure of the user data? I'm
> guessing it would be something like this:
>
> ProtocolName => "sticky"
>
> ProtocolMetadata => Version Subscription UserData
>   Version => int16
>   Subscription => [Topic]
> Topic => string
>   UserData => CurrentAssignment
> CurrentAssignment => [Topic [Partition]]
>   Topic => string
>   Partiton => int32
>
> It would also be helpful to include a little more detail on the algorithm.
> From what I can tell, it looks like you're adopting some of the strategies
> from KIP-49 to handle differing subscriptions better. If so, then I wonder
> if it makes sense to combine the two KIPs? Or do you think there would be
> an advantage to having the "fair" assignment strategy without the overhead
> of the sticky assignor?
>
> Thanks,
> Jason
>
>
>
> On Fri, Jun 3, 2016 at 11:33 AM, Guozhang Wang  wrote:
>
> > Sorry for being late on this thread.
> >
> > The assign() function is auto-triggered during the rebalance by one of
> the
> > consumers when it receives all subscription information collected from
> the
> > server-side coordinator.
> >
> > More details can be found here:
> >
> >
>
> 

[jira] [Updated] (KAFKA-3248) AdminClient Blocks Forever in send Method

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3248:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

This was fixed via KAFKA-3488.

> AdminClient Blocks Forever in send Method
> -
>
> Key: KAFKA-3248
> URL: https://issues.apache.org/jira/browse/KAFKA-3248
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.9.0.0
>Reporter: John Tylwalk
>Assignee: Warren Green
>Priority: Critical
> Fix For: 0.10.0.0
>
>
> AdminClient will block forever when performing operations involving the 
> {{send()}} method, due to usage of 
> {{ConsumerNetworkClient.poll(RequestFuture)}} - which blocks indefinitely.
> Suggested fix is to use {{ConsumerNetworkClient.poll(RequestFuture, long 
> timeout)}} in {{AdminClient.send()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3248) AdminClient Blocks Forever in send Method

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3248:
---
Fix Version/s: (was: 0.10.0.1)
   0.10.0.0

> AdminClient Blocks Forever in send Method
> -
>
> Key: KAFKA-3248
> URL: https://issues.apache.org/jira/browse/KAFKA-3248
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.9.0.0
>Reporter: John Tylwalk
>Assignee: Warren Green
>Priority: Critical
> Fix For: 0.10.0.0
>
>
> AdminClient will block forever when performing operations involving the 
> {{send()}} method, due to usage of 
> {{ConsumerNetworkClient.poll(RequestFuture)}} - which blocks indefinitely.
> Suggested fix is to use {{ConsumerNetworkClient.poll(RequestFuture, long 
> timeout)}} in {{AdminClient.send()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2720) Periodic purging groups in the coordinator

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2720:
---
Reviewer: Guozhang Wang

> Periodic purging groups in the coordinator
> --
>
> Key: KAFKA-2720
> URL: https://issues.apache.org/jira/browse/KAFKA-2720
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Jason Gustafson
> Fix For: 0.10.1.0
>
>
> Currently the coordinator removes the group (i.e. both removing it from the 
> cache and writing the tombstone message on its local replica without waiting 
> for ack) once it becomes an empty group.
> This can lead to a few issues such as 1) group removal and creation churns 
> when a group with very few members are being rebalanced, 2) if the local 
> write is failed / not propagated to other followers, they can only be removed 
> again when a new coordinator is migrated and detects the group has no members 
> already.
> We could instead piggy-back the periodic offset expiration along with the 
> group purging as well which removes any groups that had no corresponding 
> offsets in the cache any more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2720) Periodic purging groups in the coordinator

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2720:
---
Status: Patch Available  (was: Open)

> Periodic purging groups in the coordinator
> --
>
> Key: KAFKA-2720
> URL: https://issues.apache.org/jira/browse/KAFKA-2720
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Jason Gustafson
> Fix For: 0.10.1.0
>
>
> Currently the coordinator removes the group (i.e. both removing it from the 
> cache and writing the tombstone message on its local replica without waiting 
> for ack) once it becomes an empty group.
> This can lead to a few issues such as 1) group removal and creation churns 
> when a group with very few members are being rebalanced, 2) if the local 
> write is failed / not propagated to other followers, they can only be removed 
> again when a new coordinator is migrated and detects the group has no members 
> already.
> We could instead piggy-back the periodic offset expiration along with the 
> group purging as well which removes any groups that had no corresponding 
> offsets in the cache any more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1211) Hold the produce request with ack > 1 in purgatory until replicas' HW has larger than the produce offset

2016-06-06 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317726#comment-15317726
 ] 

Ismael Juma commented on KAFKA-1211:


[~guozhang], is this still an issue?

> Hold the produce request with ack > 1 in purgatory until replicas' HW has 
> larger than the produce offset
> 
>
> Key: KAFKA-1211
> URL: https://issues.apache.org/jira/browse/KAFKA-1211
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 0.10.1.0
>
>
> Today during leader failover we will have a weakness period when the 
> followers truncate their data before fetching from the new leader, i.e., 
> number of in-sync replicas is just 1. If during this time the leader has also 
> failed then produce requests with ack >1 that have get responded will still 
> be lost. To avoid this scenario we would prefer to hold the produce request 
> in purgatory until replica's HW has larger than the offset instead of just 
> their end-of-log offsets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3443) Support regex topics in addSource() and stream()

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317721#comment-15317721
 ] 

ASF GitHub Bot commented on KAFKA-3443:
---

Github user bbejeck closed the pull request at:

https://github.com/apache/kafka/pull/1410


> Support regex topics in addSource() and stream()
> 
>
> Key: KAFKA-3443
> URL: https://issues.apache.org/jira/browse/KAFKA-3443
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: api
> Fix For: 0.10.1.0
>
>
> Currently Kafka Streams only support specific topics in creating source 
> streams, while we can leverage consumer's regex subscription to allow regex 
> topics as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1410: KAFKA-3443 [Kafka Streams] added support for subsc...

2016-06-06 Thread bbejeck
Github user bbejeck closed the pull request at:

https://github.com/apache/kafka/pull/1410


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : kafka-0.10.0-jdk7 #121

2016-06-06 Thread Apache Jenkins Server
See 



[jira] [Updated] (KAFKA-570) Kafka should not need snappy jar at runtime

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-570:
--
Fix Version/s: (was: 0.10.1.0)
   0.9.0.0

> Kafka should not need snappy jar at runtime
> ---
>
> Key: KAFKA-570
> URL: https://issues.apache.org/jira/browse/KAFKA-570
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Swapnil Ghike
>  Labels: bugs
> Fix For: 0.9.0.0
>
>
> CompressionFactory imports snappy jar in a pattern match. The purpose of 
> importing it this way seems to be avoiding the import unless snappy 
> compression is actually required. However, kafka throws a 
> ClassNotFoundException if snappy jar is removed at runtime from lib_managed. 
> This exception can be easily seen by producing some data with the console 
> producer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-570) Kafka should not need snappy jar at runtime

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-570.
---
Resolution: Fixed

This is fixed in the new clients, closing.

> Kafka should not need snappy jar at runtime
> ---
>
> Key: KAFKA-570
> URL: https://issues.apache.org/jira/browse/KAFKA-570
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Swapnil Ghike
>  Labels: bugs
> Fix For: 0.9.0.0
>
>
> CompressionFactory imports snappy jar in a pattern match. The purpose of 
> importing it this way seems to be avoiding the import unless snappy 
> compression is actually required. However, kafka throws a 
> ClassNotFoundException if snappy jar is removed at runtime from lib_managed. 
> This exception can be easily seen by producing some data with the console 
> producer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-trunk-jdk8 #685

2016-06-06 Thread Apache Jenkins Server
See 



Build failed in Jenkins: kafka-trunk-jdk7 #1350

2016-06-06 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-724; Allow automatic socket.send.buffer from operating system in

[ismael] KAFKA-3783; Catch proper exception on path delete

--
[...truncated 3354 lines...]
kafka.tools.ConsoleProducerTest > testValidConfigsOldProducer PASSED

kafka.tools.ConsoleProducerTest > testInvalidConfigs STARTED

kafka.tools.ConsoleProducerTest > testInvalidConfigs PASSED

kafka.tools.ConsoleProducerTest > testValidConfigsNewProducer STARTED

kafka.tools.ConsoleProducerTest > testValidConfigsNewProducer PASSED

kafka.tools.MirrorMakerTest > testDefaultMirrorMakerMessageHandler STARTED

kafka.tools.MirrorMakerTest > testDefaultMirrorMakerMessageHandler PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytesWithCompression 
STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytesWithCompression 
PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > 
testIteratorIsConsistentWithCompression STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > 
testIteratorIsConsistentWithCompression PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testIteratorIsConsistent 
STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > testIteratorIsConsistent PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testEqualsWithCompression 
STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > testEqualsWithCompression 
PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testWrittenEqualsRead STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > testWrittenEqualsRead PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testEquals STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > testEquals PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytes STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytes PASSED

kafka.javaapi.consumer.ZookeeperConsumerConnectorTest > testBasic STARTED

kafka.javaapi.consumer.ZookeeperConsumerConnectorTest > testBasic PASSED

kafka.common.ConfigTest > testInvalidGroupIds STARTED

kafka.common.ConfigTest > testInvalidGroupIds PASSED

kafka.common.ConfigTest > testInvalidClientIds STARTED

kafka.common.ConfigTest > testInvalidClientIds PASSED

kafka.common.ZkNodeChangeNotificationListenerTest > testProcessNotification 
STARTED

kafka.common.ZkNodeChangeNotificationListenerTest > testProcessNotification 
PASSED

kafka.common.TopicTest > testInvalidTopicNames STARTED

kafka.common.TopicTest > testInvalidTopicNames PASSED

kafka.common.TopicTest > testTopicHasCollision STARTED

kafka.common.TopicTest > testTopicHasCollision PASSED

kafka.common.TopicTest > testTopicHasCollisionChars STARTED

kafka.common.TopicTest > testTopicHasCollisionChars PASSED

kafka.security.auth.ResourceTypeTest > testFromString STARTED

kafka.security.auth.ResourceTypeTest > testFromString PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAllowAllAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAllowAllAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testLocalConcurrentModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testLocalConcurrentModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyDeletionOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyDeletionOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFound STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFound PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache 

[jira] [Updated] (KAFKA-3781) Errors.exceptionName() can throw NPE

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3781:
---
Reviewer: Ewen Cheslack-Postava
  Status: Patch Available  (was: Open)

> Errors.exceptionName() can throw NPE
> 
>
> Key: KAFKA-3781
> URL: https://issues.apache.org/jira/browse/KAFKA-3781
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Grant Henke
>Assignee: Ismael Juma
>  Labels: newbie++
> Fix For: 0.10.0.1
>
>
> When calling Errors.exceptionName() on the NONE error an NPE is thrown.
> {noformat}
> public String exceptionName() {
>return exception.getClass().getName();
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3781) Errors.exceptionName() can throw NPE

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317697#comment-15317697
 ] 

ASF GitHub Bot commented on KAFKA-3781:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/1476

KAFKA-3781; Errors.exceptionName() can throw NPE



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka kafka-3781-exception-name-npe

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1476.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1476


commit d2b09a1309037ab569c8c8b0720b0e8671d4f3a1
Author: Ismael Juma 
Date:   2016-06-07T02:33:45Z

KAFKA-3781; Errors.exceptionName() can throw NPE




> Errors.exceptionName() can throw NPE
> 
>
> Key: KAFKA-3781
> URL: https://issues.apache.org/jira/browse/KAFKA-3781
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Grant Henke
>Assignee: Ismael Juma
>  Labels: newbie++
> Fix For: 0.10.0.1
>
>
> When calling Errors.exceptionName() on the NONE error an NPE is thrown.
> {noformat}
> public String exceptionName() {
>return exception.getClass().getName();
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1476: KAFKA-3781; Errors.exceptionName() can throw NPE

2016-06-06 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/1476

KAFKA-3781; Errors.exceptionName() can throw NPE



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka kafka-3781-exception-name-npe

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1476.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1476


commit d2b09a1309037ab569c8c8b0720b0e8671d4f3a1
Author: Ismael Juma 
Date:   2016-06-07T02:33:45Z

KAFKA-3781; Errors.exceptionName() can throw NPE




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (KAFKA-3781) Errors.exceptionName() can throw NPE

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-3781:
--

Assignee: Ismael Juma

> Errors.exceptionName() can throw NPE
> 
>
> Key: KAFKA-3781
> URL: https://issues.apache.org/jira/browse/KAFKA-3781
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Grant Henke
>Assignee: Ismael Juma
>  Labels: newbie++
> Fix For: 0.10.0.1
>
>
> When calling Errors.exceptionName() on the NONE error an NPE is thrown.
> {noformat}
> public String exceptionName() {
>return exception.getClass().getName();
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1981) Make log compaction point configurable

2016-06-06 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317588#comment-15317588
 ] 

Eric Wasserman edited comment on KAFKA-1981 at 6/7/16 12:56 AM:


During the KIP-58 vote it was 
[suggested](http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3ccabtagwebxsrveok-unuptjtsdf+d+pq8fuaahql+u9bgaz3...@mail.gmail.com%3e)
 the name of the sole remaining property be changed from:

log.cleaner.min.compaction.lag.ms

to

log.cleaner.compaction.delay.ms

The feature makes a guarantee that the elapsed time between adding a message 
and its being subject to compaction is _at minimum_ _*x*_ number of 
milliseconds. This setting is specifying _*x*_.

In particular this guarantee does not really affect _when_ a compaction will or 
will not happen. It only controls which messages will be protected from 
compaction in the event one occurs.

New Oxford American Dictionary defines:

*Lag* n. (also time lag) a period of time between one event or phenomenon and 
another: there was a time lag between the commission of the crime and its 
reporting to the police.

*Delay* n. a period of time by which something is late or postponed: a two-hour 
delay | long delays in obtaining passports.

Seems to me "lag" is closer than "delay" to the meaning we are after.

When considering alternative phrasing we may want to consider that the other 
parameters (cumulative message size, or message count) may later be added back 
into this feature.




was (Author: ewasserman):
During the KIP-58 vote it was 
[suggested](http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3ccabtagwebxsrveok-unuptjtsdf+d+pq8fuaahql+u9bgaz3...@mail.gmail.com%3e)
 the name of the sole remaining property be changed from:

log.cleaner.min.compaction.lag.ms

to

log.cleaner.compaction.delay.ms

The feature makes a guarantee that the elapsed time between adding a message 
and its being subject to compaction is _at minimum_ _*x*_ number of 
milliseconds. This setting is specifying _*x*_.

In particular this guarantee does not really affect *when* a compaction will or 
will not happen. It only controls which messages will be protected from 
compaction in the event one occurs.

New Oxford American Dictionary defines:

**Lag** n. (also time lag) a period of time between one event or phenomenon and 
another: there was a time lag between the commission of the crime and its 
reporting to the police.

**Delay** n. a period of time by which something is late or postponed: a 
two-hour delay | long delays in obtaining passports.

Seems to me "lag" is closer than "delay" to the meaning we are after.

When considering alternative phrasing we may want to consider that the other 
parameters (cumulative message size, or message count) may later be added back 
into this feature.



> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1981) Make log compaction point configurable

2016-06-06 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317588#comment-15317588
 ] 

Eric Wasserman edited comment on KAFKA-1981 at 6/7/16 12:56 AM:


During the KIP-58 vote it was 
[suggested](http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3ccabtagwebxsrveok-unuptjtsdf+d+pq8fuaahql+u9bgaz3...@mail.gmail.com%3e)
 the name of the sole remaining property be changed from:

log.cleaner.min.compaction.lag.ms

to

log.cleaner.compaction.delay.ms

The feature makes a guarantee that the elapsed time between adding a message 
and its being subject to compaction is _at minimum_ _*x*_ number of 
milliseconds. This setting is specifying _*x*_.

In particular this guarantee does not really affect *when* a compaction will or 
will not happen. It only controls which messages will be protected from 
compaction in the event one occurs.

New Oxford American Dictionary defines:

**Lag** n. (also time lag) a period of time between one event or phenomenon and 
another: there was a time lag between the commission of the crime and its 
reporting to the police.

**Delay** n. a period of time by which something is late or postponed: a 
two-hour delay | long delays in obtaining passports.

Seems to me "lag" is closer than "delay" to the meaning we are after.

When considering alternative phrasing we may want to consider that the other 
parameters (cumulative message size, or message count) may later be added back 
into this feature.




was (Author: ewasserman):
During the KIP-58 vote it was 
[suggested](http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3ccabtagwebxsrveok-unuptjtsdf+d+pq8fuaahql+u9bgaz3...@mail.gmail.com%3e)
 the name of the sole remaining property be changed from:

log.cleaner.min.compaction.lag.ms

to

log.cleaner.compaction.delay.ms

The feature makes a guarantee that the elapsed time between adding a message 
and its being subject to compaction is _at minimum_ _**x**_ number of 
milliseconds. This setting is specifying _**x**_.

In particular this guarantee does not really affect *when* a compaction will or 
will not happen. It only controls which messages will be protected from 
compaction in the event one occurs.

New Oxford American Dictionary defines:

**Lag** n. (also time lag) a period of time between one event or phenomenon and 
another: there was a time lag between the commission of the crime and its 
reporting to the police.

**Delay** n. a period of time by which something is late or postponed: a 
two-hour delay | long delays in obtaining passports.

Seems to me "lag" is closer than "delay" to the meaning we are after.

When considering alternative phrasing we may want to consider that the other 
parameters (cumulative message size, or message count) may later be added back 
into this feature.



> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-06-06 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317588#comment-15317588
 ] 

Eric Wasserman commented on KAFKA-1981:
---

During the KIP-58 vote it was 
[suggested](http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3ccabtagwebxsrveok-unuptjtsdf+d+pq8fuaahql+u9bgaz3...@mail.gmail.com%3e)
 the name of the sole remaining property be changed from:

log.cleaner.min.compaction.lag.ms

to

log.cleaner.compaction.delay.ms

The feature makes a guarantee that the elapsed time between adding a message 
and its being subject to compaction is _at minimum_ _**x**_ number of 
milliseconds. This setting is specifying _**x**_.

In particular this guarantee does not really affect *when* a compaction will or 
will not happen. It only controls which messages will be protected from 
compaction in the event one occurs.

New Oxford American Dictionary defines:

**Lag** n. (also time lag) a period of time between one event or phenomenon and 
another: there was a time lag between the commission of the crime and its 
reporting to the police.

**Delay** n. a period of time by which something is late or postponed: a 
two-hour delay | long delays in obtaining passports.

Seems to me "lag" is closer than "delay" to the meaning we are after.

When considering alternative phrasing we may want to consider that the other 
parameters (cumulative message size, or message count) may later be added back 
into this feature.



> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3783) Race condition on last ACL removal for a resource fails with a ZkBadVersionException

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-3783.

Resolution: Fixed

> Race condition on last ACL removal for a resource fails with a 
> ZkBadVersionException
> 
>
> Key: KAFKA-3783
> URL: https://issues.apache.org/jira/browse/KAFKA-3783
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Sébastien Launay
>Assignee: Sébastien Launay
>Priority: Minor
> Fix For: 0.10.0.1
>
>
> When removing the last ACL for a given resource, the znode storing the ACLs 
> will get removed.
> The version number of the znode is used for optimistic locking in a loop to 
> provide atomic changes across brokers.
> Unfortunately the exception thrown when the operation fails because of a 
> different version number is the wrong one 
> ({{KeeperException.BadVersionException}} instead of ZkClient 
> {{ZkBadVersionException}})  and does not get caught resulting in the 
> following stack trace:
> {noformat}
> org.I0Itec.zkclient.exception.ZkBadVersionException: 
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:51)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000)
>   at org.I0Itec.zkclient.ZkClient.delete(ZkClient.java:1047)
>   at kafka.utils.ZkUtils.conditionalDeletePath(ZkUtils.scala:522)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.kafka$security$auth$SimpleAclAuthorizer$$updateResourceAcls(SimpleAclAuthorizer.scala:282)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply$mcZ$sp(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
>   at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.removeAcls(SimpleAclAuthorizer.scala:186)
>   ...
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
> KeeperErrorCode = BadVersion for 
> /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
>   at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:109)
>   at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:1051)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990)
>   ... 18 more
> {noformat}
> I noticed this behaviour while working on another fix when running the 
> {{SimpleAclAuthorizerTest}} unit tests but this can happens when running 
> simultaneously the {{kafka-acls.sh}} command on different brokers in rare 
> cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KAFKA-3783) Race condition on last ACL removal for a resource fails with a ZkBadVersionException

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reopened KAFKA-3783:


> Race condition on last ACL removal for a resource fails with a 
> ZkBadVersionException
> 
>
> Key: KAFKA-3783
> URL: https://issues.apache.org/jira/browse/KAFKA-3783
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Sébastien Launay
>Assignee: Sébastien Launay
>Priority: Minor
> Fix For: 0.10.0.1
>
>
> When removing the last ACL for a given resource, the znode storing the ACLs 
> will get removed.
> The version number of the znode is used for optimistic locking in a loop to 
> provide atomic changes across brokers.
> Unfortunately the exception thrown when the operation fails because of a 
> different version number is the wrong one 
> ({{KeeperException.BadVersionException}} instead of ZkClient 
> {{ZkBadVersionException}})  and does not get caught resulting in the 
> following stack trace:
> {noformat}
> org.I0Itec.zkclient.exception.ZkBadVersionException: 
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:51)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000)
>   at org.I0Itec.zkclient.ZkClient.delete(ZkClient.java:1047)
>   at kafka.utils.ZkUtils.conditionalDeletePath(ZkUtils.scala:522)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.kafka$security$auth$SimpleAclAuthorizer$$updateResourceAcls(SimpleAclAuthorizer.scala:282)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply$mcZ$sp(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
>   at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.removeAcls(SimpleAclAuthorizer.scala:186)
>   ...
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
> KeeperErrorCode = BadVersion for 
> /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
>   at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:109)
>   at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:1051)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990)
>   ... 18 more
> {noformat}
> I noticed this behaviour while working on another fix when running the 
> {{SimpleAclAuthorizerTest}} unit tests but this can happens when running 
> simultaneously the {{kafka-acls.sh}} command on different brokers in rare 
> cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3783) Race condition on last ACL removal for a resource fails with a ZkBadVersionException

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3783:
---
Status: Resolved  (was: Closed)

> Race condition on last ACL removal for a resource fails with a 
> ZkBadVersionException
> 
>
> Key: KAFKA-3783
> URL: https://issues.apache.org/jira/browse/KAFKA-3783
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Sébastien Launay
>Assignee: Sébastien Launay
>Priority: Minor
> Fix For: 0.10.0.1
>
>
> When removing the last ACL for a given resource, the znode storing the ACLs 
> will get removed.
> The version number of the znode is used for optimistic locking in a loop to 
> provide atomic changes across brokers.
> Unfortunately the exception thrown when the operation fails because of a 
> different version number is the wrong one 
> ({{KeeperException.BadVersionException}} instead of ZkClient 
> {{ZkBadVersionException}})  and does not get caught resulting in the 
> following stack trace:
> {noformat}
> org.I0Itec.zkclient.exception.ZkBadVersionException: 
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:51)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000)
>   at org.I0Itec.zkclient.ZkClient.delete(ZkClient.java:1047)
>   at kafka.utils.ZkUtils.conditionalDeletePath(ZkUtils.scala:522)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.kafka$security$auth$SimpleAclAuthorizer$$updateResourceAcls(SimpleAclAuthorizer.scala:282)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply$mcZ$sp(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
>   at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.removeAcls(SimpleAclAuthorizer.scala:186)
>   ...
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
> KeeperErrorCode = BadVersion for 
> /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
>   at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:109)
>   at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:1051)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990)
>   ... 18 more
> {noformat}
> I noticed this behaviour while working on another fix when running the 
> {{SimpleAclAuthorizerTest}} unit tests but this can happens when running 
> simultaneously the {{kafka-acls.sh}} command on different brokers in rare 
> cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KAFKA-3783) Race condition on last ACL removal for a resource fails with a ZkBadVersionException

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma closed KAFKA-3783.
--

> Race condition on last ACL removal for a resource fails with a 
> ZkBadVersionException
> 
>
> Key: KAFKA-3783
> URL: https://issues.apache.org/jira/browse/KAFKA-3783
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Sébastien Launay
>Assignee: Sébastien Launay
>Priority: Minor
> Fix For: 0.10.0.1
>
>
> When removing the last ACL for a given resource, the znode storing the ACLs 
> will get removed.
> The version number of the znode is used for optimistic locking in a loop to 
> provide atomic changes across brokers.
> Unfortunately the exception thrown when the operation fails because of a 
> different version number is the wrong one 
> ({{KeeperException.BadVersionException}} instead of ZkClient 
> {{ZkBadVersionException}})  and does not get caught resulting in the 
> following stack trace:
> {noformat}
> org.I0Itec.zkclient.exception.ZkBadVersionException: 
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:51)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000)
>   at org.I0Itec.zkclient.ZkClient.delete(ZkClient.java:1047)
>   at kafka.utils.ZkUtils.conditionalDeletePath(ZkUtils.scala:522)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.kafka$security$auth$SimpleAclAuthorizer$$updateResourceAcls(SimpleAclAuthorizer.scala:282)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply$mcZ$sp(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
>   at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.removeAcls(SimpleAclAuthorizer.scala:186)
>   ...
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
> KeeperErrorCode = BadVersion for 
> /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
>   at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:109)
>   at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:1051)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990)
>   ... 18 more
> {noformat}
> I noticed this behaviour while working on another fix when running the 
> {{SimpleAclAuthorizerTest}} unit tests but this can happens when running 
> simultaneously the {{kafka-acls.sh}} command on different brokers in rare 
> cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3783) Race condition on last ACL removal for a resource fails with a ZkBadVersionException

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3783:
---
Assignee: Sébastien Launay

> Race condition on last ACL removal for a resource fails with a 
> ZkBadVersionException
> 
>
> Key: KAFKA-3783
> URL: https://issues.apache.org/jira/browse/KAFKA-3783
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Sébastien Launay
>Assignee: Sébastien Launay
>Priority: Minor
> Fix For: 0.10.0.1
>
>
> When removing the last ACL for a given resource, the znode storing the ACLs 
> will get removed.
> The version number of the znode is used for optimistic locking in a loop to 
> provide atomic changes across brokers.
> Unfortunately the exception thrown when the operation fails because of a 
> different version number is the wrong one 
> ({{KeeperException.BadVersionException}} instead of ZkClient 
> {{ZkBadVersionException}})  and does not get caught resulting in the 
> following stack trace:
> {noformat}
> org.I0Itec.zkclient.exception.ZkBadVersionException: 
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:51)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000)
>   at org.I0Itec.zkclient.ZkClient.delete(ZkClient.java:1047)
>   at kafka.utils.ZkUtils.conditionalDeletePath(ZkUtils.scala:522)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.kafka$security$auth$SimpleAclAuthorizer$$updateResourceAcls(SimpleAclAuthorizer.scala:282)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply$mcZ$sp(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
>   at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.removeAcls(SimpleAclAuthorizer.scala:186)
>   ...
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
> KeeperErrorCode = BadVersion for 
> /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
>   at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:109)
>   at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:1051)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990)
>   ... 18 more
> {noformat}
> I noticed this behaviour while working on another fix when running the 
> {{SimpleAclAuthorizerTest}} unit tests but this can happens when running 
> simultaneously the {{kafka-acls.sh}} command on different brokers in rare 
> cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3783) Race condition on last ACL removal for a resource fails with a ZkBadVersionException

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317562#comment-15317562
 ] 

ASF GitHub Bot commented on KAFKA-3783:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1461


> Race condition on last ACL removal for a resource fails with a 
> ZkBadVersionException
> 
>
> Key: KAFKA-3783
> URL: https://issues.apache.org/jira/browse/KAFKA-3783
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Sébastien Launay
>Priority: Minor
> Fix For: 0.10.0.1
>
>
> When removing the last ACL for a given resource, the znode storing the ACLs 
> will get removed.
> The version number of the znode is used for optimistic locking in a loop to 
> provide atomic changes across brokers.
> Unfortunately the exception thrown when the operation fails because of a 
> different version number is the wrong one 
> ({{KeeperException.BadVersionException}} instead of ZkClient 
> {{ZkBadVersionException}})  and does not get caught resulting in the 
> following stack trace:
> {noformat}
> org.I0Itec.zkclient.exception.ZkBadVersionException: 
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:51)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000)
>   at org.I0Itec.zkclient.ZkClient.delete(ZkClient.java:1047)
>   at kafka.utils.ZkUtils.conditionalDeletePath(ZkUtils.scala:522)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.kafka$security$auth$SimpleAclAuthorizer$$updateResourceAcls(SimpleAclAuthorizer.scala:282)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply$mcZ$sp(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
>   at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.removeAcls(SimpleAclAuthorizer.scala:186)
>   ...
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
> KeeperErrorCode = BadVersion for 
> /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
>   at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:109)
>   at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:1051)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990)
>   ... 18 more
> {noformat}
> I noticed this behaviour while working on another fix when running the 
> {{SimpleAclAuthorizerTest}} unit tests but this can happens when running 
> simultaneously the {{kafka-acls.sh}} command on different brokers in rare 
> cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3783) Race condition on last ACL removal for a resource fails with a ZkBadVersionException

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-3783.

   Resolution: Fixed
Fix Version/s: 0.10.0.1

Issue resolved by pull request 1461
[https://github.com/apache/kafka/pull/1461]

> Race condition on last ACL removal for a resource fails with a 
> ZkBadVersionException
> 
>
> Key: KAFKA-3783
> URL: https://issues.apache.org/jira/browse/KAFKA-3783
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Sébastien Launay
>Priority: Minor
> Fix For: 0.10.0.1
>
>
> When removing the last ACL for a given resource, the znode storing the ACLs 
> will get removed.
> The version number of the znode is used for optimistic locking in a loop to 
> provide atomic changes across brokers.
> Unfortunately the exception thrown when the operation fails because of a 
> different version number is the wrong one 
> ({{KeeperException.BadVersionException}} instead of ZkClient 
> {{ZkBadVersionException}})  and does not get caught resulting in the 
> following stack trace:
> {noformat}
> org.I0Itec.zkclient.exception.ZkBadVersionException: 
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = 
> BadVersion for /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:51)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1000)
>   at org.I0Itec.zkclient.ZkClient.delete(ZkClient.java:1047)
>   at kafka.utils.ZkUtils.conditionalDeletePath(ZkUtils.scala:522)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.kafka$security$auth$SimpleAclAuthorizer$$updateResourceAcls(SimpleAclAuthorizer.scala:282)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply$mcZ$sp(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at 
> kafka.security.auth.SimpleAclAuthorizer$$anonfun$removeAcls$1.apply(SimpleAclAuthorizer.scala:187)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
>   at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
>   at 
> kafka.security.auth.SimpleAclAuthorizer.removeAcls(SimpleAclAuthorizer.scala:186)
>   ...
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
> KeeperErrorCode = BadVersion for 
> /kafka-acl/Topic/e6df8028-f268-408c-814e-d418e943b2fa
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
>   at org.I0Itec.zkclient.ZkConnection.delete(ZkConnection.java:109)
>   at org.I0Itec.zkclient.ZkClient$11.call(ZkClient.java:1051)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:990)
>   ... 18 more
> {noformat}
> I noticed this behaviour while working on another fix when running the 
> {{SimpleAclAuthorizerTest}} unit tests but this can happens when running 
> simultaneously the {{kafka-acls.sh}} command on different brokers in rare 
> cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1461: KAFKA-3783: Catch proper exception on path delete

2016-06-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1461


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3396) Unauthorized topics are returned to the user

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3396:
---
Status: Patch Available  (was: Open)

> Unauthorized topics are returned to the user
> 
>
> Key: KAFKA-3396
> URL: https://issues.apache.org/jira/browse/KAFKA-3396
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.10.0.0, 0.9.0.0
>Reporter: Grant Henke
>Assignee: Edoardo Comar
> Fix For: 0.10.0.1
>
>
> Kafka's clients and protocol exposes unauthorized topics to the end user. 
> This is often considered a security hole. To some, the topic name is 
> considered sensitive information. Those that do not consider the name 
> sensitive, still consider it more information that allows a user to try and 
> circumvent security.  Instead, if a user does not have access to the topic, 
> the servers should act as if the topic does not exist. 
> To solve this some of the changes could include:
>   - The broker should not return a TOPIC_AUTHORIZATION(29) error for 
> requests (metadata, produce, fetch, etc) that include a topic that the user 
> does not have DESCRIBE access to.
>   - A user should not receive a TopicAuthorizationException when they do 
> not have DESCRIBE access to a topic or the cluster.
>  - The client should not maintain and expose a list of unauthorized 
> topics in org.apache.kafka.common.Cluster. 
> Other changes may be required that are not listed here. Further analysis is 
> needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-724) Allow automatic socket.send.buffer from operating system

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-724:
--
Resolution: Fixed
  Reviewer: Ismael Juma  (was: Jay Kreps)
Status: Resolved  (was: Patch Available)

> Allow automatic socket.send.buffer from operating system
> 
>
> Key: KAFKA-724
> URL: https://issues.apache.org/jira/browse/KAFKA-724
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Pablo Barrera
>Assignee: Rekha Joshi
>  Labels: newbie
> Fix For: 0.10.1.0
>
>
> To do this, don't call to socket().setXXXBufferSize. This can be 
> controlled by the configuration parameter: if the value socket.send.buffer or 
> others are set to -1, don't call to socket().setXXXBufferSize



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-724) Allow automatic socket.send.buffer from operating system

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317541#comment-15317541
 ] 

ASF GitHub Bot commented on KAFKA-724:
--

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1469


> Allow automatic socket.send.buffer from operating system
> 
>
> Key: KAFKA-724
> URL: https://issues.apache.org/jira/browse/KAFKA-724
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Pablo Barrera
>Assignee: Rekha Joshi
>  Labels: newbie
> Fix For: 0.10.1.0
>
>
> To do this, don't call to socket().setXXXBufferSize. This can be 
> controlled by the configuration parameter: if the value socket.send.buffer or 
> others are set to -1, don't call to socket().setXXXBufferSize



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1469: KAFKA-724; Allow automatic socket.send.buffer from...

2016-06-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1469


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-54 Sticky Partition Assignment Strategy

2016-06-06 Thread Vahid S Hashemian
Hi Jason,

Thanks for reviewing the KIP.
I will add the details you requested, but to summarize:

Regarding the structure of the user data:

Right now the user data will have the current assignments only which is a 
mapping of consumers to their assigned topic partitions. Is this mapping 
what you're also suggesting with CurrentAssignment field?
I see how adding a version (as sticky assignor version) will be useful. 
Also how having a protocol name would be useful, perhaps for validation.
But could you clarify the "Subscription" field and how you think it'll 
come into play?


Regarding the algorithm:

There could be similarities between how this KIP is implemented and how 
KIP-49 is handling the fairness. But since we had to take stickiness into 
consideration we started fresh and did not adopt from KIP-49.
The Sticky assignor implementation is comprehensive and guarantees the 
fairest possible assignment with highest stickiness. I even have a unit 
test that randomly generates an assignment problem and verifies that a 
fair and sticky assignment is calculated.
KIP-54 gives priority to fairness over stickiness (which makes the 
implementation more complex). We could have another strategy that gives 
priority to stickiness over fairness (which supposedly will have a better 
performance).
The main distinction between KIP-54 and KIP-49 is that KIP-49 calculates 
the assignment without considering the previous assignments (fairness 
only); whereas for KIP-54 previous assignments play a big role (fairness 
and stickiness).
I believe if there is a situation where the stickiness requirements do not 
exist it would make sense to use a fair-only assignment without the 
overhead of sticky assignment, as you mentioned.
So, I could see three different strategies that could enrich assignment 
policy options.
It would be great to have some feedback from the community about what is 
the best way to move forward with these two KIPs.

In the meantime, I'll add some more details in the KIP about the approach 
for calculating assignments.

Thanks again.
 
Regards,
--Vahid
 



From:   Jason Gustafson 
To: dev@kafka.apache.org
Date:   06/06/2016 01:26 PM
Subject:Re: [DISCUSS] KIP-54 Sticky Partition Assignment Strategy



Hi Vahid,

Can you add some detail to the KIP on the structure of the user data? I'm
guessing it would be something like this:

ProtocolName => "sticky"

ProtocolMetadata => Version Subscription UserData
  Version => int16
  Subscription => [Topic]
Topic => string
  UserData => CurrentAssignment
CurrentAssignment => [Topic [Partition]]
  Topic => string
  Partiton => int32

It would also be helpful to include a little more detail on the algorithm.
>From what I can tell, it looks like you're adopting some of the strategies
from KIP-49 to handle differing subscriptions better. If so, then I wonder
if it makes sense to combine the two KIPs? Or do you think there would be
an advantage to having the "fair" assignment strategy without the overhead
of the sticky assignor?

Thanks,
Jason



On Fri, Jun 3, 2016 at 11:33 AM, Guozhang Wang  wrote:

> Sorry for being late on this thread.
>
> The assign() function is auto-triggered during the rebalance by one of 
the
> consumers when it receives all subscription information collected from 
the
> server-side coordinator.
>
> More details can be found here:
>
> 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Client-side+Assignment+Proposal#KafkaClient-sideAssignmentProposal-ConsumerEmbeddedProtocol

>
> As for Kafka Streams, they way it did "stickiness" is by 1) let all
> consumers put their current assigned topic-partitions and server ids 
into
> the "metadata" field of the JoinGroupRequest, 2) when the selected 
consumer
> triggers assign() along with all the subscriptions as well as their
> metadata, it can parse the metadata to learn about the existing 
assignment
> map; and hence when making the new assignment it will try to assign
> partitions to its current owners "with best effort".
>
>
> Hope this helps.
>
>
> Guozhang
>
>
> On Thu, May 26, 2016 at 4:56 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
> > Hi Guozhang,
> >
> > I was looking at the implementation of StreamsPartitionAssignor 
through
> > its unit tests and expected to find some tests that
> > - verify stickiness by making at least two calls to the assign() 
method
> > (so we check the second assign() call output preserves the assignments
> > coming from the first assign() call output); or
> > - start off by a preset assignment, call assign() after some 
subscription
> > change, and verify the previous assignment are preserved.
> > But none of the methods seem to do these. Did I overlook them, or
> > stickiness is being tested in some other fashion?
> >
> > Also, if there is a high-level write-up about how this assignor works
> > could you please point me to it? Thanks.
> >
> > Regards.
> > --Vahid
> >
> >
> >

[GitHub] kafka pull request #1475: MINOR: Add comment for round robin partitioner wit...

2016-06-06 Thread Ishiihara
GitHub user Ishiihara opened a pull request:

https://github.com/apache/kafka/pull/1475

MINOR: Add comment for round robin partitioner with different subscriptions



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Ishiihara/kafka roundrobin-comment

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1475


commit 6be7c2380b2a4fb689dcfb42c6374fc4b9c528d4
Author: Liquan Pei 
Date:   2016-06-06T23:51:26Z

Add comment for round robin partitioner with different subscriptions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (KAFKA-3794) Add Stream / Table prefix in print functions

2016-06-06 Thread Bill Bejeck (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck reassigned KAFKA-3794:
--

Assignee: Bill Bejeck

> Add Stream / Table prefix in print functions
> 
>
> Key: KAFKA-3794
> URL: https://issues.apache.org/jira/browse/KAFKA-3794
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: newbie, user-experience
>
> Currently the KTable/KStream.print() operator will print the key-value pair 
> as it was forwarded to this operator. However, if there are multiple 
> operators in the topologies with the same {{PrintStream}} (e.g. stdout), 
> their printed key-value pairs will be interleaving on that stream channel.
> Hence it is better to add a prefix for different KStream/KTable.print 
> operators. One proposal:
> 1) For KTable, it inherits a table name when created, and we can use that 
> name as the prefix as {{[table-name]: key, value}}.
> 2) For KStream, we can overload the function with an additional "name" 
> parameter that we use as the prefix; if it is not specified, then we can use 
> the parent processor node name, which has the pattern like 
> {{KSTREAM-JOIN-suffix_index}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3794) Add Stream / Table prefix in print functions

2016-06-06 Thread Bill Bejeck (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317475#comment-15317475
 ] 

Bill Bejeck commented on KAFKA-3794:


picking this one up, mea culpa

> Add Stream / Table prefix in print functions
> 
>
> Key: KAFKA-3794
> URL: https://issues.apache.org/jira/browse/KAFKA-3794
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: newbie, user-experience
>
> Currently the KTable/KStream.print() operator will print the key-value pair 
> as it was forwarded to this operator. However, if there are multiple 
> operators in the topologies with the same {{PrintStream}} (e.g. stdout), 
> their printed key-value pairs will be interleaving on that stream channel.
> Hence it is better to add a prefix for different KStream/KTable.print 
> operators. One proposal:
> 1) For KTable, it inherits a table name when created, and we can use that 
> name as the prefix as {{[table-name]: key, value}}.
> 2) For KStream, we can overload the function with an additional "name" 
> parameter that we use as the prefix; if it is not specified, then we can use 
> the parent processor node name, which has the pattern like 
> {{KSTREAM-JOIN-suffix_index}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3167) Use local to the workspace Gradle cache and recreate it on every build

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3167 started by Ismael Juma.
--
> Use local to the workspace Gradle cache and recreate it on every build
> --
>
> Key: KAFKA-3167
> URL: https://issues.apache.org/jira/browse/KAFKA-3167
> Project: Kafka
>  Issue Type: Improvement
>  Components: build
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>
> Kafka builds often fail with "Could not add entry 
> '/home/jenkins/.gradle/caches/modules-2/files-2.1/net.jpountz.lz4/lz4/1.3.0/c708bb2590c0652a642236ef45d9f99ff842a2ce/lz4-1.3.0.jar'
>  to cache fileHashes.bin"
> I filed INFRA-11083 and Andrew Bayer suggested:
> "Can you change your builds to use a local-to-the-workspace cache and then 
> nuke it/recreate it on every build?"
> This issue is about changing the Jenkins config for one of the trunk builds 
> to do the above to see if it helps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3167) Use local to the workspace Gradle cache and recreate it on every build

2016-06-06 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317422#comment-15317422
 ] 

Ismael Juma commented on KAFKA-3167:


I changed this so that we do the following before gradle is executed:

{code}
# Delete gradle cache to workaround cache corruption bugs, see KAFKA-3167
rm -rf ${WORKSPACE}/.gradle
{code}

The corrupted cache seems to be in `.gradle`, so the previous attempt wasn't 
helping. The Gradle cache was not corrupted after the change so far (unless 
noted, the tests passed):

https://builds.apache.org/job/kafka-trunk-jdk7/1348/
https://builds.apache.org/job/kafka-trunk-jdk8/683/
https://builds.apache.org/job/kafka-0.10.0-jdk7/120/ (failed due to an 
unrelated transient failure)
https://builds.apache.org/job/kafka_0.9.0_jdk7/133/
https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/4140/

> Use local to the workspace Gradle cache and recreate it on every build
> --
>
> Key: KAFKA-3167
> URL: https://issues.apache.org/jira/browse/KAFKA-3167
> Project: Kafka
>  Issue Type: Improvement
>  Components: build
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>
> Kafka builds often fail with "Could not add entry 
> '/home/jenkins/.gradle/caches/modules-2/files-2.1/net.jpountz.lz4/lz4/1.3.0/c708bb2590c0652a642236ef45d9f99ff842a2ce/lz4-1.3.0.jar'
>  to cache fileHashes.bin"
> I filed INFRA-11083 and Andrew Bayer suggested:
> "Can you change your builds to use a local-to-the-workspace cache and then 
> nuke it/recreate it on every build?"
> This issue is about changing the Jenkins config for one of the trunk builds 
> to do the above to see if it helps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3796) SslTransportLayerTest.testInvalidEndpointIdentification fails on trunk

2016-06-06 Thread Rekha Joshi (JIRA)
Rekha Joshi created KAFKA-3796:
--

 Summary: SslTransportLayerTest.testInvalidEndpointIdentification 
fails on trunk
 Key: KAFKA-3796
 URL: https://issues.apache.org/jira/browse/KAFKA-3796
 Project: Kafka
  Issue Type: Bug
  Components: clients, security
Affects Versions: 0.10.0.0
Reporter: Rekha Joshi
Assignee: Rekha Joshi


org.apache.kafka.common.network.SslTransportLayerTest > 
testEndpointIdentificationDisabled FAILED
java.net.BindException: Can't assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at 
org.apache.kafka.common.network.NioEchoServer.(NioEchoServer.java:48)
at 
org.apache.kafka.common.network.SslTransportLayerTest.testEndpointIdentificationDisabled(SslTransportLayerTest.java:120)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3795) Transient system test failure upgrade_test.TestUpgrade

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3795:
---
Description: 
>From a recent build running on the 0.10.0 branch:

{code}
test_id:
2016-06-06--001.kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.9.0.1.to_message_format_version=0.9.0.1.compression_types=.snappy.new_consumer=True
status: FAIL
run time:   3 minutes 29.166 seconds


3522 acked message did not make it to the Consumer. They are: 476524, 
476525, 476527, 476528, 476530, 476531, 476533, 476534, 476536, 476537, 476539, 
476540, 476542, 476543, 476545, 476546, 476548, 476549, 476551, 476552, ...plus 
3482 more. Total Acked: 110437, Total Consumed: 127470. The first 1000 missing 
messages were validated to ensure they are in Kafka's data files. 1000 were 
missing. This suggests data loss. Here are some of the messages not found in 
the data files: [477184, 477185, 477187, 477188, 477190, 477191, 477193, 
477194, 477196, 477197]

Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
 line 106, in run_all_tests
data = self.run_single_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
 line 162, in run_single_test
return self.current_test_context.function(self.current_test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/mark/_mark.py",
 line 331, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/tests/kafkatest/tests/core/upgrade_test.py",
 line 113, in test_upgrade
self.run_produce_consume_validate(core_test_action=lambda: 
self.perform_upgrade(from_kafka_version,
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 79, in run_produce_consume_validate
raise e
AssertionError: 3522 acked message did not make it to the Consumer. They are: 
476524, 476525, 476527, 476528, 476530, 476531, 476533, 476534, 476536, 476537, 
476539, 476540, 476542, 476543, 476545, 476546, 476548, 476549, 476551, 476552, 
...plus 3482 more. Total Acked: 110437, Total Consumed: 127470. The first 1000 
missing messages were validated to ensure they are in Kafka's data files. 1000 
were missing. This suggests data loss. Here are some of the messages not found 
in the data files: [477184, 477185, 477187, 477188, 477190, 477191, 477193, 
477194, 477196, 477197]
{code}

Here's a link to the test data: 
http://testing.confluent.io/confluent-kafka-0-10-0-system-test-results/?prefix=2016-06-06--001.1465234069--apache--0.10.0--6500b53/

  was:
>From a recent build running on the 0.9.0 branch:

{code}
test_id:
2016-06-06--001.kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.9.0.1.to_message_format_version=0.9.0.1.compression_types=.snappy.new_consumer=True
status: FAIL
run time:   3 minutes 29.166 seconds


3522 acked message did not make it to the Consumer. They are: 476524, 
476525, 476527, 476528, 476530, 476531, 476533, 476534, 476536, 476537, 476539, 
476540, 476542, 476543, 476545, 476546, 476548, 476549, 476551, 476552, ...plus 
3482 more. Total Acked: 110437, Total Consumed: 127470. The first 1000 missing 
messages were validated to ensure they are in Kafka's data files. 1000 were 
missing. This suggests data loss. Here are some of the messages not found in 
the data files: [477184, 477185, 477187, 477188, 477190, 477191, 477193, 
477194, 477196, 477197]

Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
 line 106, in run_all_tests
data = self.run_single_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
 line 162, in run_single_test
return self.current_test_context.function(self.current_test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/mark/_mark.py",
 line 331, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/tests/kafkatest/tests/core/upgrade_test.py",
 line 113, in test_upgrade
self.run_produce_consume_validate(core_test_action=lambda: 
self.perform_upgrade(from_kafka_version,
  File 

[jira] [Created] (KAFKA-3795) Transient system test failure upgrade_test.TestUpgrade

2016-06-06 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-3795:
--

 Summary: Transient system test failure upgrade_test.TestUpgrade
 Key: KAFKA-3795
 URL: https://issues.apache.org/jira/browse/KAFKA-3795
 Project: Kafka
  Issue Type: Bug
  Components: system tests
Reporter: Jason Gustafson


>From a recent build running on the 0.9.0 branch:

{code}
test_id:
2016-06-06--001.kafkatest.tests.core.upgrade_test.TestUpgrade.test_upgrade.from_kafka_version=0.9.0.1.to_message_format_version=0.9.0.1.compression_types=.snappy.new_consumer=True
status: FAIL
run time:   3 minutes 29.166 seconds


3522 acked message did not make it to the Consumer. They are: 476524, 
476525, 476527, 476528, 476530, 476531, 476533, 476534, 476536, 476537, 476539, 
476540, 476542, 476543, 476545, 476546, 476548, 476549, 476551, 476552, ...plus 
3482 more. Total Acked: 110437, Total Consumed: 127470. The first 1000 missing 
messages were validated to ensure they are in Kafka's data files. 1000 were 
missing. This suggests data loss. Here are some of the messages not found in 
the data files: [477184, 477185, 477187, 477188, 477190, 477191, 477193, 
477194, 477196, 477197]

Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
 line 106, in run_all_tests
data = self.run_single_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
 line 162, in run_single_test
return self.current_test_context.function(self.current_test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/mark/_mark.py",
 line 331, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/tests/kafkatest/tests/core/upgrade_test.py",
 line 113, in test_upgrade
self.run_produce_consume_validate(core_test_action=lambda: 
self.perform_upgrade(from_kafka_version,
  File 
"/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 79, in run_produce_consume_validate
raise e
AssertionError: 3522 acked message did not make it to the Consumer. They are: 
476524, 476525, 476527, 476528, 476530, 476531, 476533, 476534, 476536, 476537, 
476539, 476540, 476542, 476543, 476545, 476546, 476548, 476549, 476551, 476552, 
...plus 3482 more. Total Acked: 110437, Total Consumed: 127470. The first 1000 
missing messages were validated to ensure they are in Kafka's data files. 1000 
were missing. This suggests data loss. Here are some of the messages not found 
in the data files: [477184, 477185, 477187, 477188, 477190, 477191, 477193, 
477194, 477196, 477197]
{code}

Here's a link to the test data: 
http://testing.confluent.io/confluent-kafka-0-10-0-system-test-results/?prefix=2016-06-06--001.1465234069--apache--0.10.0--6500b53/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-62: Allow consumer to send heartbeats from a background thread

2016-06-06 Thread Becket Qin
Guozhang and Jason,

I think we are on the same page that having rebalances done in the
background thread has a much bigger impact to the users. So I agree that is
is probably better to start with having 1) and 2). We can add 3) later if
necessary.

Another implementation detail I am not quite sure is about making the
NetworkClient work with two threads. The KIP implies that this will be done
by synchronizing on ConsumerNetworkClient. I am not sure if that is enough,
what if a poll() from ConsumerNetworkClient receives a FetchResponse or
OffsetFetchResponse which are supposed to be handled by user thread? This
is implementation detail but may be worth thinking about a bit more.

Thanks,

Jiangjie (Becket) Qin


On Mon, Jun 6, 2016 at 11:27 AM, Guozhang Wang  wrote:

> Jiangjie:
>
> About doing the rebalance in the background thread, I'm a bit concerned as
> it will change a lot of the concurrency guarantees that consumer currently
> provides (think of a consumer caller thread committing externally while the
> rebalance is happening in the background thread), and hence if we are
> considering changing that now or in the future, we need to think through
> all the corner cases.
>
> So in general, I'd still prefer we reserve a third config for rebalance
> timeout in this KIP.
>
> Guozhang
>
>
> On Mon, Jun 6, 2016 at 11:25 AM, Guozhang Wang  wrote:
>
> > (+ Matthias)
> >
> > Hello Henry,
> >
> > Specifically to your question regarding Kafka Streams:
> >
> > 1. Currently restoreActiveState() is triggered in the onPartitionAssigned
> > callback, which is after the rebalance is completed from the
> coordinator's
> > point of view, and hence is covered in the process timeout value in this
> > new KIP.
> >
> > 2. That is a good question, and I think it is a general root cause we saw
> > failures of directory locking reported by more than one use case already.
> > Currently I believe the main reason that a second rebalance is triggered
> > while the processors are still completing restoreActiveState() of the
> > previous rebalance is due to session timeout (default 30 seconds), which
> > will be largely reduced with a larger processor timeout; however with
> > complex topologies we restoreActiveState() for all states may still be
> > taking long time with tens / hundreds of state stores, and other cases
> > that also can cause consumers to re-join the groups right after a
> previous
> > rebalance, for example 1) regex subscription where the topic metadata has
> > changed, 2) consecutive consumer failures, or new consumers (i.e. new
> > KStream instances / threads) added.
> >
> > For such cases we can do a better job to "fail fast" if the consumer
> > detects another join is needed. I think in one of your local commit you
> > are already doing sth similar, which we can merge back to trunk.
> >
> >
> >
> > Guozhang
> >
> >
> > On Sun, Jun 5, 2016 at 11:24 PM, Henry Cai 
> > wrote:
> >
> >> I have a question on the KIP on long stall during
> >> ProcessorStateManager.restoreActiveState(), this can be a long stall
> when
> >> we need to rebuild the RocksDB state on a new node.
> >>
> >> 1. Is restoreActiveState() considered as post rebalance since this is
> >> invoked on application rebalance listener?
> >> 2. When the node A was spending long time rebuilding the state in
> >> restoreActiveState() from the previous rebalance, a new node (node B)
> send
> >> a new JoinGroup request to the co-ordinator, how long should the
> >> coordinator wait for node A to finish the restoreActiveState from the
> >> previous rebalance, the restoreActiveState can take more than 10 minutes
> >> for a big state.
> >>
> >>
> >> On Sun, Jun 5, 2016 at 10:46 PM, Becket Qin 
> wrote:
> >>
> >> > Hi Jason,
> >> >
> >> > Thanks for this very useful KIP.  In general I am with Guozhang on the
> >> > purpose of of the three timeouts.
> >> > 1) session timeout for consumer liveness,
> >> > 2) process timeout (or maybe we should rename it to
> >> max.poll.interval.ms)
> >> > for application liveness,
> >> > 3) rebalance timeout for faster rebalance in some failure cases.
> >> >
> >> > It seems the current discussion is mainly about whether we need 3) as
> a
> >> > separate timeout or not. The current KIP proposal is to combine 2) and
> >> 3),
> >> > i.e. just use process timeout as rebalance timeout. That means we need
> >> to
> >> > either increase rebalance timeout out to let it adapt to process
> >> timeout,
> >> > or the reverse. It would be helpful to understand the impact of these
> >> two
> >> > cases. Here are my two cents.
> >> >
> >> > For users who are consuming data from Kafka, usually they either care
> >> about
> >> > throughput or care about latency.
> >> >
> >> > If users care about the latency, they would probably care more about
> >> > average latency instead of 99.99 percentile latency which can be
> >> affected
> >> > by many other more 

[jira] [Created] (KAFKA-3794) Add Stream / Table prefix in print functions

2016-06-06 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-3794:


 Summary: Add Stream / Table prefix in print functions
 Key: KAFKA-3794
 URL: https://issues.apache.org/jira/browse/KAFKA-3794
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Guozhang Wang


Currently the KTable/KStream.print() operator will print the key-value pair as 
it was forwarded to this operator. However, if there are multiple operators in 
the topologies with the same {{PrintStream}} (e.g. stdout), their printed 
key-value pairs will be interleaving on that stream channel.

Hence it is better to add a prefix for different KStream/KTable.print 
operators. One proposal:

1) For KTable, it inherits a table name when created, and we can use that name 
as the prefix as {{[table-name]: key, value}}.

2) For KStream, we can overload the function with an additional "name" 
parameter that we use as the prefix; if it is not specified, then we can use 
the parent processor node name, which has the pattern like 
{{KSTREAM-JOIN-suffix_index}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317175#comment-15317175
 ] 

ASF GitHub Bot commented on KAFKA-1981:
---

Github user ewasserman closed the pull request at:

https://github.com/apache/kafka/pull/1168


> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1168: KAFKA-1981 Make log compaction point configurable

2016-06-06 Thread ewasserman
Github user ewasserman closed the pull request at:

https://github.com/apache/kafka/pull/1168


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-54 Sticky Partition Assignment Strategy

2016-06-06 Thread Jason Gustafson
Hi Vahid,

Can you add some detail to the KIP on the structure of the user data? I'm
guessing it would be something like this:

ProtocolName => "sticky"

ProtocolMetadata => Version Subscription UserData
  Version => int16
  Subscription => [Topic]
Topic => string
  UserData => CurrentAssignment
CurrentAssignment => [Topic [Partition]]
  Topic => string
  Partiton => int32

It would also be helpful to include a little more detail on the algorithm.
>From what I can tell, it looks like you're adopting some of the strategies
from KIP-49 to handle differing subscriptions better. If so, then I wonder
if it makes sense to combine the two KIPs? Or do you think there would be
an advantage to having the "fair" assignment strategy without the overhead
of the sticky assignor?

Thanks,
Jason



On Fri, Jun 3, 2016 at 11:33 AM, Guozhang Wang  wrote:

> Sorry for being late on this thread.
>
> The assign() function is auto-triggered during the rebalance by one of the
> consumers when it receives all subscription information collected from the
> server-side coordinator.
>
> More details can be found here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Client-side+Assignment+Proposal#KafkaClient-sideAssignmentProposal-ConsumerEmbeddedProtocol
>
> As for Kafka Streams, they way it did "stickiness" is by 1) let all
> consumers put their current assigned topic-partitions and server ids into
> the "metadata" field of the JoinGroupRequest, 2) when the selected consumer
> triggers assign() along with all the subscriptions as well as their
> metadata, it can parse the metadata to learn about the existing assignment
> map; and hence when making the new assignment it will try to assign
> partitions to its current owners "with best effort".
>
>
> Hope this helps.
>
>
> Guozhang
>
>
> On Thu, May 26, 2016 at 4:56 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
> > Hi Guozhang,
> >
> > I was looking at the implementation of StreamsPartitionAssignor through
> > its unit tests and expected to find some tests that
> > - verify stickiness by making at least two calls to the assign() method
> > (so we check the second assign() call output preserves the assignments
> > coming from the first assign() call output); or
> > - start off by a preset assignment, call assign() after some subscription
> > change, and verify the previous assignment are preserved.
> > But none of the methods seem to do these. Did I overlook them, or
> > stickiness is being tested in some other fashion?
> >
> > Also, if there is a high-level write-up about how this assignor works
> > could you please point me to it? Thanks.
> >
> > Regards.
> > --Vahid
> >
> >
> >
> >
> > From:   Guozhang Wang 
> > To: "dev@kafka.apache.org" 
> > Date:   05/02/2016 10:34 AM
> > Subject:Re: [DISCUSS] KIP-54 Sticky Partition Assignment Strategy
> >
> >
> >
> > Just FYI, the StreamsPartitionAssignor in Kafka Streams are already doing
> > some sort of sticky partitioning mechanism. This is done through the
> > userData field though; i.e. all group members send their current
> "assigned
> > partitions" in their join group request, which will be grouped and send
> to
> > the leader, the leader then does best-effort for sticky-partitioning.
> >
> >
> > Guozhang
> >
> > On Fri, Apr 29, 2016 at 9:48 PM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > wrote:
> >
> > > I think I'm unclear how we leverage the
> > > onPartitionsRevoked/onPartitionsAssigned here in any way that's
> > different
> > > from our normal usage -- certainly you can use them to generate a diff,
> > but
> > > you still need to commit when partitions are revoked and that has a
> > > non-trivial cost. Are we just saying that you might be able to save
> some
> > > overhead, e.g. closing/reopening some other resources by doing a flush
> > but
> > > not a close() or something? You still need to flush any output and
> > commit
> > > offsets before returning from onPartitionsRevoked, right? Otherwise you
> > > couldn't guarantee clean handoff of partitions.
> > >
> > > In terms of the rebalancing, the basic requirements in the KIP seem
> > sound.
> > > Passing previous assignment data via UserData also seems reasonable
> > since
> > > it avoids redistributing all assignment data to all members and doesn't
> > > rely on the next generation leader being a member of the current
> > > generation. Hopefully this shouldn't be surprising since I think I
> > > discussed this w/ Jason before he updated the relevant wiki pages :)
> > >
> > > -Ewen
> > >
> > >
> > > On Mon, Apr 18, 2016 at 9:34 AM, Vahid S Hashemian <
> > > vahidhashem...@us.ibm.com> wrote:
> > >
> > > > HI Jason,
> > > >
> > > > Thanks for your feedback.
> > > >
> > > > I believe your suggestion on how to take advantage of this assignor
> is
> > > > valid. We can leverage onPartitionsRevoked() and
> > onPartitionsAssigned()
> > > > callbacks and do a 

[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-06-06 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317147#comment-15317147
 ] 

Ewen Cheslack-Postava commented on KAFKA-2394:
--

[~cotedm] Sounds good. Maybe we should add a small note in the docs (I think 
there's an upgrade/compatibility section in there) so when a release includes 
this people have some additional warning even if they don't follow the lists?

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #684

2016-06-06 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-2948; Remove unused topics from producer metadata set

--
[...truncated 3293 lines...]

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow PASSED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooLow 
PASSED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooHigh 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooHigh 
PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooHigh 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooHigh 
PASSED

kafka.integration.MinIsrConfigTest > testDefaultKafkaConfig STARTED

kafka.integration.MinIsrConfigTest > testDefaultKafkaConfig PASSED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionEnabled 
STARTED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionEnabled 
PASSED

kafka.integration.UncleanLeaderElectionTest > 
testCleanLeaderElectionDisabledByTopicOverride STARTED

kafka.integration.UncleanLeaderElectionTest > 
testCleanLeaderElectionDisabledByTopicOverride PASSED

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionDisabled 
STARTED
ERROR: Could not install GRADLE_2_4_RC_2_HOME
java.lang.NullPointerException
at 
hudson.plugins.toolenv.ToolEnvBuildWrapper$1.buildEnvVars(ToolEnvBuildWrapper.java:46)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:947)
at hudson.plugins.git.GitSCM.getParamExpandedRepos(GitSCM.java:390)
at 
hudson.plugins.git.GitSCM.compareRemoteRevisionWithImpl(GitSCM.java:577)
at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:527)
at hudson.scm.SCM.compareRemoteRevisionWith(SCM.java:381)
at hudson.scm.SCM.poll(SCM.java:398)
at hudson.model.AbstractProject._poll(AbstractProject.java:1453)
at hudson.model.AbstractProject.poll(AbstractProject.java:1356)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:526)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:555)
at 
hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:119)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
ERROR: Could not install JDK1_8_0_66_HOME
java.lang.NullPointerException
at 
hudson.plugins.toolenv.ToolEnvBuildWrapper$1.buildEnvVars(ToolEnvBuildWrapper.java:46)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:947)
at hudson.plugins.git.GitSCM.getParamExpandedRepos(GitSCM.java:390)
at 
hudson.plugins.git.GitSCM.compareRemoteRevisionWithImpl(GitSCM.java:577)
at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:527)
at hudson.scm.SCM.compareRemoteRevisionWith(SCM.java:381)
at hudson.scm.SCM.poll(SCM.java:398)
at hudson.model.AbstractProject._poll(AbstractProject.java:1453)
at hudson.model.AbstractProject.poll(AbstractProject.java:1356)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:526)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:555)
at 
hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:119)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

kafka.integration.UncleanLeaderElectionTest > testUncleanLeaderElectionDisabled 
PASSED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionInvalidTopicOverride STARTED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionInvalidTopicOverride PASSED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionEnabledByTopicOverride STARTED

kafka.integration.UncleanLeaderElectionTest > 
testUncleanLeaderElectionEnabledByTopicOverride PASSED

kafka.integration.FetcherTest > testFetcher STARTED

kafka.integration.FetcherTest > testFetcher PASSED

kafka.integration.SaslSslTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED


[jira] [Comment Edited] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-06-06 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316992#comment-15316992
 ] 

Dustin Cote edited comment on KAFKA-2394 at 6/6/16 6:57 PM:


[~ewencp] I only got positive feedback on the change from the mailing list, so 
sounds like this is a desirable change?  I think release notes for 
documentation would be sufficient based on that sentiment.


was (Author: cotedm):
[~ewencp] I only got positive feedback on the change from the mailing list, so 
sounds like this is a desirable change?

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2394) Use RollingFileAppender by default in log4j.properties

2016-06-06 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316992#comment-15316992
 ] 

Dustin Cote commented on KAFKA-2394:


[~ewencp] I only got positive feedback on the change from the mailing list, so 
sounds like this is a desirable change?

> Use RollingFileAppender by default in log4j.properties
> --
>
> Key: KAFKA-2394
> URL: https://issues.apache.org/jira/browse/KAFKA-2394
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Dustin Cote
>Priority: Minor
>  Labels: newbie
> Attachments: log4j.properties.patch
>
>
> The default log4j.properties bundled with Kafka uses ConsoleAppender and 
> DailyRollingFileAppender, which offer no protection to users from spammy 
> logging. In extreme cases (such as when issues like KAFKA-1461 are 
> encountered), the logs can exhaust the local disk space. This could be a 
> problem for Kafka adoption since new users are less likely to adjust the 
> logging properties themselves, and are more likely to have configuration 
> problems which result in log spam. 
> To fix this, we can use RollingFileAppender, which offers two settings for 
> controlling the maximum space that log files will use.
> maxBackupIndex: how many backup files to retain
> maxFileSize: the max size of each log file
> One question is whether this change is a compatibility concern? The backup 
> strategy and filenames used by RollingFileAppender are different from those 
> used by DailyRollingFileAppender, so any tools which depend on the old format 
> will break. If we think this is a serious problem, one solution would be to 
> provide two versions of log4j.properties and add a flag to enable the new 
> one. Another solution would be to include the RollingFileAppender 
> configuration in the default log4j.properties, but commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2948) Kafka producer does not cope well with topic deletions

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-2948:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Kafka producer does not cope well with topic deletions
> --
>
> Key: KAFKA-2948
> URL: https://issues.apache.org/jira/browse/KAFKA-2948
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.9.0.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
> Fix For: 0.10.1.0
>
>
> Kafka producer gets metadata for topics when send is invoked and thereafter 
> it attempts to keep the metadata up-to-date without any explicit requests 
> from the client. This works well in static environments, but when topics are 
> added or deleted, list of topics in Metadata grows but never shrinks. Apart 
> from being a memory leak, this results in constant requests for metadata for 
> deleted topics.
> We are running into this issue with the Confluent REST server where topic 
> deletion from tests are filling up logs with warnings about unknown topics. 
> Auto-create is turned off in our Kafka cluster.
> I am happy to provide a fix, but am not sure what the right fix is. Does it 
> make sense to remove topics from the metadata list when 
> UNKNOWN_TOPIC_OR_PARTITION response is received if there are no outstanding 
> sends? It doesn't look very straightforward to do this, so any alternative 
> suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2948) Kafka producer does not cope well with topic deletions

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316988#comment-15316988
 ] 

ASF GitHub Bot commented on KAFKA-2948:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/645


> Kafka producer does not cope well with topic deletions
> --
>
> Key: KAFKA-2948
> URL: https://issues.apache.org/jira/browse/KAFKA-2948
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.9.0.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
> Fix For: 0.10.1.0
>
>
> Kafka producer gets metadata for topics when send is invoked and thereafter 
> it attempts to keep the metadata up-to-date without any explicit requests 
> from the client. This works well in static environments, but when topics are 
> added or deleted, list of topics in Metadata grows but never shrinks. Apart 
> from being a memory leak, this results in constant requests for metadata for 
> deleted topics.
> We are running into this issue with the Confluent REST server where topic 
> deletion from tests are filling up logs with warnings about unknown topics. 
> Auto-create is turned off in our Kafka cluster.
> I am happy to provide a fix, but am not sure what the right fix is. Does it 
> make sense to remove topics from the metadata list when 
> UNKNOWN_TOPIC_OR_PARTITION response is received if there are no outstanding 
> sends? It doesn't look very straightforward to do this, so any alternative 
> suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3693) Race condition between highwatermark-checkpoint thread and handleLeaderAndIsrRequest at broker start-up

2016-06-06 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316991#comment-15316991
 ] 

Maysam Yabandeh commented on KAFKA-3693:


My apology. That was a copy-paste error. Broker 16 started shutting down at 
05:40:46,845
{code}
2016-05-10 05:40:46,845 INFO server.ReplicaFetcherThread: 
[ReplicaFetcherThread-0-15], Shutting down
{code}
until it completely shutdown at 06:17:27,160
{code}
2016-05-10 06:17:27,160 INFO server.KafkaServer: [Kafka Server 16], shut down 
completed
{code}
So it seems that the controller is sending the LeaderAndIsr request but the 
message is not delivered since broker 16 is being shut down. Instead the 
controller retries until the broker is fully restarted and then the message is 
delivered.

Did this correction resolve the confusion?

> Race condition between highwatermark-checkpoint thread and 
> handleLeaderAndIsrRequest at broker start-up
> ---
>
> Key: KAFKA-3693
> URL: https://issues.apache.org/jira/browse/KAFKA-3693
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1
>Reporter: Maysam Yabandeh
>
> Upon broker start-up, a race between highwatermark-checkpoint thread to write 
> replication-offset-checkpoint file and handleLeaderAndIsrRequest thread 
> reading from it causes the highwatermark for some partitions to be reset to 
> 0. In the good case, this results the replica to truncate its entire log to 0 
> and hence initiates fetching of terabytes of data from the lead broker, which 
> sometimes leads to hours of downtime. We observed the bad cases that the 
> reset offset can propagate to recovery-point-offset-checkpoint file, making a 
> lead broker to truncate the file. This seems to have the potential to lead to 
> data loss if the truncation happens at both follower and leader brokers.
> This is the particular faulty scenario manifested in our tests:
> # The broker restarts and receive LeaderAndIsr from the controller
> # LeaderAndIsr message however does not contain all the partitions (probably 
> because other brokers were churning at the same time)
> # becomeLeaderOrFollower calls getOrCreatePartition and updates the 
> allPartitions with the partitions included in the LeaderAndIsr message {code}
>   def getOrCreatePartition(topic: String, partitionId: Int): Partition = {
> var partition = allPartitions.get((topic, partitionId))
> if (partition == null) {
>   allPartitions.putIfNotExists((topic, partitionId), new Partition(topic, 
> partitionId, time, this))
> {code}
> # replication-offset-checkpoint jumps in taking a snapshot of (the partial) 
> allReplicas' high watermark into replication-offset-checkpoint file {code}  
> def checkpointHighWatermarks() {
> val replicas = 
> allPartitions.values.map(_.getReplica(config.brokerId)).collect{case 
> Some(replica) => replica}{code} hence rewriting the previous highwatermarks.
> # Later becomeLeaderOrFollower calls makeLeaders and makeFollowers which read 
> the (now partial) file through Partition::getOrCreateReplica {code}
>   val checkpoint = 
> replicaManager.highWatermarkCheckpoints(log.dir.getParentFile.getAbsolutePath)
>   val offsetMap = checkpoint.read
>   if (!offsetMap.contains(TopicAndPartition(topic, partitionId)))
> info("No checkpointed highwatermark is found for partition 
> [%s,%d]".format(topic, partitionId))
> {code}
> We are not entirely sure whether the initial LeaderAndIsr message including a 
> subset of partitions is critical in making this race condition manifest or 
> not. But it is an important detail since it clarifies that a solution based 
> on not letting the highwatermark-checkpoint thread jumping in the middle of 
> processing a LeaderAndIsr message would not suffice.
> The solution we are thinking of is to force initializing allPartitions by the 
> partitions listed in the replication-offset-checkpoint (and perhaps 
> recovery-point-offset-checkpoint file too) when a server starts.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #645: KAFKA-2948: Remove unused topics from producer meta...

2016-06-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/645


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3748) Add consumer-property to console tools consumer (similar to --producer-property)

2016-06-06 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated KAFKA-3748:
--
Status: Patch Available  (was: In Progress)

> Add consumer-property to console tools consumer (similar to 
> --producer-property)
> 
>
> Key: KAFKA-3748
> URL: https://issues.apache.org/jira/browse/KAFKA-3748
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.9.0.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>  Labels: newbie
> Attachments: KAFKA-3748.PATCH
>
>
> Add --consumer-property to the console consumer.
> Creating this task from the comment given in KAFKA-3567.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3748) Add consumer-property to console tools consumer (similar to --producer-property)

2016-06-06 Thread Bharat Viswanadham (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated KAFKA-3748:
--
Attachment: KAFKA-3748.PATCH

Attaching the patch file.

> Add consumer-property to console tools consumer (similar to 
> --producer-property)
> 
>
> Key: KAFKA-3748
> URL: https://issues.apache.org/jira/browse/KAFKA-3748
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.9.0.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>  Labels: newbie
> Attachments: KAFKA-3748.PATCH
>
>
> Add --consumer-property to the console consumer.
> Creating this task from the comment given in KAFKA-3567.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-62: Allow consumer to send heartbeats from a background thread

2016-06-06 Thread Guozhang Wang
Jiangjie:

About doing the rebalance in the background thread, I'm a bit concerned as
it will change a lot of the concurrency guarantees that consumer currently
provides (think of a consumer caller thread committing externally while the
rebalance is happening in the background thread), and hence if we are
considering changing that now or in the future, we need to think through
all the corner cases.

So in general, I'd still prefer we reserve a third config for rebalance
timeout in this KIP.

Guozhang


On Mon, Jun 6, 2016 at 11:25 AM, Guozhang Wang  wrote:

> (+ Matthias)
>
> Hello Henry,
>
> Specifically to your question regarding Kafka Streams:
>
> 1. Currently restoreActiveState() is triggered in the onPartitionAssigned
> callback, which is after the rebalance is completed from the coordinator's
> point of view, and hence is covered in the process timeout value in this
> new KIP.
>
> 2. That is a good question, and I think it is a general root cause we saw
> failures of directory locking reported by more than one use case already.
> Currently I believe the main reason that a second rebalance is triggered
> while the processors are still completing restoreActiveState() of the
> previous rebalance is due to session timeout (default 30 seconds), which
> will be largely reduced with a larger processor timeout; however with
> complex topologies we restoreActiveState() for all states may still be
> taking long time with tens / hundreds of state stores, and other cases
> that also can cause consumers to re-join the groups right after a previous
> rebalance, for example 1) regex subscription where the topic metadata has
> changed, 2) consecutive consumer failures, or new consumers (i.e. new
> KStream instances / threads) added.
>
> For such cases we can do a better job to "fail fast" if the consumer
> detects another join is needed. I think in one of your local commit you
> are already doing sth similar, which we can merge back to trunk.
>
>
>
> Guozhang
>
>
> On Sun, Jun 5, 2016 at 11:24 PM, Henry Cai 
> wrote:
>
>> I have a question on the KIP on long stall during
>> ProcessorStateManager.restoreActiveState(), this can be a long stall when
>> we need to rebuild the RocksDB state on a new node.
>>
>> 1. Is restoreActiveState() considered as post rebalance since this is
>> invoked on application rebalance listener?
>> 2. When the node A was spending long time rebuilding the state in
>> restoreActiveState() from the previous rebalance, a new node (node B) send
>> a new JoinGroup request to the co-ordinator, how long should the
>> coordinator wait for node A to finish the restoreActiveState from the
>> previous rebalance, the restoreActiveState can take more than 10 minutes
>> for a big state.
>>
>>
>> On Sun, Jun 5, 2016 at 10:46 PM, Becket Qin  wrote:
>>
>> > Hi Jason,
>> >
>> > Thanks for this very useful KIP.  In general I am with Guozhang on the
>> > purpose of of the three timeouts.
>> > 1) session timeout for consumer liveness,
>> > 2) process timeout (or maybe we should rename it to
>> max.poll.interval.ms)
>> > for application liveness,
>> > 3) rebalance timeout for faster rebalance in some failure cases.
>> >
>> > It seems the current discussion is mainly about whether we need 3) as a
>> > separate timeout or not. The current KIP proposal is to combine 2) and
>> 3),
>> > i.e. just use process timeout as rebalance timeout. That means we need
>> to
>> > either increase rebalance timeout out to let it adapt to process
>> timeout,
>> > or the reverse. It would be helpful to understand the impact of these
>> two
>> > cases. Here are my two cents.
>> >
>> > For users who are consuming data from Kafka, usually they either care
>> about
>> > throughput or care about latency.
>> >
>> > If users care about the latency, they would probably care more about
>> > average latency instead of 99.99 percentile latency which can be
>> affected
>> > by many other more common reasons other than consumer failure. Because
>> all
>> > the timeout we are discussing here only have impact on the 99.99
>> percentile
>> > latency, I don't think it would really make a difference for latency
>> > sensitive users.
>> >
>> > The majority of the use cases for Kafka Connect and Mirror Maker are
>> > throughput sensitive. Ewen raised a good example where Kafka Connect
>> needs
>> > to process the previous data on rebalance therefore requires a higher
>> > rebalance timeout than process timeout. This is essentially the same in
>> > Mirror Maker, where each rebalance needs to flush all the messages in
>> the
>> > accumulator in the producer. That could take some time depending on how
>> > many messages are there. In this case, we may need to increase the
>> process
>> > timeout to make it the same as rebalance timeout. But this is probably
>> > fine. The downside of increasing process timeout is a longer detection
>> time
>> > of a consumer failure.  Detecting a consumer 

Re: [DISCUSS] KIP-62: Allow consumer to send heartbeats from a background thread

2016-06-06 Thread Guozhang Wang
(+ Matthias)

Hello Henry,

Specifically to your question regarding Kafka Streams:

1. Currently restoreActiveState() is triggered in the onPartitionAssigned
callback, which is after the rebalance is completed from the coordinator's
point of view, and hence is covered in the process timeout value in this
new KIP.

2. That is a good question, and I think it is a general root cause we saw
failures of directory locking reported by more than one use case already.
Currently I believe the main reason that a second rebalance is triggered
while the processors are still completing restoreActiveState() of the
previous rebalance is due to session timeout (default 30 seconds), which
will be largely reduced with a larger processor timeout; however with
complex topologies we restoreActiveState() for all states may still be
taking long time with tens / hundreds of state stores, and other cases that
also can cause consumers to re-join the groups right after a previous
rebalance, for example 1) regex subscription where the topic metadata has
changed, 2) consecutive consumer failures, or new consumers (i.e. new
KStream instances / threads) added.

For such cases we can do a better job to "fail fast" if the consumer
detects another join is needed. I think in one of your local commit you are
already doing sth similar, which we can merge back to trunk.



Guozhang


On Sun, Jun 5, 2016 at 11:24 PM, Henry Cai 
wrote:

> I have a question on the KIP on long stall during
> ProcessorStateManager.restoreActiveState(), this can be a long stall when
> we need to rebuild the RocksDB state on a new node.
>
> 1. Is restoreActiveState() considered as post rebalance since this is
> invoked on application rebalance listener?
> 2. When the node A was spending long time rebuilding the state in
> restoreActiveState() from the previous rebalance, a new node (node B) send
> a new JoinGroup request to the co-ordinator, how long should the
> coordinator wait for node A to finish the restoreActiveState from the
> previous rebalance, the restoreActiveState can take more than 10 minutes
> for a big state.
>
>
> On Sun, Jun 5, 2016 at 10:46 PM, Becket Qin  wrote:
>
> > Hi Jason,
> >
> > Thanks for this very useful KIP.  In general I am with Guozhang on the
> > purpose of of the three timeouts.
> > 1) session timeout for consumer liveness,
> > 2) process timeout (or maybe we should rename it to max.poll.interval.ms
> )
> > for application liveness,
> > 3) rebalance timeout for faster rebalance in some failure cases.
> >
> > It seems the current discussion is mainly about whether we need 3) as a
> > separate timeout or not. The current KIP proposal is to combine 2) and
> 3),
> > i.e. just use process timeout as rebalance timeout. That means we need to
> > either increase rebalance timeout out to let it adapt to process timeout,
> > or the reverse. It would be helpful to understand the impact of these two
> > cases. Here are my two cents.
> >
> > For users who are consuming data from Kafka, usually they either care
> about
> > throughput or care about latency.
> >
> > If users care about the latency, they would probably care more about
> > average latency instead of 99.99 percentile latency which can be affected
> > by many other more common reasons other than consumer failure. Because
> all
> > the timeout we are discussing here only have impact on the 99.99
> percentile
> > latency, I don't think it would really make a difference for latency
> > sensitive users.
> >
> > The majority of the use cases for Kafka Connect and Mirror Maker are
> > throughput sensitive. Ewen raised a good example where Kafka Connect
> needs
> > to process the previous data on rebalance therefore requires a higher
> > rebalance timeout than process timeout. This is essentially the same in
> > Mirror Maker, where each rebalance needs to flush all the messages in the
> > accumulator in the producer. That could take some time depending on how
> > many messages are there. In this case, we may need to increase the
> process
> > timeout to make it the same as rebalance timeout. But this is probably
> > fine. The downside of increasing process timeout is a longer detection
> time
> > of a consumer failure.  Detecting a consumer failure a little later only
> > has limited impact because the rest of the consumers in the same group
> are
> > still working fine. So the total throughput is unlikely to drop
> > significantly. As long as the rebalance is not taking longer it should be
> > fine. The reason we care more about how fast rebalance can finish is
> > because during rebalance no consumer in the group is consuming, i.e.
> > throughput is zero. So we want to make the rebalance finish as quickly as
> > possible.
> >
> > Compare with increasing process timeout to rebalance timeout, it seems a
> > more common case where user wants a longer process timeout, but smaller
> > rebalance timeout. I am more worried about this case where we 

Re: [DISCUSS] KIP-62: Allow consumer to send heartbeats from a background thread

2016-06-06 Thread Jason Gustafson
Hi Becket,

Thanks for the comments. It sounds like we are in basic agreement on this
KIP. On the rebalance timeout, I share the same concern about the impact of
tying the process and rebalance timeouts together. To be clear, this
problem exists in the current consumer (with the session timeout), so the
question is primarily whether this KIP goes far enough or solves the right
problem. My initial thought, if we exposed the rebalance timeout, was to
let the consumer fall out of the group if its processing prevented it from
rejoining in time. After thinking about it, you might be right that we
should really have the consumer rebalance in the background in that case.
Otherwise, the group could be unstable around rebalances and it will be
difficult to explain the behavior to users. That said, until it's clear
that we need to go that far, I'm reluctant to do so for the same reasons
you mentioned. The advantage of the current KIP is that it has a low impact
on existing users.

Also, I agree that max.poll.interval.ms seems clearer. It also fits neatly
with max.poll.records. Unless there are objections, I'll go ahead and adopt
this.

Thanks,
Jason



On Sun, Jun 5, 2016 at 11:24 PM, Henry Cai 
wrote:

> I have a question on the KIP on long stall during
> ProcessorStateManager.restoreActiveState(), this can be a long stall when
> we need to rebuild the RocksDB state on a new node.
>
> 1. Is restoreActiveState() considered as post rebalance since this is
> invoked on application rebalance listener?
> 2. When the node A was spending long time rebuilding the state in
> restoreActiveState() from the previous rebalance, a new node (node B) send
> a new JoinGroup request to the co-ordinator, how long should the
> coordinator wait for node A to finish the restoreActiveState from the
> previous rebalance, the restoreActiveState can take more than 10 minutes
> for a big state.
>
>
> On Sun, Jun 5, 2016 at 10:46 PM, Becket Qin  wrote:
>
> > Hi Jason,
> >
> > Thanks for this very useful KIP.  In general I am with Guozhang on the
> > purpose of of the three timeouts.
> > 1) session timeout for consumer liveness,
> > 2) process timeout (or maybe we should rename it to max.poll.interval.ms
> )
> > for application liveness,
> > 3) rebalance timeout for faster rebalance in some failure cases.
> >
> > It seems the current discussion is mainly about whether we need 3) as a
> > separate timeout or not. The current KIP proposal is to combine 2) and
> 3),
> > i.e. just use process timeout as rebalance timeout. That means we need to
> > either increase rebalance timeout out to let it adapt to process timeout,
> > or the reverse. It would be helpful to understand the impact of these two
> > cases. Here are my two cents.
> >
> > For users who are consuming data from Kafka, usually they either care
> about
> > throughput or care about latency.
> >
> > If users care about the latency, they would probably care more about
> > average latency instead of 99.99 percentile latency which can be affected
> > by many other more common reasons other than consumer failure. Because
> all
> > the timeout we are discussing here only have impact on the 99.99
> percentile
> > latency, I don't think it would really make a difference for latency
> > sensitive users.
> >
> > The majority of the use cases for Kafka Connect and Mirror Maker are
> > throughput sensitive. Ewen raised a good example where Kafka Connect
> needs
> > to process the previous data on rebalance therefore requires a higher
> > rebalance timeout than process timeout. This is essentially the same in
> > Mirror Maker, where each rebalance needs to flush all the messages in the
> > accumulator in the producer. That could take some time depending on how
> > many messages are there. In this case, we may need to increase the
> process
> > timeout to make it the same as rebalance timeout. But this is probably
> > fine. The downside of increasing process timeout is a longer detection
> time
> > of a consumer failure.  Detecting a consumer failure a little later only
> > has limited impact because the rest of the consumers in the same group
> are
> > still working fine. So the total throughput is unlikely to drop
> > significantly. As long as the rebalance is not taking longer it should be
> > fine. The reason we care more about how fast rebalance can finish is
> > because during rebalance no consumer in the group is consuming, i.e.
> > throughput is zero. So we want to make the rebalance finish as quickly as
> > possible.
> >
> > Compare with increasing process timeout to rebalance timeout, it seems a
> > more common case where user wants a longer process timeout, but smaller
> > rebalance timeout. I am more worried about this case where we have to
> > shoehorn the rebalance timeout into process timeout. For users care about
> > throughput, that might cause the rebalance to take unnecessarily longer.
> > Admittedly this only has impact when a 

[GitHub] kafka pull request #1474: KAFKA-3748: Add consumer-property to console tools...

2016-06-06 Thread bharatviswa504
GitHub user bharatviswa504 opened a pull request:

https://github.com/apache/kafka/pull/1474

KAFKA-3748: Add consumer-property to console tools consumer

@ijuma @harshach @edoardocomar Can you please review the changes.

@edoardocomar I have addressed your comment of extra space.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bharatviswa504/kafka Kafka-3748

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1474.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1474


commit 10812e9b1db4312d7e28276f144025da979a9ce2
Author: Bharat Viswanadham 
Date:   2016-06-06T18:05:44Z

KAFKA-3748: Add consumer-property to console tools consumer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3748) Add consumer-property to console tools consumer (similar to --producer-property)

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316906#comment-15316906
 ] 

ASF GitHub Bot commented on KAFKA-3748:
---

GitHub user bharatviswa504 opened a pull request:

https://github.com/apache/kafka/pull/1474

KAFKA-3748: Add consumer-property to console tools consumer

@ijuma @harshach @edoardocomar Can you please review the changes.

@edoardocomar I have addressed your comment of extra space.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bharatviswa504/kafka Kafka-3748

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1474.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1474


commit 10812e9b1db4312d7e28276f144025da979a9ce2
Author: Bharat Viswanadham 
Date:   2016-06-06T18:05:44Z

KAFKA-3748: Add consumer-property to console tools consumer




> Add consumer-property to console tools consumer (similar to 
> --producer-property)
> 
>
> Key: KAFKA-3748
> URL: https://issues.apache.org/jira/browse/KAFKA-3748
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.9.0.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>  Labels: newbie
>
> Add --consumer-property to the console consumer.
> Creating this task from the comment given in KAFKA-3567.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3748) Add consumer-property to console tools consumer (similar to --producer-property)

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316903#comment-15316903
 ] 

ASF GitHub Bot commented on KAFKA-3748:
---

Github user bharatviswa504 closed the pull request at:

https://github.com/apache/kafka/pull/1459


> Add consumer-property to console tools consumer (similar to 
> --producer-property)
> 
>
> Key: KAFKA-3748
> URL: https://issues.apache.org/jira/browse/KAFKA-3748
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.9.0.0
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>  Labels: newbie
>
> Add --consumer-property to the console consumer.
> Creating this task from the comment given in KAFKA-3567.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1459: KAFKA-3748: Add consumer-property to console tools...

2016-06-06 Thread bharatviswa504
Github user bharatviswa504 closed the pull request at:

https://github.com/apache/kafka/pull/1459


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-63: Unify store and downstream caching in streams

2016-06-06 Thread Ismael Juma
On Mon, Jun 6, 2016 at 6:48 PM, Guozhang Wang  wrote:
>
> About using Instrumentation.getObjectSize, yeah we were worried a lot about
> its efficiency as well as accuracy when discussing internally, but not a
> better solution was proposed. So if people have better ideas, please throw
> them here, as it is also the purpose for us to call out such KIP discussion
> threads.
>

Note that this requires a Java agent to be configured. A few links:

https://github.com/apache/spark/blob/b0ce0d13127431fa7cd4c11064762eb0b12e3436/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/utils/ObjectSizes.java
https://github.com/jbellis/jamm
http://openjdk.java.net/projects/code-tools/jol/
https://github.com/DimitrisAndreou/memory-measurer

OK, maybe that's more than what you wanted. :)

Ismael


Re: [DISCUSS] KIP-63: Unify store and downstream caching in streams

2016-06-06 Thread Guozhang Wang
About the distribution of memory: I think it is good to evenly distribute
the memory across threads, while what I was calling our for considerata is
the distribution of memory within a thread, to its tasks / caches of the
task, as the number of tasks and the number of caches in the task can vary
from rebalance to rebalance, hence the cache size (if distribute evenly
again) will be different from time to time as well; and it may be
sub-optimal to still do even distribution within a thread.

Henry's suggestion of having a global cache per thread (i.e. we are still
partitioning the cache across threads, but within a thread we use a single
cache) sounds interesting, and we still need to figure out how to avoid
conflicts for the same key objects from different operators (assuming this
global cache is a map from Object to Object).

About using Instrumentation.getObjectSize, yeah we were worried a lot about
its efficiency as well as accuracy when discussing internally, but not a
better solution was proposed. So if people have better ideas, please throw
them here, as it is also the purpose for us to call out such KIP discussion
threads.


Guozhang



On Mon, Jun 6, 2016 at 8:49 AM, Jay Kreps  wrote:

> For the threads, I think it might be reasonable to just give each thread
> cache_bytes/num_threads for simplicity? Presumably you wouldn't want to
> have to make the cache threadsafe so you'd want to avoid sharing? Giving
> each thread the same cache isn't quite right, as you point out, but
> depending on how random task assignment is might be okay? This type of
> randomization actually seems important from a CPU load balancing
> point-of-view too (i.e. you don't want all the tasks for some sub-topology
> going to the same cpu core).
>
> For Instrumentation.getObjectSize, does it really give the size of the
> referenced object graph or is it more like size_of in C? Is it fast enough
> to use for this kind of thing (seems like computing the "deep" object size
> could be easy to overwhelm the cost of the hash table lookup)?
>
> -Jay
>
>
>
> On Sun, Jun 5, 2016 at 12:44 PM, Guozhang Wang  wrote:
>
> > There are some details needed to be figured out if we go global:
> >
> > A KafkaStreams instance could have M threads, and each thread could
> various
> > number (let's say N, but in practice it may be different from thread to
> > thread) tasks, and each task contains a sub-topology with P caches (again
> > in practice it may be different depending on which sub-topology it
> > contains).
> >
> > Say if user specified X Gb for this KafkaStreams instance, then each
> cache
> > will get X / M / N / P Gb. But remember N and P can change from rebalance
> > to rebalance, and threads does not communicate with each other during
> their
> > life time. So it is hard to determine M and N dynamically.
> >
> > Plus, different caches may have different cache hit rate, so distributing
> > the memory evenly to them may not be an optimal solution (some caches may
> > be flushed much more frequently than others), and also since we are
> > considering to use instrumentation.getObjectSize which is approximate, it
> > may exaggerate the imbalance.
> >
> >
> > Guozhang
> >
> >
> > On Sat, Jun 4, 2016 at 11:54 PM, Eno Thereska 
> > wrote:
> >
> > > Hi Jay,
> > >
> > > We can make it global instead of per-processor, sounds good.
> > >
> > > Thanks
> > > Eno
> > >
> > >
> > > > On 3 Jun 2016, at 23:15, Jay Kreps  wrote:
> > > >
> > > > Hey Eno,
> > > >
> > > > Should the config be the global memory use rather than the
> > per-processor?
> > > > That is, let’s say I know I have fixed a 1GB heap because that is
> what
> > I
> > > > set for Java, and want to use 100MB for caching, it seems like right
> > now
> > > > I’d have to do some math that depends on my knowing a bit about how
> > > caching
> > > > works to figure out how to set that parameter so I don't run out of
> > > memory.
> > > > Does it also depend on the number of partitions assigned (and hence
> the
> > > > number of task), if so that makes it even harder to set since each
> time
> > > > rebalancing happens that changes so it is then pretty hard to set
> > safely.
> > > >
> > > > You could theoretically argue for either bottom up (you know how much
> > > cache
> > > > you need per processor as you have it and you want to get exactly
> that)
> > > or
> > > > top down (you know how much memory you have to spare but can't be
> > > bothered
> > > > to work out what that amounts to per-processor). I think our
> experience
> > > has
> > > > been that 99% of people never change the default and if it runs out
> of
> > > > memory they really struggle to fix it and kind of blame us, so I
> think
> > > top
> > > > down and a global config might be better. :-)
> > > >
> > > > Example: https://issues.apache.org/jira/browse/KAFKA-3775
> > > >
> > > > -Jay
> > > >
> > > > On Fri, Jun 3, 2016 at 2:39 PM, Eno Thereska 

Build failed in Jenkins: kafka-0.10.0-jdk7 #120

2016-06-06 Thread Apache Jenkins Server
See 

--
[...truncated 3367 lines...]

kafka.log.LogSegmentTest > testReadFromGap PASSED

kafka.log.LogSegmentTest > testTruncate PASSED

kafka.log.LogSegmentTest > testReadBeforeFirstOffset PASSED

kafka.log.LogSegmentTest > testCreateWithInitFileSizeAppendMessage PASSED

kafka.log.LogSegmentTest > testChangeFileSuffixes PASSED

kafka.log.LogSegmentTest > testMaxOffset PASSED

kafka.log.LogSegmentTest > testNextOffsetCalculation PASSED

kafka.log.LogSegmentTest > testReadOnEmptySegment PASSED

kafka.log.LogSegmentTest > testReadAfterLast PASSED

kafka.log.LogSegmentTest > testCreateWithInitFileSizeClearShutdown PASSED

kafka.log.LogSegmentTest > testTruncateFull PASSED

kafka.log.LogConfigTest > testFromPropsEmpty PASSED

kafka.log.LogConfigTest > testKafkaConfigToProps PASSED

kafka.log.LogConfigTest > testFromPropsInvalid PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[0] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[1] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[2] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[3] PASSED

kafka.log.LogManagerTest > testCleanupSegmentsToMaintainSize PASSED

kafka.log.LogManagerTest > testRecoveryDirectoryMappingWithRelativeDirectory 
PASSED

kafka.log.LogManagerTest > testGetNonExistentLog PASSED

kafka.log.LogManagerTest > testTwoLogManagersUsingSameDirFails PASSED

kafka.log.LogManagerTest > testLeastLoadedAssignment PASSED

kafka.log.LogManagerTest > testCleanupExpiredSegments PASSED

kafka.log.LogManagerTest > testCheckpointRecoveryPoints PASSED

kafka.log.LogManagerTest > testTimeBasedFlush PASSED

kafka.log.LogManagerTest > testCreateLog PASSED

kafka.log.LogManagerTest > testRecoveryDirectoryMappingWithTrailingSlash PASSED

kafka.coordinator.MemberMetadataTest > testMatchesSupportedProtocols PASSED

kafka.coordinator.MemberMetadataTest > testMetadata PASSED

kafka.coordinator.MemberMetadataTest > testMetadataRaisesOnUnsupportedProtocol 
PASSED

kafka.coordinator.MemberMetadataTest > testVoteForPreferredProtocol PASSED

kafka.coordinator.MemberMetadataTest > testVoteRaisesOnNoSupportedProtocols 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testHeartbeatWrongCoordinator 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testDescribeGroupStable PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testHeartbeatIllegalGeneration 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testDescribeGroupWrongCoordinator PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testDescribeGroupRebalancing 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testLeaderFailureInSyncGroup 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testGenerationIdIncrementsOnRebalance PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testSyncGroupFromIllegalGeneration PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testInvalidGroupId PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testHeartbeatUnknownGroup 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testListGroupsIncludesStableGroups PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testHeartbeatDuringRebalanceCausesRebalanceInProgress PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testSessionTimeout PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testJoinGroupInconsistentGroupProtocol PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testJoinGroupSessionTimeoutTooLarge PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testJoinGroupSessionTimeoutTooSmall PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testSyncGroupEmptyAssignment 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testCommitOffsetWithDefaultGeneration PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testHeartbeatMaintainsSession 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testJoinGroupFromUnchangedLeaderShouldRebalance PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testHeartbeatRebalanceInProgress PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testLeaveGroupUnknownGroup 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testListGroupsIncludesRebalancingGroups PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testSyncGroupFollowerAfterLeader PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testCommitOffsetInAwaitingSync 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testJoinGroupWrongCoordinator 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testJoinGroupUnknownConsumerExistingGroup PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testSyncGroupFromUnknownGroup 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testJoinGroupInconsistentProtocolType PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testCommitOffsetFromUnknownGroup PASSED


Re: [DISCUSS] KIP-63: Unify store and downstream caching in streams

2016-06-06 Thread Jay Kreps
For the threads, I think it might be reasonable to just give each thread
cache_bytes/num_threads for simplicity? Presumably you wouldn't want to
have to make the cache threadsafe so you'd want to avoid sharing? Giving
each thread the same cache isn't quite right, as you point out, but
depending on how random task assignment is might be okay? This type of
randomization actually seems important from a CPU load balancing
point-of-view too (i.e. you don't want all the tasks for some sub-topology
going to the same cpu core).

For Instrumentation.getObjectSize, does it really give the size of the
referenced object graph or is it more like size_of in C? Is it fast enough
to use for this kind of thing (seems like computing the "deep" object size
could be easy to overwhelm the cost of the hash table lookup)?

-Jay



On Sun, Jun 5, 2016 at 12:44 PM, Guozhang Wang  wrote:

> There are some details needed to be figured out if we go global:
>
> A KafkaStreams instance could have M threads, and each thread could various
> number (let's say N, but in practice it may be different from thread to
> thread) tasks, and each task contains a sub-topology with P caches (again
> in practice it may be different depending on which sub-topology it
> contains).
>
> Say if user specified X Gb for this KafkaStreams instance, then each cache
> will get X / M / N / P Gb. But remember N and P can change from rebalance
> to rebalance, and threads does not communicate with each other during their
> life time. So it is hard to determine M and N dynamically.
>
> Plus, different caches may have different cache hit rate, so distributing
> the memory evenly to them may not be an optimal solution (some caches may
> be flushed much more frequently than others), and also since we are
> considering to use instrumentation.getObjectSize which is approximate, it
> may exaggerate the imbalance.
>
>
> Guozhang
>
>
> On Sat, Jun 4, 2016 at 11:54 PM, Eno Thereska 
> wrote:
>
> > Hi Jay,
> >
> > We can make it global instead of per-processor, sounds good.
> >
> > Thanks
> > Eno
> >
> >
> > > On 3 Jun 2016, at 23:15, Jay Kreps  wrote:
> > >
> > > Hey Eno,
> > >
> > > Should the config be the global memory use rather than the
> per-processor?
> > > That is, let’s say I know I have fixed a 1GB heap because that is what
> I
> > > set for Java, and want to use 100MB for caching, it seems like right
> now
> > > I’d have to do some math that depends on my knowing a bit about how
> > caching
> > > works to figure out how to set that parameter so I don't run out of
> > memory.
> > > Does it also depend on the number of partitions assigned (and hence the
> > > number of task), if so that makes it even harder to set since each time
> > > rebalancing happens that changes so it is then pretty hard to set
> safely.
> > >
> > > You could theoretically argue for either bottom up (you know how much
> > cache
> > > you need per processor as you have it and you want to get exactly that)
> > or
> > > top down (you know how much memory you have to spare but can't be
> > bothered
> > > to work out what that amounts to per-processor). I think our experience
> > has
> > > been that 99% of people never change the default and if it runs out of
> > > memory they really struggle to fix it and kind of blame us, so I think
> > top
> > > down and a global config might be better. :-)
> > >
> > > Example: https://issues.apache.org/jira/browse/KAFKA-3775
> > >
> > > -Jay
> > >
> > > On Fri, Jun 3, 2016 at 2:39 PM, Eno Thereska 
> > wrote:
> > >
> > >> Hi Gwen,
> > >>
> > >> Yes. As an example, if cache.max.bytes.buffering set to X, and if
> users
> > >> have A aggregation operators and T KTable.to() operators, then X*(A +
> T)
> > >> total bytes will be allocated for caching.
> > >>
> > >> Eno
> > >>
> > >>> On 3 Jun 2016, at 21:37, Gwen Shapira  wrote:
> > >>>
> > >>> Just to clarify: "cache.max.bytes.buffering" is per processor?
> > >>>
> > >>>
> > >>> On Thu, Jun 2, 2016 at 11:30 AM, Eno Thereska <
> eno.there...@gmail.com>
> > >> wrote:
> >  Hi there,
> > 
> >  I have created KIP-63: Unify store and downstream caching in streams
> > 
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-63%3A+Unify+store+and+downstream+caching+in+streams
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams
> > >>>
> > 
> > 
> >  Feedback is appreciated.
> > 
> >  Thank you
> >  Eno
> > >>
> > >>
> >
> >
>
>
> --
> -- Guozhang
>


[jira] [Commented] (KAFKA-3693) Race condition between highwatermark-checkpoint thread and handleLeaderAndIsrRequest at broker start-up

2016-06-06 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316652#comment-15316652
 ] 

Jun Rao commented on KAFKA-3693:


Yes, if the controller detects that a broker is down, it will remove it from 
it's broker list and stop sending requests to that broker.

It's still not very clear to me what happened here. Earlier you were saying 
"broker 16 has been shutting down since 05:40:46,845 until 06:17:01,701". Now 
you are saying "So broker 16 was shutting down at 06:17:02,012 (not down 
yet).".  Was broker 16 restarted at 06:17:01,701 and shut down again at 
06:17:02,012?

> Race condition between highwatermark-checkpoint thread and 
> handleLeaderAndIsrRequest at broker start-up
> ---
>
> Key: KAFKA-3693
> URL: https://issues.apache.org/jira/browse/KAFKA-3693
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1
>Reporter: Maysam Yabandeh
>
> Upon broker start-up, a race between highwatermark-checkpoint thread to write 
> replication-offset-checkpoint file and handleLeaderAndIsrRequest thread 
> reading from it causes the highwatermark for some partitions to be reset to 
> 0. In the good case, this results the replica to truncate its entire log to 0 
> and hence initiates fetching of terabytes of data from the lead broker, which 
> sometimes leads to hours of downtime. We observed the bad cases that the 
> reset offset can propagate to recovery-point-offset-checkpoint file, making a 
> lead broker to truncate the file. This seems to have the potential to lead to 
> data loss if the truncation happens at both follower and leader brokers.
> This is the particular faulty scenario manifested in our tests:
> # The broker restarts and receive LeaderAndIsr from the controller
> # LeaderAndIsr message however does not contain all the partitions (probably 
> because other brokers were churning at the same time)
> # becomeLeaderOrFollower calls getOrCreatePartition and updates the 
> allPartitions with the partitions included in the LeaderAndIsr message {code}
>   def getOrCreatePartition(topic: String, partitionId: Int): Partition = {
> var partition = allPartitions.get((topic, partitionId))
> if (partition == null) {
>   allPartitions.putIfNotExists((topic, partitionId), new Partition(topic, 
> partitionId, time, this))
> {code}
> # replication-offset-checkpoint jumps in taking a snapshot of (the partial) 
> allReplicas' high watermark into replication-offset-checkpoint file {code}  
> def checkpointHighWatermarks() {
> val replicas = 
> allPartitions.values.map(_.getReplica(config.brokerId)).collect{case 
> Some(replica) => replica}{code} hence rewriting the previous highwatermarks.
> # Later becomeLeaderOrFollower calls makeLeaders and makeFollowers which read 
> the (now partial) file through Partition::getOrCreateReplica {code}
>   val checkpoint = 
> replicaManager.highWatermarkCheckpoints(log.dir.getParentFile.getAbsolutePath)
>   val offsetMap = checkpoint.read
>   if (!offsetMap.contains(TopicAndPartition(topic, partitionId)))
> info("No checkpointed highwatermark is found for partition 
> [%s,%d]".format(topic, partitionId))
> {code}
> We are not entirely sure whether the initial LeaderAndIsr message including a 
> subset of partitions is critical in making this race condition manifest or 
> not. But it is an important detail since it clarifies that a solution based 
> on not letting the highwatermark-checkpoint thread jumping in the middle of 
> processing a LeaderAndIsr message would not suffice.
> The solution we are thinking of is to force initializing allPartitions by the 
> partitions listed in the replication-offset-checkpoint (and perhaps 
> recovery-point-offset-checkpoint file too) when a server starts.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-63: Unify store and downstream caching in streams

2016-06-06 Thread Henry Cai
If we are going with a global setting, can the cache just be one gigantic
LRU cache instead of many smaller ones?  (partition_name, task_name) can be
part of cache key.

If there are many smaller caches, it's hard to achieve efficient global
resource utilization (some caches are busy, some caches are very idle, and
the number of caches can change because of rebalancing).

On Mon, Jun 6, 2016 at 2:59 AM, Eno Thereska  wrote:

> Hi Guozhang,
>
> About your first point: the alternative is not knowing how much memory a
> KafkaStreams instance will consume, since, as you mention, M and N can
> change. I agree the implementation is slightly harder since each cache now
> can change size dynamically (and the Kafka Streams instance needs to
> coordinate that).
>
> About the different cache hit rates argument, I agree, a more
> sophisticated solution would provide variable-sized caches. But precisely
> this argument leads to a global configuration parameter being better in my
> opinion, since the Kafka Streams instance would get a total memory budged
> and do what's best with it. Note I am not suggesting we attempt this for
> the V1 implementation, just pointing out that it is possible to do
> variable-size caches with the global config.
>
> So overall we have a tradeoff between a more complex implementation but a
> guarantee on total memory usage (global setting), and a simple
> implementation but with variable memory usage (per-cache setting).
>
> Eno
>
> > On 5 Jun 2016, at 20:44, Guozhang Wang  wrote:
> >
> > There are some details needed to be figured out if we go global:
> >
> > A KafkaStreams instance could have M threads, and each thread could
> various
> > number (let's say N, but in practice it may be different from thread to
> > thread) tasks, and each task contains a sub-topology with P caches (again
> > in practice it may be different depending on which sub-topology it
> > contains).
> >
> > Say if user specified X Gb for this KafkaStreams instance, then each
> cache
> > will get X / M / N / P Gb. But remember N and P can change from rebalance
> > to rebalance, and threads does not communicate with each other during
> their
> > life time. So it is hard to determine M and N dynamically.
> >
> > Plus, different caches may have different cache hit rate, so distributing
> > the memory evenly to them may not be an optimal solution (some caches may
> > be flushed much more frequently than others), and also since we are
> > considering to use instrumentation.getObjectSize which is approximate, it
> > may exaggerate the imbalance.
> >
> >
> > Guozhang
> >
> >
> > On Sat, Jun 4, 2016 at 11:54 PM, Eno Thereska 
> > wrote:
> >
> >> Hi Jay,
> >>
> >> We can make it global instead of per-processor, sounds good.
> >>
> >> Thanks
> >> Eno
> >>
> >>
> >>> On 3 Jun 2016, at 23:15, Jay Kreps  wrote:
> >>>
> >>> Hey Eno,
> >>>
> >>> Should the config be the global memory use rather than the
> per-processor?
> >>> That is, let’s say I know I have fixed a 1GB heap because that is what
> I
> >>> set for Java, and want to use 100MB for caching, it seems like right
> now
> >>> I’d have to do some math that depends on my knowing a bit about how
> >> caching
> >>> works to figure out how to set that parameter so I don't run out of
> >> memory.
> >>> Does it also depend on the number of partitions assigned (and hence the
> >>> number of task), if so that makes it even harder to set since each time
> >>> rebalancing happens that changes so it is then pretty hard to set
> safely.
> >>>
> >>> You could theoretically argue for either bottom up (you know how much
> >> cache
> >>> you need per processor as you have it and you want to get exactly that)
> >> or
> >>> top down (you know how much memory you have to spare but can't be
> >> bothered
> >>> to work out what that amounts to per-processor). I think our experience
> >> has
> >>> been that 99% of people never change the default and if it runs out of
> >>> memory they really struggle to fix it and kind of blame us, so I think
> >> top
> >>> down and a global config might be better. :-)
> >>>
> >>> Example: https://issues.apache.org/jira/browse/KAFKA-3775
> >>>
> >>> -Jay
> >>>
> >>> On Fri, Jun 3, 2016 at 2:39 PM, Eno Thereska 
> >> wrote:
> >>>
>  Hi Gwen,
> 
>  Yes. As an example, if cache.max.bytes.buffering set to X, and if
> users
>  have A aggregation operators and T KTable.to() operators, then X*(A +
> T)
>  total bytes will be allocated for caching.
> 
>  Eno
> 
> > On 3 Jun 2016, at 21:37, Gwen Shapira  wrote:
> >
> > Just to clarify: "cache.max.bytes.buffering" is per processor?
> >
> >
> > On Thu, Jun 2, 2016 at 11:30 AM, Eno Thereska <
> eno.there...@gmail.com>
>  wrote:
> >> Hi there,
> >>
> >> I have created KIP-63: Unify store and downstream caching in streams
> 

[jira] [Commented] (KAFKA-3792) Fix log spam in clients when topic doesn't exist

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316575#comment-15316575
 ] 

ASF GitHub Bot commented on KAFKA-3792:
---

GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/1473

KAFKA-3792: Fix log spam in clients for unknown topics



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-3792

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1473.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1473


commit f3d0f7cdd946ba35486d4a741d31045ca88e5f87
Author: rsivaram 
Date:   2015-12-09T00:16:18Z

KAFKA-2948: Remove unused topics from producer metadata set

commit 50fb852dbeec50c99505fb982773a3d616740d09
Author: Rajini Sivaram 
Date:   2016-06-06T14:13:15Z

KAFKA-3792: Remove repetitive warnings for unknown topics




> Fix log spam in clients when topic doesn't exist
> 
>
> Key: KAFKA-3792
> URL: https://issues.apache.org/jira/browse/KAFKA-3792
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>
> Kafka producer logs an error message for every retry of metadata when an app 
> attempts to publish to a topic that doesn't exist. Kafka consumer manual 
> assign logs error messages forever if topic doesn't exist.
> See discussion in the PR for KAFKA-2948 
> (https://github.com/apache/kafka/pull/645) for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1473: KAFKA-3792: Fix log spam in clients for unknown to...

2016-06-06 Thread rajinisivaram
GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/1473

KAFKA-3792: Fix log spam in clients for unknown topics



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-3792

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1473.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1473


commit f3d0f7cdd946ba35486d4a741d31045ca88e5f87
Author: rsivaram 
Date:   2015-12-09T00:16:18Z

KAFKA-2948: Remove unused topics from producer metadata set

commit 50fb852dbeec50c99505fb982773a3d616740d09
Author: Rajini Sivaram 
Date:   2016-06-06T14:13:15Z

KAFKA-3792: Remove repetitive warnings for unknown topics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-0.10.0-jdk7 #119

2016-06-06 Thread Apache Jenkins Server
See 

--
[...truncated 3367 lines...]

kafka.log.LogManagerTest > testRecoveryDirectoryMappingWithRelativeDirectory 
PASSED

kafka.log.LogManagerTest > testGetNonExistentLog PASSED

kafka.log.LogManagerTest > testTwoLogManagersUsingSameDirFails PASSED

kafka.log.LogManagerTest > testLeastLoadedAssignment PASSED
ERROR: Could not install GRADLE_2_4_RC_2_HOME
java.lang.NullPointerException
at 
hudson.plugins.toolenv.ToolEnvBuildWrapper$1.buildEnvVars(ToolEnvBuildWrapper.java:46)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:947)
at hudson.plugins.git.GitSCM.getParamExpandedRepos(GitSCM.java:390)
at 
hudson.plugins.git.GitSCM.compareRemoteRevisionWithImpl(GitSCM.java:577)
at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:527)
at hudson.scm.SCM.compareRemoteRevisionWith(SCM.java:381)
at hudson.scm.SCM.poll(SCM.java:398)
at hudson.model.AbstractProject._poll(AbstractProject.java:1453)
at hudson.model.AbstractProject.poll(AbstractProject.java:1356)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:526)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:555)
at 
hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:119)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
ERROR: Could not install JDK_1_7U51_HOME
java.lang.NullPointerException
at 
hudson.plugins.toolenv.ToolEnvBuildWrapper$1.buildEnvVars(ToolEnvBuildWrapper.java:46)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:947)
at hudson.plugins.git.GitSCM.getParamExpandedRepos(GitSCM.java:390)
at 
hudson.plugins.git.GitSCM.compareRemoteRevisionWithImpl(GitSCM.java:577)
at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:527)
at hudson.scm.SCM.compareRemoteRevisionWith(SCM.java:381)
at hudson.scm.SCM.poll(SCM.java:398)
at hudson.model.AbstractProject._poll(AbstractProject.java:1453)
at hudson.model.AbstractProject.poll(AbstractProject.java:1356)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:526)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:555)
at 
hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:119)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

kafka.log.LogManagerTest > testCleanupExpiredSegments PASSED

kafka.log.LogManagerTest > testCheckpointRecoveryPoints PASSED

kafka.log.LogManagerTest > testTimeBasedFlush PASSED

kafka.log.LogManagerTest > testCreateLog PASSED

kafka.log.LogManagerTest > testRecoveryDirectoryMappingWithTrailingSlash PASSED

kafka.coordinator.MemberMetadataTest > testMatchesSupportedProtocols PASSED

kafka.coordinator.MemberMetadataTest > testMetadata PASSED

kafka.coordinator.MemberMetadataTest > testMetadataRaisesOnUnsupportedProtocol 
PASSED

kafka.coordinator.MemberMetadataTest > testVoteForPreferredProtocol PASSED

kafka.coordinator.MemberMetadataTest > testVoteRaisesOnNoSupportedProtocols 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testHeartbeatWrongCoordinator 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testDescribeGroupStable PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testHeartbeatIllegalGeneration 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testDescribeGroupWrongCoordinator PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testDescribeGroupRebalancing 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testLeaderFailureInSyncGroup 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testGenerationIdIncrementsOnRebalance PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testSyncGroupFromIllegalGeneration PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testInvalidGroupId PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testHeartbeatUnknownGroup 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testListGroupsIncludesStableGroups PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testHeartbeatDuringRebalanceCausesRebalanceInProgress PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 

Jenkins build is back to normal : kafka-trunk-jdk8 #683

2016-06-06 Thread Apache Jenkins Server
See 



[jira] [Resolved] (KAFKA-3764) Error processing append operation on partition

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-3764.

Resolution: Duplicate

Marking as duplicate of KAFKA-3789 as the latter describes the problem more 
clearly (which was only possible due to this JIRA).

> Error processing append operation on partition
> --
>
> Key: KAFKA-3764
> URL: https://issues.apache.org/jira/browse/KAFKA-3764
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Martin Nowak
>Assignee: Grant Henke
>
> After updating Kafka from 0.9.0.1 to 0.10.0.0 I'm getting plenty of `Error 
> processing append operation on partition` errors. This happens with 
> ruby-kafka as producer and enabled snappy compression.
> {noformat}
> [2016-05-27 20:00:11,074] ERROR [Replica Manager on Broker 2]: Error 
> processing append operation on partition m2m-0 (kafka.server.ReplicaManager)
> kafka.common.KafkaException: 
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:159)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:85)
> at 
> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:64)
> at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:56)
> at 
> kafka.message.ByteBufferMessageSet$$anon$2.makeNextOuter(ByteBufferMessageSet.scala:357)
> at 
> kafka.message.ByteBufferMessageSet$$anon$2.makeNext(ByteBufferMessageSet.scala:369)
> at 
> kafka.message.ByteBufferMessageSet$$anon$2.makeNext(ByteBufferMessageSet.scala:324)
> at 
> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:64)
> at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:56)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:30)
> at 
> kafka.message.ByteBufferMessageSet.validateMessagesAndAssignOffsets(ByteBufferMessageSet.scala:427)
> at kafka.log.Log.liftedTree1$1(Log.scala:339)
> at kafka.log.Log.append(Log.scala:338)
> at kafka.cluster.Partition$$anonfun$11.apply(Partition.scala:443)
> at kafka.cluster.Partition$$anonfun$11.apply(Partition.scala:429)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
> at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:237)
> at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:406)
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:392)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:392)
> at 
> kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:328)
> at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:405)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: failed to read chunk
> at 
> org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:433)
> at 
> org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:167)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at java.io.DataInputStream.readLong(DataInputStream.java:416)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.readMessageFromStream(ByteBufferMessageSet.scala:118)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:153)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-0.10.0-jdk7 #118

2016-06-06 Thread Apache Jenkins Server
See 

--
[...truncated 3170 lines...]

kafka.network.SocketServerTest > testMaxConnectionsPerIp PASSED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics PASSED

kafka.network.SocketServerTest > simpleRequest PASSED

kafka.network.SocketServerTest > testSessionPrincipal PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIpOverrides PASSED

kafka.network.SocketServerTest > testSocketsCloseOnShutdown PASSED

kafka.network.SocketServerTest > testSslSocketServer PASSED

kafka.network.SocketServerTest > tooBigRequestIsRejected PASSED

kafka.log.OffsetIndexTest > lookupExtremeCases PASSED

kafka.log.OffsetIndexTest > appendTooMany PASSED

kafka.log.OffsetIndexTest > randomLookupTest PASSED

kafka.log.OffsetIndexTest > testReopen PASSED

kafka.log.OffsetIndexTest > appendOutOfOrder PASSED

kafka.log.OffsetIndexTest > truncate PASSED

kafka.log.OffsetMapTest > testClear PASSED

kafka.log.OffsetMapTest > testBasicValidation PASSED

kafka.log.CleanerTest > testBuildOffsetMap PASSED

kafka.log.CleanerTest > testBuildOffsetMapFakeLarge PASSED

kafka.log.CleanerTest > testSegmentGrouping PASSED

kafka.log.CleanerTest > testCleanSegmentsWithAbort PASSED

kafka.log.CleanerTest > testSegmentGroupingWithSparseOffsets PASSED

kafka.log.CleanerTest > testRecoveryAfterCrash PASSED

kafka.log.CleanerTest > testLogToClean PASSED

kafka.log.CleanerTest > testCleaningWithDeletes PASSED

kafka.log.CleanerTest > testCleanSegments PASSED

kafka.log.CleanerTest > testCleaningWithUnkeyedMessages PASSED

kafka.log.FileMessageSetTest > testTruncate PASSED

kafka.log.FileMessageSetTest > testIterationOverPartialAndTruncation PASSED

kafka.log.FileMessageSetTest > testRead PASSED

kafka.log.FileMessageSetTest > testFileSize PASSED

kafka.log.FileMessageSetTest > testIteratorWithLimits PASSED

kafka.log.FileMessageSetTest > testPreallocateTrue PASSED

kafka.log.FileMessageSetTest > testIteratorIsConsistent PASSED

kafka.log.FileMessageSetTest > testFormatConversionWithPartialMessage PASSED

kafka.log.FileMessageSetTest > testIterationDoesntChangePosition PASSED

kafka.log.FileMessageSetTest > testWrittenEqualsRead PASSED

kafka.log.FileMessageSetTest > testWriteTo PASSED

kafka.log.FileMessageSetTest > testPreallocateFalse PASSED

kafka.log.FileMessageSetTest > testPreallocateClearShutdown PASSED

kafka.log.FileMessageSetTest > testMessageFormatConversion PASSED

kafka.log.FileMessageSetTest > testSearch PASSED

kafka.log.FileMessageSetTest > testSizeInBytes PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingTopic PASSED

kafka.log.LogTest > testIndexRebuild PASSED

kafka.log.LogTest > testLogRolls PASSED

kafka.log.LogTest > testMessageSizeCheck PASSED

kafka.log.LogTest > testAsyncDelete PASSED

kafka.log.LogTest > testReadOutOfRange PASSED

kafka.log.LogTest > testAppendWithOutOfOrderOffsetsThrowsException PASSED

kafka.log.LogTest > testReadAtLogGap PASSED

kafka.log.LogTest > testTimeBasedLogRoll PASSED

kafka.log.LogTest > testLoadEmptyLog PASSED

kafka.log.LogTest > testMessageSetSizeCheck PASSED

kafka.log.LogTest > testIndexResizingAtTruncation PASSED

kafka.log.LogTest > testCompactedTopicConstraints PASSED

kafka.log.LogTest > testThatGarbageCollectingSegmentsDoesntChangeOffset PASSED

kafka.log.LogTest > testAppendAndReadWithSequentialOffsets PASSED

kafka.log.LogTest > testDeleteOldSegmentsMethod PASSED

kafka.log.LogTest > testParseTopicPartitionNameForNull PASSED

kafka.log.LogTest > testAppendAndReadWithNonSequentialOffsets PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingSeparator PASSED

kafka.log.LogTest > testCorruptIndexRebuild PASSED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved PASSED

kafka.log.LogTest > testCompressedMessages PASSED

kafka.log.LogTest > testAppendMessageWithNullPayload PASSED

kafka.log.LogTest > testCorruptLog PASSED

kafka.log.LogTest > testLogRecoversToCorrectOffset PASSED

kafka.log.LogTest > testReopenThenTruncate PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition PASSED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName PASSED

kafka.log.LogTest > testOpenDeletesObsoleteFiles PASSED

kafka.log.LogTest > testSizeBasedLogRoll PASSED

kafka.log.LogTest > testTimeBasedLogRollJitter PASSED

kafka.log.LogTest > testParseTopicPartitionName PASSED

kafka.log.LogTest > testTruncateTo PASSED

kafka.log.LogTest > testCleanShutdownFile PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[0] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[1] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[2] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[3] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[4] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[5] PASSED

kafka.log.BrokerCompressionTest 

[jira] [Assigned] (KAFKA-3167) Use local to the workspace Gradle cache and recreate it on every build

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reassigned KAFKA-3167:
--

Assignee: Ismael Juma

> Use local to the workspace Gradle cache and recreate it on every build
> --
>
> Key: KAFKA-3167
> URL: https://issues.apache.org/jira/browse/KAFKA-3167
> Project: Kafka
>  Issue Type: Improvement
>  Components: build
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>
> Kafka builds often fail with "Could not add entry 
> '/home/jenkins/.gradle/caches/modules-2/files-2.1/net.jpountz.lz4/lz4/1.3.0/c708bb2590c0652a642236ef45d9f99ff842a2ce/lz4-1.3.0.jar'
>  to cache fileHashes.bin"
> I filed INFRA-11083 and Andrew Bayer suggested:
> "Can you change your builds to use a local-to-the-workspace cache and then 
> nuke it/recreate it on every build?"
> This issue is about changing the Jenkins config for one of the trunk builds 
> to do the above to see if it helps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-trunk-jdk7 #1348

2016-06-06 Thread Apache Jenkins Server
See 



Re: [DISCUSS] KIP-63: Unify store and downstream caching in streams

2016-06-06 Thread Eno Thereska
Hi Guozhang,

About your first point: the alternative is not knowing how much memory a 
KafkaStreams instance will consume, since, as you mention, M and N can change. 
I agree the implementation is slightly harder since each cache now can change 
size dynamically (and the Kafka Streams instance needs to coordinate that).

About the different cache hit rates argument, I agree, a more sophisticated 
solution would provide variable-sized caches. But precisely this argument leads 
to a global configuration parameter being better in my opinion, since the Kafka 
Streams instance would get a total memory budged and do what's best with it. 
Note I am not suggesting we attempt this for the V1 implementation, just 
pointing out that it is possible to do variable-size caches with the global 
config.

So overall we have a tradeoff between a more complex implementation but a 
guarantee on total memory usage (global setting), and a simple implementation 
but with variable memory usage (per-cache setting).

Eno

> On 5 Jun 2016, at 20:44, Guozhang Wang  wrote:
> 
> There are some details needed to be figured out if we go global:
> 
> A KafkaStreams instance could have M threads, and each thread could various
> number (let's say N, but in practice it may be different from thread to
> thread) tasks, and each task contains a sub-topology with P caches (again
> in practice it may be different depending on which sub-topology it
> contains).
> 
> Say if user specified X Gb for this KafkaStreams instance, then each cache
> will get X / M / N / P Gb. But remember N and P can change from rebalance
> to rebalance, and threads does not communicate with each other during their
> life time. So it is hard to determine M and N dynamically.
> 
> Plus, different caches may have different cache hit rate, so distributing
> the memory evenly to them may not be an optimal solution (some caches may
> be flushed much more frequently than others), and also since we are
> considering to use instrumentation.getObjectSize which is approximate, it
> may exaggerate the imbalance.
> 
> 
> Guozhang
> 
> 
> On Sat, Jun 4, 2016 at 11:54 PM, Eno Thereska 
> wrote:
> 
>> Hi Jay,
>> 
>> We can make it global instead of per-processor, sounds good.
>> 
>> Thanks
>> Eno
>> 
>> 
>>> On 3 Jun 2016, at 23:15, Jay Kreps  wrote:
>>> 
>>> Hey Eno,
>>> 
>>> Should the config be the global memory use rather than the per-processor?
>>> That is, let’s say I know I have fixed a 1GB heap because that is what I
>>> set for Java, and want to use 100MB for caching, it seems like right now
>>> I’d have to do some math that depends on my knowing a bit about how
>> caching
>>> works to figure out how to set that parameter so I don't run out of
>> memory.
>>> Does it also depend on the number of partitions assigned (and hence the
>>> number of task), if so that makes it even harder to set since each time
>>> rebalancing happens that changes so it is then pretty hard to set safely.
>>> 
>>> You could theoretically argue for either bottom up (you know how much
>> cache
>>> you need per processor as you have it and you want to get exactly that)
>> or
>>> top down (you know how much memory you have to spare but can't be
>> bothered
>>> to work out what that amounts to per-processor). I think our experience
>> has
>>> been that 99% of people never change the default and if it runs out of
>>> memory they really struggle to fix it and kind of blame us, so I think
>> top
>>> down and a global config might be better. :-)
>>> 
>>> Example: https://issues.apache.org/jira/browse/KAFKA-3775
>>> 
>>> -Jay
>>> 
>>> On Fri, Jun 3, 2016 at 2:39 PM, Eno Thereska 
>> wrote:
>>> 
 Hi Gwen,
 
 Yes. As an example, if cache.max.bytes.buffering set to X, and if users
 have A aggregation operators and T KTable.to() operators, then X*(A + T)
 total bytes will be allocated for caching.
 
 Eno
 
> On 3 Jun 2016, at 21:37, Gwen Shapira  wrote:
> 
> Just to clarify: "cache.max.bytes.buffering" is per processor?
> 
> 
> On Thu, Jun 2, 2016 at 11:30 AM, Eno Thereska 
 wrote:
>> Hi there,
>> 
>> I have created KIP-63: Unify store and downstream caching in streams
>> 
 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-63%3A+Unify+store+and+downstream+caching+in+streams
 <
 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams
> 
>> 
>> 
>> Feedback is appreciated.
>> 
>> Thank you
>> Eno
 
 
>> 
>> 
> 
> 
> -- 
> -- Guozhang



Build failed in Jenkins: kafka-trunk-jdk7 #1346

2016-06-06 Thread Apache Jenkins Server
See 

--
Started by user ijuma
[EnvInject] - Loading node environment variables.
Building remotely on H11 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 79aaf19f24bb48f90404a3e3896d115107991f4c 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 79aaf19f24bb48f90404a3e3896d115107991f4c
 > git rev-list 79aaf19f24bb48f90404a3e3896d115107991f4c # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-trunk-jdk7] $ /bin/bash -xe /tmp/hudson4885997215240018867.sh
+ rm -rf 
GRADLE_USER_HOME=
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
:downloadWrapper

BUILD SUCCESSFUL

Total time: 14.291 secs
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-trunk-jdk7] $ /bin/bash -xe /tmp/hudson8927350999717682991.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew -Dorg.gradle.project.maxParallelForks=1 
-Dorg.gradle.project.testLoggingEvents=started,passed,skipped,failed clean 
jarAll testAll
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
Build file ': 
line 239
useAnt has been deprecated and is scheduled to be removed in Gradle 3.0. The 
Ant-Based Scala compiler is deprecated, please see 
https://docs.gradle.org/current/userguide/scala_plugin.html.
:clean UP-TO-DATE
:clients:clean UP-TO-DATE
:connect:clean UP-TO-DATE
:core:clean
:examples:clean UP-TO-DATE
:log4j-appender:clean UP-TO-DATE
:streams:clean UP-TO-DATE
:tools:clean UP-TO-DATE
:connect:api:clean UP-TO-DATE
:connect:file:clean UP-TO-DATE
:connect:json:clean UP-TO-DATE
:connect:runtime:clean UP-TO-DATE
:streams:examples:clean UP-TO-DATE
:jar_core_2_10
Building project 'core' with Scala version 2.10.6
:kafka-trunk-jdk7:clients:compileJava
:jar_core_2_10 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Failed to capture snapshot of input files for task 'compileJava' during 
up-to-date check.
> Could not add entry 
> '/home/jenkins/.gradle/caches/modules-2/files-2.1/net.jpountz.lz4/lz4/1.3.0/c708bb2590c0652a642236ef45d9f99ff842a2ce/lz4-1.3.0.jar'
>  to cache fileHashes.bin 
> (

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 17.955 secs
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51


Build failed in Jenkins: kafka-trunk-jdk7 #1345

2016-06-06 Thread Apache Jenkins Server
See 

--
Started by user ijuma
[EnvInject] - Loading node environment variables.
Building remotely on H11 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 79aaf19f24bb48f90404a3e3896d115107991f4c 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 79aaf19f24bb48f90404a3e3896d115107991f4c
 > git rev-list 79aaf19f24bb48f90404a3e3896d115107991f4c # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-trunk-jdk7] $ /bin/bash -xe /tmp/hudson9164275551798784037.sh
+ export 
GRADLE_USER_HOME=
+ 
GRADLE_USER_HOME=
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Download 
https://repo1.maven.org/maven2/org/ajoberstar/grgit/1.5.0/grgit-1.5.0.pom
Download 
https://jcenter.bintray.com/com/github/ben-manes/gradle-versions-plugin/0.12.0/gradle-versions-plugin-0.12.0.pom
Download 
https://repo1.maven.org/maven2/org/scoverage/gradle-scoverage/2.0.1/gradle-scoverage-2.0.1.pom
Download 
https://repo1.maven.org/maven2/org/eclipse/jgit/org.eclipse.jgit/4.1.1.201511131810-r/org.eclipse.jgit-4.1.1.201511131810-r.pom
Download 
https://repo1.maven.org/maven2/org/eclipse/jgit/org.eclipse.jgit-parent/4.1.1.201511131810-r/org.eclipse.jgit-parent-4.1.1.201511131810-r.pom
Download 
https://repo1.maven.org/maven2/org/eclipse/jgit/org.eclipse.jgit.ui/4.1.1.201511131810-r/org.eclipse.jgit.ui-4.1.1.201511131810-r.pom
Download 
https://repo1.maven.org/maven2/com/jcraft/jsch.agentproxy.jsch/0.0.9/jsch.agentproxy.jsch-0.0.9.pom
Download 
https://repo1.maven.org/maven2/com/jcraft/jsch.agentproxy/0.0.9/jsch.agentproxy-0.0.9.pom
Download 
https://repo1.maven.org/maven2/com/jcraft/jsch.agentproxy.pageant/0.0.9/jsch.agentproxy.pageant-0.0.9.pom
Download 
https://repo1.maven.org/maven2/com/jcraft/jsch.agentproxy.sshagent/0.0.9/jsch.agentproxy.sshagent-0.0.9.pom
Download 
https://repo1.maven.org/maven2/com/jcraft/jsch.agentproxy.usocket-jna/0.0.9/jsch.agentproxy.usocket-jna-0.0.9.pom
Download 
https://repo1.maven.org/maven2/com/jcraft/jsch.agentproxy.usocket-nc/0.0.9/jsch.agentproxy.usocket-nc-0.0.9.pom
Download 
https://repo1.maven.org/maven2/org/eclipse/jdt/org.eclipse.jdt.annotation/1.1.0/org.eclipse.jdt.annotation-1.1.0.pom
Download 
https://repo1.maven.org/maven2/com/jcraft/jsch.agentproxy.core/0.0.9/jsch.agentproxy.core-0.0.9.pom
Download 
https://repo1.maven.org/maven2/net/java/dev/jna/jna-platform/4.1.0/jna-platform-4.1.0.pom
Download 
https://repo1.maven.org/maven2/org/ajoberstar/grgit/1.5.0/grgit-1.5.0.jar
Download 
https://jcenter.bintray.com/com/github/ben-manes/gradle-versions-plugin/0.12.0/gradle-versions-plugin-0.12.0.jar
Download 
https://repo1.maven.org/maven2/org/scoverage/gradle-scoverage/2.0.1/gradle-scoverage-2.0.1.jar
Download 
https://repo1.maven.org/maven2/org/eclipse/jgit/org.eclipse.jgit/4.1.1.201511131810-r/org.eclipse.jgit-4.1.1.201511131810-r.jar
Download 
https://repo1.maven.org/maven2/org/eclipse/jgit/org.eclipse.jgit.ui/4.1.1.201511131810-r/org.eclipse.jgit.ui-4.1.1.201511131810-r.jar
Download 
https://repo1.maven.org/maven2/com/jcraft/jsch.agentproxy.jsch/0.0.9/jsch.agentproxy.jsch-0.0.9.jar
Download 
https://repo1.maven.org/maven2/com/jcraft/jsch.agentproxy.pageant/0.0.9/jsch.agentproxy.pageant-0.0.9.jar
Download 
https://repo1.maven.org/maven2/com/jcraft/jsch.agentproxy.sshagent/0.0.9/jsch.agentproxy.sshagent-0.0.9.jar
Download 
https://repo1.maven.org/maven2/com/jcraft/jsch.agentproxy.usocket-jna/0.0.9/jsch.agentproxy.usocket-jna-0.0.9.jar
Download 
https://repo1.maven.org/maven2/com/jcraft/jsch.agentproxy.usocket-nc/0.0.9/jsch.agentproxy.usocket-nc-0.0.9.jar
Download 
https://repo1.maven.org/maven2/org/eclipse/jdt/org.eclipse.jdt.annotation/1.1.0/org.eclipse.jdt.annotation-1.1.0.jar

Build failed in Jenkins: kafka-trunk-jdk7 #1344

2016-06-06 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-3771; Improving Kafka core code

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H11 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 79aaf19f24bb48f90404a3e3896d115107991f4c 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 79aaf19f24bb48f90404a3e3896d115107991f4c
 > git rev-list 2c7fae0a4a2b59133e53eb02a533cf8ffd0f864c # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-trunk-jdk7] $ /bin/bash -xe /tmp/hudson3796546511018528466.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
:downloadWrapper

BUILD SUCCESSFUL

Total time: 15.013 secs
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-trunk-jdk7] $ /bin/bash -xe /tmp/hudson7393019425057583545.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew -Dorg.gradle.project.maxParallelForks=1 
-Dorg.gradle.project.testLoggingEvents=started,passed,skipped,failed clean 
jarAll testAll
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
Build file ': 
line 239
useAnt has been deprecated and is scheduled to be removed in Gradle 3.0. The 
Ant-Based Scala compiler is deprecated, please see 
https://docs.gradle.org/current/userguide/scala_plugin.html.
:clean UP-TO-DATE
:clients:clean UP-TO-DATE
:connect:clean UP-TO-DATE
:core:clean
:examples:clean UP-TO-DATE
:log4j-appender:clean UP-TO-DATE
:streams:clean UP-TO-DATE
:tools:clean UP-TO-DATE
:connect:api:clean UP-TO-DATE
:connect:file:clean UP-TO-DATE
:connect:json:clean UP-TO-DATE
:connect:runtime:clean UP-TO-DATE
:streams:examples:clean UP-TO-DATE
:jar_core_2_10
Building project 'core' with Scala version 2.10.6
:kafka-trunk-jdk7:clients:compileJava
:jar_core_2_10 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Failed to capture snapshot of input files for task 'compileJava' during 
up-to-date check.
> Could not add entry 
> '/home/jenkins/.gradle/caches/modules-2/files-2.1/net.jpountz.lz4/lz4/1.3.0/c708bb2590c0652a642236ef45d9f99ff842a2ce/lz4-1.3.0.jar'
>  to cache fileHashes.bin 
> (

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 17.067 secs
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51


Build failed in Jenkins: kafka-trunk-jdk8 #681

2016-06-06 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-3771; Improving Kafka core code

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-us1 (Ubuntu ubuntu ubuntu-us golang-ppa) in 
workspace 
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 79aaf19f24bb48f90404a3e3896d115107991f4c 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 79aaf19f24bb48f90404a3e3896d115107991f4c
 > git rev-list 2c7fae0a4a2b59133e53eb02a533cf8ffd0f864c # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson613926652203067984.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
:downloadWrapper

BUILD SUCCESSFUL

Total time: 17.12 secs
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson5226516385887488897.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew -Dorg.gradle.project.maxParallelForks=1 
-Dorg.gradle.project.testLoggingEvents=started,passed,skipped,failed clean 
jarAll testAll
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
Build file ': 
line 239
useAnt has been deprecated and is scheduled to be removed in Gradle 3.0. The 
Ant-Based Scala compiler is deprecated, please see 
https://docs.gradle.org/current/userguide/scala_plugin.html.
:clean UP-TO-DATE
:clients:clean UP-TO-DATE
:connect:clean UP-TO-DATE
:core:clean
:examples:clean UP-TO-DATE
:log4j-appender:clean UP-TO-DATE
:streams:clean UP-TO-DATE
:tools:clean UP-TO-DATE
:connect:api:clean UP-TO-DATE
:connect:file:clean UP-TO-DATE
:connect:json:clean UP-TO-DATE
:connect:runtime:clean UP-TO-DATE
:streams:examples:clean UP-TO-DATE
:jar_core_2_10
Building project 'core' with Scala version 2.10.6
:kafka-trunk-jdk8:clients:compileJava
:jar_core_2_10 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Failed to capture snapshot of input files for task 'compileJava' during 
up-to-date check.
> Could not add entry 
> '/home/jenkins/.gradle/caches/modules-2/files-2.1/net.jpountz.lz4/lz4/1.3.0/c708bb2590c0652a642236ef45d9f99ff842a2ce/lz4-1.3.0.jar'
>  to cache fileHashes.bin 
> (

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 14.321 secs
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files 
were found. Configuration error?
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66


[jira] [Resolved] (KAFKA-3771) Improving Kafka code

2016-06-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-3771.

   Resolution: Fixed
Fix Version/s: 0.10.1.0

Issue resolved by pull request 1451
[https://github.com/apache/kafka/pull/1451]

> Improving Kafka code
> 
>
> Key: KAFKA-3771
> URL: https://issues.apache.org/jira/browse/KAFKA-3771
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Rekha Joshi
>Assignee: Rekha Joshi
> Fix For: 0.10.1.0
>
>
> Improve Kafka core code :
> Remove redundant val modifier for case class constructor
> Use flatMap instead of map and flatten
> Use isEmpty, NonEmpty, isDefined as appropriate
> Use head, keys and keySet where appropriate
> Use contains, diff and find where appropriate
> toString has no parameters, no side effect hence without () use consistently
> Remove unnecessary return and semi colons, parentheses



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3771) Improving Kafka code

2016-06-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316292#comment-15316292
 ] 

ASF GitHub Bot commented on KAFKA-3771:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1451


> Improving Kafka code
> 
>
> Key: KAFKA-3771
> URL: https://issues.apache.org/jira/browse/KAFKA-3771
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Rekha Joshi
>Assignee: Rekha Joshi
> Fix For: 0.10.1.0
>
>
> Improve Kafka core code :
> Remove redundant val modifier for case class constructor
> Use flatMap instead of map and flatten
> Use isEmpty, NonEmpty, isDefined as appropriate
> Use head, keys and keySet where appropriate
> Use contains, diff and find where appropriate
> toString has no parameters, no side effect hence without () use consistently
> Remove unnecessary return and semi colons, parentheses



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1451: KAFKA-3771; Improving Kafka core code

2016-06-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1451


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-62: Allow consumer to send heartbeats from a background thread

2016-06-06 Thread Henry Cai
I have a question on the KIP on long stall during
ProcessorStateManager.restoreActiveState(), this can be a long stall when
we need to rebuild the RocksDB state on a new node.

1. Is restoreActiveState() considered as post rebalance since this is
invoked on application rebalance listener?
2. When the node A was spending long time rebuilding the state in
restoreActiveState() from the previous rebalance, a new node (node B) send
a new JoinGroup request to the co-ordinator, how long should the
coordinator wait for node A to finish the restoreActiveState from the
previous rebalance, the restoreActiveState can take more than 10 minutes
for a big state.


On Sun, Jun 5, 2016 at 10:46 PM, Becket Qin  wrote:

> Hi Jason,
>
> Thanks for this very useful KIP.  In general I am with Guozhang on the
> purpose of of the three timeouts.
> 1) session timeout for consumer liveness,
> 2) process timeout (or maybe we should rename it to max.poll.interval.ms)
> for application liveness,
> 3) rebalance timeout for faster rebalance in some failure cases.
>
> It seems the current discussion is mainly about whether we need 3) as a
> separate timeout or not. The current KIP proposal is to combine 2) and 3),
> i.e. just use process timeout as rebalance timeout. That means we need to
> either increase rebalance timeout out to let it adapt to process timeout,
> or the reverse. It would be helpful to understand the impact of these two
> cases. Here are my two cents.
>
> For users who are consuming data from Kafka, usually they either care about
> throughput or care about latency.
>
> If users care about the latency, they would probably care more about
> average latency instead of 99.99 percentile latency which can be affected
> by many other more common reasons other than consumer failure. Because all
> the timeout we are discussing here only have impact on the 99.99 percentile
> latency, I don't think it would really make a difference for latency
> sensitive users.
>
> The majority of the use cases for Kafka Connect and Mirror Maker are
> throughput sensitive. Ewen raised a good example where Kafka Connect needs
> to process the previous data on rebalance therefore requires a higher
> rebalance timeout than process timeout. This is essentially the same in
> Mirror Maker, where each rebalance needs to flush all the messages in the
> accumulator in the producer. That could take some time depending on how
> many messages are there. In this case, we may need to increase the process
> timeout to make it the same as rebalance timeout. But this is probably
> fine. The downside of increasing process timeout is a longer detection time
> of a consumer failure.  Detecting a consumer failure a little later only
> has limited impact because the rest of the consumers in the same group are
> still working fine. So the total throughput is unlikely to drop
> significantly. As long as the rebalance is not taking longer it should be
> fine. The reason we care more about how fast rebalance can finish is
> because during rebalance no consumer in the group is consuming, i.e.
> throughput is zero. So we want to make the rebalance finish as quickly as
> possible.
>
> Compare with increasing process timeout to rebalance timeout, it seems a
> more common case where user wants a longer process timeout, but smaller
> rebalance timeout. I am more worried about this case where we have to
> shoehorn the rebalance timeout into process timeout. For users care about
> throughput, that might cause the rebalance to take unnecessarily longer.
> Admittedly this only has impact when a consumer had problem during
> rebalance, but depending on how long the process timeout was set, the
> rebalance could potentially take forever like Guozhang mentioned.
>
> I agree with Guozhang that we can start with 1) and 2) and add 3) later if
> needed. But adding rebalance timeout is more involved than just adding a
> configuration. That also means the rebalance has to be done in the
> background heartbeat thread. Hence we have to synchronize rebalance and
> consumer.poll() like we did in old consumer. Otherwise user may lose
> messages if auto commit is enabled, or the manual commit might fail after a
> consumer.poll() because the partitions might have been reassigned. So
> having a separate rebalance timeout also potentially means a big change to
> the users as well.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Fri, Jun 3, 2016 at 11:45 AM, Jason Gustafson 
> wrote:
>
> > Hey Ewen,
> >
> > I confess your comments caught me off guard. It never occurred to me that
> > anyone would ask for a rebalance timeout so that it could be set _larger_
> > than the process timeout. Even with buffered or batch processing, I would
> > usually expect flushing before a rebalance to take no more time than a
> > periodic flush. Otherwise, I'd probably try to see if there was some
> > workload I could push into periodic flushes so that rebalances could
> >