[GitHub] kafka pull request #2573: MINOR: Cleanup org.apache.kafka.streams.kstream.in...

2017-02-18 Thread original-brownbear
GitHub user original-brownbear opened a pull request:

https://github.com/apache/kafka/pull/2573

MINOR: Cleanup 
org.apache.kafka.streams.kstream.internals.AbstractKTableKTableJoinValueGetterSupplier#storeNames
 (performance and clarity)

Just a trivial fix improving the performance and clarity of 
`org.apache.kafka.streams.kstream.internals.AbstractKTableKTableJoinValueGetterSupplier#storeNames`.

The method is essentially just an array concatenation. This shouldn't be 
done via an `ArrayList`, replaced it by direct instantiation + copy.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/original-brownbear/kafka cleanup-store-names

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2573.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2573


commit 3f96418c685d66eee4266b3c221ef2ba6beb5fd0
Author: Armin Braun 
Date:   2017-02-19T07:30:44Z

MINOR: Cleanup 
org.apache.kafka.streams.kstream.internals.AbstractKTableKTableJoinValueGetterSupplier#storeNames
 (performance and clarity)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2296) Not able to delete topic on latest kafka

2017-02-18 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873505#comment-15873505
 ] 

Armin Braun commented on KAFKA-2296:


I think we should mark this as duplicate accordingly then :)

> Not able to delete topic on latest kafka
> 
>
> Key: KAFKA-2296
> URL: https://issues.apache.org/jira/browse/KAFKA-2296
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Andrew M
>
> Was able to reproduce [inability to delete 
> topic|https://issues.apache.org/jira/browse/KAFKA-1397?focusedCommentId=14491442&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14491442]
>  on running cluster with kafka 0.8.2.1.
> Cluster consist from 2 c3.xlarge aws instances with sufficient storage 
> attached. All communication between nodes goes through aws vpc
> Some warns from logs:
> {noformat}[Controller-1234-to-broker-4321-send-thread], Controller 1234 epoch 
> 20 fails to send request 
> Name:UpdateMetadataRequest;Version:0;Controller:1234;ControllerEpoch:20;CorrelationId:24047;ClientId:id_1234-host_1.2.3.4-port_6667;AliveBrokers:id:1234,host:1.2.3.4,port:6667,id:4321,host:4.3.2.1,port:6667;PartitionState:[topic_name,45]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:4321,1234,LeaderEpoch:0,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:1234,4321),[topic_name,27]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:4321,1234,LeaderEpoch:0,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:1234,4321),[topic_name,17]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:4321,1234,LeaderEpoch:0,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:1234,4321),[topic_name,49]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:4321,1234,LeaderEpoch:0,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:1234,4321),[topic_name,7]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:4321,1234,LeaderEpoch:0,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:1234,4321),[topic_name,26]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,62]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,18]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,36]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,29]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:4321,1234,LeaderEpoch:0,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:1234,4321),[topic_name,53]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:4321,1234,LeaderEpoch:0,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:1234,4321),[topic_name,52]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,2]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,12]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,33]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:4321,1234,LeaderEpoch:0,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:1234,4321),[topic_name,14]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,63]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:4321,1234,LeaderEpoch:0,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:1234,4321),[topic_name,30]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,6]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,28]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,38]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,24]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,31]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:4321,1234,LeaderEpoch:0,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:1234,4321),[topic_name,4]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFactor:2),AllReplicas:4321,1234),[topic_name,20]
>  -> 
> (LeaderAndIsrInfo:(Leader:-2,ISR:1234,4321,LeaderEpoch:0,ControllerEpoch:20),ReplicationFact

Re: [DISCUSS] KIP-124: Request rate quotas

2017-02-18 Thread Dong Lin
I realized the main concern with this proposal is how user can interpret
this CPU-percentage based quota. Since this quota is exposed to user, we
need to explain to user how this quota is going to impact their application
performance and convince them that the quota is now too low for their
application. We can able to do this with byte-rate based quota. But I am
not sure how we can do this with CPU-percentage based quota. For example,
how is user going to understand whether 1% CPU is OK?

On Fri, Feb 17, 2017 at 10:11 AM, Onur Karaman  wrote:

> Overall a big fan of the KIP.
>
> I'd have to agree with Dong. I'm not sure about the decision of using the
> percentage over the window as opposed to request rate. It's pretty hard to
> reason about. I just spoke to one of our SRE's and he agrees.
>
> Also I may have missed it, but I couldn't find information in the KIP on
> where this window would be configured.
>
> On Fri, Feb 17, 2017 at 9:45 AM, Dong Lin  wrote:
>
> > To correct the typo above: It seems to me that determination of request
> > rate is not any more difficult than determination of *byte* rate as both
> > metrics are commonly used to measure performance and provide guarantee to
> > user.
> >
> > On Fri, Feb 17, 2017 at 9:40 AM, Dong Lin  wrote:
> >
> > > Hey Rajini,
> > >
> > > Thanks for the KIP. I have some questions:
> > >
> > > - I am wondering why throttling based on request rate is listed as a
> > > rejected alternative. Can you provide more specific reason why it is
> > > difficult for administrators to decide request rates to allocate? It
> > seems
> > > to me that determination of request rate is not any more difficult than
> > > determination of request rate as both metrics are commonly used to
> > measure
> > > performance and provide guarantee to user. On the other hand, the
> > > percentage of processing time provides a vague guarantee to user. For
> > > example, what performance can user expect if you provide 1% processing
> > time
> > > quota to this user? How is administrator going to decide this quota?
> > Should
> > > Kafka administrator continues to reduce this percentage quota as number
> > of
> > > users grow?
> > >
> > > - The KIP suggests that LeaderAndIsrRequest and MetadataRequest will
> also
> > > be throttled by this quota. What is the motivation for throttling these
> > > requests? It is also inconsistent with rate-based quota which is only
> > > applied to ProduceRequest and FetchRequest. IMO it will be simpler to
> > only
> > > throttle ProduceRequest and FetchRequest.
> > >
> > > - Do you think we should also throttle the inter-broker traffic using
> > this
> > > quota as well similar to KIP-73?
> > >
> > > Thanks,
> > > Dong
> > >
> > >
> > >
> > > On Fri, Feb 17, 2017 at 9:05 AM, Rajini Sivaram <
> rajinisiva...@gmail.com
> > >
> > > wrote:
> > >
> > >> Hi all,
> > >>
> > >> I have just created KIP-124 to introduce request rate quotas to Kafka:
> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-124+-+
> > >> Request+rate+quotas
> > >>
> > >> The proposal is for a simple percentage request handling time quota
> that
> > >> can be allocated to **, ** or **.
> > There
> > >> are a few other suggestions also under "Rejected alternatives".
> Feedback
> > >> and suggestions are welcome.
> > >>
> > >> Thank you...
> > >>
> > >> Regards,
> > >>
> > >> Rajini
> > >>
> > >
> > >
> >
>


Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-18 Thread Dong Lin
Hey Jun,

Could you please let me know if the solutions above could address your
concern? I really want to move the discussion forward.

Thanks,
Dong


On Tue, Feb 14, 2017 at 8:17 PM, Dong Lin  wrote:

> Hey Jun,
>
> Thanks for all your help and time to discuss this KIP. When you get the
> time, could you let me know if the previous answers address the concern?
>
> I think the more interesting question in your last email is where we
> should store the "created" flag in ZK. I proposed the solution that I like
> most, i.e. store it together with the replica assignment data in the 
> /brokers/topics/[topic].
> In order to expedite discussion, let me provide another two ideas to
> address the concern just in case the first idea doesn't work:
>
> - We can avoid extra controller ZK read when there is no disk failure
> (95% of time?). When controller starts, it doesn't
> read controller_managed_state in ZK and sends LeaderAndIsrRequest with
> "create = false". Only if LeaderAndIsrResponse shows failure for any
> replica, then controller will read controller_managed_state for this
> partition and re-send LeaderAndIsrRequset with "create=true" if this
> replica has not been created.
>
> - We can significantly reduce this ZK read time by making
> controller_managed_state a topic level information in ZK, e.g.
> /brokers/topics/[topic]/state. Given that most topic has 10+ partition,
> the extra ZK read time should be less than 10% of the existing total zk
> read time during controller failover.
>
> Thanks!
> Dong
>
>
> On Tue, Feb 14, 2017 at 7:30 AM, Dong Lin  wrote:
>
>> Hey Jun,
>>
>> I just realized that you may be suggesting that a tool for listing
>> offline directories is necessary for KIP-112 by asking whether KIP-112 and
>> KIP-113 will be in the same release. I think such a tool is useful but
>> doesn't have to be included in KIP-112. This is because as of now admin
>> needs to log into broker machine and check broker log to figure out the
>> cause of broker failure and the bad log directory in case of disk failure.
>> The KIP-112 won't make it harder since admin can still figure out the bad
>> log directory by doing the same thing. Thus it is probably OK to just
>> include this script in KIP-113. Regardless, my hope is to finish both KIPs
>> ASAP and make them in the same release since both KIPs are needed for the
>> JBOD setup.
>>
>> Thanks,
>> Dong
>>
>> On Mon, Feb 13, 2017 at 5:52 PM, Dong Lin  wrote:
>>
>>> And the test plan has also been updated to simulate disk failure by
>>> changing log directory permission to 000.
>>>
>>> On Mon, Feb 13, 2017 at 5:50 PM, Dong Lin  wrote:
>>>
 Hi Jun,

 Thanks for the reply. These comments are very helpful. Let me answer
 them inline.


 On Mon, Feb 13, 2017 at 3:25 PM, Jun Rao  wrote:

> Hi, Dong,
>
> Thanks for the reply. A few more replies and new comments below.
>
> On Fri, Feb 10, 2017 at 4:27 PM, Dong Lin  wrote:
>
> > Hi Jun,
> >
> > Thanks for the detailed comments. Please see answers inline:
> >
> > On Fri, Feb 10, 2017 at 3:08 PM, Jun Rao  wrote:
> >
> > > Hi, Dong,
> > >
> > > Thanks for the updated wiki. A few comments below.
> > >
> > > 1. Topics get created
> > > 1.1 Instead of storing successfully created replicas in ZK, could
> we
> > store
> > > unsuccessfully created replicas in ZK? Since the latter is less
> common,
> > it
> > > probably reduces the load on ZK.
> > >
> >
> > We can store unsuccessfully created replicas in ZK. But I am not
> sure if
> > that can reduce write load on ZK.
> >
> > If we want to reduce write load on ZK using by store unsuccessfully
> created
> > replicas in ZK, then broker should not write to ZK if all replicas
> are
> > successfully created. It means that if /broker/topics/[topic]/partiti
> > ons/[partitionId]/controller_managed_state doesn't exist in ZK for
> a given
> > partition, we have to assume all replicas of this partition have been
> > successfully created and send LeaderAndIsrRequest with create =
> false. This
> > becomes a problem if controller crashes before receiving
> > LeaderAndIsrResponse to validate whether a replica has been created.
> >
> > I think this approach and reduce the number of bytes stored in ZK.
> But I am
> > not sure if this is a concern.
> >
> >
> >
> I was mostly concerned about the controller failover time. Currently,
> the
> controller failover is likely dominated by the cost of reading
> topic/partition level information from ZK. If we add another partition
> level path in ZK, it probably will double the controller failover
> time. If
> the approach of representing the non-created replicas doesn't work,
> have
> you considered just adding the created flag in the leaderAndIsr path
> in ZK?
>
>
 Yes, I hav

[jira] [Commented] (KAFKA-4776) Implement graceful handling for improperly formed compressed message sets

2017-02-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873354#comment-15873354
 ] 

ASF GitHub Bot commented on KAFKA-4776:
---

GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/2572

KAFKA-4776: Implement graceful handling for improperly formed compressed 
message sets



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-4776

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2572.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2572


commit 7402b3c0e079a2f033b642aad7ae6c113015ab49
Author: Jason Gustafson 
Date:   2017-02-18T22:21:52Z

KAFKA-4776: Implement graceful handling for improperly formed compressed 
message sets




> Implement graceful handling for improperly formed compressed message sets
> -
>
> Key: KAFKA-4776
> URL: https://issues.apache.org/jira/browse/KAFKA-4776
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1, 0.10.2.0
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Minor
>
> This affects validation of compressed message sets. It is possible for a 
> buggy client to send both a null compressed message set (i.e. a wrapper 
> message with a null value), and an empty compressed message set (i.e. a 
> wrapper message with valid compressed data in the value field, but no actual 
> records). In both cases, this causes an unexpected exception raised from the 
> deep iteration, which is returned to the client as an UNKNOWN_ERROR. It would 
> be better to return a CORRUPT_MESSAGE error.
> Note also that the behavior of the empty case was potentially more 
> problematic in versions prior to 0.10.2.0. Although we properly handled the 
> null case, the broker would accept the empty message set and write it to the 
> log. The impact of this appears to be minor, but may cause unexpected 
> behavior in cases where we assume compressed message sets would contain some 
> records.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2572: KAFKA-4776: Implement graceful handling for improp...

2017-02-18 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/2572

KAFKA-4776: Implement graceful handling for improperly formed compressed 
message sets



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-4776

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2572.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2572


commit 7402b3c0e079a2f033b642aad7ae6c113015ab49
Author: Jason Gustafson 
Date:   2017-02-18T22:21:52Z

KAFKA-4776: Implement graceful handling for improperly formed compressed 
message sets




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] 0.10.2.0 RC2

2017-02-18 Thread Ewen Cheslack-Postava
This vote passes with 12 +1 votes (4 bindings) and no 0 or -1 votes.

+1 votes
PMC Members:
* Guozhang Wang
* Jun Rao
* Gwen Shapira
* Neha Narkhede

Committers:
* Ismael Juma
* Sriram Subramanian

Community:
* Tom Crayford
* Magnus Edenhill
* Mathieu Fenniak
* Rajini Sivaram
* Manikumar Reddy
* Vahid Hashemian

0 votes
* No votes

-1 votes
* No votes

I'll continue with the release process and the release announcement will
follow in the next few days.

-Ewen

On Thu, Feb 16, 2017 at 10:26 AM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> +1 (non-binding)
>
> Built from the source and ran the quickstart successfully on Ubuntu, Mac,
> Windows (64 bit).
>
> Thank you Ewen for running the release.
>
> --Vahid
>
>
>
> From:   Ewen Cheslack-Postava 
> To: dev@kafka.apache.org, "us...@kafka.apache.org"
> , "kafka-clie...@googlegroups.com"
> 
> Date:   02/14/2017 10:40 AM
> Subject:[VOTE] 0.10.2.0 RC2
>
>
>
> Hello Kafka users, developers and client-developers,
>
> This is the third candidate for release of Apache Kafka 0.10.2.0.
>
> This is a minor version release of Apache Kafka. It includes 19 new KIPs.
> See the release notes and release plan (https://cwiki.apache.org/conf
> luence/display/KAFKA/Release+Plan+0.10.2.0) for more details. A few
> feature
> highlights: SASL-SCRAM support, improved client compatibility to allow use
> of clients newer than the broker, session windows and global tables in the
> Kafka Streams API, single message transforms in the Kafka Connect
> framework.
>
> Important note: in addition to the artifacts generated using JDK7 for
> Scala
> 2.10 and 2.11, this release also includes experimental artifacts built
> using JDK8 for Scala 2.12.
>
> Important code changes since RC1 (non-docs, non system tests):
>
> KAFKA-4756; The auto-generated broker id should be passed to
> MetricReporter.configure
> KAFKA-4761; Fix producer regression handling small or zero batch size
>
> Release notes for the 0.10.2.0 release:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/RELEASE_NOTES.html
>
> *** Please download, test and vote by February 17th 5pm ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/javadoc/
>
> * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 5712b489038b71ed8d5a679856d1dfaa925eadc1
>
>
>
> * Documentation:
> http://kafka.apache.org/0102/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0102/protocol.html
>
> * Successful Jenkins builds for the 0.10.2 branch:
> Unit/integration tests:
> https://builds.apache.org/job/kafka-0.10.2-jdk7/77/
> System tests:
> https://jenkins.confluent.io/job/system-test-kafka-0.10.2/29/
>
> /**
>
> Thanks,
> Ewen
>
>
>
>
>


Jenkins build is back to normal : kafka-trunk-jdk7 #1952

2017-02-18 Thread Apache Jenkins Server
See 



[jira] [Comment Edited] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873290#comment-15873290
 ] 

Buğra Gedik edited comment on KAFKA-4767 at 2/18/17 6:49 PM:
-

Agreed. I did not see {{this.sender.forceClose();}} earlier.

So what needs fixing is the interrupt status.


was (Author: bgedik):
Agreed. I did not see ``this.sender.forceClose();`` earlier.

So what needs fixing is the interrupt status.

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873290#comment-15873290
 ] 

Buğra Gedik edited comment on KAFKA-4767 at 2/18/17 6:49 PM:
-

Agreed. I did not see ``this.sender.forceClose();`` earlier.

So what needs fixing is the interrupt status.


was (Author: bgedik):
Agreed. I did not see ``this.sender.forceClose();`` earlier.

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873290#comment-15873290
 ] 

Buğra Gedik commented on KAFKA-4767:


Agreed. I did not see ``this.sender.forceClose();`` earlier.

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: kafka-trunk-jdk7 #1951

2017-02-18 Thread Apache Jenkins Server
See 

Changes:

[ismael] MINOR: Replace for with foreach loop in common module

--
[...truncated 19981 lines...]

org.apache.kafka.streams.state.internals.SegmentIteratorTest > 
shouldOnlyIterateOverSegmentsInRange STARTED

org.apache.kafka.streams.state.internals.SegmentIteratorTest > 
shouldOnlyIterateOverSegmentsInRange PASSED

org.apache.kafka.streams.state.internals.MergedSortedCacheWrappedWindowStoreIteratorTest
 > shouldIterateOverValueFromBothIterators STARTED

org.apache.kafka.streams.state.internals.MergedSortedCacheWrappedWindowStoreIteratorTest
 > shouldIterateOverValueFromBothIterators PASSED

org.apache.kafka.streams.state.internals.MergedSortedCacheWrappedWindowStoreIteratorTest
 > shouldPeekNextKey STARTED

org.apache.kafka.streams.state.internals.MergedSortedCacheWrappedWindowStoreIteratorTest
 > shouldPeekNextKey PASSED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldReturnEmptyListIfNoStoresFoundWithName STARTED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldReturnEmptyListIfNoStoresFoundWithName PASSED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldThrowInvalidStoreExceptionIfKVStoreClosed STARTED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldThrowInvalidStoreExceptionIfKVStoreClosed PASSED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldThrowInvalidStoreExceptionIfNotAllStoresAvailable STARTED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldThrowInvalidStoreExceptionIfNotAllStoresAvailable PASSED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldFindWindowStores STARTED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldFindWindowStores PASSED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldReturnEmptyListIfStoreExistsButIsNotOfTypeValueStore STARTED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldReturnEmptyListIfStoreExistsButIsNotOfTypeValueStore PASSED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldFindKeyValueStores STARTED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldFindKeyValueStores PASSED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldThrowInvalidStoreExceptionIfWindowStoreClosed STARTED

org.apache.kafka.streams.state.internals.StreamThreadStateStoreProviderTest > 
shouldThrowInvalidStoreExceptionIfWindowStoreClosed PASSED

org.apache.kafka.streams.state.StoresTest > 
shouldCreateInMemoryStoreSupplierWithLoggedConfig STARTED

org.apache.kafka.streams.state.StoresTest > 
shouldCreateInMemoryStoreSupplierWithLoggedConfig PASSED

org.apache.kafka.streams.state.StoresTest > 
shouldCreatePersistenStoreSupplierNotLogged STARTED

org.apache.kafka.streams.state.StoresTest > 
shouldCreatePersistenStoreSupplierNotLogged PASSED

org.apache.kafka.streams.state.StoresTest > 
shouldCreatePersistenStoreSupplierWithLoggedConfig STARTED

org.apache.kafka.streams.state.StoresTest > 
shouldCreatePersistenStoreSupplierWithLoggedConfig PASSED

org.apache.kafka.streams.state.StoresTest > 
shouldCreateInMemoryStoreSupplierNotLogged STARTED

org.apache.kafka.streams.state.StoresTest > 
shouldCreateInMemoryStoreSupplierNotLogged PASSED

org.apache.kafka.streams.integration.KStreamAggregationDedupIntegrationTest > 
shouldReduce STARTED

org.apache.kafka.streams.integration.KStreamAggregationDedupIntegrationTest > 
shouldReduce PASSED

org.apache.kafka.streams.integration.KStreamAggregationDedupIntegrationTest > 
shouldGroupByKey STARTED

org.apache.kafka.streams.integration.KStreamAggregationDedupIntegrationTest > 
shouldGroupByKey PASSED

org.apache.kafka.streams.integration.KStreamAggregationDedupIntegrationTest > 
shouldReduceWindowed STARTED

org.apache.kafka.streams.integration.KStreamAggregationDedupIntegrationTest > 
shouldReduceWindowed PASSED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldCompactTopicsForStateChangelogs STARTED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldCompactTopicsForStateChangelogs PASSED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldUseCompactAndDeleteForWindowStoreChangelogs STARTED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldUseCompactAndDeleteForWindowStoreChangelogs PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldReduceSessionWindows STARTED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest > 
shouldReduceSessionWindows PASSED

org.apache.kafka.streams.integration.KStreamAggregationIntegrationTest

[GitHub] kafka pull request #2571: HOTFIX: ClassCastException in Request logging

2017-02-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2571


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-4777) Kafka client Heartbeat thread use all the cpu.

2017-02-18 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-4777.

   Resolution: Fixed
Fix Version/s: 0.10.3.0

Issue resolved by pull request 2564
[https://github.com/apache/kafka/pull/2564]

> Kafka client Heartbeat thread use all the cpu.
> --
>
> Key: KAFKA-4777
> URL: https://issues.apache.org/jira/browse/KAFKA-4777
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
>Reporter: Allen Xiang
> Fix For: 0.10.3.0
>
> Attachments: Screen Shot 2017-02-17 at 12.31.38 PM.png
>
>
> When network goes down, Kafka client Heartbeat thread tries to send heartbeat 
> without waiting and uses all the cpu.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4777) Kafka client Heartbeat thread use all the cpu.

2017-02-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873267#comment-15873267
 ] 

ASF GitHub Bot commented on KAFKA-4777:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2564


> Kafka client Heartbeat thread use all the cpu.
> --
>
> Key: KAFKA-4777
> URL: https://issues.apache.org/jira/browse/KAFKA-4777
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.1
>Reporter: Allen Xiang
> Fix For: 0.10.3.0
>
> Attachments: Screen Shot 2017-02-17 at 12.31.38 PM.png
>
>
> When network goes down, Kafka client Heartbeat thread tries to send heartbeat 
> without waiting and uses all the cpu.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2564: KAFKA-4777 fix client heartbeat non-stop retry iss...

2017-02-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2564


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2571: HOTFIX: ClassCastException in Request logging

2017-02-18 Thread original-brownbear
GitHub user original-brownbear opened a pull request:

https://github.com/apache/kafka/pull/2571

HOTFIX: ClassCastException in Request logging

Comming from 
[here](https://github.com/apache/kafka/pull/2570#issuecomment-280859637)

Fixed ClassCastException resulting from missing type hint in request 
logging.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/original-brownbear/kafka 
fix-logging-err-response

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2571.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2571


commit 4cbf63a54d7233e8fadddce19b7cabdd8a2d57fa
Author: Armin Braun 
Date:   2017-02-18T17:24:58Z

Fixed KafkaAPI Error Response




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-trunk-jdk8 #1285

2017-02-18 Thread Apache Jenkins Server
See 

Changes:

[ismael] MINOR: Replace for with foreach loop in common module

[ismael] MINOR: Increase consumer init timeout in throttling test

[ismael] KAFKA-4774; Inner classes which don't need a reference to the outer c…

--
[...truncated 8428 lines...]

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.Plainte

[jira] [Commented] (KAFKA-4196) Transient test failure: DeleteConsumerGroupTest.testConsumptionOnRecreatedTopicAfterTopicWideDeleteInZK

2017-02-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873239#comment-15873239
 ] 

ASF GitHub Bot commented on KAFKA-4196:
---

GitHub user original-brownbear opened a pull request:

https://github.com/apache/kafka/pull/2570

KAFKA-4196: Improved Test Stability by Disabling ZK Fsync and Fixed 
KafkaAPI Error Response

This addresses https://issues.apache.org/jira/browse/KAFKA-4196

What I found was below warning accompanying all failures I was seeing from 
this test (reproduced instability by putting system under load):

```sh
[2017-02-18 16:17:42,892] WARN fsync-ing the write ahead log in 
SyncThread:0 took 20632ms which will adversely effect operation latency. See 
the ZooKeeper troubleshooting guide 
(org.apache.zookeeper.server.persistence.FileTxnLog:338)
```

ZK at times keeps locking for multiple seconds in tests (not only this one, 
but it's very frequent in this one for some reason). In this case (20s) the ZK 
locking lasted longer than the test timeout waiting only 15s 
(`org.apache.kafka.test.TestUtils#DEFAULT_MAX_WAIT_MS`) for the path 
`/admin/delete_topic/topic` to be deleted.
The only way to really fix this in a portable manner (should mainly hit 
ext3 users) is to turn off ZK fsyncing (not really needed in UTs anyways) as 
far as I know.
Did that here as described in 
(https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html) by setting

```scala
  sys.props.put("zookeeper.observer.syncEnabled", "false")
```

This should also help general test performance in my opinion.

Also fixed (only ever observed this here) that the resulting error was not 
properly logged by the `KafkaApis` since no type param was given in the changed 
line

```scala
  error("Error when handling request %s".format(request.body), e)
```

that then threw:

```sh
java.lang.ClassCastException: Expected request with type class 
scala.runtime.Nothing$, but found class 
org.apache.kafka.common.requests.UpdateMetadataRequest
at kafka.network.RequestChannel$Request.body(RequestChannel.scala:118)
at kafka.server.KafkaApis$$anonfun$handle$4.apply(KafkaApis.scala:120)
at kafka.server.KafkaApis$$anonfun$handle$4.apply(KafkaApis.scala:120)
at kafka.utils.Logging$class.error(Logging.scala:105)
at kafka.server.KafkaApis.error(KafkaApis.scala:56)
at kafka.server.KafkaApis.handle(KafkaApis.scala:120)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:64)
at java.lang.Thread.run(Thread.java:745)
```

added the hint there and (without the fsync fix) got logged proper errors :)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/original-brownbear/kafka KAFKA-4196

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2570


commit 99dd9c62f63960bac35effd6a2514cd5ba61d66a
Author: Armin Braun 
Date:   2017-02-18T16:24:31Z

KAFKA-4196 Improved Test Stability by Disabling ZK Fsync and Fixed KafkaAPI 
Error Response




> Transient test failure: 
> DeleteConsumerGroupTest.testConsumptionOnRecreatedTopicAfterTopicWideDeleteInZK
> ---
>
> Key: KAFKA-4196
> URL: https://issues.apache.org/jira/browse/KAFKA-4196
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ismael Juma
>  Labels: transient-unit-test-failure
>
> The error:
> {code}
> java.lang.AssertionError: Admin path /admin/delete_topic/topic path not 
> deleted even after a replica is restarted
>   at org.junit.Assert.fail(Assert.java:88)
>   at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:752)
>   at kafka.utils.TestUtils$.verifyTopicDeletion(TestUtils.scala:1017)
>   at 
> kafka.admin.DeleteConsumerGroupTest.testConsumptionOnRecreatedTopicAfterTopicWideDeleteInZK(DeleteConsumerGroupTest.scala:156)
> {code}
> Caused by a broken invariant in the Controller: a partition exists in 
> `ControllerContext.partitionLeadershipInfo`, but not 
> `controllerContext.partitionReplicaAssignment`.
> {code}
> [2016-09-20 06:45:13,967] ERROR [BrokerChangeListener on Controller 1]: Error 
> while handling broker changes 
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener:103)
> java.util.NoSuchElementException: key not found: [topic,0]
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at scala.collection.AbstractMap.default(Map.scala:58)
>   at scala.colle

[GitHub] kafka pull request #2570: KAFKA-4196: Improved Test Stability by Disabling Z...

2017-02-18 Thread original-brownbear
GitHub user original-brownbear opened a pull request:

https://github.com/apache/kafka/pull/2570

KAFKA-4196: Improved Test Stability by Disabling ZK Fsync and Fixed 
KafkaAPI Error Response

This addresses https://issues.apache.org/jira/browse/KAFKA-4196

What I found was below warning accompanying all failures I was seeing from 
this test (reproduced instability by putting system under load):

```sh
[2017-02-18 16:17:42,892] WARN fsync-ing the write ahead log in 
SyncThread:0 took 20632ms which will adversely effect operation latency. See 
the ZooKeeper troubleshooting guide 
(org.apache.zookeeper.server.persistence.FileTxnLog:338)
```

ZK at times keeps locking for multiple seconds in tests (not only this one, 
but it's very frequent in this one for some reason). In this case (20s) the ZK 
locking lasted longer than the test timeout waiting only 15s 
(`org.apache.kafka.test.TestUtils#DEFAULT_MAX_WAIT_MS`) for the path 
`/admin/delete_topic/topic` to be deleted.
The only way to really fix this in a portable manner (should mainly hit 
ext3 users) is to turn off ZK fsyncing (not really needed in UTs anyways) as 
far as I know.
Did that here as described in 
(https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html) by setting

```scala
  sys.props.put("zookeeper.observer.syncEnabled", "false")
```

This should also help general test performance in my opinion.

Also fixed (only ever observed this here) that the resulting error was not 
properly logged by the `KafkaApis` since no type param was given in the changed 
line

```scala
  error("Error when handling request %s".format(request.body), e)
```

that then threw:

```sh
java.lang.ClassCastException: Expected request with type class 
scala.runtime.Nothing$, but found class 
org.apache.kafka.common.requests.UpdateMetadataRequest
at kafka.network.RequestChannel$Request.body(RequestChannel.scala:118)
at kafka.server.KafkaApis$$anonfun$handle$4.apply(KafkaApis.scala:120)
at kafka.server.KafkaApis$$anonfun$handle$4.apply(KafkaApis.scala:120)
at kafka.utils.Logging$class.error(Logging.scala:105)
at kafka.server.KafkaApis.error(KafkaApis.scala:56)
at kafka.server.KafkaApis.handle(KafkaApis.scala:120)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:64)
at java.lang.Thread.run(Thread.java:745)
```

added the hint there and (without the fsync fix) got logged proper errors :)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/original-brownbear/kafka KAFKA-4196

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2570


commit 99dd9c62f63960bac35effd6a2514cd5ba61d66a
Author: Armin Braun 
Date:   2017-02-18T16:24:31Z

KAFKA-4196 Improved Test Stability by Disabling ZK Fsync and Fixed KafkaAPI 
Error Response




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4774) Inner classes which don't need a reference to the outer class should be static

2017-02-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873190#comment-15873190
 ] 

ASF GitHub Bot commented on KAFKA-4774:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2558


> Inner classes which don't need a reference to the outer class should be static
> --
>
> Key: KAFKA-4774
> URL: https://issues.apache.org/jira/browse/KAFKA-4774
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> Inner classes which don't need a reference to the outer class should be 
> static.  This takes up less space in memory, generates less load on the 
> garbage collector, and eliminates a findbugs warning.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2558: KAFKA-4774. Inner classes which don't need a refer...

2017-02-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2558


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2567: MINOR: Increase consumer init timeout in throttlin...

2017-02-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2567


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2530: MINOR: Replacing for with foreach loop in common m...

2017-02-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2530


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2569: MINOR: Add build_eclipse to .gitignore

2017-02-18 Thread cshannon
GitHub user cshannon opened a pull request:

https://github.com/apache/kafka/pull/2569

MINOR: Add build_eclipse to .gitignore

build_eclipse is the configured output directory for eclipse when using
the gradle eclipse plugin and should be ignored

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cshannon/kafka eclipse-gitignore

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2569.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2569


commit 7090c72fac8d93f07a3a01c17480edf8f6a6dc82
Author: Christopher L. Shannon 
Date:   2017-02-18T14:26:42Z

MINOR: Add build_eclipse to .gitignore

build_eclipse is the configured output directory for eclipse when using
the gradle eclipse plugin and should be ignored




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4198) Transient test failure: ConsumerBounceTest.testConsumptionWithBrokerFailures

2017-02-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873134#comment-15873134
 ] 

ASF GitHub Bot commented on KAFKA-4198:
---

GitHub user original-brownbear opened a pull request:

https://github.com/apache/kafka/pull/2568

Kafka 4198: Fix Race Condition in KafkaServer Shutdown

Fixes the initially reported issue in 
https://issues.apache.org/jira/browse/KAFKA-4198.

The relevant part in fixing the initial issue here is the change to 
`kafka.server.KafkaServer#shutdown`.

It contained this step:

```java
  val canShutdown = isShuttingDown.compareAndSet(false, true)
  if (canShutdown && shutdownLatch.getCount > 0) {
```

without any fallback for the case of `shutdownLatch.getCount == 0`. So in 
the case of `shutdownLatch.getCount == 0`  (when a previous call to the 
shutdown method was right about to finish) you would set `isShuttingDown` to 
true again without any possibility of ever getting the server started (since 
`startup` will check `isShuttingDown` before setting up a new latch with count 
1).

Long story short: concurrent calls to shutdown can get the server locked in 
a broken state.

This fixes the reported error:

```sh
java.lang.IllegalStateException: Kafka server is still shutting down, 
cannot re-start!
at kafka.server.KafkaServer.startup(KafkaServer.scala:184)
at 
kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply$mcVI$sp(KafkaServerTestHarness.scala:117)
at 
kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
at 
kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at 
kafka.integration.KafkaServerTestHarness$class.restartDeadBrokers(KafkaServerTestHarness.scala:116)
at 
kafka.api.ConsumerBounceTest.restartDeadBrokers(ConsumerBounceTest.scala:34)
at 
kafka.api.ConsumerBounceTest$BounceBrokerScheduler.doWork(ConsumerBounceTest.scala:158)
```

That said this error (reported in a comment to the JIRA)  is still left 
even with this fix:

```sh
kafka.api.ConsumerBounceTest > testConsumptionWithBrokerFailures FAILED
java.lang.IllegalArgumentException: You can only check the position for 
partitions assigned to this consumer.
at 
org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1271)
at 
kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:96)
at 
kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:69)
```

... I think this one should get a separate JIRA though. It seems to me that 
the behaviour of the call to `partition` when a Broker just died is a separate 
issue from the one initially reported.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/original-brownbear/kafka KAFKA-4198

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2568.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2568


commit 08460a669b4c737c20793129b01c7a3676452dac
Author: Armin Braun 
Date:   2017-02-18T08:11:10Z

KAFKA-4198: Cleaner ExecutorService Handling

commit 128db5ea5afeb940f108e95c9bcaad84c9889b10
Author: Armin Braun 
Date:   2017-02-18T09:12:15Z

KAFKA-4198: Ensure Fresh MetaData

commit 847d001e17c2baaba18936cc5c497756154c931a
Author: Armin Braun 
Date:   2017-02-18T10:27:37Z

KAFKA-4198: Revert Test Change

commit 005ff8f4a180a7c2c45313accff6627e90e9983a
Author: Armin Braun 
Date:   2017-02-18T11:39:47Z

KAFKA-4198: Fix RunCondition in KafkaServer#shutdown

commit 9559ad387bba6d24a0ba5f244aae5f6d32a897f1
Author: Armin Braun 
Date:   2017-02-18T11:41:00Z

KAFKA-4198: Revert Experimental Change to KafkaConsumer

commit d2f138c9f01800219fcd02a625a9f89b9315fd73
Author: Armin Braun 
Date:   2017-02-18T12:04:14Z

KAFKA-4198: Revert Experimental Change to KafkaServerTestHarness

commit 8cfb45240eda64cc358303c2533aef6c50f69225
Author: Armin Braun 
Date:   2017-02-18T12:06:28Z

KAFKA-4198: Revert Experimental Change to ConsumerBounceTest




> Transient test failure: ConsumerBounceTest.testConsumptionWithBrokerFailures
> 
>
> Key: KAFKA-

[GitHub] kafka pull request #2568: Kafka 4198: Fix Race Condition in KafkaServer Shut...

2017-02-18 Thread original-brownbear
GitHub user original-brownbear opened a pull request:

https://github.com/apache/kafka/pull/2568

Kafka 4198: Fix Race Condition in KafkaServer Shutdown

Fixes the initially reported issue in 
https://issues.apache.org/jira/browse/KAFKA-4198.

The relevant part in fixing the initial issue here is the change to 
`kafka.server.KafkaServer#shutdown`.

It contained this step:

```java
  val canShutdown = isShuttingDown.compareAndSet(false, true)
  if (canShutdown && shutdownLatch.getCount > 0) {
```

without any fallback for the case of `shutdownLatch.getCount == 0`. So in 
the case of `shutdownLatch.getCount == 0`  (when a previous call to the 
shutdown method was right about to finish) you would set `isShuttingDown` to 
true again without any possibility of ever getting the server started (since 
`startup` will check `isShuttingDown` before setting up a new latch with count 
1).

Long story short: concurrent calls to shutdown can get the server locked in 
a broken state.

This fixes the reported error:

```sh
java.lang.IllegalStateException: Kafka server is still shutting down, 
cannot re-start!
at kafka.server.KafkaServer.startup(KafkaServer.scala:184)
at 
kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply$mcVI$sp(KafkaServerTestHarness.scala:117)
at 
kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
at 
kafka.integration.KafkaServerTestHarness$$anonfun$restartDeadBrokers$2.apply(KafkaServerTestHarness.scala:116)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at 
kafka.integration.KafkaServerTestHarness$class.restartDeadBrokers(KafkaServerTestHarness.scala:116)
at 
kafka.api.ConsumerBounceTest.restartDeadBrokers(ConsumerBounceTest.scala:34)
at 
kafka.api.ConsumerBounceTest$BounceBrokerScheduler.doWork(ConsumerBounceTest.scala:158)
```

That said this error (reported in a comment to the JIRA)  is still left 
even with this fix:

```sh
kafka.api.ConsumerBounceTest > testConsumptionWithBrokerFailures FAILED
java.lang.IllegalArgumentException: You can only check the position for 
partitions assigned to this consumer.
at 
org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1271)
at 
kafka.api.ConsumerBounceTest.consumeWithBrokerFailures(ConsumerBounceTest.scala:96)
at 
kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures(ConsumerBounceTest.scala:69)
```

... I think this one should get a separate JIRA though. It seems to me that 
the behaviour of the call to `partition` when a Broker just died is a separate 
issue from the one initially reported.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/original-brownbear/kafka KAFKA-4198

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2568.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2568


commit 08460a669b4c737c20793129b01c7a3676452dac
Author: Armin Braun 
Date:   2017-02-18T08:11:10Z

KAFKA-4198: Cleaner ExecutorService Handling

commit 128db5ea5afeb940f108e95c9bcaad84c9889b10
Author: Armin Braun 
Date:   2017-02-18T09:12:15Z

KAFKA-4198: Ensure Fresh MetaData

commit 847d001e17c2baaba18936cc5c497756154c931a
Author: Armin Braun 
Date:   2017-02-18T10:27:37Z

KAFKA-4198: Revert Test Change

commit 005ff8f4a180a7c2c45313accff6627e90e9983a
Author: Armin Braun 
Date:   2017-02-18T11:39:47Z

KAFKA-4198: Fix RunCondition in KafkaServer#shutdown

commit 9559ad387bba6d24a0ba5f244aae5f6d32a897f1
Author: Armin Braun 
Date:   2017-02-18T11:41:00Z

KAFKA-4198: Revert Experimental Change to KafkaConsumer

commit d2f138c9f01800219fcd02a625a9f89b9315fd73
Author: Armin Braun 
Date:   2017-02-18T12:04:14Z

KAFKA-4198: Revert Experimental Change to KafkaServerTestHarness

commit 8cfb45240eda64cc358303c2533aef6c50f69225
Author: Armin Braun 
Date:   2017-02-18T12:06:28Z

KAFKA-4198: Revert Experimental Change to ConsumerBounceTest




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---