Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-04-12 Thread Rajini Sivaram
Jun's tweaked proposal sounds good to me. In terms of completing KIP-43,
this changes the format of the request-response for exchanging mechanisms,
but not the overall logic. Since the request format in KIP-43 is worth
changing anyway, I will update the KIP and the PR.

On Tue, Apr 12, 2016 at 2:13 AM, Jun Rao  wrote:

> Magnus,
>
> That sounds reasonable. To reduce the changes on the server side, I'd
> suggest the following minor tweaks on the proposal.
>
> 1. Continue supporting the separate SASL and SASL_SSL port.
>
> On SASL port, we support the new sequence
> ApiVersionRequest (optional), SaslHandshakeRequest, SASL tokens,
> regular
> requests
>
> On SASL_SSL port, we support the new sequence
> SSL handshake bytes, ApiVersionRequest (optional),
> SaslHandshakeRequest,
> SASL tokens, regular requests
>
> 2. We don't wrap SASL tokens in Kafka protocol. Similar to your argument
> about SSL handshake, those SASL tokens are generated by SASL library and
> Kafka doesn't really control its versioning. Kafka only controls the
> selection of SASL mechanism, which will be versioned in
> SaslHandshakeRequest.
>
> Does that work for you?
>
> Thanks,
>
> Jun
>
>
> On Mon, Apr 11, 2016 at 11:15 AM, Magnus Edenhill 
> wrote:
>
> > Hey Jun, see inline
> >
> > 2016-04-11 19:19 GMT+02:00 Jun Rao :
> >
> > > Hi, Magnus,
> > >
> > > Let me understand your proposal in more details just from the client's
> > > perspective. My understanding of your proposal is the following.
> > >
> > > On plaintext port, the client will send the following bytes in order.
> > > ApiVersionRequest, SaslHandshakeRequest, SASL tokens (if SASL is
> > > enabled), regular requests
> > >
> > > On SSL port, the client will send the following bytes in order.
> > > SSL handshake bytes, ApiVersionRequest, SaslHandshakeRequest, SASL
> > > tokens (if SASL is enabled), regular requests
> > >
> >
> >
> > Yup!
> > "SASL tokens" is a series of proper Kafka protocol SaslHandshakeRequests
> > until
> > the handshake is done.
> >
> >
> >
> > >
> > > Is that right? Since we can use either SSL or SASL for authentication,
> > it's
> > > weird that in one case, we require ApiVersionRequest to happen before
> > > authentication and in another case we require the reverse.
> > >
> >
> > Since the SSL/TLS is standardised and taken care of for us by the SSL
> > libraries it
> > doesnt make sense to reimplement that on top of Kafka, so it isn't really
> > comparable.
> > But for SASL there is no standardised handshake protocol so we must
> either
> > conceive one from scratch, or use the protocol that we already have
> > (Kafka).
> > For the initial SASL implementation in 0.9 the first option was chosen
> and
> > while
> > it required a new protocol implementation in supporting clients and the
> > broker
> > it served its purpose. But not for long,  it already needs to evolve,
> > and this gives us a golden[1] opportunity to make the implementation
> > reusable, evolvable, less complex
> > and in line with all our other protocol work, by using the  protocol
> stack
> > of Kafka which the
> > broker and all clients already have in place.
> >
> > Not taking this chance and instead diverging the custom SASL handshake
> > protocol
> > even further from Kafka seems to me a weird choice.
> >
> > The current KIP-43 proposal does not have a clear compatibility story; it
> > doesnt seem to be possible
> > to upgrade clients before brokers, while this might be okay for the Java
> > client, the KIP-35 discussion
> > has hopefully proven that this assumption can't be made for the entire
> > eco-system.
> >
> > Let me be clear that there isn't anything technically wrong with the
> KIP-43
> > proposal (well,
> > except for the hack to check byte[0] for 0x60 perhaps), but I'm worried
> the
> > proposal will eventually lead to
> > reimplementing Api Versioning, KIP-35, etc, in the custom SASL handshake,
> > and this is just redundant,
> > there is no technical reason for doing so and it'll just make protocol
> > semantics and implementations more complex.
> >
> >
> > Regards,
> > Magnus
> >
> > [1]: Timing is good for this change since only two clients, Java and C,
> > currently supports
> > the existing SASL handshake so far.
> >
> >
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Mon, Apr 11, 2016 at 12:20 AM, Magnus Edenhill 
> > > wrote:
> > >
> > > > 2016-04-11 3:01 GMT+02:00 Jun Rao :
> > > >
> > > > > Thinking about ApiVersionRequest a bit more. There are quite a few
> > > things
> > > > > special about it. In the ideal case, (1) its version should never
> > > change;
> > > > >
> > > >
> > > > The only thing we know of the future is that we dont know anything,
> we
> > > > can't
> > > > think of every possible future use case, that's why need to be able
> to
> > > > evolve interfaces
> > > > as requirements and use-cases change. This is the gist of KIP-35, and
> > > > hampering
> > > > KIP-35 itself, by not letting it also evolve, does not seem right to
> me
> 

[jira] [Created] (KAFKA-3547) Broker does not disconnect client on unknown request

2016-04-12 Thread Magnus Edenhill (JIRA)
Magnus Edenhill created KAFKA-3547:
--

 Summary: Broker does not disconnect client on unknown request
 Key: KAFKA-3547
 URL: https://issues.apache.org/jira/browse/KAFKA-3547
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.9.0.1, 0.9.0.0
Reporter: Magnus Edenhill
Priority: Critical
 Fix For: 0.10.0.0


A regression in 0.9.0 causes the broker to not close a client connection when 
receiving an unsupported request.

Two effects of this are:
 - the client is not informed that the request was not supported (even though a 
closed connection is a blunt indication it is infact some indication), the 
request will (hopefully) just time out on the client after some time, stalling 
sub-sequent operations.
 - the broker leaks the connection until the connection reaper brings it down 
or the client closes the connection





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3547) Broker does not disconnect client on unknown request

2016-04-12 Thread Magnus Edenhill (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236783#comment-15236783
 ] 

Magnus Edenhill commented on KAFKA-3547:


Fixed in
https://github.com/apache/kafka/commit/bb643f83a95e9ebb3a9f8331fe9998c8722c07d4#diff-d0332a0ff31df50afce3809d90505b25R80

> Broker does not disconnect client on unknown request
> 
>
> Key: KAFKA-3547
> URL: https://issues.apache.org/jira/browse/KAFKA-3547
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.0, 0.9.0.1
>Reporter: Magnus Edenhill
>Priority: Critical
> Fix For: 0.10.0.0
>
>
> A regression in 0.9.0 causes the broker to not close a client connection when 
> receiving an unsupported request.
> Two effects of this are:
>  - the client is not informed that the request was not supported (even though 
> a closed connection is a blunt indication it is infact some indication), the 
> request will (hopefully) just time out on the client after some time, 
> stalling sub-sequent operations.
>  - the broker leaks the connection until the connection reaper brings it down 
> or the client closes the connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-04-12 Thread Ismael Juma
Hi Jun,

I understand the point about the SASL tokens being similar to the SSL
handshake in a way. However, is there any SASL library that handles the
network communication for these tokens? I couldn't find any and without
that, there isn't much benefit in deviating from Kafka's protocol (we
basically save the space taken by the request header). It's worth
mentioning that we are already adding the message size before the opaque
bytes provided by the library, so one could say we are already extending
the protocol.

If we leave versioning aside, adding the standard Kafka request header to
those messages may also help from a debugging perspective as would then
include client id and correlation id along with the message.

Ismael

On Tue, Apr 12, 2016 at 2:13 AM, Jun Rao  wrote:

> Magnus,
>
> That sounds reasonable. To reduce the changes on the server side, I'd
> suggest the following minor tweaks on the proposal.
>
> 1. Continue supporting the separate SASL and SASL_SSL port.
>
> On SASL port, we support the new sequence
> ApiVersionRequest (optional), SaslHandshakeRequest, SASL tokens,
> regular
> requests
>
> On SASL_SSL port, we support the new sequence
> SSL handshake bytes, ApiVersionRequest (optional),
> SaslHandshakeRequest,
> SASL tokens, regular requests
>
> 2. We don't wrap SASL tokens in Kafka protocol. Similar to your argument
> about SSL handshake, those SASL tokens are generated by SASL library and
> Kafka doesn't really control its versioning. Kafka only controls the
> selection of SASL mechanism, which will be versioned in
> SaslHandshakeRequest.
>
> Does that work for you?
>
> Thanks,
>
> Jun
>
>
> On Mon, Apr 11, 2016 at 11:15 AM, Magnus Edenhill 
> wrote:
>
> > Hey Jun, see inline
> >
> > 2016-04-11 19:19 GMT+02:00 Jun Rao :
> >
> > > Hi, Magnus,
> > >
> > > Let me understand your proposal in more details just from the client's
> > > perspective. My understanding of your proposal is the following.
> > >
> > > On plaintext port, the client will send the following bytes in order.
> > > ApiVersionRequest, SaslHandshakeRequest, SASL tokens (if SASL is
> > > enabled), regular requests
> > >
> > > On SSL port, the client will send the following bytes in order.
> > > SSL handshake bytes, ApiVersionRequest, SaslHandshakeRequest, SASL
> > > tokens (if SASL is enabled), regular requests
> > >
> >
> >
> > Yup!
> > "SASL tokens" is a series of proper Kafka protocol SaslHandshakeRequests
> > until
> > the handshake is done.
> >
> >
> >
> > >
> > > Is that right? Since we can use either SSL or SASL for authentication,
> > it's
> > > weird that in one case, we require ApiVersionRequest to happen before
> > > authentication and in another case we require the reverse.
> > >
> >
> > Since the SSL/TLS is standardised and taken care of for us by the SSL
> > libraries it
> > doesnt make sense to reimplement that on top of Kafka, so it isn't really
> > comparable.
> > But for SASL there is no standardised handshake protocol so we must
> either
> > conceive one from scratch, or use the protocol that we already have
> > (Kafka).
> > For the initial SASL implementation in 0.9 the first option was chosen
> and
> > while
> > it required a new protocol implementation in supporting clients and the
> > broker
> > it served its purpose. But not for long,  it already needs to evolve,
> > and this gives us a golden[1] opportunity to make the implementation
> > reusable, evolvable, less complex
> > and in line with all our other protocol work, by using the  protocol
> stack
> > of Kafka which the
> > broker and all clients already have in place.
> >
> > Not taking this chance and instead diverging the custom SASL handshake
> > protocol
> > even further from Kafka seems to me a weird choice.
> >
> > The current KIP-43 proposal does not have a clear compatibility story; it
> > doesnt seem to be possible
> > to upgrade clients before brokers, while this might be okay for the Java
> > client, the KIP-35 discussion
> > has hopefully proven that this assumption can't be made for the entire
> > eco-system.
> >
> > Let me be clear that there isn't anything technically wrong with the
> KIP-43
> > proposal (well,
> > except for the hack to check byte[0] for 0x60 perhaps), but I'm worried
> the
> > proposal will eventually lead to
> > reimplementing Api Versioning, KIP-35, etc, in the custom SASL handshake,
> > and this is just redundant,
> > there is no technical reason for doing so and it'll just make protocol
> > semantics and implementations more complex.
> >
> >
> > Regards,
> > Magnus
> >
> > [1]: Timing is good for this change since only two clients, Java and C,
> > currently supports
> > the existing SASL handshake so far.
> >
> >
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Mon, Apr 11, 2016 at 12:20 AM, Magnus Edenhill 
> > > wrote:
> > >
> > > > 2016-04-11 3:01 GMT+02:00 Jun Rao :
> > > >
> > > > > Thinking about ApiVersionRequest a bit more. There are quite a few
> > > things
> > > > > special about i

Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-04-12 Thread Rajini Sivaram
Ismael,

My only concern about wrapping SASL tokens in Kafka headers is backward
compatibility. We would either have a different format for GSSAPI alone to
match 0.9.0.x or we would need to support two different wire protocols for
GSSAPI. Neither sounds ideal.

On Tue, Apr 12, 2016 at 9:18 AM, Ismael Juma  wrote:

> Hi Jun,
>
> I understand the point about the SASL tokens being similar to the SSL
> handshake in a way. However, is there any SASL library that handles the
> network communication for these tokens? I couldn't find any and without
> that, there isn't much benefit in deviating from Kafka's protocol (we
> basically save the space taken by the request header). It's worth
> mentioning that we are already adding the message size before the opaque
> bytes provided by the library, so one could say we are already extending
> the protocol.
>
> If we leave versioning aside, adding the standard Kafka request header to
> those messages may also help from a debugging perspective as would then
> include client id and correlation id along with the message.
>
> Ismael
>
> On Tue, Apr 12, 2016 at 2:13 AM, Jun Rao  wrote:
>
> > Magnus,
> >
> > That sounds reasonable. To reduce the changes on the server side, I'd
> > suggest the following minor tweaks on the proposal.
> >
> > 1. Continue supporting the separate SASL and SASL_SSL port.
> >
> > On SASL port, we support the new sequence
> > ApiVersionRequest (optional), SaslHandshakeRequest, SASL tokens,
> > regular
> > requests
> >
> > On SASL_SSL port, we support the new sequence
> > SSL handshake bytes, ApiVersionRequest (optional),
> > SaslHandshakeRequest,
> > SASL tokens, regular requests
> >
> > 2. We don't wrap SASL tokens in Kafka protocol. Similar to your argument
> > about SSL handshake, those SASL tokens are generated by SASL library and
> > Kafka doesn't really control its versioning. Kafka only controls the
> > selection of SASL mechanism, which will be versioned in
> > SaslHandshakeRequest.
> >
> > Does that work for you?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Apr 11, 2016 at 11:15 AM, Magnus Edenhill 
> > wrote:
> >
> > > Hey Jun, see inline
> > >
> > > 2016-04-11 19:19 GMT+02:00 Jun Rao :
> > >
> > > > Hi, Magnus,
> > > >
> > > > Let me understand your proposal in more details just from the
> client's
> > > > perspective. My understanding of your proposal is the following.
> > > >
> > > > On plaintext port, the client will send the following bytes in order.
> > > > ApiVersionRequest, SaslHandshakeRequest, SASL tokens (if SASL is
> > > > enabled), regular requests
> > > >
> > > > On SSL port, the client will send the following bytes in order.
> > > > SSL handshake bytes, ApiVersionRequest, SaslHandshakeRequest,
> SASL
> > > > tokens (if SASL is enabled), regular requests
> > > >
> > >
> > >
> > > Yup!
> > > "SASL tokens" is a series of proper Kafka protocol
> SaslHandshakeRequests
> > > until
> > > the handshake is done.
> > >
> > >
> > >
> > > >
> > > > Is that right? Since we can use either SSL or SASL for
> authentication,
> > > it's
> > > > weird that in one case, we require ApiVersionRequest to happen before
> > > > authentication and in another case we require the reverse.
> > > >
> > >
> > > Since the SSL/TLS is standardised and taken care of for us by the SSL
> > > libraries it
> > > doesnt make sense to reimplement that on top of Kafka, so it isn't
> really
> > > comparable.
> > > But for SASL there is no standardised handshake protocol so we must
> > either
> > > conceive one from scratch, or use the protocol that we already have
> > > (Kafka).
> > > For the initial SASL implementation in 0.9 the first option was chosen
> > and
> > > while
> > > it required a new protocol implementation in supporting clients and the
> > > broker
> > > it served its purpose. But not for long,  it already needs to evolve,
> > > and this gives us a golden[1] opportunity to make the implementation
> > > reusable, evolvable, less complex
> > > and in line with all our other protocol work, by using the  protocol
> > stack
> > > of Kafka which the
> > > broker and all clients already have in place.
> > >
> > > Not taking this chance and instead diverging the custom SASL handshake
> > > protocol
> > > even further from Kafka seems to me a weird choice.
> > >
> > > The current KIP-43 proposal does not have a clear compatibility story;
> it
> > > doesnt seem to be possible
> > > to upgrade clients before brokers, while this might be okay for the
> Java
> > > client, the KIP-35 discussion
> > > has hopefully proven that this assumption can't be made for the entire
> > > eco-system.
> > >
> > > Let me be clear that there isn't anything technically wrong with the
> > KIP-43
> > > proposal (well,
> > > except for the hack to check byte[0] for 0x60 perhaps), but I'm worried
> > the
> > > proposal will eventually lead to
> > > reimplementing Api Versioning, KIP-35, etc, in the custom SASL
> handshake,
> > > and this is just redundant,
> > > there is

Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-04-12 Thread Ismael Juma
Hi Rajini,

Yes, I agree that it's not ideal. However, doing this once at the broker is
more manageable than pushing the additional complexity to the clients.
Between the two options you outlined, the second one seemed the least bad
(we do something similar for controlled shutdown because it was missing a
header before 0.9.0.0).

Ismael

On Tue, Apr 12, 2016 at 9:56 AM, Rajini Sivaram <
rajinisiva...@googlemail.com> wrote:

> Ismael,
>
> My only concern about wrapping SASL tokens in Kafka headers is backward
> compatibility. We would either have a different format for GSSAPI alone to
> match 0.9.0.x or we would need to support two different wire protocols for
> GSSAPI. Neither sounds ideal.
>
> On Tue, Apr 12, 2016 at 9:18 AM, Ismael Juma  wrote:
>
> > Hi Jun,
> >
> > I understand the point about the SASL tokens being similar to the SSL
> > handshake in a way. However, is there any SASL library that handles the
> > network communication for these tokens? I couldn't find any and without
> > that, there isn't much benefit in deviating from Kafka's protocol (we
> > basically save the space taken by the request header). It's worth
> > mentioning that we are already adding the message size before the opaque
> > bytes provided by the library, so one could say we are already extending
> > the protocol.
> >
> > If we leave versioning aside, adding the standard Kafka request header to
> > those messages may also help from a debugging perspective as would then
> > include client id and correlation id along with the message.
> >
> > Ismael
> >
> > On Tue, Apr 12, 2016 at 2:13 AM, Jun Rao  wrote:
> >
> > > Magnus,
> > >
> > > That sounds reasonable. To reduce the changes on the server side, I'd
> > > suggest the following minor tweaks on the proposal.
> > >
> > > 1. Continue supporting the separate SASL and SASL_SSL port.
> > >
> > > On SASL port, we support the new sequence
> > > ApiVersionRequest (optional), SaslHandshakeRequest, SASL tokens,
> > > regular
> > > requests
> > >
> > > On SASL_SSL port, we support the new sequence
> > > SSL handshake bytes, ApiVersionRequest (optional),
> > > SaslHandshakeRequest,
> > > SASL tokens, regular requests
> > >
> > > 2. We don't wrap SASL tokens in Kafka protocol. Similar to your
> argument
> > > about SSL handshake, those SASL tokens are generated by SASL library
> and
> > > Kafka doesn't really control its versioning. Kafka only controls the
> > > selection of SASL mechanism, which will be versioned in
> > > SaslHandshakeRequest.
> > >
> > > Does that work for you?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Mon, Apr 11, 2016 at 11:15 AM, Magnus Edenhill 
> > > wrote:
> > >
> > > > Hey Jun, see inline
> > > >
> > > > 2016-04-11 19:19 GMT+02:00 Jun Rao :
> > > >
> > > > > Hi, Magnus,
> > > > >
> > > > > Let me understand your proposal in more details just from the
> > client's
> > > > > perspective. My understanding of your proposal is the following.
> > > > >
> > > > > On plaintext port, the client will send the following bytes in
> order.
> > > > > ApiVersionRequest, SaslHandshakeRequest, SASL tokens (if SASL
> is
> > > > > enabled), regular requests
> > > > >
> > > > > On SSL port, the client will send the following bytes in order.
> > > > > SSL handshake bytes, ApiVersionRequest, SaslHandshakeRequest,
> > SASL
> > > > > tokens (if SASL is enabled), regular requests
> > > > >
> > > >
> > > >
> > > > Yup!
> > > > "SASL tokens" is a series of proper Kafka protocol
> > SaslHandshakeRequests
> > > > until
> > > > the handshake is done.
> > > >
> > > >
> > > >
> > > > >
> > > > > Is that right? Since we can use either SSL or SASL for
> > authentication,
> > > > it's
> > > > > weird that in one case, we require ApiVersionRequest to happen
> before
> > > > > authentication and in another case we require the reverse.
> > > > >
> > > >
> > > > Since the SSL/TLS is standardised and taken care of for us by the SSL
> > > > libraries it
> > > > doesnt make sense to reimplement that on top of Kafka, so it isn't
> > really
> > > > comparable.
> > > > But for SASL there is no standardised handshake protocol so we must
> > > either
> > > > conceive one from scratch, or use the protocol that we already have
> > > > (Kafka).
> > > > For the initial SASL implementation in 0.9 the first option was
> chosen
> > > and
> > > > while
> > > > it required a new protocol implementation in supporting clients and
> the
> > > > broker
> > > > it served its purpose. But not for long,  it already needs to evolve,
> > > > and this gives us a golden[1] opportunity to make the implementation
> > > > reusable, evolvable, less complex
> > > > and in line with all our other protocol work, by using the  protocol
> > > stack
> > > > of Kafka which the
> > > > broker and all clients already have in place.
> > > >
> > > > Not taking this chance and instead diverging the custom SASL
> handshake
> > > > protocol
> > > > even further from Kafka seems to me a weird choice.
> > > >
> > > >

[jira] [Resolved] (KAFKA-3205) Error in I/O with host (java.io.EOFException) raised in producer

2016-04-12 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira resolved KAFKA-3205.
-
   Resolution: Won't Fix
 Reviewer: Flavio Junqueira
Fix Version/s: (was: 0.10.0.0)

The changes currently in 0.9+ doesn't have as many messages printed out because 
both ends, client and server, enforce the connection timeout. The change 
discussed in the pull request doesn't print it in the case of a passive close 
initiated by the server (in 0.9 the timeout is enforced), which is desirable 
only because it pollutes the logs otherwise. It is better that we keep these 
messages in 0.9 and later to be informed of connections being closed. They are 
not supposed to happen very often, but if it turns out to be a problem, we can 
revisit this issue.   

> Error in I/O with host (java.io.EOFException) raised in producer
> 
>
> Key: KAFKA-3205
> URL: https://issues.apache.org/jira/browse/KAFKA-3205
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.8.2.1, 0.9.0.0
>Reporter: Jonathan Raffre
>
> In a situation with a Kafka broker in 0.9 and producers still in 0.8.2.x, 
> producers seems to raise the following after a variable amount of time since 
> start :
> {noformat}
> 2016-01-29 14:33:13,066 WARN [] o.a.k.c.n.Selector: Error in I/O with 
> 172.22.2.170
> java.io.EOFException: null
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62)
>  ~[org.apache.kafka.kafka-clients-0.8.2.0.jar:na]
> at org.apache.kafka.common.network.Selector.poll(Selector.java:248) 
> ~[org.apache.kafka.kafka-clients-0.8.2.0.jar:na]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192) 
> [org.apache.kafka.kafka-clients-0.8.2.0.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191) 
> [org.apache.kafka.kafka-clients-0.8.2.0.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122) 
> [org.apache.kafka.kafka-clients-0.8.2.0.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66-internal]
> {noformat}
> This can be reproduced successfully by doing the following :
>  * Start a 0.8.2 producer connected to the 0.9 broker
>  * Wait 15 minutes, exactly
>  * See the error in the producer logs.
> Oddly, this also shows up in an active producer but after 10 minutes of 
> activity.
> Kafka's server.properties :
> {noformat}
> broker.id=1
> listeners=PLAINTEXT://:9092
> port=9092
> num.network.threads=2
> num.io.threads=2
> socket.send.buffer.bytes=1048576
> socket.receive.buffer.bytes=1048576
> socket.request.max.bytes=104857600
> log.dirs=/mnt/data/kafka
> num.partitions=4
> auto.create.topics.enable=false
> delete.topic.enable=true
> num.recovery.threads.per.data.dir=1
> log.retention.hours=48
> log.retention.bytes=524288000
> log.segment.bytes=52428800
> log.retention.check.interval.ms=6
> log.roll.hours=24
> log.cleanup.policy=delete
> log.cleaner.enable=true
> zookeeper.connect=127.0.0.1:2181
> zookeeper.connection.timeout.ms=100
> {noformat}
> Producer's configuration :
> {noformat}
>   compression.type = none
>   metric.reporters = []
>   metadata.max.age.ms = 30
>   metadata.fetch.timeout.ms = 6
>   acks = all
>   batch.size = 16384
>   reconnect.backoff.ms = 10
>   bootstrap.servers = [127.0.0.1:9092]
>   receive.buffer.bytes = 32768
>   retry.backoff.ms = 500
>   buffer.memory = 33554432
>   timeout.ms = 3
>   key.serializer = class 
> org.apache.kafka.common.serialization.StringSerializer
>   retries = 3
>   max.request.size = 500
>   block.on.buffer.full = true
>   value.serializer = class 
> org.apache.kafka.common.serialization.StringSerializer
>   metrics.sample.window.ms = 3
>   send.buffer.bytes = 131072
>   max.in.flight.requests.per.connection = 5
>   metrics.num.samples = 2
>   linger.ms = 0
>   client.id = 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3548) Locale is not handled properly in kafka-consumer

2016-04-12 Thread Tanju Cataltepe (JIRA)
Tanju Cataltepe created KAFKA-3548:
--

 Summary: Locale is not handled properly in kafka-consumer
 Key: KAFKA-3548
 URL: https://issues.apache.org/jira/browse/KAFKA-3548
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.9.0.1
Reporter: Tanju Cataltepe
Assignee: Neha Narkhede


If the JVM local language is Turkish, which has different upper case for the 
lower case letter i, the result is a runtime error caused by 
org.apache.kafka.clients.consumer.OffsetResetStrategy. More specifically an 
enum constant *EARLİEST* is generated which does not match *EARLIEST* (note the 
_dotted capital i_).

If the locale for the JVM is explicitly set to en_US, the example runs as 
expected.

A sample error log is below:
{noforma}
[akka://ReactiveKafka/user/$a] Failed to construct kafka consumer
akka.actor.ActorInitializationException: exception during creation
at akka.actor.ActorInitializationException$.apply(Actor.scala:172)
at akka.actor.ActorCell.create(ActorCell.scala:606)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:461)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:483)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282)
at akka.dispatch.Mailbox.run(Mailbox.scala:223)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka 
consumer
at 
org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:648)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:542)
at 
com.softwaremill.react.kafka.ReactiveKafkaConsumer.consumer$lzycompute(ReactiveKafkaConsumer.scala:31)
at 
com.softwaremill.react.kafka.ReactiveKafkaConsumer.consumer(ReactiveKafkaConsumer.scala:30)
at 
com.softwaremill.react.kafka.KafkaActorPublisher.(KafkaActorPublisher.scala:17)
at 
com.softwaremill.react.kafka.ReactiveKafka$$anonfun$consumerActorProps$1.apply(ReactiveKafka.scala:270)
at 
com.softwaremill.react.kafka.ReactiveKafka$$anonfun$consumerActorProps$1.apply(ReactiveKafka.scala:270)
at 
akka.actor.TypedCreatorFunctionConsumer.produce(IndirectActorProducer.scala:87)
at akka.actor.Props.newActor(Props.scala:214)
at akka.actor.ActorCell.newActor(ActorCell.scala:562)
at akka.actor.ActorCell.create(ActorCell.scala:588)
... 7 more
Caused by: java.lang.IllegalArgumentException: No enum constant 
org.apache.kafka.clients.consumer.OffsetResetStrategy.EARLİEST
at java.lang.Enum.valueOf(Enum.java:238)
at 
org.apache.kafka.clients.consumer.OffsetResetStrategy.valueOf(OffsetResetStrategy.java:15)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:588)
... 17 more
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3523) Capture org.apache.kafka.clients.consumer.CommitFailedException in UncaughtExceptionHandler

2016-04-12 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska reassigned KAFKA-3523:
---

Assignee: Eno Thereska

> Capture org.apache.kafka.clients.consumer.CommitFailedException in 
> UncaughtExceptionHandler
> ---
>
> Key: KAFKA-3523
> URL: https://issues.apache.org/jira/browse/KAFKA-3523
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: newbie, user-experience
> Fix For: 0.10.0.0
>
>
> When the sync commit failed due to an ongoing rebalance, it is thrown all the 
> way up to the main thread and cause the whole Kafka Streams application to 
> stop, even if users set UncaughtExceptionHandler. We need to be able to catch 
> this exception in that handler as well.
> Example stack trace (with UncaughtExceptionHandler set, but not been able to 
> capture this exception):
> {code}
> [2016-04-06 17:49:33,891] WARN Failed to commit StreamTask #0_0 in thread 
> [StreamThread-1]:  
> (org.apache.kafka.streams.processor.internals.StreamThread:485)
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured session.timeout.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:567)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:508)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:659)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3548) Locale is not handled properly in kafka-consumer

2016-04-12 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3548:
---
Fix Version/s: 0.10.0.0

> Locale is not handled properly in kafka-consumer
> 
>
> Key: KAFKA-3548
> URL: https://issues.apache.org/jira/browse/KAFKA-3548
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Tanju Cataltepe
>Assignee: Neha Narkhede
> Fix For: 0.10.0.0
>
>
> If the JVM local language is Turkish, which has different upper case for the 
> lower case letter i, the result is a runtime error caused by 
> org.apache.kafka.clients.consumer.OffsetResetStrategy. More specifically an 
> enum constant *EARLİEST* is generated which does not match *EARLIEST* (note 
> the _dotted capital i_).
> If the locale for the JVM is explicitly set to en_US, the example runs as 
> expected.
> A sample error log is below:
> {noforma}
> [akka://ReactiveKafka/user/$a] Failed to construct kafka consumer
> akka.actor.ActorInitializationException: exception during creation
> at akka.actor.ActorInitializationException$.apply(Actor.scala:172)
> at akka.actor.ActorCell.create(ActorCell.scala:606)
> at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:461)
> at akka.actor.ActorCell.systemInvoke(ActorCell.scala:483)
> at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282)
> at akka.dispatch.Mailbox.run(Mailbox.scala:223)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka 
> consumer
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:648)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:542)
> at 
> com.softwaremill.react.kafka.ReactiveKafkaConsumer.consumer$lzycompute(ReactiveKafkaConsumer.scala:31)
> at 
> com.softwaremill.react.kafka.ReactiveKafkaConsumer.consumer(ReactiveKafkaConsumer.scala:30)
> at 
> com.softwaremill.react.kafka.KafkaActorPublisher.(KafkaActorPublisher.scala:17)
> at 
> com.softwaremill.react.kafka.ReactiveKafka$$anonfun$consumerActorProps$1.apply(ReactiveKafka.scala:270)
> at 
> com.softwaremill.react.kafka.ReactiveKafka$$anonfun$consumerActorProps$1.apply(ReactiveKafka.scala:270)
> at 
> akka.actor.TypedCreatorFunctionConsumer.produce(IndirectActorProducer.scala:87)
> at akka.actor.Props.newActor(Props.scala:214)
> at akka.actor.ActorCell.newActor(ActorCell.scala:562)
> at akka.actor.ActorCell.create(ActorCell.scala:588)
> ... 7 more
> Caused by: java.lang.IllegalArgumentException: No enum constant 
> org.apache.kafka.clients.consumer.OffsetResetStrategy.EARLİEST
> at java.lang.Enum.valueOf(Enum.java:238)
> at 
> org.apache.kafka.clients.consumer.OffsetResetStrategy.valueOf(OffsetResetStrategy.java:15)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:588)
> ... 17 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3548) Locale is not handled properly in kafka-consumer

2016-04-12 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237186#comment-15237186
 ] 

Ismael Juma commented on KAFKA-3548:


A possible solution is to pass a `Locale` to `String.toUpperCase()`. However, 
we should probably fix all cases of this instead of the single one.

> Locale is not handled properly in kafka-consumer
> 
>
> Key: KAFKA-3548
> URL: https://issues.apache.org/jira/browse/KAFKA-3548
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Tanju Cataltepe
>Assignee: Neha Narkhede
> Fix For: 0.10.0.0
>
>
> If the JVM local language is Turkish, which has different upper case for the 
> lower case letter i, the result is a runtime error caused by 
> org.apache.kafka.clients.consumer.OffsetResetStrategy. More specifically an 
> enum constant *EARLİEST* is generated which does not match *EARLIEST* (note 
> the _dotted capital i_).
> If the locale for the JVM is explicitly set to en_US, the example runs as 
> expected.
> A sample error log is below:
> {noforma}
> [akka://ReactiveKafka/user/$a] Failed to construct kafka consumer
> akka.actor.ActorInitializationException: exception during creation
> at akka.actor.ActorInitializationException$.apply(Actor.scala:172)
> at akka.actor.ActorCell.create(ActorCell.scala:606)
> at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:461)
> at akka.actor.ActorCell.systemInvoke(ActorCell.scala:483)
> at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282)
> at akka.dispatch.Mailbox.run(Mailbox.scala:223)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka 
> consumer
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:648)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:542)
> at 
> com.softwaremill.react.kafka.ReactiveKafkaConsumer.consumer$lzycompute(ReactiveKafkaConsumer.scala:31)
> at 
> com.softwaremill.react.kafka.ReactiveKafkaConsumer.consumer(ReactiveKafkaConsumer.scala:30)
> at 
> com.softwaremill.react.kafka.KafkaActorPublisher.(KafkaActorPublisher.scala:17)
> at 
> com.softwaremill.react.kafka.ReactiveKafka$$anonfun$consumerActorProps$1.apply(ReactiveKafka.scala:270)
> at 
> com.softwaremill.react.kafka.ReactiveKafka$$anonfun$consumerActorProps$1.apply(ReactiveKafka.scala:270)
> at 
> akka.actor.TypedCreatorFunctionConsumer.produce(IndirectActorProducer.scala:87)
> at akka.actor.Props.newActor(Props.scala:214)
> at akka.actor.ActorCell.newActor(ActorCell.scala:562)
> at akka.actor.ActorCell.create(ActorCell.scala:588)
> ... 7 more
> Caused by: java.lang.IllegalArgumentException: No enum constant 
> org.apache.kafka.clients.consumer.OffsetResetStrategy.EARLİEST
> at java.lang.Enum.valueOf(Enum.java:238)
> at 
> org.apache.kafka.clients.consumer.OffsetResetStrategy.valueOf(OffsetResetStrategy.java:15)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:588)
> ... 17 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3511) Provide built-in aggregators sum() and avg() in Kafka Streams DSL

2016-04-12 Thread Michael Noll (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237291#comment-15237291
 ] 

Michael Noll commented on KAFKA-3511:
-

[~guozhang]: Could we leverage the parent class 
[java.lang.Number|https://docs.oracle.com/javase/7/docs/api/java/lang/Number.html]
 somehow?

> Provide built-in aggregators sum() and avg() in Kafka Streams DSL
> -
>
> Key: KAFKA-3511
> URL: https://issues.apache.org/jira/browse/KAFKA-3511
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: api, newbie
> Fix For: 0.10.0.0
>
>
> Currently we only have one built-in aggregate function count() in the Kafka 
> Streams DSL, but we want to add more aggregation functions like sum() and 
> avg().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-4 Metadata Schema (Round 2)

2016-04-12 Thread Dana Powers
+1
On Apr 11, 2016 21:55, "Gwen Shapira"  wrote:

> +1
>
> On Mon, Apr 11, 2016 at 10:42 AM, Grant Henke  wrote:
> > Based on the discussion in the previous vote thread
> > <
> http://search-hadoop.com/m/uyzND1xlaiU10QlYX&subj=+VOTE+KIP+4+Metadata+Schema
> >
> > I also would like to include a behavior change to the MetadataResponse. I
> > have update the wiki
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-MetadataSchema
> >
> > and pull request  to include
> > this change.
> >
> > The change as described on the wiki is:
> >
> >> The behavior of the replicas and isr arrays will be changed in order to
> >> support the admin tools, and better represent the state of the cluster:
> >>
> >>- In version 0, if a broker is down the replicas and isr array will
> >>omit the brokers entry and add a REPLICA_NOT_AVAILABLE error code.
> >>- In version 1, no error code will be set and a the broker id will be
> >>included in the replicas and isr array.
> >>   - Note: A user can still detect if the replica is not available,
> by
> >>   checking if the broker is in the returned broker list.
> >>
> >>
> >
> > Being optimistic that this doesn't require to much discussion, I would
> like
> > to re-start the voting process on this thread. If more discussion is
> > needed, please don't hesitate to bring it up here.
> >
> > Ismael, Gwen, Guozhang could you please review and revote based on the
> > changes.
> >
> > Thank you,
> > Grant
> >
> > On Sat, Apr 9, 2016 at 1:03 PM, Guozhang Wang 
> wrote:
> >
> >> +1
> >>
> >> On Fri, Apr 8, 2016 at 4:36 PM, Gwen Shapira  wrote:
> >>
> >> > +1
> >> >
> >> > On Fri, Apr 8, 2016 at 2:41 PM, Grant Henke 
> wrote:
> >> >
> >> > > I would like to re-initiate the voting process for the "KIP-4
> Metadata
> >> > > Schema changes". This is not a vote for all of KIP-4, but
> specifically
> >> > for
> >> > > the metadata changes. I have included the exact changes below for
> >> > clarity:
> >> > > >
> >> > > > Metadata Request (version 1)
> >> > > >
> >> > > >
> >> > > >
> >> > > > MetadataRequest => [topics]
> >> > > >
> >> > > > Stays the same as version 0 however behavior changes.
> >> > > > In version 0 there was no way to request no topics, and and empty
> >> list
> >> > > > signified all topics.
> >> > > > In version 1 a null topics list (size -1 on the wire) will
> indicate
> >> > that
> >> > > a
> >> > > > user wants *ALL* topic metadata. Compared to an empty list (size
> 0)
> >> > which
> >> > > > indicates metadata for *NO* topics should be returned.
> >> > > > Metadata Response (version 1)
> >> > > >
> >> > > >
> >> > > >
> >> > > > MetadataResponse => [brokers] controllerId [topic_metadata]
> >> > > >   brokers => node_id host port rack
> >> > > > node_id => INT32
> >> > > > host => STRING
> >> > > > port => INT32
> >> > > > rack => NULLABLE_STRING
> >> > > >   controllerId => INT32
> >> > > >   topic_metadata => topic_error_code topic is_internal
> >> > > [partition_metadata]
> >> > > > topic_error_code => INT16
> >> > > > topic => STRING
> >> > > > is_internal => BOOLEAN
> >> > > > partition_metadata => partition_error_code partition_id leader
> >> > > [replicas] [isr]
> >> > > >   partition_error_code => INT16
> >> > > >   partition_id => INT32
> >> > > >   leader => INT32
> >> > > >   replicas => INT32
> >> > > >   isr => INT32
> >> > > >
> >> > > > Adds rack, controller_id, and is_internal to the version 0
> response.
> >> > > >
> >> > >
> >> > > The KIP is available here for reference (linked to the Metadata
> schema
> >> > > section):
> >> > > *
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-MetadataSchema
> >> > > <
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-MetadataSchema
> >> > > >*
> >> > >
> >> > > A pull request is available implementing the proposed changes here:
> >> > > https://github.com/apache/kafka/pull/1095
> >> > >
> >> > > Here are some links to past discussions on the mailing list:
> >> > >
> >> http://search-hadoop.com/m/uyzND1pd4T52H1m0u1&subj=Re+KIP+4+Wiki+Update
> >> > >
> >> > >
> >> >
> >>
> http://search-hadoop.com/m/uyzND1J2IXeSNXAT&subj=Metadata+and+ACLs+wire+protocol+review+KIP+4+
> >> > >
> >> > > Here is the previous vote discussion (please take a look and discuss
> >> > > there):
> >> > >
> >> > >
> >> >
> >>
> http://search-hadoop.com/m/uyzND1xlaiU10QlYX&subj=+VOTE+KIP+4+Metadata+Schema
> >> > >
> >> > > Thank you,
> >> > > Grant
> >> > > --
> >> > > Grant Henke
> >> > > Software Engineer | Cloudera
> >> > > gr...@clou

Re: [DISCUSS] KIP-43: Kafka SASL enhancements

2016-04-12 Thread Ismael Juma
Hi Jun,

Comments inline.

On Mon, Apr 11, 2016 at 1:57 AM, Jun Rao  wrote:

> Yes, that should be fine right? Since the new api key will start with a 0
> byte, it actually guarantees that it's different from 0x60 (1st byte in the
> old protocol) even if we change the request version id in the future.


Yes, this is true. Also, the GSS API library will throw an exception if the
first byte is not 0x60 (for the case where newer clients connect to older
brokers):

https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/sun/security/jgss/GSSHeader.java#L97


And the DEFECTIVE_TOKEN status code is specified in both RFC 2743[1] and
RFC 5653[2]. Section 3.1 of RFC 2743 specifies that the token tag consists
of the following elements, in order:

1. 0x60 -- Tag for [APPLICATION 0] SEQUENCE; indicates that
  -- constructed form, definite length encoding follows.

2. Token length octets ...


Ismael

[1] Generic Security Service Application Program Interface Version 2,
Update 1: https://tools.ietf.org/html/rfc2743
[2] Generic Security Service API Version 2: Java Bindings Update:
https://tools.ietf.org/html/rfc5653

Ismael


[jira] [Comment Edited] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-12 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236055#comment-15236055
 ] 

Flavio Junqueira edited comment on KAFKA-3042 at 4/12/16 2:58 PM:
--

Thanks for the logs. I still don’t have an answer, but I wanted to report some 
progress.

I tracked one of the partitions that I’ve chosen arbitrarily for which broker 1 
was complaining about the zk version: [tec1.en2.frontend.syncPing,7]. Here are 
a few things that I’ve observed:

Broker 1 is the leader of the partition:

{noformat}
[2016-04-09 00:40:52,883] TRACE Broker 1 cached leader info 
(LeaderAndIsrInfo:(Leader:1,ISR:1,LeaderEpoch:363,ControllerEpoch:414),ReplicationFactor:3),AllReplicas:1,2,4)
 for partition [tec1.en2.frontend.syncPing,7] in response to UpdateMetadata 
request sent by controller 1 epoch 414 with correlation id 876 
(state.change.logger)
{noformat}

but soon after it fails to release leadership to broker 4:

{noformat}
[2016-04-09 00:40:58,106] TRACE Broker 1 received LeaderAndIsr request 
(LeaderAndIsrInfo:(Leader:4,ISR:4,LeaderEpoch:364,ControllerEpoch:415),ReplicationFactor:3),AllReplicas:1,2,4)
 correlation id 0 from controller 5 epoch 416 for partition 
[tec1.en2.frontend.syncPing,7] (state.change.logger)

[2016-04-09 00:40:58,139] TRACE Broker 1 handling LeaderAndIsr request 
correlationId 0 from controller 5 epoch 416 starting the become-follower 
transition for partition [tec1.en2.frontend.syncPing,7] (state.change.logger)

[2016-04-09 00:40:58,144] ERROR Broker 1 received LeaderAndIsrRequest with 
correlation id 0 from controller 5 epoch 415 for partition 
[tec1.en2.frontend.syncPing,7] but cannot become follower since the new leader 
4 is unavailable. (state.change.logger)
{noformat}

Now, a bit later in the log, the broker says that it is caching the leader info 
for the partition:

{noformat}
[2016-04-09 00:42:02,456] TRACE Broker 1 cached leader info 
(LeaderAndIsrInfo:(Leader:4,ISR:4,LeaderEpoch:364,ControllerEpoch:415),ReplicationFactor:3),AllReplicas:1,2,4)
 for partition [tec1.en2.frontend.syncPing,7] in response to UpdateMetadata 
request sent by controller 5 epoch 416 with correlation id 1473 
(state.change.logger)
{noformat}

but it keeps printing the “Cached zkVersion…” errors, which indicate that the 
broker still believes it is the leader of the partition, or at least the 
variable {{leaderReplicaIdOpt}} is set this way.

I also inspected other partitions, and the behavior doesn’t seem to be 
consistent. I’ve seen at least one partition in broker 2 for which the broker 
made the appropriate transition:

{noformat}
[2016-04-09 00:39:23,840] TRACE Broker 2 received LeaderAndIsr request 
(LeaderAndIsrInfo:(Leader:3,ISR:2,3,LeaderEpoch:305,ControllerEpoch:414),ReplicationFactor:3),AllReplicas:3,2,4)
 correlation id 535 from controller 1 epoch 414 for partition 
[tec1.ono_qe1.bodydata.recordings,20] (state.change.logger)
[2016-04-09 00:39:23,841] TRACE Broker 2 handling LeaderAndIsr request 
correlationId 535 from controller 1 epoch 414 starting the become-follower 
transition for partition [tec1.ono_qe1.bodydata.recordings,20] 
(state.change.logger)
[2016-04-09 00:39:23,841] TRACE Broker 2 stopped fetchers as part of 
become-follower request from controller 1 epoch 414 with correlation id 535 for 
partition [tec1.ono_qe1.bodydata.recordings,20] (state.change.logger)
[2016-04-09 00:39:23,856] TRACE Broker 2 truncated logs and checkpointed 
recovery boundaries for partition [tec1.ono_qe1.bodydata.recordings,20] as part 
of become-follower request with correlation id 535 from controller 1 epoch 414 
(state.change.logger)
[2016-04-09 00:39:23,856] TRACE Broker 2 started fetcher to new leader as part 
of become-follower request from controller 1 epoch 414 with correlation id 535 
for partition [tec1.ono_qe1.bodydata.recordings,20] (state.change.logger)
[2016-04-09 00:39:23,856] TRACE Broker 2 completed LeaderAndIsr request 
correlationId 535 from controller 1 epoch 414 for the become-follower 
transition for partition [tec1.ono_qe1.bodydata.recordings,20] 
(state.change.logger)
{noformat}

Actually, the state-change log of broker 2 seems to have a gap starting at 
{{[2016-04-09 00:39:46,246]}}. Is it when you’ve restarted the broker? 


was (Author: fpj):
Thanks for the logs. I still don’t have an answer, but I wanted to report some 
progress.

I tracked one of the partitions that I’ve chosen arbitrarily for which broker 1 
was complaining about the zk version: [tec1.en2.frontend.syncPing,7]. Here are 
a few things that I’ve observed:

Broker 1 is the leader of the partition:

{noformat}
[2016-04-09 00:40:52,883] TRACE Broker 1 cached leader info 
(LeaderAndIsrInfo:(Leader:1,ISR:1,LeaderEpoch:363,ControllerEpoch:414),ReplicationFactor:3),AllReplicas:1,2,4)
 for partition [tec1.en2.frontend.syncPing,7] in response to UpdateMetadata 
request sent by controller 1 e

Re: [DISCUSS] KIP-35 - Retrieve protocol version

2016-04-12 Thread Jun Rao
Hi, Ismael,

The SASL engine that we used is the SASL library, right? How did the C
client generate those SASL tokens? Once a SASL mechanism is chosen, the
subsequent tokens are determined, right? So, my feeling is that those
tokens are part of SaslHandshakeRequest and are just extended across
multiple network packets. So modeling those as independent requests feels
weird. When documentation them, we really need to document those as a
sequence, not individual isolated requests that can be issued
in arbitrary order. The version id will only add confusion since we can't
version the tokens independently. We could explicitly add the client id and
correlation id in the header, but I am not sure if it's worth the
additional complexity.

Thanks,

Jun

On Tue, Apr 12, 2016 at 1:18 AM, Ismael Juma  wrote:

> Hi Jun,
>
> I understand the point about the SASL tokens being similar to the SSL
> handshake in a way. However, is there any SASL library that handles the
> network communication for these tokens? I couldn't find any and without
> that, there isn't much benefit in deviating from Kafka's protocol (we
> basically save the space taken by the request header). It's worth
> mentioning that we are already adding the message size before the opaque
> bytes provided by the library, so one could say we are already extending
> the protocol.
>
> If we leave versioning aside, adding the standard Kafka request header to
> those messages may also help from a debugging perspective as would then
> include client id and correlation id along with the message.
>
> Ismael
>
> On Tue, Apr 12, 2016 at 2:13 AM, Jun Rao  wrote:
>
> > Magnus,
> >
> > That sounds reasonable. To reduce the changes on the server side, I'd
> > suggest the following minor tweaks on the proposal.
> >
> > 1. Continue supporting the separate SASL and SASL_SSL port.
> >
> > On SASL port, we support the new sequence
> > ApiVersionRequest (optional), SaslHandshakeRequest, SASL tokens,
> > regular
> > requests
> >
> > On SASL_SSL port, we support the new sequence
> > SSL handshake bytes, ApiVersionRequest (optional),
> > SaslHandshakeRequest,
> > SASL tokens, regular requests
> >
> > 2. We don't wrap SASL tokens in Kafka protocol. Similar to your argument
> > about SSL handshake, those SASL tokens are generated by SASL library and
> > Kafka doesn't really control its versioning. Kafka only controls the
> > selection of SASL mechanism, which will be versioned in
> > SaslHandshakeRequest.
> >
> > Does that work for you?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Apr 11, 2016 at 11:15 AM, Magnus Edenhill 
> > wrote:
> >
> > > Hey Jun, see inline
> > >
> > > 2016-04-11 19:19 GMT+02:00 Jun Rao :
> > >
> > > > Hi, Magnus,
> > > >
> > > > Let me understand your proposal in more details just from the
> client's
> > > > perspective. My understanding of your proposal is the following.
> > > >
> > > > On plaintext port, the client will send the following bytes in order.
> > > > ApiVersionRequest, SaslHandshakeRequest, SASL tokens (if SASL is
> > > > enabled), regular requests
> > > >
> > > > On SSL port, the client will send the following bytes in order.
> > > > SSL handshake bytes, ApiVersionRequest, SaslHandshakeRequest,
> SASL
> > > > tokens (if SASL is enabled), regular requests
> > > >
> > >
> > >
> > > Yup!
> > > "SASL tokens" is a series of proper Kafka protocol
> SaslHandshakeRequests
> > > until
> > > the handshake is done.
> > >
> > >
> > >
> > > >
> > > > Is that right? Since we can use either SSL or SASL for
> authentication,
> > > it's
> > > > weird that in one case, we require ApiVersionRequest to happen before
> > > > authentication and in another case we require the reverse.
> > > >
> > >
> > > Since the SSL/TLS is standardised and taken care of for us by the SSL
> > > libraries it
> > > doesnt make sense to reimplement that on top of Kafka, so it isn't
> really
> > > comparable.
> > > But for SASL there is no standardised handshake protocol so we must
> > either
> > > conceive one from scratch, or use the protocol that we already have
> > > (Kafka).
> > > For the initial SASL implementation in 0.9 the first option was chosen
> > and
> > > while
> > > it required a new protocol implementation in supporting clients and the
> > > broker
> > > it served its purpose. But not for long,  it already needs to evolve,
> > > and this gives us a golden[1] opportunity to make the implementation
> > > reusable, evolvable, less complex
> > > and in line with all our other protocol work, by using the  protocol
> > stack
> > > of Kafka which the
> > > broker and all clients already have in place.
> > >
> > > Not taking this chance and instead diverging the custom SASL handshake
> > > protocol
> > > even further from Kafka seems to me a weird choice.
> > >
> > > The current KIP-43 proposal does not have a clear compatibility story;
> it
> > > doesnt seem to be possible
> > > to upgrade clients before brokers, while this might be okay for the
> Java
> > > c

[jira] [Resolved] (KAFKA-3547) Broker does not disconnect client on unknown request

2016-04-12 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-3547.

Resolution: Fixed
  Assignee: Grant Henke

And this is the relevant commit for trunk: 
https://github.com/apache/kafka/commit/d7fc7cf6154592b7fea494a092e42ad9d45b98a0

> Broker does not disconnect client on unknown request
> 
>
> Key: KAFKA-3547
> URL: https://issues.apache.org/jira/browse/KAFKA-3547
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.0, 0.9.0.1
>Reporter: Magnus Edenhill
>Assignee: Grant Henke
>Priority: Critical
> Fix For: 0.10.0.0
>
>
> A regression in 0.9.0 causes the broker to not close a client connection when 
> receiving an unsupported request.
> Two effects of this are:
>  - the client is not informed that the request was not supported (even though 
> a closed connection is a blunt indication it is infact some indication), the 
> request will (hopefully) just time out on the client after some time, 
> stalling sub-sequent operations.
>  - the broker leaks the connection until the connection reaper brings it down 
> or the client closes the connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-12 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237383#comment-15237383
 ] 

Flavio Junqueira commented on KAFKA-3042:
-

I had a look at the zookeeper logs, and I couldn’t see anything unusual. There 
are session expirations, but it is expected that sessions expire.

Using the same topic-partition I used in my last comment, 
[tec1.en2.frontend.syncPing,7], I found that the reason seems to be that 
controller 5 is telling broker 1 that the partition leader is 4, but neither 5 
nor 1 think that broker 4 is up. Here are some relevant log lines from broker 5:

{noformat} 
[2016-04-09 00:37:54,079] INFO [BrokerChangeListener on Controller 5]: Broker 
change listener fired for path /brokers/ids with children 1,2,3,5 
(kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2016-04-09 00:37:53,709] DEBUG [Partition state machine on Controller 5]: 
After leader election, leader cache is updated to Map…. 
[tec1.en2.frontend.syncPing,7] -> 
(Leader:2,ISR:2,LeaderEpoch:361,ControllerEpoch:410)
[2016-04-09 00:37:53,765] INFO [Partition state machine on Controller 5]: 
Started partition state machine with initial state -> Map… 
[tec1.en2.frontend.syncPing,7] -> OnlinePartition

[2016-04-09 00:40:58,415] DEBUG [Partition state machine on Controller 5]: 
After leader election, leader cache is updated to Map… 
[tec1.en2.frontend.syncPing,7] -> 
(Leader:4,ISR:4,LeaderEpoch:364,ControllerEpoch:415)

[2016-04-09 00:41:35,442] INFO [BrokerChangeListener on Controller 5]: Broker 
change listener fired for path /brokers/ids with children 1,4,5 
(kafka.controller.ReplicaStateMachine$BrokerChangeListener)
{noformat}

Interestingly, broker 3 is the controller for epoch 415, see the last leader 
cache update, and this is the information that broker 1 receives from broker 5 
(see the previous comment). It looks like broker 5 ignored the fact that broker 
4 is down or at least not in its list of live brokers.  

Broker 3 seems to behave correctly with respect to the partition, here are some 
relevant log lines:

{noformat}
[2016-04-09 00:39:57,004] INFO [Controller 3]: Controller 3 incremented epoch 
to 415 (kafka.controller.KafkaController)

[2016-04-09 00:40:46,633] INFO [BrokerChangeListener on Controller 3]: Broker 
change listener fired for path /brokers/ids with children 3,4,5 
(kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2016-04-09 00:40:46,638] INFO [BrokerChangeListener on Controller 3]: Newly 
added brokers: 4, deleted brokers: , all live brokers: 3,4,5 
(kafka.controller.ReplicaStateMachine

[2016-04-09 00:40:50,911] DEBUG [OfflinePartitionLeaderSelector]: No broker in 
ISR is alive for [tec1.en2.frontend.syncPing,7]. Pick the leader from the alive 
assigned replicas: 4 (kafka.controller.OfflinePartitionLeaderSelector)
[2016-04-09 00:40:50,911] WARN [OfflinePartitionLeaderSelector]: No broker in 
ISR is alive for [tec1.en2.frontend.syncPing,7]. Elect leader 4 from live 
brokers 4. There's potential data loss. 
(kafka.controller.OfflinePartitionLeaderSelector)
[2016-04-09 00:40:50,911] INFO [OfflinePartitionLeaderSelector]: Selected new 
leader and ISR {"leader":4,"leader_epoch":364,"isr":[4]} for offline partition 
[tec1.en2.frontend.syncPing,7] (kafka.controller.OfflinePartitionLeaderSelector)

State-change log
[2016-04-09 00:40:50,909] TRACE Controller 3 epoch 415 started leader election 
for partition [tec1.en2.frontend.syncPing,7] (state.change.logger)
[2016-04-09 00:40:50,911] TRACE Controller 3 epoch 415 elected leader 4 for 
Offline partition [tec1.en2.frontend.syncPing,7] (state.change.logger)
[2016-04-09 00:40:50,930] TRACE Controller 3 epoch 415 changed partition 
[tec1.en2.frontend.syncPing,7] from OfflinePartition to OnlinePartition with 
leader 4 (state.change.logger)
{noformat}

To summarize, the problems seems to be that controller 5 tells broker 1 that 
the partition leader is an unavailable broker, and broker 1 fails to change the 
partition leader. As it fails to update the leader to broker 4, broker 1 
remains the leader, which causes it to keep trying to update the ISR and 
printing out the “Cached zkVersion…” messages. Broker 1 does not receive any 
controller update that enables it to correct the problem later on and 
consequently it is stuck with itself as partition leader incorrectly.  

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVer

[jira] [Updated] (KAFKA-3511) Provide built-in aggregators sum() and avg() in Kafka Streams DSL

2016-04-12 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3511:
-
Fix Version/s: (was: 0.10.0.0)
   0.10.1.0

> Provide built-in aggregators sum() and avg() in Kafka Streams DSL
> -
>
> Key: KAFKA-3511
> URL: https://issues.apache.org/jira/browse/KAFKA-3511
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: api, newbie
> Fix For: 0.10.1.0
>
>
> Currently we only have one built-in aggregate function count() in the Kafka 
> Streams DSL, but we want to add more aggregation functions like sum() and 
> avg().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-4 Metadata Schema (Round 2)

2016-04-12 Thread Jason Gustafson
+1 (non-binding)

On Mon, Apr 11, 2016 at 6:21 PM, Ismael Juma  wrote:

> +1 (non-binding)
>
> Ismael
>
> On Tue, Apr 12, 2016 at 2:19 AM, Jun Rao  wrote:
>
> > Grant,
> >
> > Thanks for the updated version. +1 from me.
> >
> > Jun
> >
> > On Mon, Apr 11, 2016 at 10:42 AM, Grant Henke 
> wrote:
> >
> > > Based on the discussion in the previous vote thread
> > > <
> > >
> >
> http://search-hadoop.com/m/uyzND1xlaiU10QlYX&subj=+VOTE+KIP+4+Metadata+Schema
> > > >
> > > I also would like to include a behavior change to the
> MetadataResponse. I
> > > have update the wiki
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-MetadataSchema
> > > >
> > > and pull request  to
> include
> > > this change.
> > >
> > > The change as described on the wiki is:
> > >
> > > > The behavior of the replicas and isr arrays will be changed in order
> to
> > > > support the admin tools, and better represent the state of the
> cluster:
> > > >
> > > >- In version 0, if a broker is down the replicas and isr array
> will
> > > >omit the brokers entry and add a REPLICA_NOT_AVAILABLE error code.
> > > >- In version 1, no error code will be set and a the broker id will
> > be
> > > >included in the replicas and isr array.
> > > >   - Note: A user can still detect if the replica is not
> available,
> > by
> > > >   checking if the broker is in the returned broker list.
> > > >
> > > >
> > >
> > > Being optimistic that this doesn't require to much discussion, I would
> > like
> > > to re-start the voting process on this thread. If more discussion is
> > > needed, please don't hesitate to bring it up here.
> > >
> > > Ismael, Gwen, Guozhang could you please review and revote based on the
> > > changes.
> > >
> > > Thank you,
> > > Grant
> > >
> > > On Sat, Apr 9, 2016 at 1:03 PM, Guozhang Wang 
> > wrote:
> > >
> > > > +1
> > > >
> > > > On Fri, Apr 8, 2016 at 4:36 PM, Gwen Shapira 
> > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > On Fri, Apr 8, 2016 at 2:41 PM, Grant Henke 
> > > wrote:
> > > > >
> > > > > > I would like to re-initiate the voting process for the "KIP-4
> > > Metadata
> > > > > > Schema changes". This is not a vote for all of KIP-4, but
> > > specifically
> > > > > for
> > > > > > the metadata changes. I have included the exact changes below for
> > > > > clarity:
> > > > > > >
> > > > > > > Metadata Request (version 1)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > MetadataRequest => [topics]
> > > > > > >
> > > > > > > Stays the same as version 0 however behavior changes.
> > > > > > > In version 0 there was no way to request no topics, and and
> empty
> > > > list
> > > > > > > signified all topics.
> > > > > > > In version 1 a null topics list (size -1 on the wire) will
> > indicate
> > > > > that
> > > > > > a
> > > > > > > user wants *ALL* topic metadata. Compared to an empty list
> (size
> > 0)
> > > > > which
> > > > > > > indicates metadata for *NO* topics should be returned.
> > > > > > > Metadata Response (version 1)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > MetadataResponse => [brokers] controllerId [topic_metadata]
> > > > > > >   brokers => node_id host port rack
> > > > > > > node_id => INT32
> > > > > > > host => STRING
> > > > > > > port => INT32
> > > > > > > rack => NULLABLE_STRING
> > > > > > >   controllerId => INT32
> > > > > > >   topic_metadata => topic_error_code topic is_internal
> > > > > > [partition_metadata]
> > > > > > > topic_error_code => INT16
> > > > > > > topic => STRING
> > > > > > > is_internal => BOOLEAN
> > > > > > > partition_metadata => partition_error_code partition_id
> > leader
> > > > > > [replicas] [isr]
> > > > > > >   partition_error_code => INT16
> > > > > > >   partition_id => INT32
> > > > > > >   leader => INT32
> > > > > > >   replicas => INT32
> > > > > > >   isr => INT32
> > > > > > >
> > > > > > > Adds rack, controller_id, and is_internal to the version 0
> > > response.
> > > > > > >
> > > > > >
> > > > > > The KIP is available here for reference (linked to the Metadata
> > > schema
> > > > > > section):
> > > > > > *
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-MetadataSchema
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-MetadataSchema
> > > > > > >*
> > > > > >
> > > > > > A pull request is available implementing the proposed changes
> here:
> > > > > > https://github.com/apache/kafka/pull/1095
> > > > > >
> > > > > > Here are some links to past d

[jira] [Commented] (KAFKA-3513) Transient failure of OffsetValidationTest

2016-04-12 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237505#comment-15237505
 ] 

Ismael Juma commented on KAFKA-3513:


We have had failures with various different parameters:

{code}
"""Module: kafkatest.tests.consumer_test
Class:  OffsetValidationTest
Method: test_consumer_bounce
Arguments:
{
  ""bounce_mode"": ""all"",
  ""clean_shutdown"": true
}"
"Module: kafkatest.tests.client.consumer_test
Class:  OffsetValidationTest
Method: test_broker_failure
Arguments:
{
  ""clean_shutdown"": false,
  ""enable_autocommit"": true
}"
"Module: kafkatest.tests.client.consumer_test
Class:  OffsetValidationTest
Method: test_broker_failure
Arguments:
{
  ""clean_shutdown"": true,
  ""enable_autocommit"": false
}"
"""Module: kafkatest.tests.consumer_test
Class:  OffsetValidationTest
Method: test_consumer_bounce
Arguments:
{
  ""bounce_mode"": ""rolling"",
  ""clean_shutdown"": true
}"
"Module: kafkatest.tests.client.consumer_test
Class:  OffsetValidationTest
Method: test_consumer_bounce
Arguments:
{
  ""bounce_mode"": ""all"",
  ""clean_shutdown"": false
}"
{code}

> Transient failure of OffsetValidationTest
> -
>
> Key: KAFKA-3513
> URL: https://issues.apache.org/jira/browse/KAFKA-3513
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, system tests
>Reporter: Ewen Cheslack-Postava
>Assignee: Jason Gustafson
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-04-05--001.1459840046--apache--trunk--31e263e/report.html
> The version of the test fails in this case is:
> Module: kafkatest.tests.client.consumer_test
> Class:  OffsetValidationTest
> Method: test_broker_failure
> Arguments:
> {
>   "clean_shutdown": true,
>   "enable_autocommit": false
> }
> and others passed. It's unclear if the parameters actually have any impact on 
> the failure.
> I did some initial triage and it looks like the test code isn't seeing all 
> the group members join the group (receive partition assignments), but it 
> appears from the logs that they all did. This could indicate a simple timing 
> issue, but I haven't been able to verify that yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3518) Transient failure in ReplicationTest.test_replication_with_broker_failure with ConsumerTimeoutException

2016-04-12 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237508#comment-15237508
 ] 

Ismael Juma commented on KAFKA-3518:


We have had failures with various different parameters:

{code}
"""Module: kafkatest.tests.replication_test
Class:  ReplicationTest
Method: test_replication_with_broker_failure
Arguments:
{
  ""broker_type"": ""leader"",
  ""failure_mode"": ""clean_bounce"",
  ""security_protocol"": ""SASL_PLAINTEXT""
}"
"Module: kafkatest.tests.core.replication_test
Class:  ReplicationTest
Method: test_replication_with_broker_failure
Arguments:
{
  ""broker_type"": ""leader"",
  ""failure_mode"": ""hard_bounce"",
  ""security_protocol"": ""PLAINTEXT""
}"
"""Module: kafkatest.tests.replication_test
Class:  ReplicationTest
Method: test_replication_with_broker_failure
Arguments:
{
  ""broker_type"": ""controller"",
  ""failure_mode"": ""hard_bounce"",
  ""security_protocol"": ""PLAINTEXT""
}"
"""Module: kafkatest.tests.replication_test
Class:  ReplicationTest
Method: test_replication_with_broker_failure
Arguments:
{
  ""broker_type"": ""controller"",
  ""failure_mode"": ""hard_bounce"",
  ""security_protocol"": ""SASL_SSL""
}"
{code}

> Transient failure in ReplicationTest.test_replication_with_broker_failure 
> with ConsumerTimeoutException
> ---
>
> Key: KAFKA-3518
> URL: https://issues.apache.org/jira/browse/KAFKA-3518
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Ewen Cheslack-Postava
>
> From 
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-04-06--001.1459956800--apache--trunk--99d2329/report.html
>  this test failed on a recent run:
> Module: kafkatest.tests.core.replication_test
> Class:  ReplicationTest
> Method: test_replication_with_broker_failure
> Arguments:
> {
>   "broker_type": "leader",
>   "failure_mode": "clean_bounce",
>   "security_protocol": "SASL_PLAINTEXT"
> }
> It is missing 7 messages on the consumer trying to validate the messages. The 
> consumer hit its timeout, which should be sufficiently long to capture any 
> successfully produced messages. I didn't notice anything obviously wrong in 
> the logs. Could be some sort of consumer stall?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: MINOR: Add missing `@Override` to `KStreamImpl...

2016-04-12 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/1216

MINOR: Add missing `@Override` to `KStreamImpl.through`



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka add-missing-override-to-through

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1216.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1216


commit 96ba16fa23d4f391ba20143be6f1dfa81a0931cb
Author: Ismael Juma 
Date:   2016-04-12T16:49:16Z

Add missing `@Override` to `KStreamImpl.through`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3117) Fail test at: PlaintextConsumerTest. testAutoCommitOnRebalance

2016-04-12 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-3117:
---
Issue Type: Sub-task  (was: Bug)
Parent: KAFKA-2054

> Fail test at: PlaintextConsumerTest. testAutoCommitOnRebalance 
> ---
>
> Key: KAFKA-3117
> URL: https://issues.apache.org/jira/browse/KAFKA-3117
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Affects Versions: 0.9.0.0
> Environment: oracle java764bit
> ubuntu 13.10 
>Reporter: edwardt
>Assignee: Neha Narkhede
>  Labels: newbie, test
>
> java.lang.AssertionError: Expected partitions [topic-0, topic-1, topic2-0, 
> topic2-1] but actually got [topic-0, topic-1]
>   at org.junit.Assert.fail(Assert.java:88)
>   at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:730)
>   at 
> kafka.api.BaseConsumerTest.testAutoCommitOnRebalance(BaseConsumerTest.scala:125)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-12 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237604#comment-15237604
 ] 

James Cheng commented on KAFKA-3042:


Thanks [~fpj]. Do you need any additional info from us? I don't think we have 
any other logs, but let us know if you have any questions.

About your findings:
>From your comment at https://issues.apache.org/jira/browse/KAFKA-3042, you 
>said that broker 3 failed to release leadership to broker 4 because broker 4 
>was offline:
{noformat}
[2016-04-09 00:40:58,144] ERROR Broker 1 received LeaderAndIsrRequest with 
correlation id 0 from controller 5 epoch 415 for partition 
[tec1.en2.frontend.syncPing,7] but cannot become follower since the new leader 
4 is unavailable. (state.change.logger)
{noformat}
What is the correct behavior for that scenario? Should broker 3 continue 
leadership? Or should it give up leadership completely until a controller comes 
back and tells it who is the new leader? Does broker 3 send back a response (or 
error) to the controller saying that it was unable to accept that change?

What happens in this scenario?
1) Broker 1 is leader of a partition.
2) Controller sends a LeaderAndIsrRequest to brokers 1 and 2 and 3, saying that 
broker 4 is the new leader.
3) Brokers 2 and 3 receives the LeaderAndIsrRequest and accepts the change.
4) LeaderAndIsrRequest is delayed due to network latency enroute to broker 1.

During this delay, won't different brokers have different ideas of who the 
leader is? Broker 1 thinks it is leader. Brokers 2 3 4 5 think that broker 4 is 
the leader. Or did I miss something?




> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-12 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237604#comment-15237604
 ] 

James Cheng edited comment on KAFKA-3042 at 4/12/16 5:37 PM:
-

Thanks [~fpj]. Do you need any additional info from us? I don't think we have 
any other logs, but let us know if you have any questions.

About your findings:
>From your comment at 
>https://issues.apache.org/jira/browse/KAFKA-3042?focusedCommentId=15236055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15236055,
> you said that broker 3 failed to release leadership to broker 4 because 
>broker 4 was offline:
{noformat}
[2016-04-09 00:40:58,144] ERROR Broker 1 received LeaderAndIsrRequest with 
correlation id 0 from controller 5 epoch 415 for partition 
[tec1.en2.frontend.syncPing,7] but cannot become follower since the new leader 
4 is unavailable. (state.change.logger)
{noformat}
What is the correct behavior for that scenario? Should broker 3 continue 
leadership? Or should it give up leadership completely until a controller comes 
back and tells it who is the new leader? Does broker 3 send back a response (or 
error) to the controller saying that it was unable to accept that change?

What happens in this scenario?
1) Broker 1 is leader of a partition.
2) Controller sends a LeaderAndIsrRequest to brokers 1 and 2 and 3, saying that 
broker 4 is the new leader.
3) Brokers 2 and 3 receives the LeaderAndIsrRequest and accepts the change.
4) LeaderAndIsrRequest is delayed due to network latency enroute to broker 1.

During this delay, won't different brokers have different ideas of who the 
leader is? Broker 1 thinks it is leader. Brokers 2 3 4 5 think that broker 4 is 
the leader. Or did I miss something?





was (Author: wushujames):
Thanks [~fpj]. Do you need any additional info from us? I don't think we have 
any other logs, but let us know if you have any questions.

About your findings:
>From your comment at https://issues.apache.org/jira/browse/KAFKA-3042, you 
>said that broker 3 failed to release leadership to broker 4 because broker 4 
>was offline:
{noformat}
[2016-04-09 00:40:58,144] ERROR Broker 1 received LeaderAndIsrRequest with 
correlation id 0 from controller 5 epoch 415 for partition 
[tec1.en2.frontend.syncPing,7] but cannot become follower since the new leader 
4 is unavailable. (state.change.logger)
{noformat}
What is the correct behavior for that scenario? Should broker 3 continue 
leadership? Or should it give up leadership completely until a controller comes 
back and tells it who is the new leader? Does broker 3 send back a response (or 
error) to the controller saying that it was unable to accept that change?

What happens in this scenario?
1) Broker 1 is leader of a partition.
2) Controller sends a LeaderAndIsrRequest to brokers 1 and 2 and 3, saying that 
broker 4 is the new leader.
3) Brokers 2 and 3 receives the LeaderAndIsrRequest and accepts the change.
4) LeaderAndIsrRequest is delayed due to network latency enroute to broker 1.

During this delay, won't different brokers have different ideas of who the 
leader is? Broker 1 thinks it is leader. Brokers 2 3 4 5 think that broker 4 is 
the leader. Or did I miss something?




> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: MINOR: Add missing `@Override` to `KStreamImpl...

2016-04-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1216


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-04-12 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237782#comment-15237782
 ] 

Ismael Juma commented on KAFKA-1981:


Thanks for the PR. This introduces new configs so I think it needs a small KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3549) Close consumers instantiated in consumer tests

2016-04-12 Thread Grant Henke (JIRA)
Grant Henke created KAFKA-3549:
--

 Summary: Close consumers instantiated in consumer tests
 Key: KAFKA-3549
 URL: https://issues.apache.org/jira/browse/KAFKA-3549
 Project: Kafka
  Issue Type: Improvement
Reporter: Grant Henke
Assignee: Grant Henke


Close consumers instantiated in consumer tests. Since these consumers often use 
the default group.id of "", they could cause transient failures like those seen 
in KAFKA-3117 and KAFKA-2933. I have not been able to prove that this change 
will fix those failures, but closing the consumers is a good practice 
regardless.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3550) Broker does not honor MetadataRequest api version; always returns v0 MetadataResponse

2016-04-12 Thread Dana Powers (JIRA)
Dana Powers created KAFKA-3550:
--

 Summary: Broker does not honor MetadataRequest api version; always 
returns v0 MetadataResponse
 Key: KAFKA-3550
 URL: https://issues.apache.org/jira/browse/KAFKA-3550
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.9.0.1, 0.8.2.2, 0.8.1.1, 0.8.0
Reporter: Dana Powers


To reproduce:

Send a MetadataRequest (api key 3) with incorrect api version (e.g., 1234).
The expected behavior is for the broker to reject the request as unrecognized.
Broker (incorrectly) responds with MetadataResponse v0.

The problem here is that any request for a "new" MetadataRequest (i.e., KIP-4) 
sent to an old broker will generate an incorrect response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3550) Broker does not honor MetadataRequest api version; always returns v0 MetadataResponse

2016-04-12 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237791#comment-15237791
 ] 

Ismael Juma commented on KAFKA-3550:


cc [~granthenke]

> Broker does not honor MetadataRequest api version; always returns v0 
> MetadataResponse
> -
>
> Key: KAFKA-3550
> URL: https://issues.apache.org/jira/browse/KAFKA-3550
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.0, 0.8.1.1, 0.8.2.2, 0.9.0.1
>Reporter: Dana Powers
>
> To reproduce:
> Send a MetadataRequest (api key 3) with incorrect api version (e.g., 1234).
> The expected behavior is for the broker to reject the request as unrecognized.
> Broker (incorrectly) responds with MetadataResponse v0.
> The problem here is that any request for a "new" MetadataRequest (i.e., 
> KIP-4) sent to an old broker will generate an incorrect response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3549) Close consumers instantiated in consumer tests

2016-04-12 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237795#comment-15237795
 ] 

Ismael Juma commented on KAFKA-3549:


Interesting. It should have been obvious given that we were leaking producers 
and zkclients until recently. ;)

> Close consumers instantiated in consumer tests
> --
>
> Key: KAFKA-3549
> URL: https://issues.apache.org/jira/browse/KAFKA-3549
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Grant Henke
>
> Close consumers instantiated in consumer tests. Since these consumers often 
> use the default group.id of "", they could cause transient failures like 
> those seen in KAFKA-3117 and KAFKA-2933. I have not been able to prove that 
> this change will fix those failures, but closing the consumers is a good 
> practice regardless.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3550) Broker does not honor MetadataRequest api version; always returns v0 MetadataResponse

2016-04-12 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke reassigned KAFKA-3550:
--

Assignee: Grant Henke

> Broker does not honor MetadataRequest api version; always returns v0 
> MetadataResponse
> -
>
> Key: KAFKA-3550
> URL: https://issues.apache.org/jira/browse/KAFKA-3550
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.0, 0.8.1.1, 0.8.2.2, 0.9.0.1
>Reporter: Dana Powers
>Assignee: Grant Henke
>
> To reproduce:
> Send a MetadataRequest (api key 3) with incorrect api version (e.g., 1234).
> The expected behavior is for the broker to reject the request as unrecognized.
> Broker (incorrectly) responds with MetadataResponse v0.
> The problem here is that any request for a "new" MetadataRequest (i.e., 
> KIP-4) sent to an old broker will generate an incorrect response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3550) Broker does not honor MetadataRequest api version; always returns v0 MetadataResponse

2016-04-12 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237794#comment-15237794
 ] 

Grant Henke commented on KAFKA-3550:


I will look into this and provide a detailed summary. 

> Broker does not honor MetadataRequest api version; always returns v0 
> MetadataResponse
> -
>
> Key: KAFKA-3550
> URL: https://issues.apache.org/jira/browse/KAFKA-3550
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.0, 0.8.1.1, 0.8.2.2, 0.9.0.1
>Reporter: Dana Powers
>Assignee: Grant Henke
>
> To reproduce:
> Send a MetadataRequest (api key 3) with incorrect api version (e.g., 1234).
> The expected behavior is for the broker to reject the request as unrecognized.
> Broker (incorrectly) responds with MetadataResponse v0.
> The problem here is that any request for a "new" MetadataRequest (i.e., 
> KIP-4) sent to an old broker will generate an incorrect response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3550) Broker does not honor MetadataRequest api version; always returns v0 MetadataResponse

2016-04-12 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237800#comment-15237800
 ] 

Grant Henke commented on KAFKA-3550:


Note that KAFKA-2512 has an older pull request open from [~becket_qin] to 
validate the apiVersion of all requests here: 
https://github.com/apache/kafka/pull/200

> Broker does not honor MetadataRequest api version; always returns v0 
> MetadataResponse
> -
>
> Key: KAFKA-3550
> URL: https://issues.apache.org/jira/browse/KAFKA-3550
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.0, 0.8.1.1, 0.8.2.2, 0.9.0.1
>Reporter: Dana Powers
>Assignee: Grant Henke
>
> To reproduce:
> Send a MetadataRequest (api key 3) with incorrect api version (e.g., 1234).
> The expected behavior is for the broker to reject the request as unrecognized.
> Broker (incorrectly) responds with MetadataResponse v0.
> The problem here is that any request for a "new" MetadataRequest (i.e., 
> KIP-4) sent to an old broker will generate an incorrect response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3549: Close consumers instantiated in co...

2016-04-12 Thread granthenke
GitHub user granthenke opened a pull request:

https://github.com/apache/kafka/pull/1217

KAFKA-3549: Close consumers instantiated in consumer tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/granthenke/kafka close-consumers

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1217.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1217


commit 0008c9dc4495bd8830d5245350a3dd61807d8c21
Author: Grant Henke 
Date:   2016-04-12T19:08:50Z

KAFKA-3549: Close consumers instantiated in consumer tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3549) Close consumers instantiated in consumer tests

2016-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237803#comment-15237803
 ] 

ASF GitHub Bot commented on KAFKA-3549:
---

GitHub user granthenke opened a pull request:

https://github.com/apache/kafka/pull/1217

KAFKA-3549: Close consumers instantiated in consumer tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/granthenke/kafka close-consumers

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1217.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1217


commit 0008c9dc4495bd8830d5245350a3dd61807d8c21
Author: Grant Henke 
Date:   2016-04-12T19:08:50Z

KAFKA-3549: Close consumers instantiated in consumer tests




> Close consumers instantiated in consumer tests
> --
>
> Key: KAFKA-3549
> URL: https://issues.apache.org/jira/browse/KAFKA-3549
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Grant Henke
>
> Close consumers instantiated in consumer tests. Since these consumers often 
> use the default group.id of "", they could cause transient failures like 
> those seen in KAFKA-3117 and KAFKA-2933. I have not been able to prove that 
> this change will fix those failures, but closing the consumers is a good 
> practice regardless.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3549) Close consumers instantiated in consumer tests

2016-04-12 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-3549:
---
Status: Patch Available  (was: Open)

> Close consumers instantiated in consumer tests
> --
>
> Key: KAFKA-3549
> URL: https://issues.apache.org/jira/browse/KAFKA-3549
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Grant Henke
>
> Close consumers instantiated in consumer tests. Since these consumers often 
> use the default group.id of "", they could cause transient failures like 
> those seen in KAFKA-3117 and KAFKA-2933. I have not been able to prove that 
> this change will fix those failures, but closing the consumers is a good 
> practice regardless.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3549) Close consumers instantiated in consumer tests

2016-04-12 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237807#comment-15237807
 ] 

Grant Henke commented on KAFKA-3549:


[~ijuma] This patch tries to clean up a lot of them, but I am sure there are 
some leaks remaining. It would be nice if there were a generic way to 
automatically detect any closable resource that is still open.

> Close consumers instantiated in consumer tests
> --
>
> Key: KAFKA-3549
> URL: https://issues.apache.org/jira/browse/KAFKA-3549
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Grant Henke
>
> Close consumers instantiated in consumer tests. Since these consumers often 
> use the default group.id of "", they could cause transient failures like 
> those seen in KAFKA-3117 and KAFKA-2933. I have not been able to prove that 
> this change will fix those failures, but closing the consumers is a good 
> practice regardless.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : kafka-0.10.0-jdk7 #36

2016-04-12 Thread Apache Jenkins Server
See 



[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-12 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237824#comment-15237824
 ] 

Flavio Junqueira commented on KAFKA-3042:
-

hey [~wushujames]

bq. you said that broker 3 failed to release leadership to broker 4 because 
broker 4 was offline

it is actually broker the one that failed to release leadership.

bq. What is the correct behavior for that scenario?

The behavior isn't incorrect in the following sense. We cannot completely 
prevent a single broker from being partitioned from the other replicas. If this 
broker is the leader before the partition, then it may remain in this state for 
some time. In the meanwhile, the other replicas may form a new ISR and make 
progress independently. But, very important, the partitioned broker won't be 
able to commit anything on its own, assuming that the minimum ISR is at least 
two.

In the scenario we are discussing, we don't have a network partition, but the 
behavior is equivalent: broker 1 will remain the leader until it is able to 
follow successfully. The part is bad is that broker 1 isn't partitioned away, 
it is talking to other controllers, and the broker should be brought back into 
a state that it can make progress with that partition and others that are 
equally stuck. The bottom line is that is safe, but we clearly want the broker 
up and making progress with those partitions.

Let me point out that from the logs, it looks like you have unclean leader 
election enabled because of this log message:

{noformat}
[2016-04-09 00:40:50,911] WARN [OfflinePartitionLeaderSelector]: No broker in 
ISR is alive for [tec1.en2.frontend.syncPing,7]. 
Elect leader 4 from live brokers 4. There's potential data loss. 
(kafka.controller.OfflinePartitionLeaderSelector)
{noformat} 

and no minimum ISR set:

{noformat}
[2016-04-09 00:56:53,009] WARN [Controller 5]: Cannot remove replica 1 from ISR 
of partition [tec1.en2.frontend.syncPing,7]
since it is not in the ISR. Leader = 4 ; ISR = List(4) 
{noformat}

Those options can cause some data loss.

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-12 Thread Robert Christ (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237839#comment-15237839
 ] 

Robert Christ commented on KAFKA-3042:
--

We turned on unclean leader election due to encountering KAFKA-3410.  We really 
want
to turn it back off but we need to get a bit of stability before we do.  We can 
live with the data
loss for the moment. 

We are experimenting with minimum ISR and how it behaves in various failure 
cases.
We expect to increase it for all of our replicated topics.  We overlooked it 
when we were starting
out and now we are trying to catch up.


> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3439) Document possible exception thrown in public APIs

2016-04-12 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira resolved KAFKA-3439.
-
   Resolution: Fixed
Fix Version/s: (was: 0.10.0.0)
   0.10.1.0

Issue resolved by pull request 1213
[https://github.com/apache/kafka/pull/1213]

> Document possible exception thrown in public APIs
> -
>
> Key: KAFKA-3439
> URL: https://issues.apache.org/jira/browse/KAFKA-3439
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: api, docs
> Fix For: 0.10.1.0
>
>
> Candidate interfaces include all the ones in "kstream", "processor" and 
> "state" packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3439: Added exceptions thrown

2016-04-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1213


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3439) Document possible exception thrown in public APIs

2016-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237867#comment-15237867
 ] 

ASF GitHub Bot commented on KAFKA-3439:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1213


> Document possible exception thrown in public APIs
> -
>
> Key: KAFKA-3439
> URL: https://issues.apache.org/jira/browse/KAFKA-3439
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: api, docs
> Fix For: 0.10.1.0
>
>
> Candidate interfaces include all the ones in "kstream", "processor" and 
> "state" packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #521

2016-04-12 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] MINOR: Add missing `@Override` to `KStreamImpl.through`

--
[...truncated 3152 lines...]
kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow PASSED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooLow 
PASSED

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooHigh 
PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooHigh 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.KafkaTest > testGetKafkaConfigFromArgsWrongSetValue PASSED

kafka.KafkaTest > testKafkaSslPasswords PASSED

kafka.KafkaTest > testGetKafkaConfigFromArgs PASSED

kafka.KafkaTest > testGetKafkaConfigFromArgsNonArgsAtTheEnd PASSED

kafka.KafkaTest > testGetKafkaConfigFromArgsNonArgsOnly PASSED

kafka.KafkaTest > testGetKafkaConfigFromArgsNonArgsAtTheBegging PASSED

kafka.metrics.KafkaTimerTest > testKafkaTimer PASSED

kafka.utils.UtilsTest > testAbs PASSED

kafka.utils.UtilsTest > testReplaceSuffix PASSED

kafka.utils.UtilsTest > testCircularIterator PASSED

kafka.utils.UtilsTest > testReadBytes PASSED

kafka.utils.UtilsTest > testCsvList PASSED

kafka.utils.UtilsTest > testReadInt PASSED

kafka.utils.UtilsTest > testCsvMap PASSED

kafka.utils.UtilsTest > testInLock PASSED

kafka.utils.UtilsTest > testSwallow PASSED

kafka.utils.SchedulerTest > testMockSchedulerNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testMockSchedulerPeriodicTask PASSED

kafka.utils.SchedulerTest > testNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testRestart PASSED

kafka.utils.SchedulerTest > testReentrantTaskInMockScheduler PASSED

kafka.utils.SchedulerTest > testPeriodicTask PASSED

kafka.utils.ByteBoundedBlockingQueueTest > testByteBoundedBlockingQueue PASSED

kafka.utils.timer.TimerTaskListTest > testAll PASSED

kafka.utils.timer.TimerTest > testAlreadyExpiredTask PASSED

kafka.utils.timer.TimerTest > testTaskExpiration PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArg PASSED

kafka.utils.CommandLineUtilsTest > testParseSingleArg PASSED

kafka.utils.CommandLineUtilsTest > testParseArgs PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgAsValid PASSED

kafka.utils.IteratorTemplateTest > testIterator PASSED

kafka.utils.ReplicationUtilsTest > testUpdateLeaderAndIsr PASSED

kafka.utils.ReplicationUtilsTest > testGetLeaderIsrAndEpochForPartition PASSED

kafka.utils.JsonTest > testJsonEncoding PASSED

kafka.message.MessageCompressionTest > testCompressSize PASSED

kafka.message.MessageCompressionTest > testSimpleCompressDecompress PASSED

kafka.message.MessageWriterTest > testWithNoCompressionAttribute PASSED

kafka.message.MessageWriterTest > testWithCompressionAttribute PASSED

kafka.message.MessageWriterTest > testBufferingOutputStream PASSED

kafka.message.MessageWriterTest > testWithKey PASSED

kafka.message.MessageTest > testChecksum PASSED

kafka.message.MessageTest > testInvalidTimestamp PASSED

kafka.message.MessageTest > testIsHashable PASSED

kafka.message.MessageTest > testInvalidTimestampAndMagicValueCombination PASSED

kafka.message.MessageTest > testExceptionMapping PASSED

kafka.message.MessageTest > testFieldValues PASSED

kafka.message.MessageTest > testInvalidMagicByte PASSED

kafka.message.MessageTest > testEquality PASSED

kafka.message.MessageTest > testMessageFormatConversion PASSED

kafka.message.ByteBufferMessageSetTest > testMessageWithProvidedOffsetSeq PASSED

kafka.message.ByteBufferMessageSetTest > testValidBytes PASSED

kafka.message.ByteBufferMessageSetTest > testValidBytesWithCompression PASSED

kafka.message.ByteBufferMessageSetTest > 
testOffsetAssignmentAfterMessageFormatConversion PASSED

kafka.message.ByteBufferMessageSetTest > testIteratorIsConsistent PASSED

kafka.message.ByteBufferMessageSetTest > testAbsoluteOffsetAssignment PASSED

kafka.message.ByteBufferMessageSetTest > testCreateTime PASSED

kafka.message.ByteBufferMessageSetTest > testInvalidCreateTime PASSED

kafka.message.ByteBufferMessageSetTest > testWrittenEqualsRead PASSED

kafka.message.ByteBufferMessageSetTest > testLogAppendTime PASSED

kafka.message.ByteBufferMessageSetTest > testWriteTo PASSED

kafka.message.

[jira] [Created] (KAFKA-3551) Update rocksdb, snappy-java, slf4j

2016-04-12 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-3551:
--

 Summary: Update rocksdb, snappy-java, slf4j
 Key: KAFKA-3551
 URL: https://issues.apache.org/jira/browse/KAFKA-3551
 Project: Kafka
  Issue Type: Improvement
  Components: build
Reporter: Ismael Juma
Assignee: Ismael Juma


Maybe rocksdb 4.2.0 helps with the segfaults we are seeing in Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: MINOR: Remove unused hadoop version

2016-04-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1214


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: KAFKA-3461: Fix typos in Kafka web documentati...

2016-04-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1138


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3461) Fix typos in Kafka web documentations

2016-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237952#comment-15237952
 ] 

ASF GitHub Bot commented on KAFKA-3461:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1138


> Fix typos in Kafka web documentations
> -
>
> Key: KAFKA-3461
> URL: https://issues.apache.org/jira/browse/KAFKA-3461
> Project: Kafka
>  Issue Type: Bug
>  Components: website
>Reporter: Dongjoon Hyun
>Priority: Trivial
> Fix For: 0.10.1.0
>
>
> This issue fixes the following typos.
> * docs/api.html: compatability => compatibility
> * docs/connect.html: simultaneoulsy => simultaneously
> * docs/implementation.html: LATIEST_TIME => LATEST_TIME, nPartions => 
> nPartitions
> * docs/migration.html: Decomission => Decommission
> * docs/ops.html: stoping => stopping, ConumserGroupCommand => 
> ConsumerGroupCommand, youre => you're



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3461) Fix typos in Kafka web documentations

2016-04-12 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira resolved KAFKA-3461.
-
   Resolution: Fixed
Fix Version/s: 0.10.1.0

Issue resolved by pull request 1138
[https://github.com/apache/kafka/pull/1138]

> Fix typos in Kafka web documentations
> -
>
> Key: KAFKA-3461
> URL: https://issues.apache.org/jira/browse/KAFKA-3461
> Project: Kafka
>  Issue Type: Bug
>  Components: website
>Reporter: Dongjoon Hyun
>Priority: Trivial
> Fix For: 0.10.1.0
>
>
> This issue fixes the following typos.
> * docs/api.html: compatability => compatibility
> * docs/connect.html: simultaneoulsy => simultaneously
> * docs/implementation.html: LATIEST_TIME => LATEST_TIME, nPartions => 
> nPartitions
> * docs/migration.html: Decomission => Decommission
> * docs/ops.html: stoping => stopping, ConumserGroupCommand => 
> ConsumerGroupCommand, youre => you're



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3551) Update rocksdb to 4.2.0

2016-04-12 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3551:
---
Summary: Update rocksdb to 4.2.0  (was: Update rocksdb, snappy-java, slf4j)

> Update rocksdb to 4.2.0
> ---
>
> Key: KAFKA-3551
> URL: https://issues.apache.org/jira/browse/KAFKA-3551
> Project: Kafka
>  Issue Type: Improvement
>  Components: build
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>
> Maybe rocksdb 4.2.0 helps with the segfaults we are seeing in Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3551) Update rocksdb to 4.2.0

2016-04-12 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-3551.

Resolution: Won't Fix

Rocksdb 4.2.0 is even worse, running the test suite segfaulted on two 
consecutive runs locally.

> Update rocksdb to 4.2.0
> ---
>
> Key: KAFKA-3551
> URL: https://issues.apache.org/jira/browse/KAFKA-3551
> Project: Kafka
>  Issue Type: Improvement
>  Components: build
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>
> Maybe rocksdb 4.2.0 helps with the segfaults we are seeing in Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: MINOR: Patch version updates for snappy and sl...

2016-04-12 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/1218

MINOR: Patch version updates for snappy and slf4j

* slf4j 1.7.21 includes thread-safety fixes: http://www.slf4j.org/news.html
* snappy 1.1.2.4 includes performance improvements requested by Spark: 
https://github.com/xerial/snappy-java/blob/master/Milestone.md

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka update-snappy-slf4j

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1218.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1218


commit 76fe3ec6d324eca5ceb53e09db1dfc139949b30d
Author: Ismael Juma 
Date:   2016-04-12T21:46:22Z

Patch version updates for snappy and slf4j




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : kafka-trunk-jdk8 #522

2016-04-12 Thread Apache Jenkins Server
See 



[jira] [Created] (KAFKA-3552) New Consumer: java.lang.OutOfMemoryError: Direct buffer memory

2016-04-12 Thread Kanak Biscuitwala (JIRA)
Kanak Biscuitwala created KAFKA-3552:


 Summary: New Consumer: java.lang.OutOfMemoryError: Direct buffer 
memory
 Key: KAFKA-3552
 URL: https://issues.apache.org/jira/browse/KAFKA-3552
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.9.0.1
Reporter: Kanak Biscuitwala
Assignee: Neha Narkhede


I'm running Kafka's new consumer with message handlers that can sometimes take 
a lot of time to return, and combining that with manual offset management (to 
get at-least-once semantics). Since poll() is the only way to heartbeat with 
the consumer, I have a thread that runs every 500 milliseconds that does the 
following:

1) Pause all partitions
2) Call poll(0)
3) Resume all partitions

For the record, all accesses to KafkaConsumer are protected by synchronized 
blocks. This generally works, but I'm occasionally seeing messages like this:

{code}
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
at sun.nio.ch.IOUtil.read(IOUtil.java:195)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at 
org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:108)
at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97)
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)
at org.apache.kafka.common.network.Selector.poll(Selector.java:286)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
{code}

In addition, when I'm reporting offsets, I'm seeing:
{code}
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
at sun.nio.ch.IOUtil.read(IOUtil.java:195)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at 
org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:108)
at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97)
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)
at org.apache.kafka.common.network.Selector.poll(Selector.java:286)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:358)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:968)
{code}

Given that I'm just calling the library, this behavior is unexpected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3552) New Consumer: java.lang.OutOfMemoryError: Direct buffer memory

2016-04-12 Thread Liquan Pei (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liquan Pei reassigned KAFKA-3552:
-

Assignee: Liquan Pei  (was: Neha Narkhede)

> New Consumer: java.lang.OutOfMemoryError: Direct buffer memory
> --
>
> Key: KAFKA-3552
> URL: https://issues.apache.org/jira/browse/KAFKA-3552
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Kanak Biscuitwala
>Assignee: Liquan Pei
>
> I'm running Kafka's new consumer with message handlers that can sometimes 
> take a lot of time to return, and combining that with manual offset 
> management (to get at-least-once semantics). Since poll() is the only way to 
> heartbeat with the consumer, I have a thread that runs every 500 milliseconds 
> that does the following:
> 1) Pause all partitions
> 2) Call poll(0)
> 3) Resume all partitions
> For the record, all accesses to KafkaConsumer are protected by synchronized 
> blocks. This generally works, but I'm occasionally seeing messages like this:
> {code}
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:658)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
> at sun.nio.ch.IOUtil.read(IOUtil.java:195)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:108)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> {code}
> In addition, when I'm reporting offsets, I'm seeing:
> {code}
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:658)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
> at sun.nio.ch.IOUtil.read(IOUtil.java:195)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:108)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:358)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:968)
> {code}
> Given that I'm just calling the library, this behavior is unexpected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #523

2016-04-12 Thread Apache Jenkins Server
See 

Changes:

[cshapi] MINOR: Remove unused hadoop version

[cshapi] KAFKA-3461: Fix typos in Kafka web documentations.

--
[...truncated 3813 lines...]

org.apache.kafka.common.SerializeCompatibilityTopicPartitionTest > 
testTopiPartitionSerializationCompatibility PASSED

org.apache.kafka.common.serialization.SerializationTest > testSerdeFromUnknown 
PASSED

org.apache.kafka.common.serialization.SerializationTest > testDoubleSerializer 
PASSED

org.apache.kafka.common.serialization.SerializationTest > testStringSerializer 
PASSED

org.apache.kafka.common.serialization.SerializationTest > testIntegerSerializer 
PASSED

org.apache.kafka.common.serialization.SerializationTest > testLongSerializer 
PASSED

org.apache.kafka.common.serialization.SerializationTest > testSerdeFrom PASSED

org.apache.kafka.common.serialization.SerializationTest > testSerdeFromNotNull 
PASSED

org.apache.kafka.common.serialization.SerializationTest > 
testByteBufferSerializer PASSED

org.apache.kafka.common.config.AbstractConfigTest > testOriginalsWithPrefix 
PASSED

org.apache.kafka.common.config.AbstractConfigTest > testConfiguredInstances 
PASSED

org.apache.kafka.common.config.ConfigDefTest > testBasicTypes PASSED

org.apache.kafka.common.config.ConfigDefTest > testNullDefault PASSED

org.apache.kafka.common.config.ConfigDefTest > testParseForValidate PASSED

org.apache.kafka.common.config.ConfigDefTest > testInvalidDefaultRange PASSED

org.apache.kafka.common.config.ConfigDefTest > testGroupInference PASSED

org.apache.kafka.common.config.ConfigDefTest > testValidators PASSED

org.apache.kafka.common.config.ConfigDefTest > testValidateCannotParse PASSED

org.apache.kafka.common.config.ConfigDefTest > testValidate PASSED

org.apache.kafka.common.config.ConfigDefTest > testInvalidDefaultString PASSED

org.apache.kafka.common.config.ConfigDefTest > testSslPasswords PASSED

org.apache.kafka.common.config.ConfigDefTest > testInvalidDefault PASSED

org.apache.kafka.common.config.ConfigDefTest > testMissingRequired PASSED

org.apache.kafka.common.config.ConfigDefTest > testNullDefaultWithValidator 
PASSED

org.apache.kafka.common.config.ConfigDefTest > testDefinedTwice PASSED

org.apache.kafka.common.config.ConfigDefTest > testBadInputs PASSED

org.apache.kafka.common.config.ConfigDefTest > testValidateMissingConfigKey 
PASSED

org.apache.kafka.common.protocol.ErrorsTest > testForExceptionDefault PASSED

org.apache.kafka.common.protocol.ErrorsTest > testUniqueExceptions PASSED

org.apache.kafka.common.protocol.ErrorsTest > testForExceptionInheritance PASSED

org.apache.kafka.common.protocol.ErrorsTest > testNoneException PASSED

org.apache.kafka.common.protocol.ErrorsTest > testUniqueErrorCodes PASSED

org.apache.kafka.common.protocol.ErrorsTest > testExceptionsAreNotGeneric PASSED

org.apache.kafka.common.protocol.ApiKeysTest > testForIdWithInvalidIdLow PASSED

org.apache.kafka.common.protocol.ApiKeysTest > testForIdWithInvalidIdHigh PASSED

org.apache.kafka.common.protocol.ProtoUtilsTest > schemaVersionOutOfRange PASSED

org.apache.kafka.common.protocol.types.ProtocolSerializationTest > testNulls 
PASSED

org.apache.kafka.common.protocol.types.ProtocolSerializationTest > 
testReadStringSizeTooLarge PASSED

org.apache.kafka.common.protocol.types.ProtocolSerializationTest > 
testNullableDefault PASSED

org.apache.kafka.common.protocol.types.ProtocolSerializationTest > 
testReadNegativeStringSize PASSED

org.apache.kafka.common.protocol.types.ProtocolSerializationTest > 
testReadArraySizeTooLarge PASSED

org.apache.kafka.common.protocol.types.ProtocolSerializationTest > testDefault 
PASSED

org.apache.kafka.common.protocol.types.ProtocolSerializationTest > 
testReadNegativeBytesSize PASSED

org.apache.kafka.common.protocol.types.ProtocolSerializationTest > 
testReadBytesSizeTooLarge PASSED

org.apache.kafka.common.protocol.types.ProtocolSerializationTest > testSimple 
PASSED

org.apache.kafka.common.protocol.types.ProtocolSerializationTest > 
testReadNegativeArraySize PASSED

org.apache.kafka.common.requests.RequestResponseTest > testSerialization PASSED

org.apache.kafka.common.requests.RequestResponseTest > fetchResponseVersionTest 
PASSED

org.apache.kafka.common.requests.RequestResponseTest > 
produceResponseVersionTest PASSED

org.apache.kafka.common.requests.RequestResponseTest > 
testControlledShutdownResponse PASSED

org.apache.kafka.common.requests.RequestResponseTest > 
testRequestHeaderWithNullClientId PASSED

org.apache.kafka.common.security.auth.KafkaPrincipalTest > 
testPrincipalNameCanContainSeparator PASSED

org.apache.kafka.common.security.auth.KafkaPrincipalTest > 
testEqualsAndHashCode PASSED

org.apache.kafka.common.security.kerberos.KerberosNameTest > testParse PASSED

org.apache.kafka.common.security.ssl.SslFactoryTest > testClientMode PASSED

org.apache.kafka.common.security.ssl.SslFactoryTest > 
testSslFactoryC

Reg: Issue with Kafka Kerberos (Kafka Version 0.9.0.1)

2016-04-12 Thread BigData dev
Hi All,
I am facing issue with kafka kerberoized cluster.

After following the steps how to enables SASL on kafka by using below link.
http://docs.confluent.io/2.0.0/kafka/sasl.html



After this,when i start the kafka-server I am getting below error.
[2016-04-12 16:59:26,201] ERROR [KafkaApi-1001] error when handling request
Name:LeaderAndIsrRequest;Version:0;Controller:1001;ControllerEpoch:3;CorrelationId:3;ClientId:1001;Leaders:BrokerEndPoint(1001,
hostname.com,6667);PartitionState:(t1,0) ->
(LeaderAndIsrInfo:(Leader:1001,ISR:1001,LeaderEpoch:1,ControllerEpoch:3),ReplicationFactor:1),AllReplicas:1001),(ambari_kafka_service_check,0)
->
(LeaderAndIsrInfo:(Leader:1001,ISR:1001,LeaderEpoch:2,ControllerEpoch:3),ReplicationFactor:1),AllReplicas:1001)
(kafka.server.KafkaApis)
kafka.common.ClusterAuthorizationException: Request
Request(0,9.30.150.20:6667-9.30.150.20:37550,Session(User:kafka,/9.30.150.20),null,1460505566200,SASL_PLAINTEXT)
is not authorized.
at kafka.server.KafkaApis.authorizeClusterAction(KafkaApis.scala:910)
at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:113)
at kafka.server.KafkaApis.handle(KafkaApis.scala:72)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:745)


Regards,
Bharat


[jira] [Commented] (KAFKA-3544) Missing topics on startup

2016-04-12 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238307#comment-15238307
 ] 

Guozhang Wang commented on KAFKA-3544:
--

The topics that Kafka Streams reads as sources or writes as sinks, or more 
explicitly,

{code}
builder.stream("topic1");
builder.table("topic2");

stream / table.through("topic3");
stream / table.to("topic4");
{code}

In these cases, currently topic1/2/3/4 are considered "external" to the Kafka 
Streams, which are not auto-created by the library as other changelog / 
re-partition for aggregation, and users are supposed to create these 
themselves. It's just that by default, Kafka brokers will auto-create the 
topics when it receives metadata requests from the clients (in this case, from 
Kafka Streams) and depending on the timing it may succeed, but some times it 
may now.

As I mentioned before, we plan to auto-embed "through" for repartitioning 
before joining, and hence those topics will be considered as internal topics as 
well, but for now they are out of the library "black box".

> Missing topics on startup
> -
>
> Key: KAFKA-3544
> URL: https://issues.apache.org/jira/browse/KAFKA-3544
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Greg Fodor
>Assignee: Guozhang Wang
>  Labels: semantics
>
> When running a relatively complex job with multiple tasks and state stores, 
> on the first run I get errors due to some of the intermediate topics not 
> existing. Subsequent runs work OK. My assumption is streams may be creating 
> topics lazily, so if downstream tasks are initializing before their parents 
> have had a chance to create their necessary topics then the children will 
> attempt to start consuming from topics that do not exist yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3504: Log compaction for changelog parti...

2016-04-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1203


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3504) Changelog partition configured to enable log compaction

2016-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238329#comment-15238329
 ] 

ASF GitHub Bot commented on KAFKA-3504:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1203


> Changelog partition configured to enable log compaction
> ---
>
> Key: KAFKA-3504
> URL: https://issues.apache.org/jira/browse/KAFKA-3504
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: user-experience
> Fix For: 0.10.0.0
>
>
> Today Kafka Streams automatically configured changelog topics for state 
> stores, however these changelog topics are not configured as log compaction 
> enabled. We should set the right configs when auto-creating these internal 
> topics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3504) Changelog partition configured to enable log compaction

2016-04-12 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-3504.
--
Resolution: Fixed

Issue resolved by pull request 1203
[https://github.com/apache/kafka/pull/1203]

> Changelog partition configured to enable log compaction
> ---
>
> Key: KAFKA-3504
> URL: https://issues.apache.org/jira/browse/KAFKA-3504
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: user-experience
> Fix For: 0.10.0.0
>
>
> Today Kafka Streams automatically configured changelog topics for state 
> stores, however these changelog topics are not configured as log compaction 
> enabled. We should set the right configs when auto-creating these internal 
> topics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3552) New Consumer: java.lang.OutOfMemoryError: Direct buffer memory

2016-04-12 Thread Liquan Pei (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238372#comment-15238372
 ] 

Liquan Pei commented on KAFKA-3552:
---

Hi Kanak,

Can you share with us the consumer configuration file? Also, it would be 
helpful to know how many topic partitions you are consuming from. Do you 
experience GC when this happens? It would be nice if you can provide us the GC 
log and heap dump? 

> New Consumer: java.lang.OutOfMemoryError: Direct buffer memory
> --
>
> Key: KAFKA-3552
> URL: https://issues.apache.org/jira/browse/KAFKA-3552
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
>Reporter: Kanak Biscuitwala
>Assignee: Liquan Pei
>
> I'm running Kafka's new consumer with message handlers that can sometimes 
> take a lot of time to return, and combining that with manual offset 
> management (to get at-least-once semantics). Since poll() is the only way to 
> heartbeat with the consumer, I have a thread that runs every 500 milliseconds 
> that does the following:
> 1) Pause all partitions
> 2) Call poll(0)
> 3) Resume all partitions
> For the record, all accesses to KafkaConsumer are protected by synchronized 
> blocks. This generally works, but I'm occasionally seeing messages like this:
> {code}
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:658)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
> at sun.nio.ch.IOUtil.read(IOUtil.java:195)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:108)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
> {code}
> In addition, when I'm reporting offsets, I'm seeing:
> {code}
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:658)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
> at sun.nio.ch.IOUtil.read(IOUtil.java:195)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at 
> org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:108)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97)
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:358)
> at 
> org.apache.kafka.clients.consumer.

Build failed in Jenkins: kafka-trunk-jdk8 #524

2016-04-12 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-3504; Log compaction for changelog partition

--
[...truncated 3842 lines...]

kafka.log.BrokerCompressionTest > testBrokerSideCompression[3] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[4] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[5] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[6] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[7] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[14] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[15] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[16] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[17] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[18] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[19] PASSED

kafka.log.OffsetMapTest > testClear PASSED

kafka.log.OffsetMapTest > testBasicValidation PASSED

kafka.log.LogManagerTest > testCleanupSegmentsToMaintainSize PASSED

kafka.log.LogManagerTest > testRecoveryDirectoryMappingWithRelativeDirectory 
PASSED

kafka.log.LogManagerTest > testGetNonExistentLog PASSED

kafka.log.LogManagerTest > testTwoLogManagersUsingSameDirFails PASSED

kafka.log.LogManagerTest > testLeastLoadedAssignment PASSED

kafka.log.LogManagerTest > testCleanupExpiredSegments PASSED

kafka.log.LogManagerTest > testCheckpointRecoveryPoints PASSED

kafka.log.LogManagerTest > testTimeBasedFlush PASSED

kafka.log.LogManagerTest > testCreateLog PASSED

kafka.log.LogManagerTest > testRecoveryDirectoryMappingWithTrailingSlash PASSED

kafka.log.LogSegmentTest > testRecoveryWithCorruptMessage PASSED

kafka.log.LogSegmentTest > testRecoveryFixesCorruptIndex PASSED

kafka.log.LogSegmentTest > testReadFromGap PASSED

kafka.log.LogSegmentTest > testTruncate PASSED

kafka.log.LogSegmentTest > testReadBeforeFirstOffset PASSED

kafka.log.LogSegmentTest > testCreateWithInitFileSizeAppendMessage PASSED

kafka.log.LogSegmentTest > testChangeFileSuffixes PASSED

kafka.log.LogSegmentTest > testMaxOffset PASSED

kafka.log.LogSegmentTest > testNextOffsetCalculation PASSED

kafka.log.LogSegmentTest > testReadOnEmptySegment PASSED

kafka.log.LogSegmentTest > testReadAfterLast PASSED

kafka.log.LogSegmentTest > testCreateWithInitFileSizeClearShutdown PASSED

kafka.log.LogSegmentTest > testTruncateFull PASSED

kafka.log.FileMessageSetTest > testTruncate PASSED

kafka.log.FileMessageSetTest > testIterationOverPartialAndTruncation PASSED

kafka.log.FileMessageSetTest > testRead PASSED

kafka.log.FileMessageSetTest > testFileSize PASSED

kafka.log.FileMessageSetTest > testIteratorWithLimits PASSED

kafka.log.FileMessageSetTest > testPreallocateTrue PASSED

kafka.log.FileMessageSetTest > testIteratorIsConsistent PASSED

kafka.log.FileMessageSetTest > testFormatConversionWithPartialMessage PASSED

kafka.log.FileMessageSetTest > testIterationDoesntChangePosition PASSED

kafka.log.FileMessageSetTest > testWrittenEqualsRead PASSED

kafka.log.FileMessageSetTest > testWriteTo PASSED

kafka.log.FileMessageSetTest > testPreallocateFalse PASSED

kafka.log.FileMessageSetTest > testPreallocateClearShutdown PASSED

kafka.log.FileMessageSetTest > testMessageFormatConversion PASSED

kafka.log.FileMessageSetTest > testSearch PASSED

kafka.log.FileMessageSetTest > testSizeInBytes PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[0] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[1] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[2] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[3] PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingTopic PASSED

kafka.log.LogTest > testIndexRebuild PASSED

kafka.log.LogTest > testLogRolls PASSED

kafka.log.LogTest > testMessageSizeCheck PASSED

kafka.log.LogTest > testAsyncDelete PASSED

kafka.log.LogTest > testReadOutOfRange PASSED

kafka.log.LogTest > testAppendWithOutOfOrderOffsetsThrowsException PASSED

kafka.log.LogTest > testReadAtLogGap PASSED

kafka.log.LogTest > testTimeBasedLogRoll PASSED

kafka.log.LogTest > testLoadEmptyLog PASSED

kafka.log.LogTest > testMessageSetSizeCheck PASSED

kafka.log.LogTest > testIndexResizingAtTruncation PASSED

kafka.log.LogTest > testCompactedTopicConstraints PASSED

kafka.log.LogTest > testThatGarbageCollectingSegmentsDoesntChangeOffset PASSED

kafka.log.LogTest > testAppendAndReadWithSequentialOffsets PASSED

[jira] [Commented] (KAFKA-3377) add REST interface to JMX

2016-04-12 Thread Chien Le (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238536#comment-15238536
 ] 

Chien Le commented on KAFKA-3377:
-

Might also want to check out 
https://github.com/arnobroekhof/kafka-http-metrics-reporter as a quick drop in 
jar.

> add REST interface to JMX
> -
>
> Key: KAFKA-3377
> URL: https://issues.apache.org/jira/browse/KAFKA-3377
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Christian Posta
>
> Would be awesome if we could get JMX metrics w/out having to use the JMX 
> APIs.. would there be any interest in adding something like 
> https://jolokia.org to Kafka? I'll happily volunteer :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-12 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238651#comment-15238651
 ] 

Jun Rao commented on KAFKA-3042:


The issue seems to be the following. In 0.9.0, we changed the logic a bit in 
ReplicaManager.makeFollowers() to ensure that the new leader is in the 
liveBrokers of metadataCache. However, during a controller failover, the new 
controller first sends leaderAndIsr requests, followed by an UpdateMetaRequest. 
So, it is possible when a broker receives a leaderAndIsr request, the 
liveBrokers in metadataCache are stale and don't include the leader and 
therefore causes the becoming follower logic to error out. Indeed, from broker 
1's state-change log, the last UpdateMetaRequest before the error in becoming 
follower came from controller 1.

{code}
[2016-04-09 00:40:52,929] TRACE Broker 1 cached leader info 
(LeaderAndIsrInfo:(Leader:1,ISR:1,LeaderEpoch:330,ControllerEpoch:414),ReplicationFactor:3),AllReplicas:2,1,4)
 for partit
ion [tec1.usqe1.frontend.syncPing,1] in response to UpdateMetadata request sent 
by controller 1 epoch 414 with correlation id 877 (state.change.logger)
{code}

In controller 1's log, the last time it updated the live broker list is the 
following and it didn't include broker 4 in the live broker list.
{code}
[2016-04-09 00:39:33,005] INFO [BrokerChangeListener on Controller 1]: Newly 
added brokers: , deleted brokers: 2, all live brokers: 1,3,5 
(kafka.controller.ReplicaStateMachine$BrokerChangeListener)
{code}

To fix this, we should probably send an UpdateMetadataRequest before any 
leaderAndIsrRequest during controller failover.

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-12 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238721#comment-15238721
 ] 

Ismael Juma commented on KAFKA-3042:


[~junrao], some people have said that they have seen this issue in 0.8.2 too so 
that suggests that there may be 2 different problems?

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Fix For: 0.10.0.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3042) updateIsr should stop after failed several times due to zkVersion issue

2016-04-12 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3042:
---
Fix Version/s: 0.10.0.0

> updateIsr should stop after failed several times due to zkVersion issue
> ---
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
>Reporter: Jiahongchao
> Fix For: 0.10.0.0
>
> Attachments: controller.log, server.log.2016-03-23-01, 
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's 
> a follower.
> So after several failed tries, it need to find out who is the leader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)