[GitHub] kafka pull request #1629: KAFKA-3960 - Committed offset not set after first ...

2016-07-18 Thread 13h3r
GitHub user 13h3r opened a pull request:

https://github.com/apache/kafka/pull/1629

KAFKA-3960 - Committed offset not set after first assign

Fixes https://issues.apache.org/jira/browse/KAFKA-3960

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/13h3r/kafka kafka-3960

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1629.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1629


commit 7708b0c39f3dc5ceb7e74422aba3708acfea57a9
Author: Alexey Romanchuk 
Date:   2016-07-18T08:27:46Z

Fix KAFKA-3960 - Committed offset not set after first assign




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3960) Committed offset not set after first assign

2016-07-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15381888#comment-15381888
 ] 

ASF GitHub Bot commented on KAFKA-3960:
---

GitHub user 13h3r opened a pull request:

https://github.com/apache/kafka/pull/1629

KAFKA-3960 - Committed offset not set after first assign

Fixes https://issues.apache.org/jira/browse/KAFKA-3960

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/13h3r/kafka kafka-3960

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1629.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1629


commit 7708b0c39f3dc5ceb7e74422aba3708acfea57a9
Author: Alexey Romanchuk 
Date:   2016-07-18T08:27:46Z

Fix KAFKA-3960 - Committed offset not set after first assign




> Committed offset not set after first assign
> ---
>
> Key: KAFKA-3960
> URL: https://issues.apache.org/jira/browse/KAFKA-3960
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.0.0
>Reporter: Alexey Romanchuk
>Priority: Blocker
>
> Committed offset did not set after first assign. Here it is minimal example 
> (scala):
> {code}
>   val props = new Properties()
>   props.put("bootstrap.servers", "localhost:9092")
>   props.put("client.id", "client1")
>   props.put("group.id", "client1")
>   props.put("enable.auto.commit", "false")
>   props.put("key.deserializer", 
> "org.apache.kafka.common.serialization.ByteArrayDeserializer")
>   props.put("value.deserializer", 
> "org.apache.kafka.common.serialization.ByteArrayDeserializer")
>   val consumer = new KafkaConsumer[Array[Byte], Array[Byte]](props)
>   import scala.collection.JavaConversions._
>   def dumpPositionAndCommitted() = {
> consumer.assignment().foreach { tp =>
>   println(tp)
>   println(s"Position - ${consumer.position(tp)}")
>   println(s"Committed - ${consumer.committed(tp)}")
> }
> println("---")
>   }
>   consumer.assign(Collections.singleton(new TopicPartition("topic", 0)))
>   dumpPositionAndCommitted()
>   Thread.sleep(3000)
>   val ps = Collections.singleton(new TopicPartition("topic", 1)) ++ 
> consumer.assignment()
>   consumer.assign(ps)
>   dumpPositionAndCommitted()
>   Thread.sleep(3000)
>   dumpPositionAndCommitted()
> {code}
> and the result is
> {noformat}
> Position - 1211046445
> Committed - OffsetAndMetadata{offset=1211046445, metadata=''}
> ---
> topic-1
> Position - 1262864347
> Committed - null
> topic-0
> Position - 1211046445
> Committed - OffsetAndMetadata{offset=1211046445, metadata=''}
> ---
> topic-1
> Position - 1262864347
> Committed - null
> topic-0
> Position - 1211046445
> Committed - OffsetAndMetadata{offset=1211046445, metadata=''}
> ---
> {noformat}
> Pay attention to 
> {noformat}
> topic-1
> Position - 1262864347
> Committed - null
> {noformat}
> There is no committed offset fetched from broker, but it is. Looks like we 
> should set {{needsFetchCommittedOffsets}} to {{true}} during assign the 
> partition



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Helping Spread the Word about Apachecon EU 2016

2016-07-18 Thread Sharan Foga
Hi Everyone

I'm forwarding the following message on behalf of Rich Bowen and the 
Apachecon team
==

As you are aware, we are holding ApacheCon in Seville in November. While this 
seems like a long way away, it is critical that we get on people's calendar 
now, so that they can plan, get budget approval, and spread the word to their 
contacts.

Here's how you can help.

If you Tweet, please consider using some of the following sample tweets to get 
the word out:


Save the date. #ApacheCon is coming to Seville, November 14-18 2016. 
http://apachecon.com/

Come join me at @ApacheCon in Seville in November. http://apachecon.com/

#ApacheBigData is the best place to learn what's next in the world of big data. 
November 14-16 in Seville http://apachecon.com/

 @TheASF is 300 projects strong and growing. Come learn about all of them at 
@ApacheCon in Seville - http://apachecon.com/


Follow @ApacheCon and @TheASF, and retweet mentions of ApacheCon, to spread the 
word. 


If you use other social media platforms, share the URLs of the events and their 
CFPs, to collect the broadest possible audience for our events, as well as 
getting the best content:


Big Data: Website: 
http://events.linuxfoundation.org/events/apache-big-data-europe
CFP: http://events.linuxfoundation.org/events/apache-big-data-europe/program/cfp


ApacheCon: Website: http://events.linuxfoundation.org/events/apachecon-europe
CFP: http://events.linuxfoundation.org/events/apachecon-europe/program/cfp


And, finally, if your employer benefits from the work that we do at Apache, or 
is looking for the brightest software developers in the industry, encourage 
them to sponsor the event. Sponsorship is available at all levels. Have them 
contact me at e...@apache.org for a prospectus, and I'll make the right 
introductions. Sponsors in the past include … well, everyone. You have to go 
pretty deep into the Forbes Technology list ( 
http://fortune.com/2015/06/13/fortune-500-tech/ ) to find a company that hasn't 
sponsored ApacheCon.

==

Thanks
Sharan



Re: [VOTE] KIP-67: Queryable state for Kafka Streams

2016-07-18 Thread Jim Jagielski
Very cool.

> On Jul 16, 2016, at 9:55 AM, Damian Guy  wrote:
> 
> Hi,
> The vote is now complete and KIP-67 has been accepted and adopted.
> Thanks everyone for the input etc.
> 
> Regards,
> Damian
> 
> On Sat, 16 Jul 2016 at 06:53 Damian Guy  wrote:
> 
>> Hi,
>> Jay's interpretation is correct.
>> Thanks,
>> Damian
>> 
>> 
>> On Fri, 15 Jul 2016 at 16:10, Jay Kreps  wrote:
>> 
>>> My interpretation was that you need an implementation of
>>> QueryableStoreType which anyone can do and QueryableStoreTypes is just
>>> a
>>> place to put the type objects for the types we ship with Kafka.
>>> 
>>> -Jay
>>> 
>>> On Fri, Jul 15, 2016 at 4:04 PM, Sriram Subramanian 
>>> wrote:
>>> 
 So, it looks like QueryableStoreTypes would be part of the streams
>>> library,
 right? If a developer needs to support a new store, would they need to
 change code in streams?
 
 On Fri, Jul 15, 2016 at 3:15 PM, Jay Kreps  wrote:
 
> Cool, I'm +1 after the updates.
> 
> -Jay
> 
> On Fri, Jul 15, 2016 at 1:50 PM, Damian Guy 
 wrote:
> 
>> Hi Guozhang, KIP updated.
>> 
>> Thanks,
>> Damian
>> 
>> On Fri, 15 Jul 2016 at 13:15 Guozhang Wang 
>>> wrote:
>> 
>>> Hi Damian,
>>> 
>>> Since the StateStoreProvider is moved into internal packages, how
 about
>>> just keeping the ReadOnlyXXStores interface for the queryAPI, and
>>> "QueryableStoreType" in the discoverAPI, and move the
> StateStoreProvider
>>> / QueryableStoreTypeMatcher and different implementations of the
> matcher
>>> like KeyValueStoreType / etc in a new section called "developer
>>> guide
> for
>>> customized stores"? This way we have a separate guidance for
>>> Streams
>> users,
>>> from Streams developers.
>>> 
>>> Other than that, all LGTM.
>>> 
>>> Guozhang
>>> 
>>> On Fri, Jul 15, 2016 at 9:57 AM, Damian Guy >>> 
>> wrote:
>>> 
 Hi All,
 
 I've updated the KIP with changes as discussed in this Thread.
 
 Thanks,
 Damian
 
 On Wed, 13 Jul 2016 at 16:51 Ismael Juma 
 wrote:
 
> I think it's important to distinguish the use cases of
>>> defining
 new
 stores
> (somewhat rare) versus using the `store` method (very common).
 The
 strategy
> employed here is a common way to use generics to ensure type
 safety
>> for
 the
> latter case. In the former case, there are all sorts of weird
> things
>>> one
> could do to defeat the type system, but spending a bit more
 effort
> to
>>> get
> it right so that the common case is safer and more pleasant is
> worth
>>> it,
 in
> my opinion.
> 
> Ismael
> 
> On Thu, Jul 14, 2016 at 12:23 AM, Damian Guy <
 damian@gmail.com
>> 
 wrote:
> 
>> Yes, you get compile time errors
>> 
>> On Wed, 13 Jul 2016 at 16:22 Damian Guy <
>>> damian@gmail.com>
>>> wrote:
>> 
>>> You wont get a runtime error as you wouldn't find a store
>>> of
> that
 type.
>>> The API would return null
>>> 
>>> On Wed, 13 Jul 2016 at 16:22 Jay Kreps 
> wrote:
>>> 
 But if "my-store" is not of type MyStoreType don't you
>>> still
>> get a
 run
 time
 error that in effect is the same as the class cast would
>>> be?
 Basically
>> the
 question I'm asking is whether this added complexity is
> actually
> moving
 runtime errors to compile time errors.
 
 -Jay
 
>> 
> 
 
>>> 
>>> 
>>> 
>>> --
>>> -- Guozhang
>>> 
>> 
> 
 
>>> 
>> 



Re: [DISCUSS] KIP-48 Support for delegation tokens as an authentication mechanism

2016-07-18 Thread Grant Henke
Hi Parth,

Are you still working on this? If you need any help please don't hesitate
to ask.

Thanks,
Grant

On Thu, Jun 30, 2016 at 4:35 PM, Jun Rao  wrote:

> Parth,
>
> Thanks for the reply.
>
> It makes sense to only allow the renewal by users that authenticated using
> *non* delegation token mechanism. Then, should we make the renewal a list?
> For example, in the case of rest proxy, it will be useful for every
> instance of rest proxy to be able to renew the tokens.
>
> It would be clearer if we can document the request protocol like
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicsRequest(KAFKA-2945):(VotedandPlannedforin0.10.1.0)
> .
>
> It would also be useful to document the client APIs.
>
> Thanks,
>
> Jun
>
> On Tue, Jun 28, 2016 at 2:55 PM, parth brahmbhatt <
> brahmbhatt.pa...@gmail.com> wrote:
>
> > Hi,
> >
> > I am suggesting that we will only allow the renewal by users that
> > authenticated using *non* delegation token mechanism. For example, If
> user
> > Alice authenticated using kerberos and requested delegation tokens, only
> > user Alice authenticated via non delegation token mechanism can renew.
> > Clients that have  access to delegation tokens can not issue renewal
> > request for renewing their own token and this is primarily important to
> > reduce the time window for which a compromised token will be valid.
> >
> > To clarify, Yes any authenticated user can request delegation tokens but
> > even here I would recommend to avoid creating a chain where a client
> > authenticated via delegation token request for more delegation tokens.
> > Basically anyone can request delegation token, as long as they
> authenticate
> > via a non delegation token mechanism.
> >
> > Aren't classes listed here
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka#KIP-48DelegationtokensupportforKafka-PublicInterfaces
> > >
> > sufficient?
> >
> > Thanks
> > Parth
> >
> >
> >
> > On Tue, Jun 21, 2016 at 4:33 PM, Jun Rao  wrote:
> >
> > > Parth,
> > >
> > > Thanks for the reply. A couple of comments inline below.
> > >
> > > On Tue, Jun 21, 2016 at 10:36 AM, parth brahmbhatt <
> > > brahmbhatt.pa...@gmail.com> wrote:
> > >
> > > > 1. Who / how are tokens renewed? By original requester only? or using
> > > > Kerberos
> > > > auth only?
> > > > My recommendation is to do this only using Kerberos auth and only
> threw
> > > the
> > > > renewer specified during the acquisition request.
> > > >
> > > >
> > > Hmm, not sure that I follow this. Are you saying that any client
> > > authenticated with the delegation token can renew, i.e. there is no
> > renewer
> > > needed?
> > >
> > > Also, just to be clear, any authenticated client (either through SASL
> or
> > > SSL) can request a delegation token for the authenticated user, right?
> > >
> > >
> > > > 2. Are tokens stored on each broker or in ZK?
> > > > My recommendation is still to store in ZK or not store them at all.
> The
> > > > whole controller based distribution is too much overhead with not
> much
> > to
> > > > achieve.
> > > >
> > > > 3. How are tokens invalidated / expired?
> > > > Either by expiration time out or through an explicit request to
> > > invalidate.
> > > >
> > > > 4. Which encryption algorithm is used?
> > > > SCRAM
> > > >
> > > > 5. What is the impersonation proposal (it wasn't in the KIP but was
> > > > discussed
> > > > in this thread)?
> > > > There is no imperonation proposal. I tried and explained how its a
> > > > different problem and why its not really necessary to discuss that as
> > > part
> > > > of this KIP.  This KIP will not support any impersonation, it will
> just
> > > be
> > > > another way to authenticate.
> > > >
> > > > 6. Do we need new ACLs, if so - for what actions?
> > > > We do not need new ACLs.
> > > >
> > > >
> > > Could we document the format of the new request/response and their
> > > associated Resource and Operation for ACL?
> > >
> > >
> > > > 7. How would the delegation token be configured in the client?
> > > > Should be through config. I wasn't planning on supporting JAAS for
> > > tokens.
> > > > I don't believe hadoop does this either.
> > > >
> > > > Thanks
> > > > Parth
> > > >
> > > >
> > > >
> > > > On Thu, Jun 16, 2016 at 4:03 PM, Jun Rao  wrote:
> > > >
> > > > > Harsha,
> > > > >
> > > > > Another question.
> > > > >
> > > > > 9. How would the delegation token be configured in the client? The
> > > > standard
> > > > > way is to do this through JAAS. However, we will need to think
> > through
> > > if
> > > > > this is convenient in a shared environment. For example, when a new
> > > task
> > > > is
> > > > > added to a Storm worker node, do we need to dynamically add a new
> > > section
> > > > > in the JAAS file? It may be more convenient if we can pass in the
> > token
> > > > > through the config dir

[jira] [Commented] (KAFKA-3894) Log Cleaner thread crashes and never restarts

2016-07-18 Thread Vincent Rischmann (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382370#comment-15382370
 ] 

Vincent Rischmann commented on KAFKA-3894:
--

Not adding much to the conversation, but I've just been hit by this bug.

I'm in the process of upgrading my cluster to 0.9.0.1, and in one case the log 
cleaner dies because of this.

{{requirement failed: 1214976153 messages in segment 
__consumer_offsets-15/12560043.log but offset map can fit only 
40265317}}

If I'm not wrong, there's no way that much messages can fit in the buffer since 
it's limited to 2G anyway per thread. Right now I'm leaving it as is since the 
broker seems to be working, but it's not ideal.

I'm wondering if I simply delete the log file with the broker shut down, will 
it be fetched at startup from an other replica without problems ?
In my case, I believe this is only temporary: we never enabled the log cleaner 
when running 0.8.2.1 (mistake on my part) and now when migrating to 0.9.0.1 it 
does a giant cleanup at first startup.

> Log Cleaner thread crashes and never restarts
> -
>
> Key: KAFKA-3894
> URL: https://issues.apache.org/jira/browse/KAFKA-3894
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.2, 0.9.0.1
> Environment: Oracle JDK 8
> Ubuntu Precise
>Reporter: Tim Carey-Smith
>  Labels: compaction
>
> The log-cleaner thread can crash if the number of keys in a topic grows to be 
> too large to fit into the dedupe buffer. 
> The result of this is a log line: 
> {quote}
> broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner 
> \[kafka-log-cleaner-thread-0\], Error due to  
> java.lang.IllegalArgumentException: requirement failed: 9750860 messages in 
> segment MY_FAVORITE_TOPIC-2/47580165.log but offset map can fit 
> only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease 
> log.cleaner.threads
> {quote}
> As a result, the broker is left in a potentially dangerous situation where 
> cleaning of compacted topics is not running. 
> It is unclear if the broader strategy for the {{LogCleaner}} is the reason 
> for this upper bound, or if this is a value which must be tuned for each 
> specific use-case. 
> Of more immediate concern is the fact that the thread crash is not visible 
> via JMX or exposed as some form of service degradation. 
> Some short-term remediations we have made are:
> * increasing the size of the dedupe buffer
> * monitoring the log-cleaner threads inside the JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1630: MINOR: Fix typo in Operations section

2016-07-18 Thread ssaamm
GitHub user ssaamm opened a pull request:

https://github.com/apache/kafka/pull/1630

MINOR: Fix typo in Operations section

This contribution is my original work, and I license the work to the 
project under the project's open source license.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ssaamm/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1630.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1630


commit 7633cf38376d2bc221b94bf3794731f4f86ff244
Author: Samuel Taylor 
Date:   2016-07-18T14:42:46Z

MINOR: Fix typo in Operations section




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1631: KAFKA-3934: kafka-server-start.sh enables GC by de...

2016-07-18 Thread granthenke
GitHub user granthenke opened a pull request:

https://github.com/apache/kafka/pull/1631

KAFKA-3934: kafka-server-start.sh enables GC by default with no way t…

…o disable

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/granthenke/kafka garbage-flag

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1631


commit f20a23870ac146a9d5e9ef43005d21d1d0e13c23
Author: Grant Henke 
Date:   2016-07-18T15:50:53Z

KAFKA-3934: kafka-server-start.sh enables GC by default with no way to 
disable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Compacted topic cannot accept message without key

2016-07-18 Thread Kafka
Hi, 
The server log shows error as belows on broker 0.9.0.
ERROR [Replica Manager on Broker 0]: Error processing append operation 
on partition [__consumer_offsets,5] (kafka.server.ReplicaManager)
kafka.message.InvalidMessageException: Compacted topic cannot accept message 
without key.

Why does this happen and what’s the solution?




[jira] [Commented] (KAFKA-3894) Log Cleaner thread crashes and never restarts

2016-07-18 Thread Peter Davis (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382511#comment-15382511
 ] 

Peter Davis commented on KAFKA-3894:


Re: "the broker seems to be working"

You may regret not taking action now.  As Tim mentioned from the talk at the 
Kafka Summit 
(http://www.slideshare.net/jjkoshy/kafkaesque-days-at-linked-in-in-2015/49), if 
__consumer_offsets is not compacted and has accumulated millions (or billions!) 
of messages, it can take many minutes for the broker to elect a new coordinator 
after any kind of hiccup.  *Your new consumers may be hung during this time!*

However, even shutting down brokers to change the configuration will cause 
coordinator elections which will cause an outage.  It seems like not having a 
"hot spare" for Offset Managers is a liability here…

We were bit by this bug and it caused all kinds of headaches until we managed 
to get __consumer_offsets cleaned up again.

> Log Cleaner thread crashes and never restarts
> -
>
> Key: KAFKA-3894
> URL: https://issues.apache.org/jira/browse/KAFKA-3894
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.2, 0.9.0.1
> Environment: Oracle JDK 8
> Ubuntu Precise
>Reporter: Tim Carey-Smith
>  Labels: compaction
>
> The log-cleaner thread can crash if the number of keys in a topic grows to be 
> too large to fit into the dedupe buffer. 
> The result of this is a log line: 
> {quote}
> broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner 
> \[kafka-log-cleaner-thread-0\], Error due to  
> java.lang.IllegalArgumentException: requirement failed: 9750860 messages in 
> segment MY_FAVORITE_TOPIC-2/47580165.log but offset map can fit 
> only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease 
> log.cleaner.threads
> {quote}
> As a result, the broker is left in a potentially dangerous situation where 
> cleaning of compacted topics is not running. 
> It is unclear if the broader strategy for the {{LogCleaner}} is the reason 
> for this upper bound, or if this is a value which must be tuned for each 
> specific use-case. 
> Of more immediate concern is the fact that the thread crash is not visible 
> via JMX or exposed as some form of service degradation. 
> Some short-term remediations we have made are:
> * increasing the size of the dedupe buffer
> * monitoring the log-cleaner threads inside the JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3934) kafka-server-start.sh enables GC by default with no way to disable

2016-07-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382484#comment-15382484
 ] 

ASF GitHub Bot commented on KAFKA-3934:
---

GitHub user granthenke opened a pull request:

https://github.com/apache/kafka/pull/1631

KAFKA-3934: kafka-server-start.sh enables GC by default with no way t…

…o disable

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/granthenke/kafka garbage-flag

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1631


commit f20a23870ac146a9d5e9ef43005d21d1d0e13c23
Author: Grant Henke 
Date:   2016-07-18T15:50:53Z

KAFKA-3934: kafka-server-start.sh enables GC by default with no way to 
disable




> kafka-server-start.sh enables GC by default with no way to disable
> --
>
> Key: KAFKA-3934
> URL: https://issues.apache.org/jira/browse/KAFKA-3934
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>Reporter: Grant Henke
>Assignee: Grant Henke
>
> In KAFKA-1127 the following line was added to kafka-server-start.sh:
> {noformat}
> EXTRA_ARGS="-name kafkaServer -loggc"
> {noformat}
> This prevents gc logging from being disabled without some unusual environment 
> variable workarounds. 
> I suggest EXTRA_ARGS is made overridable like below: 
> {noformat}
> if [ "x$EXTRA_ARGS" = "x" ]; then
> export EXTRA_ARGS="-name kafkaServer -loggc"
> fi
> {noformat}
> *Note:* I am also not sure I understand why the existing code uses the "x" 
> thing when checking the variable instead of the following:
> {noformat}
> export EXTRA_ARGS=${EXTRA_ARGS-'-name kafkaServer -loggc'}
> {noformat}
> This lets the variable be overridden to "" without taking the default. 
> *Workaround:* As a workaround the user should be able to set 
> $KAFKA_GC_LOG_OPTS to fit their needs. Since kafka-run-class.sh will not 
> ignore the -loggc parameter if that is set. 
> {noformat}
> -loggc)
>   if [ -z "$KAFKA_GC_LOG_OPTS" ]; then
> GC_LOG_ENABLED="true"
>   fi
>   shift
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3894) Log Cleaner thread crashes and never restarts

2016-07-18 Thread Vincent Rischmann (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382516#comment-15382516
 ] 

Vincent Rischmann commented on KAFKA-3894:
--

Yeah, that's why I was hoping for a workaround :)

Right now it takes a ridiculous amount of time for the broker to load some 
partitions, it just took like 1h+ to load a 300Gb partition. In that case it 
didn't impact production though.

I believe I have found a workaround in my case, since as said it's a temporary 
thing: I note all big partitions (more than a Gb let's say) and reassign them 
on brokers that are already cleaned up. The reassignment takes a long time but 
in the end I think it'll remove the partition from the problematic broker.

> Log Cleaner thread crashes and never restarts
> -
>
> Key: KAFKA-3894
> URL: https://issues.apache.org/jira/browse/KAFKA-3894
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.2, 0.9.0.1
> Environment: Oracle JDK 8
> Ubuntu Precise
>Reporter: Tim Carey-Smith
>  Labels: compaction
>
> The log-cleaner thread can crash if the number of keys in a topic grows to be 
> too large to fit into the dedupe buffer. 
> The result of this is a log line: 
> {quote}
> broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner 
> \[kafka-log-cleaner-thread-0\], Error due to  
> java.lang.IllegalArgumentException: requirement failed: 9750860 messages in 
> segment MY_FAVORITE_TOPIC-2/47580165.log but offset map can fit 
> only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease 
> log.cleaner.threads
> {quote}
> As a result, the broker is left in a potentially dangerous situation where 
> cleaning of compacted topics is not running. 
> It is unclear if the broader strategy for the {{LogCleaner}} is the reason 
> for this upper bound, or if this is a value which must be tuned for each 
> specific use-case. 
> Of more immediate concern is the fact that the thread crash is not visible 
> via JMX or exposed as some form of service degradation. 
> Some short-term remediations we have made are:
> * increasing the size of the dedupe buffer
> * monitoring the log-cleaner threads inside the JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3969) kafka.admin.ConsumerGroupCommand doesn't show consumer groups

2016-07-18 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382518#comment-15382518
 ] 

Vahid Hashemian commented on KAFKA-3969:


Without diving much into the code, I think you might be running the new 
consumer, in which case the correct consumer group command would be
{{bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --new-consumer 
--list}}. Could you give this one a try?

> kafka.admin.ConsumerGroupCommand doesn't show consumer groups
> -
>
> Key: KAFKA-3969
> URL: https://issues.apache.org/jira/browse/KAFKA-3969
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Dieter Plaetinck
>
> http://kafka.apache.org/documentation.html , at 
> http://kafka.apache.org/documentation.html#basic_ops_consumer_lag says 
> " Note, however, after 0.9.0, the kafka.tools.ConsumerOffsetChecker tool is 
> deprecated and you should use the kafka.admin.ConsumerGroupCommand (or the 
> bin/kafka-consumer-groups.sh script) to manage consumer groups, including 
> consumers created with the new consumer API."
> I'm sure that i have a consumer running, because i wrote an app that is 
> processing data, and i can see the data as well as the metrics that confirm 
> it's receiving data. I'm using kafka 0.10
> Yet when I run the command as instructed, it doesn't list any consumer groups
> $ /opt/kafka_2.11-0.10.0.0/bin/kafka-run-class.sh 
> kafka.admin.ConsumerGroupCommand --zookeeper localhost:2181 --list
> $
> So either something is wrong with the tool, or with the docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3934) kafka-server-start.sh enables GC by default with no way to disable

2016-07-18 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-3934:
---
Status: Patch Available  (was: Open)

> kafka-server-start.sh enables GC by default with no way to disable
> --
>
> Key: KAFKA-3934
> URL: https://issues.apache.org/jira/browse/KAFKA-3934
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>Reporter: Grant Henke
>Assignee: Grant Henke
>
> In KAFKA-1127 the following line was added to kafka-server-start.sh:
> {noformat}
> EXTRA_ARGS="-name kafkaServer -loggc"
> {noformat}
> This prevents gc logging from being disabled without some unusual environment 
> variable workarounds. 
> I suggest EXTRA_ARGS is made overridable like below: 
> {noformat}
> if [ "x$EXTRA_ARGS" = "x" ]; then
> export EXTRA_ARGS="-name kafkaServer -loggc"
> fi
> {noformat}
> *Note:* I am also not sure I understand why the existing code uses the "x" 
> thing when checking the variable instead of the following:
> {noformat}
> export EXTRA_ARGS=${EXTRA_ARGS-'-name kafkaServer -loggc'}
> {noformat}
> This lets the variable be overridden to "" without taking the default. 
> *Workaround:* As a workaround the user should be able to set 
> $KAFKA_GC_LOG_OPTS to fit their needs. Since kafka-run-class.sh will not 
> ignore the -loggc parameter if that is set. 
> {noformat}
> -loggc)
>   if [ -z "$KAFKA_GC_LOG_OPTS" ]; then
> GC_LOG_ENABLED="true"
>   fi
>   shift
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-48 Support for delegation tokens as an authentication mechanism

2016-07-18 Thread Harsha Chintalapani
Hi Grant,
  We are working on it. Will add the details to KIP about the
request protocol.

Thanks,
Harsha

On Mon, Jul 18, 2016 at 6:50 AM Grant Henke  wrote:

> Hi Parth,
>
> Are you still working on this? If you need any help please don't hesitate
> to ask.
>
> Thanks,
> Grant
>
> On Thu, Jun 30, 2016 at 4:35 PM, Jun Rao  wrote:
>
> > Parth,
> >
> > Thanks for the reply.
> >
> > It makes sense to only allow the renewal by users that authenticated
> using
> > *non* delegation token mechanism. Then, should we make the renewal a
> list?
> > For example, in the case of rest proxy, it will be useful for every
> > instance of rest proxy to be able to renew the tokens.
> >
> > It would be clearer if we can document the request protocol like
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicsRequest(KAFKA-2945):(VotedandPlannedforin0.10.1.0)
> > .
> >
> > It would also be useful to document the client APIs.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Jun 28, 2016 at 2:55 PM, parth brahmbhatt <
> > brahmbhatt.pa...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am suggesting that we will only allow the renewal by users that
> > > authenticated using *non* delegation token mechanism. For example, If
> > user
> > > Alice authenticated using kerberos and requested delegation tokens,
> only
> > > user Alice authenticated via non delegation token mechanism can renew.
> > > Clients that have  access to delegation tokens can not issue renewal
> > > request for renewing their own token and this is primarily important to
> > > reduce the time window for which a compromised token will be valid.
> > >
> > > To clarify, Yes any authenticated user can request delegation tokens
> but
> > > even here I would recommend to avoid creating a chain where a client
> > > authenticated via delegation token request for more delegation tokens.
> > > Basically anyone can request delegation token, as long as they
> > authenticate
> > > via a non delegation token mechanism.
> > >
> > > Aren't classes listed here
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka#KIP-48DelegationtokensupportforKafka-PublicInterfaces
> > > >
> > > sufficient?
> > >
> > > Thanks
> > > Parth
> > >
> > >
> > >
> > > On Tue, Jun 21, 2016 at 4:33 PM, Jun Rao  wrote:
> > >
> > > > Parth,
> > > >
> > > > Thanks for the reply. A couple of comments inline below.
> > > >
> > > > On Tue, Jun 21, 2016 at 10:36 AM, parth brahmbhatt <
> > > > brahmbhatt.pa...@gmail.com> wrote:
> > > >
> > > > > 1. Who / how are tokens renewed? By original requester only? or
> using
> > > > > Kerberos
> > > > > auth only?
> > > > > My recommendation is to do this only using Kerberos auth and only
> > threw
> > > > the
> > > > > renewer specified during the acquisition request.
> > > > >
> > > > >
> > > > Hmm, not sure that I follow this. Are you saying that any client
> > > > authenticated with the delegation token can renew, i.e. there is no
> > > renewer
> > > > needed?
> > > >
> > > > Also, just to be clear, any authenticated client (either through SASL
> > or
> > > > SSL) can request a delegation token for the authenticated user,
> right?
> > > >
> > > >
> > > > > 2. Are tokens stored on each broker or in ZK?
> > > > > My recommendation is still to store in ZK or not store them at all.
> > The
> > > > > whole controller based distribution is too much overhead with not
> > much
> > > to
> > > > > achieve.
> > > > >
> > > > > 3. How are tokens invalidated / expired?
> > > > > Either by expiration time out or through an explicit request to
> > > > invalidate.
> > > > >
> > > > > 4. Which encryption algorithm is used?
> > > > > SCRAM
> > > > >
> > > > > 5. What is the impersonation proposal (it wasn't in the KIP but was
> > > > > discussed
> > > > > in this thread)?
> > > > > There is no imperonation proposal. I tried and explained how its a
> > > > > different problem and why its not really necessary to discuss that
> as
> > > > part
> > > > > of this KIP.  This KIP will not support any impersonation, it will
> > just
> > > > be
> > > > > another way to authenticate.
> > > > >
> > > > > 6. Do we need new ACLs, if so - for what actions?
> > > > > We do not need new ACLs.
> > > > >
> > > > >
> > > > Could we document the format of the new request/response and their
> > > > associated Resource and Operation for ACL?
> > > >
> > > >
> > > > > 7. How would the delegation token be configured in the client?
> > > > > Should be through config. I wasn't planning on supporting JAAS for
> > > > tokens.
> > > > > I don't believe hadoop does this either.
> > > > >
> > > > > Thanks
> > > > > Parth
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jun 16, 2016 at 4:03 PM, Jun Rao  wrote:
> > > > >
> > > > > > Harsha,
> > > > > >
> > > > > > Another question.
> > > > > >
> > > > > > 9. How would the del

[jira] [Assigned] (KAFKA-3855) Guard race conditions in TopologyBuilder

2016-07-18 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy reassigned KAFKA-3855:
-

Assignee: Damian Guy

> Guard race conditions in TopologyBuilder
> 
>
> Key: KAFKA-3855
> URL: https://issues.apache.org/jira/browse/KAFKA-3855
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: architecture
>
> The user specified {{TopologyBuilder}} is shared among all stream threads to 
> build the processor topology instance, one for each thread. It public 
> functions that can be accessed by the threads is not synchronized, and we 
> need to double check if it could cause race conditions, and if yes guard 
> against these concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3855) Guard race conditions in TopologyBuilder

2016-07-18 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3855 started by Damian Guy.
-
> Guard race conditions in TopologyBuilder
> 
>
> Key: KAFKA-3855
> URL: https://issues.apache.org/jira/browse/KAFKA-3855
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: architecture
>
> The user specified {{TopologyBuilder}} is shared among all stream threads to 
> build the processor topology instance, one for each thread. It public 
> functions that can be accessed by the threads is not synchronized, and we 
> need to double check if it could cause race conditions, and if yes guard 
> against these concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1632: MINOR: MetadataCache brokerId is not set on first ...

2016-07-18 Thread granthenke
GitHub user granthenke opened a pull request:

https://github.com/apache/kafka/pull/1632

MINOR: MetadataCache brokerId is not set on first run with generated …

…broker id

This is because the id passed into the MetadataCache is the value from the 
config before the real broker id is generated.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/granthenke/kafka metadata-id

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1632.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1632


commit 0459f23798231b38c486df5d42aa56319e5a066e
Author: Grant Henke 
Date:   2016-04-30T05:51:29Z

MINOR: MetadataCache brokerId is not set on first run with generated broker 
id

This is because the id passed into the MetadataCache is the value from the 
config before the real broker id is generated.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3969) kafka.admin.ConsumerGroupCommand doesn't show consumer groups

2016-07-18 Thread Dieter Plaetinck (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382618#comment-15382618
 ] 

Dieter Plaetinck commented on KAFKA-3969:
-

thanks that works. seems like the "checking consumer position" paragraph should 
either be explained better, or just merged into "Managing Consumer Groups" 
since that covers pretty much the same thing.

> kafka.admin.ConsumerGroupCommand doesn't show consumer groups
> -
>
> Key: KAFKA-3969
> URL: https://issues.apache.org/jira/browse/KAFKA-3969
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Dieter Plaetinck
>
> http://kafka.apache.org/documentation.html , at 
> http://kafka.apache.org/documentation.html#basic_ops_consumer_lag says 
> " Note, however, after 0.9.0, the kafka.tools.ConsumerOffsetChecker tool is 
> deprecated and you should use the kafka.admin.ConsumerGroupCommand (or the 
> bin/kafka-consumer-groups.sh script) to manage consumer groups, including 
> consumers created with the new consumer API."
> I'm sure that i have a consumer running, because i wrote an app that is 
> processing data, and i can see the data as well as the metrics that confirm 
> it's receiving data. I'm using kafka 0.10
> Yet when I run the command as instructed, it doesn't list any consumer groups
> $ /opt/kafka_2.11-0.10.0.0/bin/kafka-run-class.sh 
> kafka.admin.ConsumerGroupCommand --zookeeper localhost:2181 --list
> $
> So either something is wrong with the tool, or with the docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Compacted topic cannot accept message without key

2016-07-18 Thread Dustin Cote
Compacted topics require keyed messages in order for compaction to
function.  The solution is to provide a key for your messages.  I would
suggest reading the wiki on log compaction.


On Mon, Jul 18, 2016 at 12:03 PM, Kafka  wrote:

> Hi,
> The server log shows error as belows on broker 0.9.0.
> ERROR [Replica Manager on Broker 0]: Error processing append
> operation on partition [__consumer_offsets,5] (kafka.server.ReplicaManager)
> kafka.message.InvalidMessageException: Compacted topic cannot accept
> message without key.
>
> Why does this happen and what’s the solution?
>
>
>


-- 
Dustin Cote
confluent.io


[jira] [Created] (KAFKA-3970) Message Listener(that can hit callback registered)

2016-07-18 Thread rajan verma (JIRA)
rajan verma created KAFKA-3970:
--

 Summary: Message Listener(that can hit callback registered)
 Key: KAFKA-3970
 URL: https://issues.apache.org/jira/browse/KAFKA-3970
 Project: Kafka
  Issue Type: New Feature
Reporter: rajan verma
Priority: Minor


At my current project I am writing a Message Listener for kafka.

-Message listener can register a callable (domain url , headers, schema of 
domain end point)

-Initially wrote Message listener as java Callable later used akka Futures (as 
akka promises to scale well)

-Attaching classes that i am using in my current project.These are POC not 
production ready


Please review and let me know if this is a good idea as it looks promising 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3970) Message Listener(that can hit callback registered)

2016-07-18 Thread rajan verma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajan verma updated KAFKA-3970:
---
Attachment: KafkaCallable.java
KafkaAkkaConsumerQueueManager.java
ConsumerEntry.java

> Message Listener(that can hit callback registered)
> --
>
> Key: KAFKA-3970
> URL: https://issues.apache.org/jira/browse/KAFKA-3970
> Project: Kafka
>  Issue Type: New Feature
>Reporter: rajan verma
>Priority: Minor
> Attachments: ConsumerEntry.java, KafkaAkkaConsumerQueueManager.java, 
> KafkaCallable.java
>
>
> At my current project I am writing a Message Listener for kafka.
> -Message listener can register a callable (domain url , headers, schema of 
> domain end point)
> -Initially wrote Message listener as java Callable later used akka Futures 
> (as akka promises to scale well)
> -Attaching classes that i am using in my current project.These are POC not 
> production ready
> Please review and let me know if this is a good idea as it looks promising 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1633: KAFKA-3855: Guard race conditions in TopologyBuild...

2016-07-18 Thread dguy
GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/1633

KAFKA-3855: Guard race conditions in TopologyBuilder

Mark all public `TopologyBuilder` methods as synchronized as they can 
modify data-structures and these methods could be called from multiple threads

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kafka-3855

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1633.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1633


commit 5178325457e09eb65a042cea3339cbb0c860beb1
Author: Damian Guy 
Date:   2016-07-18T16:55:48Z

mark all public TopologyBuilder methods as synchronized as they can modify 
data-structures and these methods could be called from multiple threads




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3855) Guard race conditions in TopologyBuilder

2016-07-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382744#comment-15382744
 ] 

ASF GitHub Bot commented on KAFKA-3855:
---

GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/1633

KAFKA-3855: Guard race conditions in TopologyBuilder

Mark all public `TopologyBuilder` methods as synchronized as they can 
modify data-structures and these methods could be called from multiple threads

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kafka-3855

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1633.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1633


commit 5178325457e09eb65a042cea3339cbb0c860beb1
Author: Damian Guy 
Date:   2016-07-18T16:55:48Z

mark all public TopologyBuilder methods as synchronized as they can modify 
data-structures and these methods could be called from multiple threads




> Guard race conditions in TopologyBuilder
> 
>
> Key: KAFKA-3855
> URL: https://issues.apache.org/jira/browse/KAFKA-3855
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: architecture
>
> The user specified {{TopologyBuilder}} is shared among all stream threads to 
> build the processor topology instance, one for each thread. It public 
> functions that can be accessed by the threads is not synchronized, and we 
> need to double check if it could cause race conditions, and if yes guard 
> against these concurrent access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3971) Consumers drop from coordinator and cannot reconnet

2016-07-18 Thread Lei Wang (JIRA)
Lei Wang created KAFKA-3971:
---

 Summary: Consumers drop from coordinator and cannot reconnet
 Key: KAFKA-3971
 URL: https://issues.apache.org/jira/browse/KAFKA-3971
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.9.0.1
 Environment: version 0.9.0.1
Reporter: Lei Wang


>From time to time, we're creating new topics, and all consumers will pickup 
>those new topics. When starting to consume from these new topics, we often see 
>some of random consumers cannot connect to the coordinator. The log will be 
>flushed with the following log message tens of thousands every second:
{noformat}
16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.005 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
{noformat}

the servers seem working fine, and other consumers are also happy.

from the log, looks like it's retrying multiple times every millisecond but all 
failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3924) Data loss due to halting when LEO is larger than leader's LEO

2016-07-18 Thread Maysam Yabandeh (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maysam Yabandeh updated KAFKA-3924:
---
Description: 
Currently the follower broker panics when its LEO is larger than its leader's 
LEO,  and assuming that this is an impossible state to reach, halts the process 
to prevent any further damage.
{code}
if (leaderEndOffset < replica.logEndOffset.messageOffset) {
  // Prior to truncating the follower's log, ensure that doing so is not 
disallowed by the configuration for unclean leader election.
  // This situation could only happen if the unclean election configuration 
for a topic changes while a replica is down. Otherwise,
  // we should never encounter this situation since a non-ISR leader cannot 
be elected if disallowed by the broker configuration.
  if (!LogConfig.fromProps(brokerConfig.originals, 
AdminUtils.fetchEntityConfig(replicaMgr.zkUtils,
ConfigType.Topic, 
topicAndPartition.topic)).uncleanLeaderElectionEnable) {
// Log a fatal error and shutdown the broker to ensure that data loss 
does not unexpectedly occur.
fatal("...")
Runtime.getRuntime.halt(1)
  }
{code}

Firstly this assumption is invalid and there are legitimate cases (examples 
below) that this case could actually occur. Secondly halt results into the 
broker losing its un-flushed data, and if multiple brokers halt simultaneously 
there is a chance that both leader and followers of a partition are among the 
halted brokers, which would result into permanent data loss.

Given that this is a legit case, we suggest to replace it with a graceful 
shutdown to avoid propagating data loss to the entire cluster.

Details:
One legit case that this could actually occur is when a troubled broker shrinks 
its partitions right before crashing (KAFKA-3410 and KAFKA-3861). In this case 
the broker has lost some data but the controller cannot still elects the others 
as the leader. If the crashed broker comes back up, the controller elects it as 
the leader, and as a result all other brokers who are now following it halt 
since they have LEOs larger than that of shrunk topics in the restarted broker. 
 We actually had a case that bringing up a crashed broker simultaneously took 
down the entire cluster and as explained above this could result into data loss.

The other legit case is when multiple brokers ungracefully shutdown at the same 
time. In this case both of the leader and the followers lose their un-flushed 
data but one of them has HW larger than the other. Controller elects the one 
who comes back up sooner as the leader and if its LEO is less than its future 
follower, the follower will halt (and probably lose more data). Simultaneous 
ungrateful shutdown could happen due to hardware issue (e.g., rack power 
failure), operator errors, or software issue (e.g., the case above that is 
further explained in KAFKA-3410 and KAFKA-3861 and causes simultaneous halts in 
multiple brokers)

  was:
Currently the follower broker panics when its LEO is less than its leader's 
LEO,  and assuming that this is an impossible state to reach, halts the process 
to prevent any further damage.
{code}
if (leaderEndOffset < replica.logEndOffset.messageOffset) {
  // Prior to truncating the follower's log, ensure that doing so is not 
disallowed by the configuration for unclean leader election.
  // This situation could only happen if the unclean election configuration 
for a topic changes while a replica is down. Otherwise,
  // we should never encounter this situation since a non-ISR leader cannot 
be elected if disallowed by the broker configuration.
  if (!LogConfig.fromProps(brokerConfig.originals, 
AdminUtils.fetchEntityConfig(replicaMgr.zkUtils,
ConfigType.Topic, 
topicAndPartition.topic)).uncleanLeaderElectionEnable) {
// Log a fatal error and shutdown the broker to ensure that data loss 
does not unexpectedly occur.
fatal("...")
Runtime.getRuntime.halt(1)
  }
{code}

Firstly this assumption is invalid and there are legitimate cases (examples 
below) that this case could actually occur. Secondly halt results into the 
broker losing its un-flushed data, and if multiple brokers halt simultaneously 
there is a chance that both leader and followers of a partition are among the 
halted brokers, which would result into permanent data loss.

Given that this is a legit case, we suggest to replace it with a graceful 
shutdown to avoid propagating data loss to the entire cluster.

Details:
One legit case that this could actually occur is when a troubled broker shrinks 
its partitions right before crashing (KAFKA-3410 and KAFKA-3861). In this case 
the broker has lost some data but the controller cannot still elects the others 
as the leader. If the crashed broker comes back up, the controller elects it as 
the leader, and as a result all other 

[jira] [Assigned] (KAFKA-3782) Transient failure with kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_bounce.clean=True

2016-07-18 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson reassigned KAFKA-3782:
--

Assignee: Jason Gustafson  (was: Ewen Cheslack-Postava)

> Transient failure with 
> kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_bounce.clean=True
> -
>
> Key: KAFKA-3782
> URL: https://issues.apache.org/jira/browse/KAFKA-3782
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Liquan Pei
>Assignee: Jason Gustafson
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> For commit 946ae60
> max() arg is an empty sequence
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 106, in run_all_tests
> data = self.run_single_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/tests/runner.py",
>  line 162, in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.5.1-py2.7.egg/ducktape/mark/_mark.py",
>  line 331, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.0/kafka/tests/kafkatest/tests/connect/connect_distributed_test.py",
>  line 321, in test_bounce
> sink_seqno_max = max(sink_seqnos)
> ValueError: max() arg is an empty sequence



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3971) Consumers drop from coordinator and cannot reconnet

2016-07-18 Thread Lei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Wang updated KAFKA-3971:

Description: 
>From time to time, we're creating new topics, and all consumers will pickup 
>those new topics. When starting to consume from these new topics, we often see 
>some of random consumers cannot connect to the coordinator. The log will be 
>flushed with the following log message tens of thousands every second:
{noformat}
16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.005 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
{noformat}

the servers seem working fine, and other consumers are also happy.

from the log, looks like it's retrying multiple times every millisecond but all 
failing.

the same process are consuming from many topics, some of them are still working 
well, but those random topics will fail.

  was:
>From time to time, we're creating new topics, and all consumers will pickup 
>those new topics. When starting to consume from these new topics, we often see 
>some of random consumers cannot connect to the coordinator. The log will be 
>flushed with the following log message tens of thousands every second:
{noformat}
16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
16/07/18 18:18:36.005 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483645 dead.
{noformat}

the servers seem working fine, and other consumers are also happy.

from the log, looks like it's retrying multiple times every millisecond but all 
failing.


> Consumers drop from coordinator and cannot reconnet
> ---
>
> Key: KAFKA-3971
> URL: https://issues.apache.org/jira/browse/KAFKA-3971
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: version 0.9.0.1
>Reporter: Lei Wang
>
> From time to time, we're creating new topics, and all consumers will pickup 
> those new topics. When starting to consume from these new topics, we often 
> see some of random consumers cannot connect to the coordinator. The log will 
> be flushed with the following log message tens of thousands every second:
> {noformat}
> 16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:1

[jira] [Updated] (KAFKA-3971) Consumers drop from coordinator and cannot reconnet

2016-07-18 Thread Lei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Wang updated KAFKA-3971:

Attachment: KAFKA-3971.txt

stacktrace of the process

> Consumers drop from coordinator and cannot reconnet
> ---
>
> Key: KAFKA-3971
> URL: https://issues.apache.org/jira/browse/KAFKA-3971
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: version 0.9.0.1
>Reporter: Lei Wang
> Attachments: KAFKA-3971.txt
>
>
> From time to time, we're creating new topics, and all consumers will pickup 
> those new topics. When starting to consume from these new topics, we often 
> see some of random consumers cannot connect to the coordinator. The log will 
> be flushed with the following log message tens of thousands every second:
> {noformat}
> 16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.005 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> {noformat}
> the servers seem working fine, and other consumers are also happy.
> from the log, looks like it's retrying multiple times every millisecond but 
> all failing.
> the same process are consuming from many topics, some of them are still 
> working well, but those random topics will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3971) Consumers drop from coordinator and cannot reconnet

2016-07-18 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382833#comment-15382833
 ] 

Lei Wang commented on KAFKA-3971:
-

consumer config:
{noformat}
Properties props = new Properties();
props.put("bootstrap.servers", "");
String me = clientId + UID;
props.put("group.id", me);
props.put("client.id", me);
props.put("auto.offset.reset", "earliest");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "3");
props.put("key.deserializer", 
"org.apache.kafka.common.serialization.LongDeserializer");
props.put("value.deserializer", 
"org.apache.kafka.common.serialization.ByteArrayDeserializer");

consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList(topic));
{noformat}

> Consumers drop from coordinator and cannot reconnet
> ---
>
> Key: KAFKA-3971
> URL: https://issues.apache.org/jira/browse/KAFKA-3971
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: version 0.9.0.1
>Reporter: Lei Wang
> Attachments: KAFKA-3971.txt
>
>
> From time to time, we're creating new topics, and all consumers will pickup 
> those new topics. When starting to consume from these new topics, we often 
> see some of random consumers cannot connect to the coordinator. The log will 
> be flushed with the following log message tens of thousands every second:
> {noformat}
> 16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.005 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> {noformat}
> the servers seem working fine, and other consumers are also happy.
> from the log, looks like it's retrying multiple times every millisecond but 
> all failing.
> the same process are consuming from many topics, some of them are still 
> working well, but those random topics will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3971) Consumers drop from coordinator and cannot reconnet

2016-07-18 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382840#comment-15382840
 ] 

Lei Wang commented on KAFKA-3971:
-

for different bad hosts, the coordinator id are different:
{noformat}
16/07/18 19:00:45.960 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483641 dead.
16/07/18 19:00:45.961 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483641 dead.
16/07/18 19:00:45.961 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483641 dead.
16/07/18 19:00:45.961 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483641 dead.
16/07/18 19:00:45.961 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483641 dead.
16/07/18 19:00:45.961 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483641 dead.
16/07/18 19:00:45.962 INFO (AbstractCoordinator.java:529): Marking the 
coordinator 2147483641 dead.
{noformat}

> Consumers drop from coordinator and cannot reconnet
> ---
>
> Key: KAFKA-3971
> URL: https://issues.apache.org/jira/browse/KAFKA-3971
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1
> Environment: version 0.9.0.1
>Reporter: Lei Wang
> Attachments: KAFKA-3971.txt
>
>
> From time to time, we're creating new topics, and all consumers will pickup 
> those new topics. When starting to consume from these new topics, we often 
> see some of random consumers cannot connect to the coordinator. The log will 
> be flushed with the following log message tens of thousands every second:
> {noformat}
> 16/07/18 18:18:36.003 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.004 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> 16/07/18 18:18:36.005 INFO (AbstractCoordinator.java:529): Marking the 
> coordinator 2147483645 dead.
> {noformat}
> the servers seem working fine, and other consumers are also happy.
> from the log, looks like it's retrying multiple times every millisecond but 
> all failing.
> the same process are consuming from many topics, some of them are still 
> working well, but those random topics will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1526: KAFKA-3870: Expose state store names in DSL

2016-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1526


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-3870) Expose state store names to DSL

2016-07-18 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-3870.
--
Resolution: Fixed

Issue resolved by pull request 1526
[https://github.com/apache/kafka/pull/1526]

> Expose state store names to DSL
> ---
>
> Key: KAFKA-3870
> URL: https://issues.apache.org/jira/browse/KAFKA-3870
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
>Assignee: Eno Thereska
> Fix For: 0.10.1.0
>
>
> Currently in Kafka Streams the state store names are hidden in the DSL. This 
> JIRA proposes to make them visible to the DSL, so that developers can 
> subsequently query them. Sequence of steps:
> - Explicitly require names for all operations that create a state store, like 
> {{aggregate}}.
> - Explicitly require names for all created KTables (thus their changelog 
> state stores)
> - Materialize all KTables (today some KTables are materialized and some are 
> not)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3870) Expose state store names to DSL

2016-07-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382875#comment-15382875
 ] 

ASF GitHub Bot commented on KAFKA-3870:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1526


> Expose state store names to DSL
> ---
>
> Key: KAFKA-3870
> URL: https://issues.apache.org/jira/browse/KAFKA-3870
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
>Assignee: Eno Thereska
> Fix For: 0.10.1.0
>
>
> Currently in Kafka Streams the state store names are hidden in the DSL. This 
> JIRA proposes to make them visible to the DSL, so that developers can 
> subsequently query them. Sequence of steps:
> - Explicitly require names for all operations that create a state store, like 
> {{aggregate}}.
> - Explicitly require names for all created KTables (thus their changelog 
> state stores)
> - Materialize all KTables (today some KTables are materialized and some are 
> not)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1621: MINOR: Added simple streams benchmark to system te...

2016-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1621


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3894) Log Cleaner thread crashes and never restarts

2016-07-18 Thread Vincent Rischmann (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382917#comment-15382917
 ] 

Vincent Rischmann commented on KAFKA-3894:
--

Well that doesn't work, in fact I just realized that the log cleaner threads 
all died on the migrated brokers. So yep, still need to find a workaround or 
wait for a fix. How did you manage to cleanup the logs [~davispw] ?

> Log Cleaner thread crashes and never restarts
> -
>
> Key: KAFKA-3894
> URL: https://issues.apache.org/jira/browse/KAFKA-3894
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.2, 0.9.0.1
> Environment: Oracle JDK 8
> Ubuntu Precise
>Reporter: Tim Carey-Smith
>  Labels: compaction
>
> The log-cleaner thread can crash if the number of keys in a topic grows to be 
> too large to fit into the dedupe buffer. 
> The result of this is a log line: 
> {quote}
> broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner 
> \[kafka-log-cleaner-thread-0\], Error due to  
> java.lang.IllegalArgumentException: requirement failed: 9750860 messages in 
> segment MY_FAVORITE_TOPIC-2/47580165.log but offset map can fit 
> only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease 
> log.cleaner.threads
> {quote}
> As a result, the broker is left in a potentially dangerous situation where 
> cleaning of compacted topics is not running. 
> It is unclear if the broader strategy for the {{LogCleaner}} is the reason 
> for this upper bound, or if this is a value which must be tuned for each 
> specific use-case. 
> Of more immediate concern is the fact that the thread crash is not visible 
> via JMX or exposed as some form of service degradation. 
> Some short-term remediations we have made are:
> * increasing the size of the dedupe buffer
> * monitoring the log-cleaner threads inside the JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3940) Log should check the return value of dir.mkdirs()

2016-07-18 Thread Jim Jagielski (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382927#comment-15382927
 ] 

Jim Jagielski commented on KAFKA-3940:
--

Let me know if I can add myself to the assignee list. Thx

> Log should check the return value of dir.mkdirs()
> -
>
> Key: KAFKA-3940
> URL: https://issues.apache.org/jira/browse/KAFKA-3940
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.0.0
>Reporter: Jun Rao
>Assignee: Ishita Mandhan
>  Labels: newbie
>
> In Log.loadSegments(), we call dir.mkdirs() w/o checking the return value and 
> just assume the directory will exist after the call. However, if the 
> directory can't be created (e.g. due to no space), we will hit 
> NullPointerException in the next statement, which will be confusing.
>for(file <- dir.listFiles if file.isFile) {



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3972) kafka java consumer poll returns 0 records after seekToBeginning

2016-07-18 Thread don caldwell (JIRA)
don caldwell created KAFKA-3972:
---

 Summary: kafka java consumer poll returns 0 records after 
seekToBeginning
 Key: KAFKA-3972
 URL: https://issues.apache.org/jira/browse/KAFKA-3972
 Project: Kafka
  Issue Type: Task
  Components: consumer
Affects Versions: 0.10.0.0
 Environment: docker image elasticsearch:latest, kafka scala version 
2.11, kafka version 0.10.0.0
Reporter: don caldwell


kafkacat successfully returns rows for the topic, but the following java source 
reliably fails to produce rows. I have the suspicion that I am missing some 
simple thing in my setup, but I have been unable to find a way out. I am using 
the current docker and using docker network commands to connect the processes 
in my cluster. The properties are:
bootstrap.servers: kafka01:9092,kafka02:9092,kafka03:9092
group.id: dhcp1
topic: dhcp
enable.auto.commit: false
auto.commit.interval.ms: 1000
session.timeout.ms 3
key.deserializer: org.apache.kafka.common.serialization.StringDeserializer
value.deserializer: org.apache.kafka.common.serialization.StringDeserializer

the kafka consumer follows. One thing that I find curious is that, although I 
seem to successfully make the call to seekToBeginning(), when I print offsets 
on failure, I get large offsets for all partitions although I had expected them 
to be 0 or at least some small number.
Here is the code:

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;

import org.apache.kafka.common.errors.TimeoutException;
import org.apache.kafka.common.protocol.types.SchemaException;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.Node;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.TopicPartition;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.lang.Integer;
import java.lang.System;
import java.lang.Thread;
import java.lang.InterruptedException;
import java.util.Arrays;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.Properties;


public class KConsumer {
private Properties prop;
private String topic;
private Integer polln;
private KafkaConsumer consumer;
private String[] pna = {ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
ConsumerConfig.GROUP_ID_CONFIG,
ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,
ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,
ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG,
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG};

public KConsumer(String pf) throws FileNotFoundException,
IOException {
this.setProperties(pf);
this.newClient();
}

public void setProperties(String p) throws FileNotFoundException,
IOException {
this.prop = new Properties();
this.prop.load(new FileInputStream(p));
this.topic = this.prop.getProperty("topic");
this.polln = new Integer(this.prop.getProperty("polln"));
}

public void setTopic(String t) {
this.topic = t;
}

public String getTopic() {
return this.topic;
}

public void newClient() {
System.err.println("creating consumer");
Properties kp = new Properties();
for(String p : pna) {
String v = this.prop.getProperty(p);
if(v != null) {
kp.put(p, v);
}
}
//this.consumer = new KafkaConsumer<>(this.prop);
this.consumer = new KafkaConsumer<>(kp);
//this.consumer.subscribe(Collections.singletonList(this.topic));
System.err.println("subscribing to " + this.topic);
this.consumer.subscribe(Arrays.asList(this.topic));
//this.seekToBeginning();
}

public void close() {
this.consumer.close();
this.consumer = null;
}

public void seekToBeginning() {
if(this.topic == null) {
System.err.println("KConsumer: topic not set");
System.exit(1);
}
System.err.println("setting partition offset to beginning");
java.util.Set tps = this.consumer.assignment();
this.consumer.seekToBeginning(tps);
}

public ConsumerRecords nextBatch()
   throws KafkaException {
while(true) {
try {
System.err.printf("polling...");
ConsumerRecords records =
this.consumer.poll(this.polln);

System.err.println("returned");
System.err.printf("record count %d\n", records.count());

   

[GitHub] kafka pull request #1634: KAFKA-3924: Replacing halt with exit upon LEO mism...

2016-07-18 Thread maysamyabandeh
GitHub user maysamyabandeh opened a pull request:

https://github.com/apache/kafka/pull/1634

KAFKA-3924: Replacing halt with exit upon LEO mismatch to trigger gra…

…ceful shutdown

The patch is pretty simple and the justification is explained in 
https://issues.apache.org/jira/browse/KAFKA-3924

I could not find Andrew Olson, who seems to be the contributor of this part 
of the code, in github so I am not sure whom I should ask to review the patch.

 the contribution is my original work and that i license the work to the 
project under the project's open source license.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maysamyabandeh/kafka KAFKA-3924

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1634


commit cc59b8845219e21f150fa584581d066d5db1d9c6
Author: Maysam Yabandeh 
Date:   2016-07-18T20:41:00Z

KAFKA-3924: Replacing halt with exit upon LEO mismatch to trigger graceful 
shutdown




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3924) Data loss due to halting when LEO is larger than leader's LEO

2016-07-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383035#comment-15383035
 ] 

ASF GitHub Bot commented on KAFKA-3924:
---

GitHub user maysamyabandeh opened a pull request:

https://github.com/apache/kafka/pull/1634

KAFKA-3924: Replacing halt with exit upon LEO mismatch to trigger gra…

…ceful shutdown

The patch is pretty simple and the justification is explained in 
https://issues.apache.org/jira/browse/KAFKA-3924

I could not find Andrew Olson, who seems to be the contributor of this part 
of the code, in github so I am not sure whom I should ask to review the patch.

 the contribution is my original work and that i license the work to the 
project under the project's open source license.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maysamyabandeh/kafka KAFKA-3924

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1634


commit cc59b8845219e21f150fa584581d066d5db1d9c6
Author: Maysam Yabandeh 
Date:   2016-07-18T20:41:00Z

KAFKA-3924: Replacing halt with exit upon LEO mismatch to trigger graceful 
shutdown




> Data loss due to halting when LEO is larger than leader's LEO
> -
>
> Key: KAFKA-3924
> URL: https://issues.apache.org/jira/browse/KAFKA-3924
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Maysam Yabandeh
>
> Currently the follower broker panics when its LEO is larger than its leader's 
> LEO,  and assuming that this is an impossible state to reach, halts the 
> process to prevent any further damage.
> {code}
> if (leaderEndOffset < replica.logEndOffset.messageOffset) {
>   // Prior to truncating the follower's log, ensure that doing so is not 
> disallowed by the configuration for unclean leader election.
>   // This situation could only happen if the unclean election 
> configuration for a topic changes while a replica is down. Otherwise,
>   // we should never encounter this situation since a non-ISR leader 
> cannot be elected if disallowed by the broker configuration.
>   if (!LogConfig.fromProps(brokerConfig.originals, 
> AdminUtils.fetchEntityConfig(replicaMgr.zkUtils,
> ConfigType.Topic, 
> topicAndPartition.topic)).uncleanLeaderElectionEnable) {
> // Log a fatal error and shutdown the broker to ensure that data loss 
> does not unexpectedly occur.
> fatal("...")
> Runtime.getRuntime.halt(1)
>   }
> {code}
> Firstly this assumption is invalid and there are legitimate cases (examples 
> below) that this case could actually occur. Secondly halt results into the 
> broker losing its un-flushed data, and if multiple brokers halt 
> simultaneously there is a chance that both leader and followers of a 
> partition are among the halted brokers, which would result into permanent 
> data loss.
> Given that this is a legit case, we suggest to replace it with a graceful 
> shutdown to avoid propagating data loss to the entire cluster.
> Details:
> One legit case that this could actually occur is when a troubled broker 
> shrinks its partitions right before crashing (KAFKA-3410 and KAFKA-3861). In 
> this case the broker has lost some data but the controller cannot still 
> elects the others as the leader. If the crashed broker comes back up, the 
> controller elects it as the leader, and as a result all other brokers who are 
> now following it halt since they have LEOs larger than that of shrunk topics 
> in the restarted broker.  We actually had a case that bringing up a crashed 
> broker simultaneously took down the entire cluster and as explained above 
> this could result into data loss.
> The other legit case is when multiple brokers ungracefully shutdown at the 
> same time. In this case both of the leader and the followers lose their 
> un-flushed data but one of them has HW larger than the other. Controller 
> elects the one who comes back up sooner as the leader and if its LEO is less 
> than its future follower, the follower will halt (and probably lose more 
> data). Simultaneous ungrateful shutdown could happen due to hardware issue 
> (e.g., rack power failure), operator errors, or software issue (e.g., the 
> case above that is further explained in KAFKA-3410 and KAFKA-3861 and causes 
> simultaneous halts in multiple brokers)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3924) Data loss due to halting when LEO is larger than leader's LEO

2016-07-18 Thread Maysam Yabandeh (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maysam Yabandeh updated KAFKA-3924:
---
Status: Patch Available  (was: Open)

Submitting a simple patch that replaces halt with exit, which in turn will 
trigger the shutdown hook that gracefully shuts down the broker.

[~noslowerdna] You seem to be the most knowledgable about this part of the code 
but I did not manage to tag you on github pull request. Do you think you can 
review the patch? or perhaps redirect us to the some who could? Thanks.

> Data loss due to halting when LEO is larger than leader's LEO
> -
>
> Key: KAFKA-3924
> URL: https://issues.apache.org/jira/browse/KAFKA-3924
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Maysam Yabandeh
>
> Currently the follower broker panics when its LEO is larger than its leader's 
> LEO,  and assuming that this is an impossible state to reach, halts the 
> process to prevent any further damage.
> {code}
> if (leaderEndOffset < replica.logEndOffset.messageOffset) {
>   // Prior to truncating the follower's log, ensure that doing so is not 
> disallowed by the configuration for unclean leader election.
>   // This situation could only happen if the unclean election 
> configuration for a topic changes while a replica is down. Otherwise,
>   // we should never encounter this situation since a non-ISR leader 
> cannot be elected if disallowed by the broker configuration.
>   if (!LogConfig.fromProps(brokerConfig.originals, 
> AdminUtils.fetchEntityConfig(replicaMgr.zkUtils,
> ConfigType.Topic, 
> topicAndPartition.topic)).uncleanLeaderElectionEnable) {
> // Log a fatal error and shutdown the broker to ensure that data loss 
> does not unexpectedly occur.
> fatal("...")
> Runtime.getRuntime.halt(1)
>   }
> {code}
> Firstly this assumption is invalid and there are legitimate cases (examples 
> below) that this case could actually occur. Secondly halt results into the 
> broker losing its un-flushed data, and if multiple brokers halt 
> simultaneously there is a chance that both leader and followers of a 
> partition are among the halted brokers, which would result into permanent 
> data loss.
> Given that this is a legit case, we suggest to replace it with a graceful 
> shutdown to avoid propagating data loss to the entire cluster.
> Details:
> One legit case that this could actually occur is when a troubled broker 
> shrinks its partitions right before crashing (KAFKA-3410 and KAFKA-3861). In 
> this case the broker has lost some data but the controller cannot still 
> elects the others as the leader. If the crashed broker comes back up, the 
> controller elects it as the leader, and as a result all other brokers who are 
> now following it halt since they have LEOs larger than that of shrunk topics 
> in the restarted broker.  We actually had a case that bringing up a crashed 
> broker simultaneously took down the entire cluster and as explained above 
> this could result into data loss.
> The other legit case is when multiple brokers ungracefully shutdown at the 
> same time. In this case both of the leader and the followers lose their 
> un-flushed data but one of them has HW larger than the other. Controller 
> elects the one who comes back up sooner as the leader and if its LEO is less 
> than its future follower, the follower will halt (and probably lose more 
> data). Simultaneous ungrateful shutdown could happen due to hardware issue 
> (e.g., rack power failure), operator errors, or software issue (e.g., the 
> case above that is further explained in KAFKA-3410 and KAFKA-3861 and causes 
> simultaneous halts in multiple brokers)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #754

2016-07-18 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-3870: Expose state store names in DSL

[wangguoz] MINOR: Added simple streams benchmark to system tests

--
[...truncated 5317 lines...]

kafka.log.LogCleanerIntegrationTest > cleanerTest[0] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[1] STARTED

kafka.log.LogCleanerIntegrationTest > cleanerTest[1] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[2] STARTED

kafka.log.LogCleanerIntegrationTest > cleanerTest[2] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[3] STARTED

kafka.log.LogCleanerIntegrationTest > cleanerTest[3] PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingTopic STARTED

kafka.log.LogTest > testParseTopicPartitionNameForMissingTopic PASSED

kafka.log.LogTest > testIndexRebuild STARTED

kafka.log.LogTest > testIndexRebuild PASSED

kafka.log.LogTest > testLogRolls STARTED

kafka.log.LogTest > testLogRolls PASSED

kafka.log.LogTest > testMessageSizeCheck STARTED

kafka.log.LogTest > testMessageSizeCheck PASSED

kafka.log.LogTest > testAsyncDelete STARTED

kafka.log.LogTest > testAsyncDelete PASSED

kafka.log.LogTest > testReadOutOfRange STARTED

kafka.log.LogTest > testReadOutOfRange PASSED

kafka.log.LogTest > testAppendWithOutOfOrderOffsetsThrowsException STARTED

kafka.log.LogTest > testAppendWithOutOfOrderOffsetsThrowsException PASSED

kafka.log.LogTest > testReadAtLogGap STARTED

kafka.log.LogTest > testReadAtLogGap PASSED

kafka.log.LogTest > testTimeBasedLogRoll STARTED

kafka.log.LogTest > testTimeBasedLogRoll PASSED

kafka.log.LogTest > testLoadEmptyLog STARTED

kafka.log.LogTest > testLoadEmptyLog PASSED

kafka.log.LogTest > testMessageSetSizeCheck STARTED

kafka.log.LogTest > testMessageSetSizeCheck PASSED

kafka.log.LogTest > testIndexResizingAtTruncation STARTED

kafka.log.LogTest > testIndexResizingAtTruncation PASSED

kafka.log.LogTest > testCompactedTopicConstraints STARTED

kafka.log.LogTest > testCompactedTopicConstraints PASSED

kafka.log.LogTest > testThatGarbageCollectingSegmentsDoesntChangeOffset STARTED

kafka.log.LogTest > testThatGarbageCollectingSegmentsDoesntChangeOffset PASSED

kafka.log.LogTest > testAppendAndReadWithSequentialOffsets STARTED

kafka.log.LogTest > testAppendAndReadWithSequentialOffsets PASSED

kafka.log.LogTest > testDeleteOldSegmentsMethod STARTED

kafka.log.LogTest > testDeleteOldSegmentsMethod PASSED

kafka.log.LogTest > testParseTopicPartitionNameForNull STARTED

kafka.log.LogTest > testParseTopicPartitionNameForNull PASSED

kafka.log.LogTest > testAppendAndReadWithNonSequentialOffsets STARTED

kafka.log.LogTest > testAppendAndReadWithNonSequentialOffsets PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingSeparator STARTED

kafka.log.LogTest > testParseTopicPartitionNameForMissingSeparator PASSED

kafka.log.LogTest > testCorruptIndexRebuild STARTED

kafka.log.LogTest > testCorruptIndexRebuild PASSED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved STARTED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved PASSED

kafka.log.LogTest > testCompressedMessages STARTED

kafka.log.LogTest > testCompressedMessages PASSED

kafka.log.LogTest > testAppendMessageWithNullPayload STARTED

kafka.log.LogTest > testAppendMessageWithNullPayload PASSED

kafka.log.LogTest > testCorruptLog STARTED

kafka.log.LogTest > testCorruptLog PASSED

kafka.log.LogTest > testLogRecoversToCorrectOffset STARTED

kafka.log.LogTest > testLogRecoversToCorrectOffset PASSED

kafka.log.LogTest > testReopenThenTruncate STARTED

kafka.log.LogTest > testReopenThenTruncate PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition STARTED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition PASSED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName STARTED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName PASSED

kafka.log.LogTest > testOpenDeletesObsoleteFiles STARTED

kafka.log.LogTest > testOpenDeletesObsoleteFiles PASSED

kafka.log.LogTest > testSizeBasedLogRoll STARTED

kafka.log.LogTest > testSizeBasedLogRoll PASSED

kafka.log.LogTest > testTimeBasedLogRollJitter STARTED

kafka.log.LogTest > testTimeBasedLogRollJitter PASSED

kafka.log.LogTest > testParseTopicPartitionName STARTED

kafka.log.LogTest > testParseTopicPartitionName PASSED

kafka.log.LogTest > testTruncateTo STARTED

kafka.log.LogTest > testTruncateTo PASSED

kafka.log.LogTest > testCleanShutdownFile STARTED

kafka.log.LogTest > testCleanShutdownFile PASSED

kafka.log.LogConfigTest > testFromPropsEmpty STARTED

kafka.log.LogConfigTest > testFromPropsEmpty PASSED

kafka.log.LogConfigTest > testKafkaConfigToProps STARTED

kafka.log.LogConfigTest > testKafkaConfigToProps PASSED

kafka.log.LogConfigTest > testFromPropsInvalid STARTED

kafka.log.LogConfigTest > testFromPropsInvalid PASSED

kafka.log.CleanerTest > testBuildOffsetMap STARTED

kafka.log.CleanerTest > testBuildOffs

[jira] [Commented] (KAFKA-3924) Data loss due to halting when LEO is larger than leader's LEO

2016-07-18 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383107#comment-15383107
 ] 

Andrew Olson commented on KAFKA-3924:
-

Reviewed, looks good to me.

> Data loss due to halting when LEO is larger than leader's LEO
> -
>
> Key: KAFKA-3924
> URL: https://issues.apache.org/jira/browse/KAFKA-3924
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Maysam Yabandeh
>
> Currently the follower broker panics when its LEO is larger than its leader's 
> LEO,  and assuming that this is an impossible state to reach, halts the 
> process to prevent any further damage.
> {code}
> if (leaderEndOffset < replica.logEndOffset.messageOffset) {
>   // Prior to truncating the follower's log, ensure that doing so is not 
> disallowed by the configuration for unclean leader election.
>   // This situation could only happen if the unclean election 
> configuration for a topic changes while a replica is down. Otherwise,
>   // we should never encounter this situation since a non-ISR leader 
> cannot be elected if disallowed by the broker configuration.
>   if (!LogConfig.fromProps(brokerConfig.originals, 
> AdminUtils.fetchEntityConfig(replicaMgr.zkUtils,
> ConfigType.Topic, 
> topicAndPartition.topic)).uncleanLeaderElectionEnable) {
> // Log a fatal error and shutdown the broker to ensure that data loss 
> does not unexpectedly occur.
> fatal("...")
> Runtime.getRuntime.halt(1)
>   }
> {code}
> Firstly this assumption is invalid and there are legitimate cases (examples 
> below) that this case could actually occur. Secondly halt results into the 
> broker losing its un-flushed data, and if multiple brokers halt 
> simultaneously there is a chance that both leader and followers of a 
> partition are among the halted brokers, which would result into permanent 
> data loss.
> Given that this is a legit case, we suggest to replace it with a graceful 
> shutdown to avoid propagating data loss to the entire cluster.
> Details:
> One legit case that this could actually occur is when a troubled broker 
> shrinks its partitions right before crashing (KAFKA-3410 and KAFKA-3861). In 
> this case the broker has lost some data but the controller cannot still 
> elects the others as the leader. If the crashed broker comes back up, the 
> controller elects it as the leader, and as a result all other brokers who are 
> now following it halt since they have LEOs larger than that of shrunk topics 
> in the restarted broker.  We actually had a case that bringing up a crashed 
> broker simultaneously took down the entire cluster and as explained above 
> this could result into data loss.
> The other legit case is when multiple brokers ungracefully shutdown at the 
> same time. In this case both of the leader and the followers lose their 
> un-flushed data but one of them has HW larger than the other. Controller 
> elects the one who comes back up sooner as the leader and if its LEO is less 
> than its future follower, the follower will halt (and probably lose more 
> data). Simultaneous ungrateful shutdown could happen due to hardware issue 
> (e.g., rack power failure), operator errors, or software issue (e.g., the 
> case above that is further explained in KAFKA-3410 and KAFKA-3861 and causes 
> simultaneous halts in multiple brokers)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2507) Replace ControlledShutdown{Request,Response} with org.apache.kafka.common.requests equivalent

2016-07-18 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-2507:
---
Fix Version/s: 0.11.0.0

> Replace ControlledShutdown{Request,Response} with 
> org.apache.kafka.common.requests equivalent
> -
>
> Key: KAFKA-2507
> URL: https://issues.apache.org/jira/browse/KAFKA-2507
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ismael Juma
>Assignee: Grant Henke
> Fix For: 0.11.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3777) Extract the LRU cache out of RocksDBStore

2016-07-18 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska updated KAFKA-3777:

Assignee: Anna Povzner

> Extract the LRU cache out of RocksDBStore
> -
>
> Key: KAFKA-3777
> URL: https://issues.apache.org/jira/browse/KAFKA-3777
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
>Assignee: Anna Povzner
> Fix For: 0.10.1.0
>
>
> The LRU cache that is currently inside the RocksDbStore class. As part of 
> KAFKA-3776 it needs to come outside of RocksDbStore and be a separate 
> component used in:
> 1. KGroupedStream.aggregate() / reduce(), 
> 2. KStream.aggregateByKey() / reduceByKey(),
> 3. KTable.to() (this will be done in KAFKA-3779).
> As all of the above operators can have a cache on top to deduplicate the 
> materialized state store in RocksDB.
> The scope of this JIRA is to extract out the cache of RocksDBStore, and keep 
> them as item 1) and 2) above; and it should be done together / after 
> KAFKA-3780.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3973) Investigate feasibility of caching bytes vs. records

2016-07-18 Thread Eno Thereska (JIRA)
Eno Thereska created KAFKA-3973:
---

 Summary: Investigate feasibility of caching bytes vs. records
 Key: KAFKA-3973
 URL: https://issues.apache.org/jira/browse/KAFKA-3973
 Project: Kafka
  Issue Type: Sub-task
  Components: streams
Reporter: Eno Thereska
Assignee: Bill Bejeck


Currently the cache stores and accounts for records, not bytes or objects. This 
investigation would be around measuring any performance overheads that come 
from storing bytes or objects. As an outcome we should know whether 1) we 
should store bytes or 2) we should store objects. 

If we store objects, the cache still needs to know their size (so that it can 
know if the object fits in the allocated cache space, e.g., if the cache is 
100MB and the object is 10MB, we'd have space for 10 such objects). The 
investigation needs to figure out how to find out the size of the object 
efficiently in Java.

If we store bytes, then we are serialising an object into bytes before caching 
it, i.e., we take a serialisation cost. The investigation needs measure how bad 
this cost can be especially for the case when all objects fit in cache (and 
thus any extra serialisation cost would show).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3778) Avoiding using range queries of RocksDBWindowStore on KStream windowed aggregations

2016-07-18 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska updated KAFKA-3778:

Summary: Avoiding using range queries of RocksDBWindowStore on KStream 
windowed aggregations  (was: Avoidin using range queries of RocksDBWindowStore 
on KStream windowed aggregations)

> Avoiding using range queries of RocksDBWindowStore on KStream windowed 
> aggregations
> ---
>
> Key: KAFKA-3778
> URL: https://issues.apache.org/jira/browse/KAFKA-3778
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
> Fix For: 0.10.1.0
>
>
> RocksDbWindowStore currently does not use caches, but its window segments 
> implemented as RocksDbStore does. However, its range query {{fetch(key, 
> fromTime, toTime)}} will cause all its touched segments' cache to be flushed. 
> After KAFKA-3777, we should change its implementation for 
> KStreamWindowAggregation / KStreamWindowReduce to not use {{fetch}}, but just 
> as multiple {{get}} calls on the underlying segments, one for each affected 
> window range.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3777) Extract the existing LRU cache out of RocksDBStore

2016-07-18 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska updated KAFKA-3777:

Summary: Extract the existing LRU cache out of RocksDBStore  (was: Extract 
the LRU cache out of RocksDBStore)

> Extract the existing LRU cache out of RocksDBStore
> --
>
> Key: KAFKA-3777
> URL: https://issues.apache.org/jira/browse/KAFKA-3777
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
>Assignee: Anna Povzner
> Fix For: 0.10.1.0
>
>
> The LRU cache that is currently inside the RocksDbStore class. As part of 
> KAFKA-3776 it needs to come outside of RocksDbStore and be a separate 
> component used in:
> 1. KGroupedStream.aggregate() / reduce(), 
> 2. KStream.aggregateByKey() / reduceByKey(),
> 3. KTable.to() (this will be done in KAFKA-3779).
> As all of the above operators can have a cache on top to deduplicate the 
> materialized state store in RocksDB.
> The scope of this JIRA is to extract out the cache of RocksDBStore, and keep 
> them as item 1) and 2) above; and it should be done together / after 
> KAFKA-3780.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3777) Extract the existing LRU cache out of RocksDBStore

2016-07-18 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska updated KAFKA-3777:

Description: 
The LRU cache that is currently inside the RocksDbStore class. As part of 
KAFKA-3776 it needs to come outside of RocksDbStore and be a separate component 
used in:

1. KGroupedStream.aggregate() / reduce(), 
2. KStream.aggregateByKey() / reduceByKey(),
3. KTable.to() (this will be done in KAFKA-3779).

As all of the above operators can have a cache on top to deduplicate the 
materialized state store in RocksDB.

The scope of this JIRA is to extract out the cache of RocksDBStore, and keep 
them as item 1) and 2) above; and it should be done together / after KAFKA-3780.

Note it is NOT in the scope of this JIRA to re-write the cache, so this will 
basically stay the same record-based cache we currently have.

  was:
The LRU cache that is currently inside the RocksDbStore class. As part of 
KAFKA-3776 it needs to come outside of RocksDbStore and be a separate component 
used in:

1. KGroupedStream.aggregate() / reduce(), 
2. KStream.aggregateByKey() / reduceByKey(),
3. KTable.to() (this will be done in KAFKA-3779).

As all of the above operators can have a cache on top to deduplicate the 
materialized state store in RocksDB.

The scope of this JIRA is to extract out the cache of RocksDBStore, and keep 
them as item 1) and 2) above; and it should be done together / after KAFKA-3780.


> Extract the existing LRU cache out of RocksDBStore
> --
>
> Key: KAFKA-3777
> URL: https://issues.apache.org/jira/browse/KAFKA-3777
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
>Assignee: Anna Povzner
> Fix For: 0.10.1.0
>
>
> The LRU cache that is currently inside the RocksDbStore class. As part of 
> KAFKA-3776 it needs to come outside of RocksDbStore and be a separate 
> component used in:
> 1. KGroupedStream.aggregate() / reduce(), 
> 2. KStream.aggregateByKey() / reduceByKey(),
> 3. KTable.to() (this will be done in KAFKA-3779).
> As all of the above operators can have a cache on top to deduplicate the 
> materialized state store in RocksDB.
> The scope of this JIRA is to extract out the cache of RocksDBStore, and keep 
> them as item 1) and 2) above; and it should be done together / after 
> KAFKA-3780.
> Note it is NOT in the scope of this JIRA to re-write the cache, so this will 
> basically stay the same record-based cache we currently have.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3974) LRU cache should store bytes/object and not records

2016-07-18 Thread Eno Thereska (JIRA)
Eno Thereska created KAFKA-3974:
---

 Summary: LRU cache should store bytes/object and not records
 Key: KAFKA-3974
 URL: https://issues.apache.org/jira/browse/KAFKA-3974
 Project: Kafka
  Issue Type: Sub-task
Reporter: Eno Thereska
Assignee: Bill Bejeck


After the investigation in KAFKA-3973, if the outcome is either bytes or 
objects, the actual LRU cache needs to be modified to store bytes or objects 
(instead of records). The cache will have a total byte size as an input 
parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3101) Optimize Aggregation Outputs

2016-07-18 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska resolved KAFKA-3101.
-
Resolution: Duplicate

This is a duplicate of KAFKA-3776.

> Optimize Aggregation Outputs
> 
>
> Key: KAFKA-3101
> URL: https://issues.apache.org/jira/browse/KAFKA-3101
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: architecture
> Fix For: 0.10.1.0
>
>
> Today we emit one output record for each incoming message for Table / 
> Windowed Stream Aggregations. For example, say we have a sequence of 
> aggregate outputs computed from the input stream (assuming there is no agg 
> value for this key before):
> V1, V2, V3, V4, V5
> Then the aggregator will output the following sequence of Change oldValue>:
> , , , , 
> where could cost a lot of CPU overhead computing the intermediate results. 
> Instead if we can let the underlying state store to "remember" the last 
> emitted old value, we can reduce the number of emits based on some configs. 
> More specifically, we can add one more field in the KV store engine storing 
> the last emitted old value, which only get updated when we emit to the 
> downstream processor. For example:
> At Beginning: 
> Store: key => empty (no agg values yet)
> V1 computed: 
> Update Both in Store: key => (V1, V1), Emit 
> V2 computed: 
> Update NewValue in Store: key => (V2, V1), No Emit
> V3 computed: 
> Update NewValue in Store: key => (V3, V1), No Emit
> V4 computed: 
> Update Both in Store: key => (V4, V4), Emit 
> V5 computed: 
> Update NewValue in Store: key => (V5, V4), No Emit
> One more thing to consider is that, we need a "closing" time control on the 
> not-yet-emitted keys; when some time has elapsed (or the window is to be 
> closed), we need to check for any key if their current materialized pairs 
> have not been emitted (for example  in the above example). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3101) Optimize Aggregation Outputs

2016-07-18 Thread Eno Thereska (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383256#comment-15383256
 ] 

Eno Thereska commented on KAFKA-3101:
-

Since this is a duplicate, will resolve.

> Optimize Aggregation Outputs
> 
>
> Key: KAFKA-3101
> URL: https://issues.apache.org/jira/browse/KAFKA-3101
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: architecture
> Fix For: 0.10.1.0
>
>
> Today we emit one output record for each incoming message for Table / 
> Windowed Stream Aggregations. For example, say we have a sequence of 
> aggregate outputs computed from the input stream (assuming there is no agg 
> value for this key before):
> V1, V2, V3, V4, V5
> Then the aggregator will output the following sequence of Change oldValue>:
> , , , , 
> where could cost a lot of CPU overhead computing the intermediate results. 
> Instead if we can let the underlying state store to "remember" the last 
> emitted old value, we can reduce the number of emits based on some configs. 
> More specifically, we can add one more field in the KV store engine storing 
> the last emitted old value, which only get updated when we emit to the 
> downstream processor. For example:
> At Beginning: 
> Store: key => empty (no agg values yet)
> V1 computed: 
> Update Both in Store: key => (V1, V1), Emit 
> V2 computed: 
> Update NewValue in Store: key => (V2, V1), No Emit
> V3 computed: 
> Update NewValue in Store: key => (V3, V1), No Emit
> V4 computed: 
> Update Both in Store: key => (V4, V4), Emit 
> V5 computed: 
> Update NewValue in Store: key => (V5, V4), No Emit
> One more thing to consider is that, we need a "closing" time control on the 
> not-yet-emitted keys; when some time has elapsed (or the window is to be 
> closed), we need to check for any key if their current materialized pairs 
> have not been emitted (for example  in the above example). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Compacted topic cannot accept message without key

2016-07-18 Thread Kafka
thanks for your answer,I know the necessity of key for compacted topics,and as 
you know,__consumer_offsets is a internal compacted topic in kafka,and it’s key 
is a triple of ,these errors are occurred when the 
consumer client wants to commit group offset.
so why does his happen?


> 在 2016年7月19日,上午1:27,Dustin Cote  写道:
> 
> Compacted topics require keyed messages in order for compaction to
> function.  The solution is to provide a key for your messages.  I would
> suggest reading the wiki on log compaction.
> 
> 
> On Mon, Jul 18, 2016 at 12:03 PM, Kafka  wrote:
> 
>> Hi,
>>The server log shows error as belows on broker 0.9.0.
>>ERROR [Replica Manager on Broker 0]: Error processing append
>> operation on partition [__consumer_offsets,5] (kafka.server.ReplicaManager)
>> kafka.message.InvalidMessageException: Compacted topic cannot accept
>> message without key.
>> 
>> Why does this happen and what’s the solution?
>> 
>> 
>> 
> 
> 
> -- 
> Dustin Cote
> confluent.io




Re: [DISCUSS] KIP-62: Allow consumer to send heartbeats from a background thread

2016-07-18 Thread Jason Gustafson
Hey Simon,

Sorry for the late response. The onPartitionsRevoked() hook is called
before the rebalance begins (that is, before the JoinGroup is sent) and is
intended to be used to flush uncommitted data and to commit corresponding
offsets. One of the main purposes of the KIP is to decouple the time
allowed to complete this from the session timeout, which now is used only
to detect failed or unreachable processes. This should give you more time
in onPartitionsRevoked() to cleanup existing state without sacrificing
failure detection. Does that make sense?

Thanks,
Jason

On Tue, Jul 12, 2016 at 2:57 AM, Simon Souter 
wrote:

>  Hi,
>
> An issue I have regarding rebalancing, is that a call to poll() triggers
> the JoinGroupRequest when rebalancing is in process.  In cases where a
> consumer is streaming more than a single batch at a time, there is no
> opportunity to attempt to flush any consumed batches through prior to the
> rebalance completing.  If onPartitionsRevoked would be called via a
> background thread, or an alive() call, there would be an opportunity for a
> client to hold off from calling poll, until downstream messages are flushed
> prior to calling poll again to trigger the Join and onPartitionsAssigned.
>
> The current assumption appears to be that a call to poll() indicates that
> there are no more in-flight messages.  Attempting to decouple consumer and
> processor threads or the streaming of multiple batches results in
> unavoidable redeliveries during a rebalance.
>
> Regards
>
> Simon Souter
>
> https://github.com/cakesolutions/scala-kafka-client
>
>
> --
>
> *Simon Souter*
>
> Software Engineer - Team Lead
> Cake Solutions Limited
>
>
> Find out more about The Art of Possible 
>
> Overview videos  - Check
> out our wide range of services
>
> Cake’s blog  - Read all about the
> exciting technical problems we are solving
>
> Twitter  - Keep up-to-date with white
> papers, events, user group updates and other snippets of wisdom
>
> T: 0845 6171200
>
> *T:* (from outside UK): +44 (0)161 443 2355
>
>
> *sim...@cakesolutions.net *
>
> www.cakesolutions.net
>
> Company registered in UK, No. 4184567
>
> If you have received this e-mail in error please accept our apologies,
> destroy it immediately and it would be greatly appreciated if you notified
> the sender. It is your responsibility to protect your system from viruses
> and any other harmful code or device. We try to eliminate them from e-mails
> and attachments; but we accept no liability for any which remain. We may
> monitor or access any or all e-mails sent to us.
>


[jira] [Commented] (KAFKA-3915) LogCleaner IO buffers do not account for potential size difference due to message format change

2016-07-18 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383610#comment-15383610
 ] 

James Cheng commented on KAFKA-3915:


We just saw this in one of our brokers as well. We are running Kafka 0.10.0.0.


> LogCleaner IO buffers do not account for potential size difference due to 
> message format change
> ---
>
> Key: KAFKA-3915
> URL: https://issues.apache.org/jira/browse/KAFKA-3915
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.0.0
>Reporter: Tommy Becker
>Assignee: Ismael Juma
>Priority: Blocker
> Fix For: 0.10.0.1
>
>
> We are upgrading from Kafka 0.8.1 to 0.10.0.0 and discovered an issue after 
> getting the following exception from the log cleaner:
> {code}
> [2016-06-28 10:02:18,759] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> java.nio.BufferOverflowException
>   at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:206)
>   at 
> kafka.message.ByteBufferMessageSet$.writeMessage(ByteBufferMessageSet.scala:169)
>   at kafka.log.Cleaner$$anonfun$cleanInto$1.apply(LogCleaner.scala:435)
>   at kafka.log.Cleaner$$anonfun$cleanInto$1.apply(LogCleaner.scala:429)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:30)
>   at kafka.log.Cleaner.cleanInto(LogCleaner.scala:429)
>   at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:380)
>   at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:376)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:376)
>   at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:343)
>   at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:342)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at kafka.log.Cleaner.clean(LogCleaner.scala:342)
>   at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:237)
>   at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:215)
>   at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> {code}
> At first this seems impossible because the input and output buffers are 
> identically sized. But in the case where the source messages are of an older 
> format, additional space may be required to write them out in the new one. 
> Since the message header is 8 bytes larger in 0.10.0, this failure can 
> happen. 
> We're planning to work around this by adding the following config:
> {code}log.message.format.version=0.8.1{code} but this definitely needs a fix.
> We could simply preserve the existing message format (since in this case we 
> can't retroactively add a timestamp anyway). Otherwise, the log cleaner would 
> have to be smarter about ensuring there is sufficient "slack space" in the 
> output buffer to account for the size difference * the number of messages in 
> the input buffer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)