[GitHub] kafka pull request #3406: KAFKA-5490: Retain empty batch for last sequence o...

2017-06-21 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/3406

KAFKA-5490: Retain empty batch for last sequence of each producer



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-5490

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3406.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3406


commit cf27cc1d69de90c513d96895ec2f557a49b2b3b6
Author: Jason Gustafson 
Date:   2017-06-21T23:55:36Z

KAFKA-5490: Retain empty batch for last sequence of each producer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-06-21 Thread Guozhang Wang
Thanks for the updated KIP, some more comments:

1.The config name is "default.deserialization.exception.handler" while the
interface class name is "RecordExceptionHandler", which is more general
than the intended purpose. Could we rename the class name accordingly?

2. Could you describe the full implementation of "DefaultExceptionHandler",
currently it is not clear to me how it is implemented with the configured
value.

In addition, I think we do not need to include an additional
"DEFAULT_DESERIALIZATION_EXCEPTION_RESPONSE_CONFIG" as the configure()
function is mainly used for users to pass any customized parameters that is
out of the Streams library; plus adding such additional config sounds
over-complicated for a default exception handler. Instead I'd suggest we
just provide two handlers (or three if people feel strong about the
LogAndThresholdExceptionHandler), one for FailOnExceptionHandler and one
for LogAndContinueOnExceptionHandler. And we can set
LogAndContinueOnExceptionHandler
by default.


Guozhang








On Wed, Jun 21, 2017 at 1:39 AM, Eno Thereska 
wrote:

> Thanks Guozhang,
>
> I’ve updated the KIP and hopefully addressed all the comments so far. In
> the process also changed the name of the KIP to reflect its scope better:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> deserialization+exception+handlers  confluence/display/KAFKA/KIP-161:+streams+deserialization+
> exception+handlers>
>
> Any other feedback appreciated, otherwise I’ll start the vote soon.
>
> Thanks
> Eno
>
> > On Jun 12, 2017, at 6:28 AM, Guozhang Wang  wrote:
> >
> > Eno, Thanks for bringing this proposal up and sorry for getting late on
> > this. Here are my two cents:
> >
> > 1. First some meta comments regarding "fail fast" v.s. "making
> progress". I
> > agree that in general we should better "enforce user to do the right
> thing"
> > in system design, but we also need to keep in mind that Kafka is a
> > multi-tenant system, i.e. from a Streams app's pov you probably would not
> > control the whole streaming processing pipeline end-to-end. E.g. Your
> input
> > data may not be controlled by yourself; it could be written by another
> app,
> > or another team in your company, or even a different organization, and if
> > an error happens maybe you cannot fix "to do the right thing" just by
> > yourself in time. In such an environment I think it is important to leave
> > the door open to let users be more resilient. So I find the current
> > proposal which does leave the door open for either fail-fast or make
> > progress quite reasonable.
> >
> > 2. On the other hand, if the question is whether we should provide a
> > built-in "send to bad queue" handler from the library, I think that might
> > be an overkill: with some tweaks (see my detailed comments below) on the
> > API we can allow users to implement such handlers pretty easily. In
> fact, I
> > feel even "LogAndThresholdExceptionHandler" is not necessary as a
> built-in
> > handler, as it would then require users to specify the threshold via
> > configs, etc. I think letting people provide such "eco-libraries" may be
> > better.
> >
> > 3. Regarding the CRC error: today we validate CRC on both the broker end
> > upon receiving produce requests and on consumer end upon receiving fetch
> > responses; and if the CRC validation fails in the former case it would
> not
> > be appended to the broker logs. So if we do see a CRC failure on the
> > consumer side it has to be that either we have a flipped bit on the
> broker
> > disks or over the wire. For the first case it is fatal while for the
> second
> > it is retriable. Unfortunately we cannot tell which case it is when
> seeing
> > CRC validation failures. But in either case, just skipping and making
> > progress seems not a good choice here, and hence I would personally
> exclude
> > these errors from the general serde errors to NOT leave the door open of
> > making progress.
> >
> > Currently such errors are thrown as KafkaException that wraps an
> > InvalidRecordException, which may be too general and we could consider
> just
> > throwing the InvalidRecordException directly. But that could be an
> > orthogonal discussion if we agrees that CRC failures should not be
> > considered in this KIP.
> >
> > 
> >
> > Now some detailed comments:
> >
> > 4. Could we consider adding the processor context in the handle()
> function
> > as well? This context will be wrapping as the source node that is about
> to
> > process the record. This could expose more info like which task / source
> > node sees this error, which timestamp of the message, etc, and also can
> > allow users to implement their handlers by exposing some metrics, by
> > calling context.forward() to implement the "send to bad queue" behavior
> etc.
> >
> > 5. Could you add the string name of
> > StreamsConfig.DEFAULT_RECORD_EXCEPTION_HANDLER as well in the KIP?
> > Personally I find "defaul

Re: [DISCUSS] KIP-167: Add a restoreAll method to StateRestoreCallback

2017-06-21 Thread Guozhang Wang
More specifically, if we can replace the first parameter from the String
store name to the store instance itself, would that be sufficient to cover `
StateRestoreNotification`?

On Wed, Jun 21, 2017 at 7:13 PM, Guozhang Wang  wrote:

> Bill,
>
> I'm wondering why we need the `StateRestoreNotification` while still
> having `StateRestoreListener`, could the above setup achievable just with
> `StateRestoreListener.onRestoreStart / onRestoreEnd`? I.e. it seems the
> later can subsume any use cases intended for the former API.
>
> Guozhang
>
> On Mon, Jun 19, 2017 at 3:23 PM, Bill Bejeck  wrote:
>
>> I'm going to update the KIP with new interface StateRestoreNotification
>> containing two methods, startRestore and endRestore.
>>
>> While naming is very similar to methods already proposed on the
>> StateRestoreListener, the intent of these methods is not for user
>> notification of restore status.  Instead these new methods are for
>> internal
>> use by the state store to perform any required setup and teardown work due
>> to a batch restoration process.
>>
>> Here's one current use case: when using RocksDB we should optimize for a
>> bulk load by setting Options.prepareForBulkload().
>>
>>1. If the database has already been opened, we'll need to close it, set
>>the "prepareForBulkload" and re-open the database.
>>2. Once the restore is completed we'll need to close and re-open the
>>database with the "prepareForBulkload" option turned off.
>>
>> While we are mentioning the RocksDB use case above, the addition of this
>> interface is not specific to any specific implementation of a persistent
>> state store.
>>
>> Additionally, a separate interface is needed so that any user can
>> implement
>> the state restore notification feature regardless of the state restore
>> callback used.
>>
>> I'll also remove the "getStateRestoreListener" method and stick with the
>> notion of a "global" restore listener for now.
>>
>> On Mon, Jun 19, 2017 at 1:05 PM, Bill Bejeck  wrote:
>>
>> > Yes it is, more of an oversight on my part, I'll remove it from the KIP.
>> >
>> >
>> > On Mon, Jun 19, 2017 at 12:48 PM, Matthias J. Sax <
>> matth...@confluent.io>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> I thinks for now it's good enough to start with a single global restore
>> >> listener. We can incrementally improve this later on if required. Of
>> >> course, if it's easy to do right away we can also be more fine grained.
>> >> But for KTable, we might want to add this after getting rid of all the
>> >> overloads we have atm.
>> >>
>> >> One question: what is the purpose of parameter "endOffset" in
>> >> #onRestoreEnd() -- isn't this the same value as provided in
>> >> #onRestoreStart() ?
>> >>
>> >>
>> >> -Matthias
>> >>
>> >>
>> >>
>> >> On 6/15/17 6:18 AM, Bill Bejeck wrote:
>> >> > Thinking about the custom StateRestoreListener approach and having a
>> get
>> >> > method on the interface will really only work for custom state
>> stores.
>> >> >
>> >> > So we'll need to provide another way for users to set behavior with
>> >> > provided state stores.  The only option that comes to mind now is
>> also
>> >> > adding a parameter to the StateStoreSupplier.
>> >> >
>> >> >
>> >> > Bill
>> >> >
>> >> >
>> >> > On Wed, Jun 14, 2017 at 5:39 PM, Bill Bejeck 
>> wrote:
>> >> >
>> >> >> Guozhang,
>> >> >>
>> >> >> Thanks for the comments.
>> >> >>
>> >> >> 1.  As for the granularity, I agree that having one global
>> >> >> StateRestoreListener could be restrictive.  But I think it's
>> important
>> >> to
>> >> >> have a "setStateRestoreListener" on KafkaStreams as this allows
>> users
>> >> to
>> >> >> define an anonymous instance that has access to local scope for
>> >> reporting
>> >> >> purposes.  This is a similar pattern we use for
>> >> >> KafkaStreams.setStateListener.
>> >> >>
>> >> >> As an alternative, what if we add a method to the
>> >> BatchingStateRestoreCallback
>> >> >> interface named "getStateStoreListener".   Then in an abstract
>> adapter
>> >> >> class we return null from getStateStoreListener.   But if users
>> want to
>> >> >> supply a different StateRestoreListener strategy per callback they
>> >> would
>> >> >> simply override the method to return an actual instance.
>> >> >>
>> >> >> WDYT?
>> >> >>
>> >> >> 2.  I'll make the required updates to pass in the ending offset at
>> the
>> >> >> start as well as the actual name of the state store.
>> >> >>
>> >> >> Bill
>> >> >>
>> >> >>
>> >> >> On Wed, Jun 14, 2017 at 3:53 PM, Guozhang Wang 
>> >> wrote:
>> >> >>
>> >> >>> Thanks Bill for the updated wiki. I have a couple of more comments:
>> >> >>>
>> >> >>> 1. Setting StateRestoreListener on the KafkaStreams granularity may
>> >> not be
>> >> >>> sufficient, as in the listener callback we do not which store it is
>> >> >>> restoring right now: if the topic is a changelog topic then from
>> the
>> >> >>> `TopicPartition` we may be able to infer the state store name, but
>> if
>> >> the
>> >> >>> topic is the

Re: [DISCUSS] KIP-167: Add a restoreAll method to StateRestoreCallback

2017-06-21 Thread Guozhang Wang
Bill,

I'm wondering why we need the `StateRestoreNotification` while still having
`StateRestoreListener`, could the above setup achievable just with
`StateRestoreListener.onRestoreStart / onRestoreEnd`? I.e. it seems the
later can subsume any use cases intended for the former API.

Guozhang

On Mon, Jun 19, 2017 at 3:23 PM, Bill Bejeck  wrote:

> I'm going to update the KIP with new interface StateRestoreNotification
> containing two methods, startRestore and endRestore.
>
> While naming is very similar to methods already proposed on the
> StateRestoreListener, the intent of these methods is not for user
> notification of restore status.  Instead these new methods are for internal
> use by the state store to perform any required setup and teardown work due
> to a batch restoration process.
>
> Here's one current use case: when using RocksDB we should optimize for a
> bulk load by setting Options.prepareForBulkload().
>
>1. If the database has already been opened, we'll need to close it, set
>the "prepareForBulkload" and re-open the database.
>2. Once the restore is completed we'll need to close and re-open the
>database with the "prepareForBulkload" option turned off.
>
> While we are mentioning the RocksDB use case above, the addition of this
> interface is not specific to any specific implementation of a persistent
> state store.
>
> Additionally, a separate interface is needed so that any user can implement
> the state restore notification feature regardless of the state restore
> callback used.
>
> I'll also remove the "getStateRestoreListener" method and stick with the
> notion of a "global" restore listener for now.
>
> On Mon, Jun 19, 2017 at 1:05 PM, Bill Bejeck  wrote:
>
> > Yes it is, more of an oversight on my part, I'll remove it from the KIP.
> >
> >
> > On Mon, Jun 19, 2017 at 12:48 PM, Matthias J. Sax  >
> > wrote:
> >
> >> Hi,
> >>
> >> I thinks for now it's good enough to start with a single global restore
> >> listener. We can incrementally improve this later on if required. Of
> >> course, if it's easy to do right away we can also be more fine grained.
> >> But for KTable, we might want to add this after getting rid of all the
> >> overloads we have atm.
> >>
> >> One question: what is the purpose of parameter "endOffset" in
> >> #onRestoreEnd() -- isn't this the same value as provided in
> >> #onRestoreStart() ?
> >>
> >>
> >> -Matthias
> >>
> >>
> >>
> >> On 6/15/17 6:18 AM, Bill Bejeck wrote:
> >> > Thinking about the custom StateRestoreListener approach and having a
> get
> >> > method on the interface will really only work for custom state stores.
> >> >
> >> > So we'll need to provide another way for users to set behavior with
> >> > provided state stores.  The only option that comes to mind now is also
> >> > adding a parameter to the StateStoreSupplier.
> >> >
> >> >
> >> > Bill
> >> >
> >> >
> >> > On Wed, Jun 14, 2017 at 5:39 PM, Bill Bejeck 
> wrote:
> >> >
> >> >> Guozhang,
> >> >>
> >> >> Thanks for the comments.
> >> >>
> >> >> 1.  As for the granularity, I agree that having one global
> >> >> StateRestoreListener could be restrictive.  But I think it's
> important
> >> to
> >> >> have a "setStateRestoreListener" on KafkaStreams as this allows users
> >> to
> >> >> define an anonymous instance that has access to local scope for
> >> reporting
> >> >> purposes.  This is a similar pattern we use for
> >> >> KafkaStreams.setStateListener.
> >> >>
> >> >> As an alternative, what if we add a method to the
> >> BatchingStateRestoreCallback
> >> >> interface named "getStateStoreListener".   Then in an abstract
> adapter
> >> >> class we return null from getStateStoreListener.   But if users want
> to
> >> >> supply a different StateRestoreListener strategy per callback they
> >> would
> >> >> simply override the method to return an actual instance.
> >> >>
> >> >> WDYT?
> >> >>
> >> >> 2.  I'll make the required updates to pass in the ending offset at
> the
> >> >> start as well as the actual name of the state store.
> >> >>
> >> >> Bill
> >> >>
> >> >>
> >> >> On Wed, Jun 14, 2017 at 3:53 PM, Guozhang Wang 
> >> wrote:
> >> >>
> >> >>> Thanks Bill for the updated wiki. I have a couple of more comments:
> >> >>>
> >> >>> 1. Setting StateRestoreListener on the KafkaStreams granularity may
> >> not be
> >> >>> sufficient, as in the listener callback we do not which store it is
> >> >>> restoring right now: if the topic is a changelog topic then from the
> >> >>> `TopicPartition` we may be able to infer the state store name, but
> if
> >> the
> >> >>> topic is the source topic read as a KTable then we may not know
> which
> >> >>> store
> >> >>> it is restoring right now; plus forcing users to infer the state
> store
> >> >>> name
> >> >>> from the topic partition name would not be intuitive as well. Plus
> for
> >> >>> different stores the listener may be implemented differently, and
> >> setting
> >> >>> a
> >> >>> global listener would force users to branch on the

Re: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread Garrett Barton
Getting good concurrency in a webapp is more than doable.  Check out these
benchmarks:
https://www.techempower.com/benchmarks/#section=data-r14&hw=ph&test=db
I linked to the single query one because thats closest to a single
operation like you will be doing.

I'd also note if the data delivery does not need to be guaranteed you could
go faster switching the web servers over to UDP and using async mode on the
kafka producers.

On Wed, Jun 21, 2017 at 2:23 PM, Tauzell, Dave  wrote:

> I’m not really familiar with Netty so I won’t be of much help.   Maybe try
> posting on a Netty forum to see what they think?
> -Dave
>
> From: SenthilKumar K [mailto:senthilec...@gmail.com]
> Sent: Wednesday, June 21, 2017 10:28 AM
> To: Tauzell, Dave
> Cc: us...@kafka.apache.org; senthilec...@apache.org; dev@kafka.apache.org
> Subject: Re: Handling 2 to 3 Million Events before Kafka
>
> So netty would work for this case ?  I do have netty server and seems to
> be i'm not getting the expected results .. here is the git
> https://github.com/senthilec566/netty4-server , is this right
> implementation ?
>
> Cheers,
> Senthil
>
> On Wed, Jun 21, 2017 at 7:45 PM, Tauzell, Dave <
> dave.tauz...@surescripts.com> wrote:
> I see.
>
> 1.   You don’t want the 100k machines sending directly to kafka.
>
> 2.   You can only have a small number of web servers
>
> People certainly have web-servers handling over 100k concurrent
> connections.  See this for some examples:  https://github.com/smallnest/
> C1000K-Servers .
>
> It seems possible with the right sort of kafka producer tuning.
>
> -Dave
>
> From: SenthilKumar K [mailto:senthilec...@gmail.com senthilec...@gmail.com>]
> Sent: Wednesday, June 21, 2017 8:55 AM
> To: Tauzell, Dave
> Cc: us...@kafka.apache.org;
> senthilec...@apache.org;
> dev@kafka.apache.org; Senthil kumar
> Subject: Re: Handling 2 to 3 Million Events before Kafka
>
> Thanks Jeyhun. Yes http server would be problematic here w.r.t network ,
> memory ..
>
> Hi Dave ,  The problem is not with Kafka , it's all about how do you
> handle huge data before kafka.  I did a simple test with 5 node Kafka
> Cluster which gives good result ( ~950 MB/s ) ..So Kafka side i dont see a
> scaling issue ...
>
> All we are trying is before kafka how do we handle messages from different
> servers ...  Webservers can send fast to kafka but still i can handle only
> 50k events per second which is less for my use case.. also i can't deploy
> 20 webservers to handle this load. I'm looking for an option what could be
> the best candidate before kafka , it should be super fast in getting all
> and send it to kafka producer ..
>
>
> --Senthil
>
> On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave <
> dave.tauz...@surescripts.com> wrote:
> What are your configurations?
>
> - production
> - brokers
> - consumers
>
> Is the problem that web servers cannot send to Kafka fast enough or your
> consumers cannot process messages off of kafka fast enough?
> What is the average size of these messages?
>
> -Dave
>
> -Original Message-
> From: SenthilKumar K [mailto:senthilec...@gmail.com senthilec...@gmail.com>]
> Sent: Wednesday, June 21, 2017 7:58 AM
> To: us...@kafka.apache.org
> Cc: senthilec...@apache.org; Senthil
> kumar; dev@kafka.apache.org
> Subject: Handling 2 to 3 Million Events before Kafka
>
> Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...
>
> I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
> is really good candidate for us to handle this ingestion rate ..
>
>
> 100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..
>
> I see the problem in Http Server where it can't handle beyond 50K events
> per instance ..  I'm thinking some other solution would be right choice
> before Kafka ..
>
> Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?
>
> --Senthil
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>
>
>


[GitHub] kafka pull request #3405: KAFKA-5495: Update docs to use `kafka-consumer-gro...

2017-06-21 Thread vahidhashemian
GitHub user vahidhashemian opened a pull request:

https://github.com/apache/kafka/pull/3405

KAFKA-5495: Update docs to use `kafka-consumer-groups.sh` for checking 
consumer offsets

And remove the deprecated `ConsumerOffsetChecker` example.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-5495

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3405


commit bd86f642603538cd0823b695421ba7b7b4862d9d
Author: Vahid Hashemian 
Date:   2017-06-20T20:56:47Z

KAFKA-5495: Update documentation to use `kafka-consumer-groups.sh` as the 
main tool for checking consumer offsets




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3387: MINOR: Update documentation to use `kafka-consumer...

2017-06-21 Thread vahidhashemian
Github user vahidhashemian closed the pull request at:

https://github.com/apache/kafka/pull/3387


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3375: KAFKA-5474: Streams StandbyTask should no checkpoi...

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3375


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5495) Replace the deprecated 'ConsumerOffsetChecker' in documentation

2017-06-21 Thread Vahid Hashemian (JIRA)
Vahid Hashemian created KAFKA-5495:
--

 Summary: Replace the deprecated 'ConsumerOffsetChecker' in 
documentation
 Key: KAFKA-5495
 URL: https://issues.apache.org/jira/browse/KAFKA-5495
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Reporter: Vahid Hashemian
Assignee: Vahid Hashemian
Priority: Minor
 Fix For: 0.11.1.0


Use {{kafka-consumer-groups.sh}} instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3404: KAFKA-5476: Implement a system test that creates n...

2017-06-21 Thread cmccabe
GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/3404

KAFKA-5476: Implement a system test that creates network partitions

KAFKA-5476: Implement a system test that creates network partitions

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-5476

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3404.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3404


commit 23b85ccf6c3195be56dbca918dd130de8c68504f
Author: Colin P. Mccabe 
Date:   2017-06-20T21:31:46Z

KAFKA-5484: Refactor kafkatest docker support

commit 5dd1672a906c55bca800927ce235a0b6989a0559
Author: Colin P. Mccabe 
Date:   2017-06-20T17:33:09Z

Add PartitionedProduceConsumeTest




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-168: Add TotalTopicCount metric per cluster

2017-06-21 Thread Abhishek Mendhekar
Hi Dong,

Thanks for the suggestion!

I think TopicCount sounds reasonable to me and it definitely seems
consistent with the other metric names. I will update the proposal to
reflect this change.

Thanks,
Abhishek

On Wed, Jun 21, 2017 at 2:17 PM, Dong Lin  wrote:

> Hey Abhishek,
>
> I think the metric is useful. Sorry for being late on this. I am wondering
> if TopicCount is a better name than TotalTopicCount, given that we
> currently have metric with names OfflinePartitionsCount, LeaderCount,
> PartitionCount etc.
>
> Thanks,
> Dong
>
> On Fri, Jun 16, 2017 at 9:09 AM, Abhishek Mendhekar <
> abhishek.mendhe...@gmail.com> wrote:
>
> > Hi Kafka Dev,
> >
> > I created KIP-168 to propose adding a metric to emit total topic count
> > in a cluster. The metric will be emited by the controller.
> >
> > The KIP can be found here
> > (https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 168%3A+Add+TotalTopicCount+metric+per+cluster)
> > and the assciated JIRA improvement is KAFKA-5461
> > (https://issues.apache.org/jira/browse/KAFKA-5461)
> >
> > Appreciate all the comments.
> >
> > Best,
> >
> > Abhishek
> >
>



-- 
Abhishek Mendhekar
abhishek.mendhe...@gmail.com | 818.263.7030


[jira] [Created] (KAFKA-5494) Idempotent producer should not require max.in.flight.requests.per.connection=1 and acks=all

2017-06-21 Thread Apurva Mehta (JIRA)
Apurva Mehta created KAFKA-5494:
---

 Summary: Idempotent producer should not require 
max.in.flight.requests.per.connection=1 and acks=all
 Key: KAFKA-5494
 URL: https://issues.apache.org/jira/browse/KAFKA-5494
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.11.0.0
Reporter: Apurva Mehta


Currently, the idempotent producer (and hence transactional producer) requires 
max.in.flight.requests.per.connection=1.

This was due to simplifying the implementation on the client and server. With 
some additional work, we can satisfy the idempotent guarantees even with any 
number of in flight requests. The changes on the client be summarized as 
follows:
 
# We increment sequence numbers when batches are drained.
# If for some reason, a batch fails with a retriable error, we know that all 
future batches would fail with an out of order sequence exception. 
# As such, the client should treat some OutOfOrderSequence errors as retriable. 
In particular, we should maintain the 'last acked sequnece'. If the batch 
succeeding the last ack'd sequence has an OutOfOrderSequence, that is a fatal 
error. If a future batch fails with OutOfOrderSequence they should be reenqeued.
# With the changes above, the the producer queues should become priority queues 
ordered by the sequence numbers. 
# The partition is not ready unless the front of the queue has the next 
expected sequence.

With the changes above, we would get the benefits of multiple inflights in 
normal cases. When there are failures, we automatically constrain to a single 
inflight until we get back in sequence. 

With multiple inflights, we now have the possibility of getting duplicates for 
batches other than the last appended batch. In order to return the record 
metadata (including offset) of the duplicates inside the log, we would require 
a log scan at the tail to get the metadata at the tail. This can be optimized 
by caching the metadata for the last 'n' batches. For instance, if the default 
max.inflight is 5, we could cache the record metadata of the last 5 batches, 
and fall back to a scan if the duplicate is not within those 5. 

* *

The reason to have acks=all is to protect against OutOfOrderSequence exceptions 
in the case where the leader fails before replication happens. In that case, 
the next batch sent by the producer would get an OutOfOrderSequence because the 
new leader would not have the last message. 

This may be OK: for applications which really care about avoiding duplicates, 
they have to handle fatal errors of this sort anyway. In particular, the 
recommendation is to close the producer in the callback on a fatal error and 
then check the tail of the log for the last committed message, and then start 
sending from there. 

By making acks=all, this application logic would just be exercised more 
frequently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3398: allow transactions in producer perf script

2017-06-21 Thread tcrayford
Github user tcrayford closed the pull request at:

https://github.com/apache/kafka/pull/3398


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5493) Optimize calls to flush for tasks and standby tasks

2017-06-21 Thread Bill Bejeck (JIRA)
Bill Bejeck created KAFKA-5493:
--

 Summary: Optimize calls to flush for tasks and standby tasks
 Key: KAFKA-5493
 URL: https://issues.apache.org/jira/browse/KAFKA-5493
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Bill Bejeck
Assignee: Bill Bejeck
 Fix For: 0.11.1.0


With EOS enabled we don't checkpoint on {{commit}} so there is no need to call 
{{flush}} when committing _top level tasks_ .  However for _standby tasks_ we 
still checkpoint thus need to still flush when committing. We need to develop 
an approach where we can optimize for top level tasks by avoid flushing on 
commit, while still preserving flush on commit for standby tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3400: KAFKA-5491: Enable transactions in ProducerPerform...

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3400


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-5491) The ProducerPerformance tool should support transactions

2017-06-21 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-5491.

   Resolution: Fixed
Fix Version/s: 0.11.0.0

Issue resolved by pull request 3400
[https://github.com/apache/kafka/pull/3400]

> The ProducerPerformance tool should support transactions
> 
>
> Key: KAFKA-5491
> URL: https://issues.apache.org/jira/browse/KAFKA-5491
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
> Fix For: 0.11.0.0
>
>
> We should allow users of the ProducerPerformance tool to run transactional 
> sends.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3403: MINOR: Turn off caching in demos for more understa...

2017-06-21 Thread guozhangwang
GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/3403

MINOR: Turn off caching in demos for more understandable outputs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka 
KMinor-turn-off-caching-in-demo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3403.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3403






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3399: KAFKA-5475: Connector config validation should inc...

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3399


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-168: Add TotalTopicCount metric per cluster

2017-06-21 Thread Dong Lin
Hey Abhishek,

I think the metric is useful. Sorry for being late on this. I am wondering
if TopicCount is a better name than TotalTopicCount, given that we
currently have metric with names OfflinePartitionsCount, LeaderCount,
PartitionCount etc.

Thanks,
Dong

On Fri, Jun 16, 2017 at 9:09 AM, Abhishek Mendhekar <
abhishek.mendhe...@gmail.com> wrote:

> Hi Kafka Dev,
>
> I created KIP-168 to propose adding a metric to emit total topic count
> in a cluster. The metric will be emited by the controller.
>
> The KIP can be found here
> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 168%3A+Add+TotalTopicCount+metric+per+cluster)
> and the assciated JIRA improvement is KAFKA-5461
> (https://issues.apache.org/jira/browse/KAFKA-5461)
>
> Appreciate all the comments.
>
> Best,
>
> Abhishek
>


[GitHub] kafka pull request #3373: MINOR: Detail message/batch size implications for ...

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3373


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5492) LogRecoveryTest.testHWCheckpointWithFailuresSingleLogSegment transient failure

2017-06-21 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-5492:
--

 Summary: 
LogRecoveryTest.testHWCheckpointWithFailuresSingleLogSegment transient failure
 Key: KAFKA-5492
 URL: https://issues.apache.org/jira/browse/KAFKA-5492
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jason Gustafson


{code}
ava.lang.AssertionError: Timing out after 3 ms since a new leader that is 
different from 1 was not elected for partition new-topic-0, leader is Some(1)
at kafka.utils.TestUtils$.fail(TestUtils.scala:333)
at 
kafka.utils.TestUtils$.$anonfun$waitUntilLeaderIsElectedOrChanged$8(TestUtils.scala:808)
at scala.Option.getOrElse(Option.scala:121)
at 
kafka.utils.TestUtils$.waitUntilLeaderIsElectedOrChanged(TestUtils.scala:798)
at 
kafka.server.LogRecoveryTest.testHWCheckpointWithFailuresSingleLogSegment(LogRecoveryTest.scala:152)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3402: KAFKA-5486: org.apache.kafka logging should go to ...

2017-06-21 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/3402

KAFKA-5486: org.apache.kafka logging should go to server.log



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-5486-org.apache.kafka-logging-server.log

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3402.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3402


commit 4e80435198b0de9ac656096d5d87fd5f77d77eb2
Author: Ismael Juma 
Date:   2017-06-21T20:32:07Z

KAFKA-5486: org.apache.kafka logging should go to server.log




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[VOTE] KIP-168: Add TotalTopicCount metric per cluster

2017-06-21 Thread Abhishek Mendhekar
Hi Kafka Dev,

I did like to start the voting on -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-
168%3A+Add+TotalTopicCount+metric+per+cluster

Discussions will continue on -
http://mail-archives.apache.org/mod_mbox/kafka-dev/201706.mbox/%3CCAMcwe-ugep-UiSn9TkKEMwwTM%3DAzGC4jPro9LnyYRezyZg_NKA%40mail.gmail.com%3E

Thanks,
Abhishek


Re: [DISCUSS] KIP-163: Lower the Minimum Required ACL Permission of OffsetFetch

2017-06-21 Thread Vahid S Hashemian
I appreciate everyone's feedback so far on this KIP.

Before starting a vote, I'd like to also ask for feedback on the 
"Additional Food for Thought" section in the KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-163%3A+Lower+the+Minimum+Required+ACL+Permission+of+OffsetFetch#KIP-163:LowertheMinimumRequiredACLPermissionofOffsetFetch-AdditionalFoodforThought
I just added some more details in that section, which I hope further 
clarifies the suggestion there.

Thanks.
--Vahid



From:   Vahid S Hashemian/Silicon Valley/IBM
To: dev@kafka.apache.org
Cc: "Kafka User" 
Date:   06/08/2017 11:29 AM
Subject:[DISCUSS] KIP-163: Lower the Minimum Required ACL 
Permission of OffsetFetch


Hi all,

I'm resending my earlier note hoping it would spark some conversation this 
time around :)

Thanks.
--Vahid





From:   "Vahid S Hashemian" 
To: dev , "Kafka User" 
Date:   05/30/2017 08:33 AM
Subject:KIP-163: Lower the Minimum Required ACL Permission of 
OffsetFetch



Hi,

I started a new KIP to improve the minimum required ACL permissions of 
some of the APIs: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-163%3A+Lower+the+Minimum+Required+ACL+Permission+of+OffsetFetch

The KIP is to address KAFKA-4585.

Feedback and suggestions are welcome!

Thanks.
--Vahid








[GitHub] kafka pull request #3083: KAFKA-1955: [WIP] Disk based buffer in Producer

2017-06-21 Thread blbradley
Github user blbradley closed the pull request at:

https://github.com/apache/kafka/pull/3083


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5491) The ProducerPerformance tool should support transactions

2017-06-21 Thread Apurva Mehta (JIRA)
Apurva Mehta created KAFKA-5491:
---

 Summary: The ProducerPerformance tool should support transactions
 Key: KAFKA-5491
 URL: https://issues.apache.org/jira/browse/KAFKA-5491
 Project: Kafka
  Issue Type: Bug
Reporter: Apurva Mehta


We should allow users of the ProducerPerformance tool to run transactional 
sends.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [VOTE] 0.11.0.0 RC1

2017-06-21 Thread Tom Crayford
That looks better than mine, nice! I think the tooling matters a lot to the
usability of the product we're shipping, being able to test out Kafka's
features on your own hardware/setup is very important to knowing if it can
work.

On Wed, Jun 21, 2017 at 8:01 PM, Apurva Mehta  wrote:

> Hi Tom,
>
> I actually made modifications to the produce performance tool to do real
> transactions earlier this week as part of our benchmarking (results
> published here: bit.ly/kafka-eos-perf). I just submitted that patch here:
> https://github.com/apache/kafka/pull/3400/files
>
> I think my version is more complete since it runs the full gamut of APIs:
> initTransactions, beginTransaction, commitTransaction. Also, it is the
> version used for our published benchmarks.
>
> I am not sure that this tool is a blocker for the release though, since it
> doesn't really affect the usability of the feature any way.
>
> Thanks,
> Apurva
>
> On Wed, Jun 21, 2017 at 11:12 AM, Tom Crayford 
> wrote:
>
> > Hi there,
> >
> > I'm -1 (non-binding) on shipping this RC.
> >
> > Heroku has carried on performance testing with 0.11 RC1. We have updated
> > our test setup to use 0.11.0.0 RC1 client libraries. Without any of the
> > transactional features enabled, we get slightly better performance than
> > 0.10.2.1 with 10.2.1 client libraries.
> >
> > However, we attempted to run a performance test today with transactions,
> > idempotence and consumer read_committed enabled, but couldn't, because
> > enabling transactions requires the producer to call `initTransactions`
> > before starting to send messages, and the producer performance tool
> doesn't
> > allow for that.
> >
> > I'm -1 (non-binding) on shipping this RC in this state, because users
> > expect to be able to use the inbuilt performance testing tools, and
> > preventing them from testing the impact of the new features using the
> > inbuilt tools isn't great. I made a PR for this:
> > https://github.com/apache/kafka/pull/3398 (the change is very small).
> > Happy
> > to make a jira as well, if that makes sense.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Tue, Jun 20, 2017 at 8:32 PM, Vahid S Hashemian <
> > vahidhashem...@us.ibm.com> wrote:
> >
> > > Hi Ismael,
> > >
> > > Thanks for running the release.
> > >
> > > Running tests ('gradlew.bat test') on my Windows 64-bit VM results in
> > > these checkstyle errors:
> > >
> > > :clients:checkstyleMain
> > > [ant:checkstyle] [ERROR]
> > > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > > java\org\apache\kafka\common\protocol\Errors.java:89:1:
> > > Class Data Abstraction Coupling is 57 (max allowed is 20) classes
> > > [ApiExceptionBuilder, BrokerNotAvailableException,
> > > ClusterAuthorizationException, ConcurrentTransactionsException,
> > > ControllerMovedException, CoordinatorLoadInProgressException,
> > > CoordinatorNotAvailableException, CorruptRecordException,
> > > DuplicateSequenceNumberException, GroupAuthorizationException,
> > > IllegalGenerationException, IllegalSaslStateException,
> > > InconsistentGroupProtocolException, InvalidCommitOffsetSizeException,
> > > InvalidConfigurationException, InvalidFetchSizeException,
> > > InvalidGroupIdException, InvalidPartitionsException,
> > > InvalidPidMappingException, InvalidReplicaAssignmentException,
> > > InvalidReplicationFactorException, InvalidRequestException,
> > > InvalidRequiredAcksException, InvalidSessionTimeoutException,
> > > InvalidTimestampException, InvalidTopicException,
> > > InvalidTxnStateException, InvalidTxnTimeoutException,
> > > LeaderNotAvailableException, NetworkException, NotControllerException,
> > > NotCoordinatorException, NotEnoughReplicasAfterAppendException,
> > > NotEnoughReplicasException, NotLeaderForPartitionException,
> > > OffsetMetadataTooLarge, OffsetOutOfRangeException,
> > > OperationNotAttemptedException, OutOfOrderSequenceException,
> > > PolicyViolationException, ProducerFencedException,
> > > RebalanceInProgressException, RecordBatchTooLargeException,
> > > RecordTooLargeException, ReplicaNotAvailableException,
> > > SecurityDisabledException, TimeoutException,
> TopicAuthorizationException,
> > > TopicExistsException, TransactionCoordinatorFencedException,
> > > TransactionalIdAuthorizationException, UnknownMemberIdException,
> > > UnknownServerException, UnknownTopicOrPartitionException,
> > > UnsupportedForMessageFormatException, UnsupportedSaslMechanismExcept
> ion,
> > > UnsupportedVersionException]. [ClassDataAbstractionCoupling]
> > > [ant:checkstyle] [ERROR]
> > > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > > java\org\apache\kafka\common\protocol\Errors.java:89:1:
> > > Class Fan-Out Complexity is 60 (max allowed is 40).
> > > [ClassFanOutComplexity]
> > > [ant:checkstyle] [ERROR]
> > > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > > java\org\apache\kafka\common\requests\AbstractRequest.java:26:1:
> > > Class Fan-Out Complexity is 43 (max allowed i

Re: [VOTE] 0.11.0.0 RC1

2017-06-21 Thread Apurva Mehta
Hi Tom,

I actually made modifications to the produce performance tool to do real
transactions earlier this week as part of our benchmarking (results
published here: bit.ly/kafka-eos-perf). I just submitted that patch here:
https://github.com/apache/kafka/pull/3400/files

I think my version is more complete since it runs the full gamut of APIs:
initTransactions, beginTransaction, commitTransaction. Also, it is the
version used for our published benchmarks.

I am not sure that this tool is a blocker for the release though, since it
doesn't really affect the usability of the feature any way.

Thanks,
Apurva

On Wed, Jun 21, 2017 at 11:12 AM, Tom Crayford  wrote:

> Hi there,
>
> I'm -1 (non-binding) on shipping this RC.
>
> Heroku has carried on performance testing with 0.11 RC1. We have updated
> our test setup to use 0.11.0.0 RC1 client libraries. Without any of the
> transactional features enabled, we get slightly better performance than
> 0.10.2.1 with 10.2.1 client libraries.
>
> However, we attempted to run a performance test today with transactions,
> idempotence and consumer read_committed enabled, but couldn't, because
> enabling transactions requires the producer to call `initTransactions`
> before starting to send messages, and the producer performance tool doesn't
> allow for that.
>
> I'm -1 (non-binding) on shipping this RC in this state, because users
> expect to be able to use the inbuilt performance testing tools, and
> preventing them from testing the impact of the new features using the
> inbuilt tools isn't great. I made a PR for this:
> https://github.com/apache/kafka/pull/3398 (the change is very small).
> Happy
> to make a jira as well, if that makes sense.
>
> Thanks
>
> Tom Crayford
> Heroku Kafka
>
> On Tue, Jun 20, 2017 at 8:32 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
> > Hi Ismael,
> >
> > Thanks for running the release.
> >
> > Running tests ('gradlew.bat test') on my Windows 64-bit VM results in
> > these checkstyle errors:
> >
> > :clients:checkstyleMain
> > [ant:checkstyle] [ERROR]
> > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > java\org\apache\kafka\common\protocol\Errors.java:89:1:
> > Class Data Abstraction Coupling is 57 (max allowed is 20) classes
> > [ApiExceptionBuilder, BrokerNotAvailableException,
> > ClusterAuthorizationException, ConcurrentTransactionsException,
> > ControllerMovedException, CoordinatorLoadInProgressException,
> > CoordinatorNotAvailableException, CorruptRecordException,
> > DuplicateSequenceNumberException, GroupAuthorizationException,
> > IllegalGenerationException, IllegalSaslStateException,
> > InconsistentGroupProtocolException, InvalidCommitOffsetSizeException,
> > InvalidConfigurationException, InvalidFetchSizeException,
> > InvalidGroupIdException, InvalidPartitionsException,
> > InvalidPidMappingException, InvalidReplicaAssignmentException,
> > InvalidReplicationFactorException, InvalidRequestException,
> > InvalidRequiredAcksException, InvalidSessionTimeoutException,
> > InvalidTimestampException, InvalidTopicException,
> > InvalidTxnStateException, InvalidTxnTimeoutException,
> > LeaderNotAvailableException, NetworkException, NotControllerException,
> > NotCoordinatorException, NotEnoughReplicasAfterAppendException,
> > NotEnoughReplicasException, NotLeaderForPartitionException,
> > OffsetMetadataTooLarge, OffsetOutOfRangeException,
> > OperationNotAttemptedException, OutOfOrderSequenceException,
> > PolicyViolationException, ProducerFencedException,
> > RebalanceInProgressException, RecordBatchTooLargeException,
> > RecordTooLargeException, ReplicaNotAvailableException,
> > SecurityDisabledException, TimeoutException, TopicAuthorizationException,
> > TopicExistsException, TransactionCoordinatorFencedException,
> > TransactionalIdAuthorizationException, UnknownMemberIdException,
> > UnknownServerException, UnknownTopicOrPartitionException,
> > UnsupportedForMessageFormatException, UnsupportedSaslMechanismException,
> > UnsupportedVersionException]. [ClassDataAbstractionCoupling]
> > [ant:checkstyle] [ERROR]
> > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > java\org\apache\kafka\common\protocol\Errors.java:89:1:
> > Class Fan-Out Complexity is 60 (max allowed is 40).
> > [ClassFanOutComplexity]
> > [ant:checkstyle] [ERROR]
> > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > java\org\apache\kafka\common\requests\AbstractRequest.java:26:1:
> > Class Fan-Out Complexity is 43 (max allowed is 40).
> > [ClassFanOutComplexity]
> > [ant:checkstyle] [ERROR]
> > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > java\org\apache\kafka\common\requests\AbstractResponse.java:26:1:
> > Class Fan-Out Complexity is 42 (max allowed is 40).
> > [ClassFanOutComplexity]
> > :clients:checkstyleMain FAILED
> >
> > I wonder if there is an issue with my VM since I don't get similar errors
> > on Ubuntu or Mac.
> >
> > --Vahid
> >
> >
> >
> >
> > From:   Ismael Juma 
> > To:

[jira] [Created] (KAFKA-5490) Deletion of tombstones during cleaning should consider idempotent message retention

2017-06-21 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-5490:
--

 Summary: Deletion of tombstones during cleaning should consider 
idempotent message retention
 Key: KAFKA-5490
 URL: https://issues.apache.org/jira/browse/KAFKA-5490
 Project: Kafka
  Issue Type: Sub-task
Reporter: Jason Gustafson
Assignee: Jason Gustafson
Priority: Critical
 Fix For: 0.11.0.1


The LogCleaner always preserves the message containing last sequence from a 
given ProducerId when doing a round of cleaning. This is necessary to ensure 
that the producer is not prematurely evicted which would cause an 
OutOfOrderSequenceException. The problem with this approach is that the 
preserved message won't be considered again for cleaning until a new message 
with the same key is written to the topic. Generally this could result in 
accumulation of stale entries in the log, but the bigger problem is that the 
newer entry with the same key could be a tombstone. If we end up deleting this 
tombstone before a new record with the same key is written, then the old entry 
will resurface. For example, suppose the following sequence of writes:

1. ProducerId=1, Key=A, Value=1
2. ProducerId=2, Key=A, Value=null (tombstone)

We will preserve the first entry indefinitely until a new record with Key=A is 
written AND either ProducerId 1 has written a newer record with a larger 
sequence number or ProducerId 1 becomes expired. As long as the tombstone is 
preserved, there is no correctness violation: a consumer reading from the 
beginning will ignore the first entry after reading the tombstone. But it is 
possible that the tombstone entry will be removed from the log before a new 
record with Key=A is written. If that happens, then a consumer reading from the 
beginning would incorrectly observe the overwritten value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3401: MINOR: explain producer naming within Streams

2017-06-21 Thread mjsax
GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/3401

MINOR: explain producer naming within Streams



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka minor-producer-naming-011

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3401.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3401


commit 411d09790875f080e510af18352e4549e2091dce
Author: Matthias J. Sax 
Date:   2017-06-20T00:42:13Z

MINOR: explain producer naming within Streams




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3400: Enable transactions in ProducerPerformance Tool

2017-06-21 Thread apurvam
GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/3400

Enable transactions in ProducerPerformance Tool

With this patch, the `ProducePerfomance` tool can create transactions of 
differing durations.

This patch was used to to collect the initial set of benchmarks for 
transaction performance, documented here: 
https://docs.google.com/spreadsheets/d/1dHY6M7qCiX-NFvsgvaE0YoVdNq26uA8608XIh_DUpI4/edit#gid=282787170

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka 
MINOR-add-transaction-size-to-producre-perf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3400.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3400


commit a0ae1f00d84dc3516646366b63e59915796f95a5
Author: Apurva Mehta 
Date:   2017-06-20T00:11:21Z

Enable the performance producer to cerate transactions of variable
durations.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5489) Failing test: InternalTopicIntegrationTest.shouldCompactTopicsForStateChangelogs

2017-06-21 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-5489:
--

 Summary: Failing test: 
InternalTopicIntegrationTest.shouldCompactTopicsForStateChangelogs
 Key: KAFKA-5489
 URL: https://issues.apache.org/jira/browse/KAFKA-5489
 Project: Kafka
  Issue Type: Bug
  Components: streams, unit tests
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax


Test failed with

{noformat}
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.kafka.streams.integration.InternalTopicIntegrationTest.shouldCompactTopicsForStateChangelogs(InternalTopicIntegrationTest.java:173)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: confluence permission request

2017-06-21 Thread Jeremy Hanna
May I also get permission to modify the Kafka Confluence space?  My user is 
jeromatron and I’d like to help with documentation.

> On Jun 21, 2017, at 11:22 AM, Damian Guy  wrote:
> 
> Hi,
> 
> That should be done.
> 
> Thanks,
> Damian
> 
> On Wed, 21 Jun 2017 at 05:42 Kenji Hayashida  wrote:
> 
>> To Kafka Dev Team,
>> 
>> Sorry, forgot sending my ID.
>> My ID is kenjih.
>> 
>> Thanks.
>> 
>> - Kenji Hayashida
>> 
>> 2017-06-21 13:29 GMT+09:00 Kenji Hayashida :
>> 
>>> To Kafka Dev Team,
>>> 
>>> Hi, could you please give me a write permission to the confluence page?
>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>> Kafka+Improvement+Proposals
>>> 
>>> I'm going to write a KIP.
>>> Thanks.
>>> 
>>> - Kenji Hayashida
>>> 
>>> 
>> 
>> 
>> --
>> 
>> ☆---★
>> 林田賢二
>> MAIL: kenji12...@gmail.com
>> 
>> ☆---★
>> 



RE: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread Tauzell, Dave
I’m not really familiar with Netty so I won’t be of much help.   Maybe try 
posting on a Netty forum to see what they think?
-Dave

From: SenthilKumar K [mailto:senthilec...@gmail.com]
Sent: Wednesday, June 21, 2017 10:28 AM
To: Tauzell, Dave
Cc: us...@kafka.apache.org; senthilec...@apache.org; dev@kafka.apache.org
Subject: Re: Handling 2 to 3 Million Events before Kafka

So netty would work for this case ?  I do have netty server and seems to be i'm 
not getting the expected results .. here is the git 
https://github.com/senthilec566/netty4-server , is this right implementation ?

Cheers,
Senthil

On Wed, Jun 21, 2017 at 7:45 PM, Tauzell, Dave 
mailto:dave.tauz...@surescripts.com>> wrote:
I see.

1.   You don’t want the 100k machines sending directly to kafka.

2.   You can only have a small number of web servers

People certainly have web-servers handling over 100k concurrent connections.  
See this for some examples:  https://github.com/smallnest/C1000K-Servers .

It seems possible with the right sort of kafka producer tuning.

-Dave

From: SenthilKumar K 
[mailto:senthilec...@gmail.com]
Sent: Wednesday, June 21, 2017 8:55 AM
To: Tauzell, Dave
Cc: us...@kafka.apache.org; 
senthilec...@apache.org; 
dev@kafka.apache.org; Senthil kumar
Subject: Re: Handling 2 to 3 Million Events before Kafka

Thanks Jeyhun. Yes http server would be problematic here w.r.t network , memory 
..

Hi Dave ,  The problem is not with Kafka , it's all about how do you handle 
huge data before kafka.  I did a simple test with 5 node Kafka Cluster which 
gives good result ( ~950 MB/s ) ..So Kafka side i dont see a scaling issue ...

All we are trying is before kafka how do we handle messages from different 
servers ...  Webservers can send fast to kafka but still i can handle only 50k 
events per second which is less for my use case.. also i can't deploy 20 
webservers to handle this load. I'm looking for an option what could be the 
best candidate before kafka , it should be super fast in getting all and send 
it to kafka producer ..


--Senthil

On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave 
mailto:dave.tauz...@surescripts.com>> wrote:
What are your configurations?

- production
- brokers
- consumers

Is the problem that web servers cannot send to Kafka fast enough or your 
consumers cannot process messages off of kafka fast enough?
What is the average size of these messages?

-Dave

-Original Message-
From: SenthilKumar K 
[mailto:senthilec...@gmail.com]
Sent: Wednesday, June 21, 2017 7:58 AM
To: us...@kafka.apache.org
Cc: senthilec...@apache.org; Senthil kumar; 
dev@kafka.apache.org
Subject: Handling 2 to 3 Million Events before Kafka

Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...

I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka is 
really good candidate for us to handle this ingestion rate ..


100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..

I see the problem in Http Server where it can't handle beyond 50K events per 
instance ..  I'm thinking some other solution would be right choice before 
Kafka ..

Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?

--Senthil
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.




[GitHub] kafka pull request #3394: KAFKA-5475: Connector config validation needs to i...

2017-06-21 Thread kkonstantine
Github user kkonstantine closed the pull request at:

https://github.com/apache/kafka/pull/3394


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] 0.11.0.0 RC1

2017-06-21 Thread Tom Crayford
Hi there,

I'm -1 (non-binding) on shipping this RC.

Heroku has carried on performance testing with 0.11 RC1. We have updated
our test setup to use 0.11.0.0 RC1 client libraries. Without any of the
transactional features enabled, we get slightly better performance than
0.10.2.1 with 10.2.1 client libraries.

However, we attempted to run a performance test today with transactions,
idempotence and consumer read_committed enabled, but couldn't, because
enabling transactions requires the producer to call `initTransactions`
before starting to send messages, and the producer performance tool doesn't
allow for that.

I'm -1 (non-binding) on shipping this RC in this state, because users
expect to be able to use the inbuilt performance testing tools, and
preventing them from testing the impact of the new features using the
inbuilt tools isn't great. I made a PR for this:
https://github.com/apache/kafka/pull/3398 (the change is very small). Happy
to make a jira as well, if that makes sense.

Thanks

Tom Crayford
Heroku Kafka

On Tue, Jun 20, 2017 at 8:32 PM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> Hi Ismael,
>
> Thanks for running the release.
>
> Running tests ('gradlew.bat test') on my Windows 64-bit VM results in
> these checkstyle errors:
>
> :clients:checkstyleMain
> [ant:checkstyle] [ERROR]
> C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> java\org\apache\kafka\common\protocol\Errors.java:89:1:
> Class Data Abstraction Coupling is 57 (max allowed is 20) classes
> [ApiExceptionBuilder, BrokerNotAvailableException,
> ClusterAuthorizationException, ConcurrentTransactionsException,
> ControllerMovedException, CoordinatorLoadInProgressException,
> CoordinatorNotAvailableException, CorruptRecordException,
> DuplicateSequenceNumberException, GroupAuthorizationException,
> IllegalGenerationException, IllegalSaslStateException,
> InconsistentGroupProtocolException, InvalidCommitOffsetSizeException,
> InvalidConfigurationException, InvalidFetchSizeException,
> InvalidGroupIdException, InvalidPartitionsException,
> InvalidPidMappingException, InvalidReplicaAssignmentException,
> InvalidReplicationFactorException, InvalidRequestException,
> InvalidRequiredAcksException, InvalidSessionTimeoutException,
> InvalidTimestampException, InvalidTopicException,
> InvalidTxnStateException, InvalidTxnTimeoutException,
> LeaderNotAvailableException, NetworkException, NotControllerException,
> NotCoordinatorException, NotEnoughReplicasAfterAppendException,
> NotEnoughReplicasException, NotLeaderForPartitionException,
> OffsetMetadataTooLarge, OffsetOutOfRangeException,
> OperationNotAttemptedException, OutOfOrderSequenceException,
> PolicyViolationException, ProducerFencedException,
> RebalanceInProgressException, RecordBatchTooLargeException,
> RecordTooLargeException, ReplicaNotAvailableException,
> SecurityDisabledException, TimeoutException, TopicAuthorizationException,
> TopicExistsException, TransactionCoordinatorFencedException,
> TransactionalIdAuthorizationException, UnknownMemberIdException,
> UnknownServerException, UnknownTopicOrPartitionException,
> UnsupportedForMessageFormatException, UnsupportedSaslMechanismException,
> UnsupportedVersionException]. [ClassDataAbstractionCoupling]
> [ant:checkstyle] [ERROR]
> C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> java\org\apache\kafka\common\protocol\Errors.java:89:1:
> Class Fan-Out Complexity is 60 (max allowed is 40).
> [ClassFanOutComplexity]
> [ant:checkstyle] [ERROR]
> C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> java\org\apache\kafka\common\requests\AbstractRequest.java:26:1:
> Class Fan-Out Complexity is 43 (max allowed is 40).
> [ClassFanOutComplexity]
> [ant:checkstyle] [ERROR]
> C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> java\org\apache\kafka\common\requests\AbstractResponse.java:26:1:
> Class Fan-Out Complexity is 42 (max allowed is 40).
> [ClassFanOutComplexity]
> :clients:checkstyleMain FAILED
>
> I wonder if there is an issue with my VM since I don't get similar errors
> on Ubuntu or Mac.
>
> --Vahid
>
>
>
>
> From:   Ismael Juma 
> To: dev@kafka.apache.org, Kafka Users ,
> kafka-clients 
> Date:   06/18/2017 03:32 PM
> Subject:[VOTE] 0.11.0.0 RC1
> Sent by:isma...@gmail.com
>
>
>
> Hello Kafka users, developers and client-developers,
>
> This is the second candidate for release of Apache Kafka 0.11.0.0.
>
> This is a major version release of Apache Kafka. It includes 32 new KIPs.
> See
> the release notes and release plan (https://cwiki.apache.org/conf
> luence/display/KAFKA/Release+Plan+0.11.0.0) for more details. A few
> feature
> highlights:
>
> * Exactly-once delivery and transactional messaging
> * Streams exactly-once semantics
> * Admin client with support for topic, ACLs and config management
> * Record headers
> * Request rate quotas
> * Improved resiliency: replication protocol improvement and
> single-threaded
> controller
> * Richer and more eff

[GitHub] kafka pull request #3399: KAFKA-5475: Connector config validation should inc...

2017-06-21 Thread ewencp
GitHub user ewencp opened a pull request:

https://github.com/apache/kafka/pull/3399

KAFKA-5475: Connector config validation should include fields for defined 
transformation aliases



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ewencp/kafka 
kafka-5475-validation-transformations

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3399.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3399


commit 792a4e03379ff470bc98698455bbe75767739383
Author: Ewen Cheslack-Postava 
Date:   2017-06-21T06:18:41Z

KAFKA-5475: Connector config validation should include fields for defined 
transformation aliases




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3377: KAFKA-5477: Lower retryBackoff for AddPartitionsRe...

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3377


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3398: allow transactions in producer perf script

2017-06-21 Thread tcrayford
GitHub user tcrayford opened a pull request:

https://github.com/apache/kafka/pull/3398

allow transactions in producer perf script

allow the transactional producer to be enabled in `producer-perf.sh`, with 
a new flag `--use-transactions`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/heroku/kafka 
allow_transactions_in_producer_perf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3398.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3398


commit 9170e87b2a154e7478af0ede43cc56b01f75bedd
Author: Xavier Léauté 
Date:   2017-05-19T00:02:51Z

KAFKA-5192: add WindowStore range scan (KIP-155)

Implements range scan for keys in windowed and session stores

Modifies caching session and windowed stores to use segmented cache keys.
Cache keys are internally prefixed with their segment id to ensure key 
ordering in the cache matches the ordering in the underlying store for keys 
spread across multiple segments.
This should also result in fewer cache keys getting scanned for queries 
spanning only some segments.

Author: Xavier Léauté 

Reviewers: Damian Guy, Guozhang Wang

Closes #3027 from xvrl/windowstore-range-scan

(cherry picked from commit e28752357705568219315375c666f8e500db9c12)
Signed-off-by: Guozhang Wang 

commit b31df613f2664a8dc699b421b392f0aca9d5f83f
Author: Dana Powers 
Date:   2017-05-19T13:03:22Z

KAFKA-3878; Support exponential backoff policy via reconnect.backoff.max 
(KIP-144)

Summary:
- add `reconnect.backoff.max.ms` common client configuration parameter
- if `reconnect.backoff.max.ms` > `reconnect.backoff.ms`, apply an 
exponential backoff policy
- apply +/- 20% random jitter to smooth cluster reconnects

Author: Dana Powers 

Reviewers: Ewen Cheslack-Postava , Roger Hoover 
, Ismael Juma 

Closes #1523 from dpkp/exp_backoff

(cherry picked from commit abe699176babe4f065b67b2b72d20daa0a2e46a1)
Signed-off-by: Ismael Juma 

commit 64e12d3c641baeb6a4a624b971ba33721d656054
Author: Ewen Cheslack-Postava 
Date:   2017-05-19T18:26:59Z

KAFKA-4714; TimestampConverter transformation (KIP-66)

Author: Ewen Cheslack-Postava 

Reviewers: Konstantine Karantasis , Jason 
Gustafson 

Closes #3065 from ewencp/kafka-3209-timestamp-converter

(cherry picked from commit 61bab2d875ab5e03d0df4f62217346549a4c64c3)
Signed-off-by: Jason Gustafson 

commit 037f63882df854c923bb9de11856318b951ecb32
Author: Matthias J. Sax 
Date:   2017-05-19T23:29:25Z

MINOR: improve descriptions of Streams reset tool options

Author: Matthias J. Sax 

Reviewers: Bill Bejeck, Guozhang Wang

Closes #3107 from mjsax/minor-reset-tool-options

(cherry picked from commit 338857e1c2c2bb0c6011fcab477818102b95cb56)
Signed-off-by: Guozhang Wang 

commit 55330cc2931cc05cb1172cc207ada60154a6250a
Author: Apurva Mehta 
Date:   2017-05-20T01:51:37Z

KAFKA-5269; Retry on unknown topic/partition error in transactional requests

We should retry AddPartitionsToTxnRequest and TxnOffsetCommitRequest when 
receiving an UNKNOWN_TOPIC_OR_PARTITION error.

As described in the JIRA: It turns out that the 
`UNKNOWN_TOPIC_OR_PARTITION` is returned from the request handler in KafkaAPis 
for the AddPartitionsToTxn and the TxnOffsetCommitRequest when the broker's 
metadata doesn't contain one or more partitions in the request. This can happen 
for instance when the broker is bounced and has not received the cluster 
metadata yet.

We should retry in these cases, as this is the model followed by the 
consumer when committing offsets, and by the producer with a ProduceRequest.

Author: Apurva Mehta 

Reviewers: Guozhang Wang , Jason Gustafson 


Closes #3094 from 
apurvam/KAFKA-5269-handle-unknown-topic-partition-in-transaction-manager

commit a144c5277dd95d8862ea6d9a497e5a087aa491fe
Author: Ewen Cheslack-Postava 
Date:   2017-05-22T01:04:32Z

HOTFIX: In Connect test with auto topic creation disabled, ensure 
precreated topic is always used

Author: Ewen Cheslack-Postava 

Reviewers: Jason Gustafson 

Closes #3112 from ewencp/hotfix-precreate-topic

(cherry picked from commit d190d89dbc3df90e00f5e3c55507f67c5818504e)
Signed-off-by: Ewen Cheslack-Postava 

commit c0e789bdcc039a177f99bfecf36191f5be566d63
Author: Magnus Edenhill 
Date:   2017-05-22T01:23:12Z

MINOR: Fix race condition in TestVerifiableProducer sanity test

## Fixes race condition in TestVerifiableProducer sanity test:
The test starts a producer, waits for at least 5 acks, and then
logs in to the worker to grep for the producer process to figure
out w

[GitHub] kafka pull request #3397: KAFKA-5413: Port fix to 0.10.2 branch

2017-06-21 Thread kelvinrutt
GitHub user kelvinrutt opened a pull request:

https://github.com/apache/kafka/pull/3397

KAFKA-5413: Port fix to 0.10.2 branch

Port KAFKA-5413 to the 0.10.2 branch

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kelvinrutt/kafka kafka_5413_port

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3397


commit 82c7ce89f230bd08864d46abadbae0a2278c89dc
Author: Kelvin Rutt 
Date:   2017-06-21T17:34:18Z

KAFKA-5413: Port fix to 0.10.2 branch




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Contributor

2017-06-21 Thread Maisnam Ns
Can you please add me also (username: niranjanmaisnam) to the contributors
list for
this project at issues.apache.org? (kafka)


Thank you,
niranjan

On Wed, Jun 21, 2017 at 10:08 PM, Tom Bentley  wrote:

> Thanks!
>
> On 21 Jun 2017 4:20 pm, "Damian Guy"  wrote:
>
> > Done - thanks
> >
> > On Wed, 21 Jun 2017 at 12:19 Tom Bentley  wrote:
> >
> > > Please can I also be added? My username is tombentley.
> > >
> > > Thanks
> > >
> > > Tom
> > >
> > > On 21 June 2017 at 12:03, Damian Guy  wrote:
> > >
> > > > Hi Andras,
> > > >
> > > > You should have access now.
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > > > On Wed, 21 Jun 2017 at 10:45 Andras Beni 
> > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I'd like to contribute to Apache Kafka.
> > > > > Can you please add me (username: andrasbeni) to the contributors
> list
> > > for
> > > > > this project at issues.apache.org?
> > > > >
> > > > > Thank you,
> > > > > Andras
> > > > >
> > > >
> > >
> >
>


Re: Contributor

2017-06-21 Thread Tom Bentley
Thanks!

On 21 Jun 2017 4:20 pm, "Damian Guy"  wrote:

> Done - thanks
>
> On Wed, 21 Jun 2017 at 12:19 Tom Bentley  wrote:
>
> > Please can I also be added? My username is tombentley.
> >
> > Thanks
> >
> > Tom
> >
> > On 21 June 2017 at 12:03, Damian Guy  wrote:
> >
> > > Hi Andras,
> > >
> > > You should have access now.
> > >
> > > Thanks,
> > > Damian
> > >
> > > On Wed, 21 Jun 2017 at 10:45 Andras Beni 
> > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'd like to contribute to Apache Kafka.
> > > > Can you please add me (username: andrasbeni) to the contributors list
> > for
> > > > this project at issues.apache.org?
> > > >
> > > > Thank you,
> > > > Andras
> > > >
> > >
> >
>


Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-06-21 Thread Eno Thereska
To make it clear, it’s outlined by Damian, I just copy pasted what he told me 
in person :)

Eno

> On Jun 21, 2017, at 4:40 PM, Bill Bejeck  wrote:
> 
> +1 for the approach outlined above by Eno.
> 
> On Wed, Jun 21, 2017 at 11:28 AM, Damian Guy  wrote:
> 
>> Thanks Eno.
>> 
>> Yes i agree. We could apply this same approach to most of the operations
>> where we have multiple overloads, i.e., we have a single method for each
>> operation that takes the required parameters and everything else is
>> specified as you have done above.
>> 
>> On Wed, 21 Jun 2017 at 16:24 Eno Thereska  wrote:
>> 
>>> (cc’ing user-list too)
>>> 
>>> Given that we already have StateStoreSuppliers that are configurable
>> using
>>> the fluent-like API, probably it’s worth discussing the other examples
>> with
>>> joins and serdes first since those have many overloads and are in need of
>>> some TLC.
>>> 
>>> So following your example, I guess you’d have something like:
>>> .join()
>>>   .withKeySerdes(…)
>>>   .withValueSerdes(…)
>>>   .withJoinType(“outer”)
>>> 
>>> etc?
>>> 
>>> I like the approach since it still remains declarative and it’d reduce
>> the
>>> number of overloads by quite a bit.
>>> 
>>> Eno
>>> 
 On Jun 21, 2017, at 3:37 PM, Damian Guy  wrote:
 
 Hi,
 
 I'd like to get a discussion going around some of the API choices we've
 made in the DLS. In particular those that relate to stateful operations
 (though this could expand).
 As it stands we lean heavily on overloaded methods in the API, i.e,
>> there
 are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and
>> i
 feel it is only going to get worse as we add more optional params. In
 particular we've had some requests to be able to turn caching off, or
 change log configs,  on a per operator basis (note this can be done now
>>> if
 you pass in a StateStoreSupplier, but this can be a bit cumbersome).
 
 So this is a bit of an open question. How can we change the DSL
>> overloads
 so that it flows, is simple to use and understand, and is easily
>> extended
 in the future?
 
 One option would be to use a fluent API approach for providing the
>>> optional
 params, so something like this:
 
 groupedStream.count()
  .withStoreName("name")
  .withCachingEnabled(false)
  .withLoggingEnabled(config)
  .table()
 
 
 
 Another option would be to provide a Builder to the count method, so it
 would look something like this:
 groupedStream.count(new
 CountBuilder("storeName").withCachingEnabled(false).build())
 
 Another option is to say: Hey we don't need this, what are you on
>> about!
 
 The above has focussed on state store related overloads, but the same
>>> ideas
 could  be applied to joins etc, where we presently have many join
>> methods
 and many overloads.
 
 Anyway, i look forward to hearing your opinions.
 
 Thanks,
 Damian
>>> 
>>> 
>> 



Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-06-21 Thread Bill Bejeck
+1 for the approach outlined above by Eno.

On Wed, Jun 21, 2017 at 11:28 AM, Damian Guy  wrote:

> Thanks Eno.
>
> Yes i agree. We could apply this same approach to most of the operations
> where we have multiple overloads, i.e., we have a single method for each
> operation that takes the required parameters and everything else is
> specified as you have done above.
>
> On Wed, 21 Jun 2017 at 16:24 Eno Thereska  wrote:
>
> > (cc’ing user-list too)
> >
> > Given that we already have StateStoreSuppliers that are configurable
> using
> > the fluent-like API, probably it’s worth discussing the other examples
> with
> > joins and serdes first since those have many overloads and are in need of
> > some TLC.
> >
> > So following your example, I guess you’d have something like:
> > .join()
> >.withKeySerdes(…)
> >.withValueSerdes(…)
> >.withJoinType(“outer”)
> >
> > etc?
> >
> > I like the approach since it still remains declarative and it’d reduce
> the
> > number of overloads by quite a bit.
> >
> > Eno
> >
> > > On Jun 21, 2017, at 3:37 PM, Damian Guy  wrote:
> > >
> > > Hi,
> > >
> > > I'd like to get a discussion going around some of the API choices we've
> > > made in the DLS. In particular those that relate to stateful operations
> > > (though this could expand).
> > > As it stands we lean heavily on overloaded methods in the API, i.e,
> there
> > > are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and
> i
> > > feel it is only going to get worse as we add more optional params. In
> > > particular we've had some requests to be able to turn caching off, or
> > > change log configs,  on a per operator basis (note this can be done now
> > if
> > > you pass in a StateStoreSupplier, but this can be a bit cumbersome).
> > >
> > > So this is a bit of an open question. How can we change the DSL
> overloads
> > > so that it flows, is simple to use and understand, and is easily
> extended
> > > in the future?
> > >
> > > One option would be to use a fluent API approach for providing the
> > optional
> > > params, so something like this:
> > >
> > > groupedStream.count()
> > >   .withStoreName("name")
> > >   .withCachingEnabled(false)
> > >   .withLoggingEnabled(config)
> > >   .table()
> > >
> > >
> > >
> > > Another option would be to provide a Builder to the count method, so it
> > > would look something like this:
> > > groupedStream.count(new
> > > CountBuilder("storeName").withCachingEnabled(false).build())
> > >
> > > Another option is to say: Hey we don't need this, what are you on
> about!
> > >
> > > The above has focussed on state store related overloads, but the same
> > ideas
> > > could  be applied to joins etc, where we presently have many join
> methods
> > > and many overloads.
> > >
> > > Anyway, i look forward to hearing your opinions.
> > >
> > > Thanks,
> > > Damian
> >
> >
>


Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-06-21 Thread Damian Guy
Thanks Eno.

Yes i agree. We could apply this same approach to most of the operations
where we have multiple overloads, i.e., we have a single method for each
operation that takes the required parameters and everything else is
specified as you have done above.

On Wed, 21 Jun 2017 at 16:24 Eno Thereska  wrote:

> (cc’ing user-list too)
>
> Given that we already have StateStoreSuppliers that are configurable using
> the fluent-like API, probably it’s worth discussing the other examples with
> joins and serdes first since those have many overloads and are in need of
> some TLC.
>
> So following your example, I guess you’d have something like:
> .join()
>.withKeySerdes(…)
>.withValueSerdes(…)
>.withJoinType(“outer”)
>
> etc?
>
> I like the approach since it still remains declarative and it’d reduce the
> number of overloads by quite a bit.
>
> Eno
>
> > On Jun 21, 2017, at 3:37 PM, Damian Guy  wrote:
> >
> > Hi,
> >
> > I'd like to get a discussion going around some of the API choices we've
> > made in the DLS. In particular those that relate to stateful operations
> > (though this could expand).
> > As it stands we lean heavily on overloaded methods in the API, i.e, there
> > are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and i
> > feel it is only going to get worse as we add more optional params. In
> > particular we've had some requests to be able to turn caching off, or
> > change log configs,  on a per operator basis (note this can be done now
> if
> > you pass in a StateStoreSupplier, but this can be a bit cumbersome).
> >
> > So this is a bit of an open question. How can we change the DSL overloads
> > so that it flows, is simple to use and understand, and is easily extended
> > in the future?
> >
> > One option would be to use a fluent API approach for providing the
> optional
> > params, so something like this:
> >
> > groupedStream.count()
> >   .withStoreName("name")
> >   .withCachingEnabled(false)
> >   .withLoggingEnabled(config)
> >   .table()
> >
> >
> >
> > Another option would be to provide a Builder to the count method, so it
> > would look something like this:
> > groupedStream.count(new
> > CountBuilder("storeName").withCachingEnabled(false).build())
> >
> > Another option is to say: Hey we don't need this, what are you on about!
> >
> > The above has focussed on state store related overloads, but the same
> ideas
> > could  be applied to joins etc, where we presently have many join methods
> > and many overloads.
> >
> > Anyway, i look forward to hearing your opinions.
> >
> > Thanks,
> > Damian
>
>


Re: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread SenthilKumar K
So netty would work for this case ?  I do have netty server and seems to be
i'm not getting the expected results .. here is the git
https://github.com/senthilec566/netty4-server , is this right
implementation ?

Cheers,
Senthil

On Wed, Jun 21, 2017 at 7:45 PM, Tauzell, Dave  wrote:

> I see.
>
> 1.   You don’t want the 100k machines sending directly to kafka.
>
> 2.   You can only have a small number of web servers
>
>
>
> People certainly have web-servers handling over 100k concurrent
> connections.  See this for some examples:  https://github.com/smallnest/
> C1000K-Servers .
>
>
>
> It seems possible with the right sort of kafka producer tuning.
>
>
>
> -Dave
>
>
>
> *From:* SenthilKumar K [mailto:senthilec...@gmail.com]
> *Sent:* Wednesday, June 21, 2017 8:55 AM
> *To:* Tauzell, Dave
> *Cc:* us...@kafka.apache.org; senthilec...@apache.org;
> dev@kafka.apache.org; Senthil kumar
> *Subject:* Re: Handling 2 to 3 Million Events before Kafka
>
>
>
> Thanks Jeyhun. Yes http server would be problematic here w.r.t network ,
> memory ..
>
>
>
> Hi Dave ,  The problem is not with Kafka , it's all about how do you
> handle huge data before kafka.  I did a simple test with 5 node Kafka
> Cluster which gives good result ( ~950 MB/s ) ..So Kafka side i dont see a
> scaling issue ...
>
>
>
> All we are trying is before kafka how do we handle messages from different
> servers ...  Webservers can send fast to kafka but still i can handle only
> 50k events per second which is less for my use case.. also i can't deploy
> 20 webservers to handle this load. I'm looking for an option what could be
> the best candidate before kafka , it should be super fast in getting all
> and send it to kafka producer ..
>
>
>
>
>
> --Senthil
>
>
>
> On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave <
> dave.tauz...@surescripts.com> wrote:
>
> What are your configurations?
>
> - production
> - brokers
> - consumers
>
> Is the problem that web servers cannot send to Kafka fast enough or your
> consumers cannot process messages off of kafka fast enough?
> What is the average size of these messages?
>
> -Dave
>
>
> -Original Message-
> From: SenthilKumar K [mailto:senthilec...@gmail.com]
> Sent: Wednesday, June 21, 2017 7:58 AM
> To: us...@kafka.apache.org
> Cc: senthilec...@apache.org; Senthil kumar; dev@kafka.apache.org
> Subject: Handling 2 to 3 Million Events before Kafka
>
> Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...
>
> I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
> is really good candidate for us to handle this ingestion rate ..
>
>
> 100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..
>
> I see the problem in Http Server where it can't handle beyond 50K events
> per instance ..  I'm thinking some other solution would be right choice
> before Kafka ..
>
> Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?
>
> --Senthil
>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>
>
>


Re: [DISCUSS] Streams DSL/StateStore Refactoring

2017-06-21 Thread Eno Thereska
(cc’ing user-list too)

Given that we already have StateStoreSuppliers that are configurable using the 
fluent-like API, probably it’s worth discussing the other examples with joins 
and serdes first since those have many overloads and are in need of some TLC.

So following your example, I guess you’d have something like:
.join()
   .withKeySerdes(…)
   .withValueSerdes(…)
   .withJoinType(“outer”)

etc?

I like the approach since it still remains declarative and it’d reduce the 
number of overloads by quite a bit.

Eno

> On Jun 21, 2017, at 3:37 PM, Damian Guy  wrote:
> 
> Hi,
> 
> I'd like to get a discussion going around some of the API choices we've
> made in the DLS. In particular those that relate to stateful operations
> (though this could expand).
> As it stands we lean heavily on overloaded methods in the API, i.e, there
> are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and i
> feel it is only going to get worse as we add more optional params. In
> particular we've had some requests to be able to turn caching off, or
> change log configs,  on a per operator basis (note this can be done now if
> you pass in a StateStoreSupplier, but this can be a bit cumbersome).
> 
> So this is a bit of an open question. How can we change the DSL overloads
> so that it flows, is simple to use and understand, and is easily extended
> in the future?
> 
> One option would be to use a fluent API approach for providing the optional
> params, so something like this:
> 
> groupedStream.count()
>   .withStoreName("name")
>   .withCachingEnabled(false)
>   .withLoggingEnabled(config)
>   .table()
> 
> 
> 
> Another option would be to provide a Builder to the count method, so it
> would look something like this:
> groupedStream.count(new
> CountBuilder("storeName").withCachingEnabled(false).build())
> 
> Another option is to say: Hey we don't need this, what are you on about!
> 
> The above has focussed on state store related overloads, but the same ideas
> could  be applied to joins etc, where we presently have many join methods
> and many overloads.
> 
> Anyway, i look forward to hearing your opinions.
> 
> Thanks,
> Damian



Re: confluence permission request

2017-06-21 Thread Damian Guy
Hi,

That should be done.

Thanks,
Damian

On Wed, 21 Jun 2017 at 05:42 Kenji Hayashida  wrote:

> To Kafka Dev Team,
>
> Sorry, forgot sending my ID.
> My ID is kenjih.
>
> Thanks.
>
> - Kenji Hayashida
>
> 2017-06-21 13:29 GMT+09:00 Kenji Hayashida :
>
> > To Kafka Dev Team,
> >
> > Hi, could you please give me a write permission to the confluence page?
> > https://cwiki.apache.org/confluence/display/KAFKA/
> > Kafka+Improvement+Proposals
> >
> > I'm going to write a KIP.
> > Thanks.
> >
> > - Kenji Hayashida
> >
> >
>
>
> --
>
> ☆---★
> 林田賢二
> MAIL: kenji12...@gmail.com
>
> ☆---★
>


Re: Contributor

2017-06-21 Thread Damian Guy
Done - thanks

On Wed, 21 Jun 2017 at 12:19 Tom Bentley  wrote:

> Please can I also be added? My username is tombentley.
>
> Thanks
>
> Tom
>
> On 21 June 2017 at 12:03, Damian Guy  wrote:
>
> > Hi Andras,
> >
> > You should have access now.
> >
> > Thanks,
> > Damian
> >
> > On Wed, 21 Jun 2017 at 10:45 Andras Beni 
> wrote:
> >
> > > Hi All,
> > >
> > > I'd like to contribute to Apache Kafka.
> > > Can you please add me (username: andrasbeni) to the contributors list
> for
> > > this project at issues.apache.org?
> > >
> > > Thank you,
> > > Andras
> > >
> >
>


[DISCUSS] Streams DSL/StateStore Refactoring

2017-06-21 Thread Damian Guy
Hi,

I'd like to get a discussion going around some of the API choices we've
made in the DLS. In particular those that relate to stateful operations
(though this could expand).
As it stands we lean heavily on overloaded methods in the API, i.e, there
are 9 overloads for KGroupedStream.count(..)! It is becoming noisy and i
feel it is only going to get worse as we add more optional params. In
particular we've had some requests to be able to turn caching off, or
change log configs,  on a per operator basis (note this can be done now if
you pass in a StateStoreSupplier, but this can be a bit cumbersome).

So this is a bit of an open question. How can we change the DSL overloads
so that it flows, is simple to use and understand, and is easily extended
in the future?

One option would be to use a fluent API approach for providing the optional
params, so something like this:

groupedStream.count()
   .withStoreName("name")
   .withCachingEnabled(false)
   .withLoggingEnabled(config)
   .table()



Another option would be to provide a Builder to the count method, so it
would look something like this:
groupedStream.count(new
CountBuilder("storeName").withCachingEnabled(false).build())

Another option is to say: Hey we don't need this, what are you on about!

The above has focussed on state store related overloads, but the same ideas
could  be applied to joins etc, where we presently have many join methods
and many overloads.

Anyway, i look forward to hearing your opinions.

Thanks,
Damian


Consumer Lag Metric in Kafka

2017-06-21 Thread Madhav Ancha (BLOOMBERG/ 919 3RD A)
Hi,

When producerOffset is used in calculating ConsumerLag/MaxLag metric for a 
consumer, is the producerOffset 

a) the partition offset that is visible to the clients at the leader
b) or the partition offset that is waiting to be replicated at the leader 
please.

Thanks
Madhav.

RE: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread Tauzell, Dave
I see.

1.   You don’t want the 100k machines sending directly to kafka.

2.   You can only have a small number of web servers

People certainly have web-servers handling over 100k concurrent connections.  
See this for some examples:  https://github.com/smallnest/C1000K-Servers .

It seems possible with the right sort of kafka producer tuning.

-Dave

From: SenthilKumar K [mailto:senthilec...@gmail.com]
Sent: Wednesday, June 21, 2017 8:55 AM
To: Tauzell, Dave
Cc: us...@kafka.apache.org; senthilec...@apache.org; dev@kafka.apache.org; 
Senthil kumar
Subject: Re: Handling 2 to 3 Million Events before Kafka

Thanks Jeyhun. Yes http server would be problematic here w.r.t network , memory 
..

Hi Dave ,  The problem is not with Kafka , it's all about how do you handle 
huge data before kafka.  I did a simple test with 5 node Kafka Cluster which 
gives good result ( ~950 MB/s ) ..So Kafka side i dont see a scaling issue ...

All we are trying is before kafka how do we handle messages from different 
servers ...  Webservers can send fast to kafka but still i can handle only 50k 
events per second which is less for my use case.. also i can't deploy 20 
webservers to handle this load. I'm looking for an option what could be the 
best candidate before kafka , it should be super fast in getting all and send 
it to kafka producer ..


--Senthil

On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave 
mailto:dave.tauz...@surescripts.com>> wrote:
What are your configurations?

- production
- brokers
- consumers

Is the problem that web servers cannot send to Kafka fast enough or your 
consumers cannot process messages off of kafka fast enough?
What is the average size of these messages?

-Dave

-Original Message-
From: SenthilKumar K 
[mailto:senthilec...@gmail.com]
Sent: Wednesday, June 21, 2017 7:58 AM
To: us...@kafka.apache.org
Cc: senthilec...@apache.org; Senthil kumar; 
dev@kafka.apache.org
Subject: Handling 2 to 3 Million Events before Kafka

Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...

I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka is 
really good candidate for us to handle this ingestion rate ..


100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..

I see the problem in Http Server where it can't handle beyond 50K events per 
instance ..  I'm thinking some other solution would be right choice before 
Kafka ..

Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?

--Senthil
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.



Re: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread SenthilKumar K
Thanks Jeyhun. Yes http server would be problematic here w.r.t network ,
memory ..

Hi Dave ,  The problem is not with Kafka , it's all about how do you handle
huge data before kafka.  I did a simple test with 5 node Kafka Cluster
which gives good result ( ~950 MB/s ) ..So Kafka side i dont see a scaling
issue ...

All we are trying is before kafka how do we handle messages from different
servers ...  Webservers can send fast to kafka but still i can handle only
50k events per second which is less for my use case.. also i can't deploy
20 webservers to handle this load. I'm looking for an option what could be
the best candidate before kafka , it should be super fast in getting all
and send it to kafka producer ..


--Senthil

On Wed, Jun 21, 2017 at 6:53 PM, Tauzell, Dave  wrote:

> What are your configurations?
>
> - production
> - brokers
> - consumers
>
> Is the problem that web servers cannot send to Kafka fast enough or your
> consumers cannot process messages off of kafka fast enough?
> What is the average size of these messages?
>
> -Dave
>
> -Original Message-
> From: SenthilKumar K [mailto:senthilec...@gmail.com]
> Sent: Wednesday, June 21, 2017 7:58 AM
> To: us...@kafka.apache.org
> Cc: senthilec...@apache.org; Senthil kumar; dev@kafka.apache.org
> Subject: Handling 2 to 3 Million Events before Kafka
>
> Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...
>
> I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
> is really good candidate for us to handle this ingestion rate ..
>
>
> 100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..
>
> I see the problem in Http Server where it can't handle beyond 50K events
> per instance ..  I'm thinking some other solution would be right choice
> before Kafka ..
>
> Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?
>
> --Senthil
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>


[GitHub] kafka pull request #3396: KAFKA-4931: stop script fails due 4096 ps output l...

2017-06-21 Thread tombentley
GitHub user tombentley opened a pull request:

https://github.com/apache/kafka/pull/3396

KAFKA-4931: stop script fails due 4096 ps output limit

This also fixes KAFKA-4389 and KAFKA-4297, which were exactly the same
issue but for kafka-server-stop.sh.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tombentley/kafka KAFKA-4297+4389+4931

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3396.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3396


commit 1f9c7d4f98b183b9c73079c6906914628b07c56c
Author: Tom Bentley 
Date:   2017-06-21T13:21:56Z

KAFKA-4931: stop script fails due 4096 ps output limit

This also fixes KAFKA-4389 and KAFKA-4297, which were exactly the same
issue but for kafka-server-stop.sh.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5488) KStream.branch should not return a Array of streams we have to access by known index

2017-06-21 Thread Marcel Silberhorn (JIRA)
Marcel Silberhorn created KAFKA-5488:


 Summary: KStream.branch should not return a Array of streams we 
have to access by known index
 Key: KAFKA-5488
 URL: https://issues.apache.org/jira/browse/KAFKA-5488
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.2.1
Reporter: Marcel Silberhorn


some details and thoughts about:

https://gitlab.com/childno.de/apache_kafka/snippets/1665655

long story short: it's a mess to get a {{KStream<>[]}} out from 
{{KStream<>branch(Predicate<>...)}}. It breaks the fluent API and it produces 
bad code which is not that good to maintain since you have to know the right 
index for an unnamed branching stream.

Quick idea, s.th. like {{void branch(final KeyValue, 
Consumer>>... branchPredicatesAndHandlers);}} where you can write 
branches/streams code nested where it belongs to



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


RE: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread Tauzell, Dave
What are your configurations?

- production
- brokers
- consumers

Is the problem that web servers cannot send to Kafka fast enough or your 
consumers cannot process messages off of kafka fast enough?
What is the average size of these messages?

-Dave

-Original Message-
From: SenthilKumar K [mailto:senthilec...@gmail.com]
Sent: Wednesday, June 21, 2017 7:58 AM
To: us...@kafka.apache.org
Cc: senthilec...@apache.org; Senthil kumar; dev@kafka.apache.org
Subject: Handling 2 to 3 Million Events before Kafka

Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...

I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka is 
really good candidate for us to handle this ingestion rate ..


100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..

I see the problem in Http Server where it can't handle beyond 50K events per 
instance ..  I'm thinking some other solution would be right choice before 
Kafka ..

Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?

--Senthil
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


Re: Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread Jeyhun Karimov
Hi,

With kafka you can increase overall throughput  by increasing the number of
nodes in a cluster.
I had a similar issue, where we needed to ingest vast amounts of data to
streaming system.
In our case, kafka was a bottleneck, because of disk I/O. To solve it, we
implemented (simple) distributed pub-sub system with C which reside data in
memory. Also you should take account your network bandwidth and the
(upper-bound) capability of your processing engine or http server.


Cheers,
Jeyhun


On Wed, Jun 21, 2017 at 2:58 PM SenthilKumar K 
wrote:

> Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...
>
> I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
> is really good candidate for us to handle this ingestion rate ..
>
>
> 100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..
>
> I see the problem in Http Server where it can't handle beyond 50K events
> per instance ..  I'm thinking some other solution would be right choice
> before Kafka ..
>
> Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?
>
> --Senthil
>
-- 
-Cheers

Jeyhun


Handling 2 to 3 Million Events before Kafka

2017-06-21 Thread SenthilKumar K
Hi Team ,   Sorry if this question is irrelevant to Kafka Group ...

I have been trying to solve problem of handling 5 GB/sec ingestion. Kafka
is really good candidate for us to handle this ingestion rate ..


100K machines > { Http Server (Jetty/Netty) } --> Kafka Cluster..

I see the problem in Http Server where it can't handle beyond 50K events
per instance ..  I'm thinking some other solution would be right choice
before Kafka ..

Anyone worked on similar use case and similar load ? Suggestions/Thoughts ?

--Senthil


[jira] [Created] (KAFKA-5487) Rolling upgrade test for streams

2017-06-21 Thread Eno Thereska (JIRA)
Eno Thereska created KAFKA-5487:
---

 Summary: Rolling upgrade test for streams
 Key: KAFKA-5487
 URL: https://issues.apache.org/jira/browse/KAFKA-5487
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.11.0.0
Reporter: Eno Thereska
Assignee: Eno Thereska
 Fix For: 0.11.0.1


We need to do a basic rolling upgrade test for streams, similar to the 
tests/kafkatest/tests/core/upgrade_test.py test for Kafka core. Basically we 
need to test the ability of a streams app to use a different JAR from a 
different version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Contributor

2017-06-21 Thread Tom Bentley
Please can I also be added? My username is tombentley.

Thanks

Tom

On 21 June 2017 at 12:03, Damian Guy  wrote:

> Hi Andras,
>
> You should have access now.
>
> Thanks,
> Damian
>
> On Wed, 21 Jun 2017 at 10:45 Andras Beni  wrote:
>
> > Hi All,
> >
> > I'd like to contribute to Apache Kafka.
> > Can you please add me (username: andrasbeni) to the contributors list for
> > this project at issues.apache.org?
> >
> > Thank you,
> > Andras
> >
>


[GitHub] kafka pull request #3395: KAFKA-3575: Use console consumer access topic that...

2017-06-21 Thread tombentley
GitHub user tombentley opened a pull request:

https://github.com/apache/kafka/pull/3395

KAFKA-3575: Use console consumer access topic that does not exist, ca…

…n not use "Control + C" to exit process

A finally block is not guaranteed to execute in the event of Ctrl+C 
happening
while in the try or catch blocks. Decrementing the latch in the finally 
block
therefore made the shutdown hook hang waiting for something that would
never happen and the JVM couldn't exit while it was running the shutdown 
hook.

Replacing the latch with an atomic flag to say whether we've run the cleanup
code allows us to either run it from the shutdown hook, or the finally 
block.
It should thus definitely run once. When run from the shutdown hook the main
thread would no longer be running, so it should be threadsafe.

The contribution is my original work and I license the work to the project 
under the project's open source license.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tombentley/kafka KAFKA-3575

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3395.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3395


commit 66b1ff6d37d0eab36e00f1251d0463a7b13cd11d
Author: Tom Bentley 
Date:   2017-06-21T11:11:27Z

KAFKA-3575: Use console consumer access topic that does not exist, can not 
use "Control + C" to exit process

A finally block is not guaranteed to execute in the event of Ctrl+C 
happening
while in the try or catch blocks. Decrementing the latch in the finally 
block
therefore made the shutdown hook hang waiting for something that would
never happen and the JVM couldn't exit while it was running the shutdown 
hook.

Replacing the latch with an atomic flag to say whether we've run the cleanup
code allows us to either run it from the shutdown hook, or the finally 
block.
It should thus definitely run once. When run from the shutdown hook the main
thread would no longer be running, so it should be threadsafe.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Contributor

2017-06-21 Thread Damian Guy
Hi Andras,

You should have access now.

Thanks,
Damian

On Wed, 21 Jun 2017 at 10:45 Andras Beni  wrote:

> Hi All,
>
> I'd like to contribute to Apache Kafka.
> Can you please add me (username: andrasbeni) to the contributors list for
> this project at issues.apache.org?
>
> Thank you,
> Andras
>


[GitHub] kafka pull request #3289: MINOR: add Yahoo benchmark to nightly runs

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3289


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3394: KAFKA-5475: Connector config validation needs to i...

2017-06-21 Thread kkonstantine
GitHub user kkonstantine opened a pull request:

https://github.com/apache/kafka/pull/3394

KAFKA-5475: Connector config validation needs to include tranformation types



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kkonstantine/kafka 
KAFKA-5475-Connector-config-validation-REST-API-endpoint-not-including-fields-for-transformations

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3394.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3394


commit 10c4ef7b8785119f42eb48dac237fe017c93b753
Author: Konstantine Karantasis 
Date:   2017-06-21T10:36:39Z

KAFKA-5475: Connector config validation needs to include tranformation 
types.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3393: KAFKA-5319: Add a tool to balance replicas and lea...

2017-06-21 Thread MarkTcMA
GitHub user MarkTcMA opened a pull request:

https://github.com/apache/kafka/pull/3393

KAFKA-5319: Add a tool to balance replicas and leaders of cluster



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MarkTcMA/kafka KAFKA-5319

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3393.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3393


commit 0229e10da3d552ae1bb44e9829671a7ddcd0a3b1
Author: Ma Tianchi 
Date:   2017-06-21T10:34:03Z

To deal with KAFKA-5319




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3392: KAFKA-5319:Add a tool to balance the cluster

2017-06-21 Thread MarkTcMA
Github user MarkTcMA closed the pull request at:

https://github.com/apache/kafka/pull/3392


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3392: KAFKA-5319:Add a tool to balance the cluster

2017-06-21 Thread MarkTcMA
GitHub user MarkTcMA opened a pull request:

https://github.com/apache/kafka/pull/3392

KAFKA-5319:Add a tool to balance the cluster

It is the code about 
[KIP-166](https://cwiki.apache.org/confluence/display/KAFKA/KIP-166+-+Add+a+tool+to+make+amounts+of+replicas+and+leaders+on+brokers+balancedl)
 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MarkTcMA/kafka KIP-166

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3392.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3392


commit 7946b2f35c0e3811ba7d706fc9c761d73ece47bb
Author: Damian Guy 
Date:   2017-01-16T19:40:47Z

MINOR: Remove unused constructor param from ProcessorStateManager

Remove applicationId parameter as it is no longer used.

Author: Damian Guy 

Reviewers: Guozhang Wang 

Closes #2385 from dguy/minor-remove-unused-param

commit 621dff22e79dc64b9a8748186dd985774044f91a
Author: Rajini Sivaram 
Date:   2017-01-17T11:16:29Z

KAFKA-4363; Documentation for sasl.jaas.config property

Author: Rajini Sivaram 

Reviewers: Ismael Juma 

Closes #2316 from rajinisivaram/KAFKA-4363

commit e3f4cdd0e249f78a7f4e8f064533bcd15eb11cbf
Author: Rajini Sivaram 
Date:   2017-01-17T12:55:07Z

KAFKA-4590; SASL/SCRAM system tests

Runs sanity test and one replication test using SASL/SCRAM.

Author: Rajini Sivaram 

Reviewers: Ewen Cheslack-Postava , Ismael Juma 


Closes #2355 from rajinisivaram/KAFKA-4590

commit 2b19ad9d8c47fb0f78a6e90d2f5711df6110bf1f
Author: Rajini Sivaram 
Date:   2017-01-17T18:42:55Z

KAFKA-4580; Use sasl.jaas.config for some system tests

Switched console_consumer, verifiable_consumer and verifiable_producer to 
use new sasl.jaas_config property instead of static JAAS configuration file 
when used with SASL_PLAINTEXT.

Author: Rajini Sivaram 

Reviewers: Ewen Cheslack-Postava , Ismael Juma 


Closes #2323 from rajinisivaram/KAFKA-4580

(cherry picked from commit 3f6c4f63c9c17424cf717ca76c74554bcf3b2e9a)
Signed-off-by: Ismael Juma 

commit 60d759a227087079e6fd270c68fd9e38441cb34a
Author: Jason Gustafson 
Date:   2017-01-17T18:42:05Z

MINOR: Some cleanups and additional testing for KIP-88

Author: Jason Gustafson 

Reviewers: Vahid Hashemian , Ismael Juma 


Closes #2383 from hachikuji/minor-cleanup-kip-88

commit c9b9acf6a8b542c2d0d825c17a4a20cf3fa5
Author: Damian Guy 
Date:   2017-01-17T20:33:11Z

KAFKA-4588: Wait for topics to be created in 
QueryableStateIntegrationTest.shouldNotMakeStoreAvailableUntilAllStoresAvailable

After debugging this i can see the times that it fails there is a race 
between when the topic is actually created/ready on the broker and when the 
assignment happens. When it fails `StreamPartitionAssignor.assign(..)` gets 
called with a `Cluster` with no topics. Hence the test hangs as no tasks get 
assigned. To fix this I added a `waitForTopics` method to 
`EmbeddedKafkaCluster`. This will wait until the topics have been created.

Author: Damian Guy 

Reviewers: Matthias J. Sax, Guozhang Wang

Closes #2371 from dguy/integration-test-fix

(cherry picked from commit 825f225bc5706b16af8ec44ca47ee1452c11e6f3)
Signed-off-by: Guozhang Wang 

commit 6f72a5a53c444278187fa6be58031168bcaffb26
Author: Damian Guy 
Date:   2017-01-17T22:13:46Z

KAFKA-3452 Follow-up: Refactoring StateStore hierarchies

This is a follow up of https://github.com/apache/kafka/pull/2166 - 
refactoring the store hierarchies as requested

Author: Damian Guy 

Reviewers: Guozhang Wang 

Closes #2360 from dguy/state-store-refactor

(cherry picked from commit 73b7ae0019d387407375f3865e263225c986a6ce)
Signed-off-by: Guozhang Wang 

commit eb62e5695506ae13bd37102c3c08e8a067eca0c8
Author: Ismael Juma 
Date:   2017-01-18T02:43:10Z

KAFKA-4591; Create Topic Policy follow-up

1. Added javadoc to public classes
2. Removed `s` from config name for consistency with interface name
3. The policy interface now implements Configurable and AutoCloseable as 
per the KIP
4. Use `null` instead of `-1` in `RequestMetadata`
5. Perform all broker validation before invoking the policy
6. Add tests

Author: Ismael Juma 

Reviewers: Jason Gustafson 

Closes #2388 from ijuma/create-topic-policy-docs-and-config-name-change

(cherry picked from commit fd6d7bcf335166a524dc9a29a50c96af8f1c1c02)
Signed-off-by: Ismael Juma 

commit e38794e020951adec5a5d0bbfe42c57294bf67bd
Author: Guozhang Wang 
Date:   2017-01-18T04:29:55Z

KAFKA-3502; move RocksDB options construction to init()

In RocksDBStore, option

[jira] [Created] (KAFKA-5486) org.apache.kafka logging should go to server.log

2017-06-21 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-5486:
--

 Summary: org.apache.kafka logging should go to server.log
 Key: KAFKA-5486
 URL: https://issues.apache.org/jira/browse/KAFKA-5486
 Project: Kafka
  Issue Type: Bug
Reporter: Ismael Juma
Assignee: Ismael Juma
Priority: Critical


The broker uses a number of classes from the org.apache.kafka package and 
logging from those classes should go to server.log. It currently goes to stdout 
only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Contributor

2017-06-21 Thread Andras Beni
Hi All,

I'd like to contribute to Apache Kafka.
Can you please add me (username: andrasbeni) to the contributors list for
this project at issues.apache.org?

Thank you,
Andras


Re: [DISCUSS]: KIP-161: streams record processing exception handlers

2017-06-21 Thread Eno Thereska
Thanks Guozhang,

I’ve updated the KIP and hopefully addressed all the comments so far. In the 
process also changed the name of the KIP to reflect its scope better: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
 


Any other feedback appreciated, otherwise I’ll start the vote soon.

Thanks
Eno

> On Jun 12, 2017, at 6:28 AM, Guozhang Wang  wrote:
> 
> Eno, Thanks for bringing this proposal up and sorry for getting late on
> this. Here are my two cents:
> 
> 1. First some meta comments regarding "fail fast" v.s. "making progress". I
> agree that in general we should better "enforce user to do the right thing"
> in system design, but we also need to keep in mind that Kafka is a
> multi-tenant system, i.e. from a Streams app's pov you probably would not
> control the whole streaming processing pipeline end-to-end. E.g. Your input
> data may not be controlled by yourself; it could be written by another app,
> or another team in your company, or even a different organization, and if
> an error happens maybe you cannot fix "to do the right thing" just by
> yourself in time. In such an environment I think it is important to leave
> the door open to let users be more resilient. So I find the current
> proposal which does leave the door open for either fail-fast or make
> progress quite reasonable.
> 
> 2. On the other hand, if the question is whether we should provide a
> built-in "send to bad queue" handler from the library, I think that might
> be an overkill: with some tweaks (see my detailed comments below) on the
> API we can allow users to implement such handlers pretty easily. In fact, I
> feel even "LogAndThresholdExceptionHandler" is not necessary as a built-in
> handler, as it would then require users to specify the threshold via
> configs, etc. I think letting people provide such "eco-libraries" may be
> better.
> 
> 3. Regarding the CRC error: today we validate CRC on both the broker end
> upon receiving produce requests and on consumer end upon receiving fetch
> responses; and if the CRC validation fails in the former case it would not
> be appended to the broker logs. So if we do see a CRC failure on the
> consumer side it has to be that either we have a flipped bit on the broker
> disks or over the wire. For the first case it is fatal while for the second
> it is retriable. Unfortunately we cannot tell which case it is when seeing
> CRC validation failures. But in either case, just skipping and making
> progress seems not a good choice here, and hence I would personally exclude
> these errors from the general serde errors to NOT leave the door open of
> making progress.
> 
> Currently such errors are thrown as KafkaException that wraps an
> InvalidRecordException, which may be too general and we could consider just
> throwing the InvalidRecordException directly. But that could be an
> orthogonal discussion if we agrees that CRC failures should not be
> considered in this KIP.
> 
> 
> 
> Now some detailed comments:
> 
> 4. Could we consider adding the processor context in the handle() function
> as well? This context will be wrapping as the source node that is about to
> process the record. This could expose more info like which task / source
> node sees this error, which timestamp of the message, etc, and also can
> allow users to implement their handlers by exposing some metrics, by
> calling context.forward() to implement the "send to bad queue" behavior etc.
> 
> 5. Could you add the string name of
> StreamsConfig.DEFAULT_RECORD_EXCEPTION_HANDLER as well in the KIP?
> Personally I find "default" prefix a bit misleading since we do not allow
> users to override it per-node yet. But I'm okay either way as I can see we
> may extend it in the future and probably would like to not rename the
> config again. Also from the experience of `default partitioner` and
> `default timestamp extractor` we may also make sure that the passed in
> object can be either a string "class name" or a class object?
> 
> 
> Guozhang
> 
> 
> On Wed, Jun 7, 2017 at 2:16 PM, Jan Filipiak 
> wrote:
> 
>> Hi Eno,
>> 
>> On 07.06.2017 22:49, Eno Thereska wrote:
>> 
>>> Comments inline:
>>> 
>>> On 5 Jun 2017, at 18:19, Jan Filipiak  wrote:
 
 Hi
 
 just my few thoughts
 
 On 05.06.2017 11:44, Eno Thereska wrote:
 
> Hi there,
> 
> Sorry for the late reply, I was out this past week. Looks like good
> progress was made with the discussions either way. Let me recap a couple 
> of
> points I saw into one big reply:
> 
> 1. Jan mentioned CRC errors. I think this is a good point. As these
> happen in Kafka, before Kafka Streams gets a chance to inspect anything,
> I'd like to hear the opinion of more Kafka folks like Ismael or Jason on
> this one. Currently the documentation is not great 

[GitHub] kafka pull request #3378: MINOR: explain producer naming within Streams

2017-06-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3378


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---