Re: Question regarding buffer.memory, max.request.size and send.buffer.bytes

2017-05-23 Thread Milind Vaidya
I am looking for Producer tuning as mentioned in the mail, all the
properties are related to producer config.

This is where the property is mentioned :
https://kafka.apache.org/0100/documentation.html#producerconfigs

Consumer in this case if KafkaSpout from Apache-Storm.





On Tue, May 23, 2017 at 3:28 PM, Mohammed Manna  wrote:

>  This could be for various reasons:
>
> 1) Your consumer.property settings - if you have not been acknowledging
> automatically, you need to provide a sufficient polling time and commit in
> sync/async.
> 2) You are not consuming the messages how you think.
>
> I don't know how you got this buffer.memory property. Doesn't sound right,
> could you kindly check this again? Also, could you please provide a snippet
> of your Consumer and how you are reading from the stream?
>
> By default, the buffer is about 10% of the message.max.bytes. Perhaps you
> are looking for a Producer tuning by using the following:
>
> batch.size
> message.max.bytes
> send.buffer.bytes
> Cloudtera and Confluent.io have some nice articles on Kafka. Have a read
> through this
> https://www.cloudera.com/documentation/kafka/latest/
> topics/kafka_performance.html
>
>
>
> On 23 May 2017 at 20:09, Milind Vaidya  wrote:
>
> > I have set the producer properties as follows (0.10.0.0)
> >
> > *"linger.ms "** : **"500"** ,*
> >
> >  *"batch.size"** : **"1000"**,*
> >
> > *"buffer.memory"** :**"**1**"**,*
> >
> >  *"send.buffer.bytes"** : **"512000"*
> >
> > *and default *
> >
> > * max.request.size = *1048576
> >
> >
> >  If records are sent faster than they can be delivered, they will be
> > buffered. Now with buffer.memory having *1 *bytes value, if a record
> > has
> >  more size than this what will happen ? say 11629 bytes in size. What is
> > the minimum value of buffer.memory in terms of other params ? Should it
> be
> > atleast equal to *send.buffer.bytes or **max.request.size or* better left
> > to default which is 33554432 ?
> >
> > I am trying to debug some events not reaching consumer, so wondering if
> > this could be the reason.
> >
>


Re: Question regarding buffer.memory, max.request.size and send.buffer.bytes

2017-05-23 Thread Mohammed Manna
 This could be for various reasons:

1) Your consumer.property settings - if you have not been acknowledging
automatically, you need to provide a sufficient polling time and commit in
sync/async.
2) You are not consuming the messages how you think.

I don't know how you got this buffer.memory property. Doesn't sound right,
could you kindly check this again? Also, could you please provide a snippet
of your Consumer and how you are reading from the stream?

By default, the buffer is about 10% of the message.max.bytes. Perhaps you
are looking for a Producer tuning by using the following:

batch.size
message.max.bytes
send.buffer.bytes
Cloudtera and Confluent.io have some nice articles on Kafka. Have a read
through this
https://www.cloudera.com/documentation/kafka/latest/topics/kafka_performance.html



On 23 May 2017 at 20:09, Milind Vaidya  wrote:

> I have set the producer properties as follows (0.10.0.0)
>
> *"linger.ms "** : **"500"** ,*
>
>  *"batch.size"** : **"1000"**,*
>
> *"buffer.memory"** :**"**1**"**,*
>
>  *"send.buffer.bytes"** : **"512000"*
>
> *and default *
>
> * max.request.size = *1048576
>
>
>  If records are sent faster than they can be delivered, they will be
> buffered. Now with buffer.memory having *1 *bytes value, if a record
> has
>  more size than this what will happen ? say 11629 bytes in size. What is
> the minimum value of buffer.memory in terms of other params ? Should it be
> atleast equal to *send.buffer.bytes or **max.request.size or* better left
> to default which is 33554432 ?
>
> I am trying to debug some events not reaching consumer, so wondering if
> this could be the reason.
>


Re: Kafka Streams: "subscribed topics are not assigned to any members in the group"

2017-05-23 Thread Matthias J. Sax
Hi Dimity,

this sounds like a bug to me. Can you share some more details. What is
your program structure? How many partitions so you have per topic? How
many threads/instances to you run?

When does the issue occur exactly?


-Matthias

On 5/23/17 12:26 PM, Dmitry Minkovsky wrote:
> Certain elements of my streams app stopped working and I noticed that my
> logs contain:
> 
> [2017-05-23 15:23:09,274] WARN
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:361) The
> following subscribed topics are not assigned to any members in the group
> user-service : [user-service-KSTREAM-KEY-SELECT-38-repartition,
> user-service-KSTREAM-KEY-SELECT-000130-repartition, users,
> user-service-logins-by-user-repartition]
> 
> These are precisely the affected topics. Does anyone know what this log is
> related to, and why this might have happened?
> 
> I am working on isolating the problem, but if anyone has seen this before,
> some help/guidance would be much appreciated.
> 
> Thank you!
> 



signature.asc
Description: OpenPGP digital signature


RE: MBean for records-lag-max metric always 1.0 or -Infinity

2017-05-23 Thread Benson Chen
We just tried upgrading the jar to 10.2 version and it works!  Also, I was 
misleading with the prior version.  We were running 10.0.1 KafkaServer but 
using the 10.1 client jar.  With the 10.2 client jar working against the 10.0.1 
KafkaServer, the records-lag-max metrics is now reporting correctly!
Anyway, it may be useful to document this for other users the compatibility of 
this very useful metric.  Thanks.
From: Benson Chen
Sent: Tuesday, May 23, 2017 3:11 PM
To: users@kafka.apache.org
Cc: Hongjie Xin 
Subject: MBean for records-lag-max metric always 1.0 or -Infinity

Hello,

We're researching autoscaling our microservices consuming from a Kafka Topic 
based on the records-lag-max metric.  For the following MBean object:

 MBeans TAB: kafka.consumer-> consumer-fetcher-manager-metric -> consumer-1 -> 
Attributes -> records-lag-max

In JConsole, we noticed that we consistently get either 1.0 or -Infinity.  We 
also implemented in our Java code to invoke the metrics() method of 
KafkaConsumer.java to fetch the metrics after a poll of records from the Topic.
Map metrics()//Get the metrics kept by the 
consumer
Printing out this record also shows 1.0 or -Infinity.
In our test we have a Kafka Producer sending in about 50 messages/s while a 
Kafka Consumer is consuming about 20 messages/s.  We know that the Kafka 
Consumer is definitely lagging behind because it continues to consume messages 
well after shutting down the Kafka Producer.
Is this a known bug to get the correct value for records-lag-max or do we need 
to tune the configuration settings somehow?
We're running 0.10.1.1.
Thanks for your help.


MBean for records-lag-max metric always 1.0 or -Infinity

2017-05-23 Thread Benson Chen
Hello,

We're researching autoscaling our microservices consuming from a Kafka Topic 
based on the records-lag-max metric.  For the following MBean object:

 MBeans TAB: kafka.consumer-> consumer-fetcher-manager-metric -> consumer-1 -> 
Attributes -> records-lag-max

In JConsole, we noticed that we consistently get either 1.0 or -Infinity.  We 
also implemented in our Java code to invoke the metrics() method of 
KafkaConsumer.java to fetch the metrics after a poll of records from the Topic.
Map metrics()//Get the metrics kept by the 
consumer
Printing out this record also shows 1.0 or -Infinity.
In our test we have a Kafka Producer sending in about 50 messages/s while a 
Kafka Consumer is consuming about 20 messages/s.  We know that the Kafka 
Consumer is definitely lagging behind because it continues to consume messages 
well after shutting down the Kafka Producer.
Is this a known bug to get the correct value for records-lag-max or do we need 
to tune the configuration settings somehow?
We're running 0.10.1.1.
Thanks for your help.


Kafka Streams: "subscribed topics are not assigned to any members in the group"

2017-05-23 Thread Dmitry Minkovsky
Certain elements of my streams app stopped working and I noticed that my
logs contain:

[2017-05-23 15:23:09,274] WARN
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:361) The
following subscribed topics are not assigned to any members in the group
user-service : [user-service-KSTREAM-KEY-SELECT-38-repartition,
user-service-KSTREAM-KEY-SELECT-000130-repartition, users,
user-service-logins-by-user-repartition]

These are precisely the affected topics. Does anyone know what this log is
related to, and why this might have happened?

I am working on isolating the problem, but if anyone has seen this before,
some help/guidance would be much appreciated.

Thank you!


Re: Server Property num.partitions and Specification of Topic Paritions on Console

2017-05-23 Thread Girish Aher
The server property of num.partitions is the default value used when
auto.create.topics.enable=true.
If you are creating topics using a utility like kafka-topics, then what you
provide with that utility is what is used for topic creation irrespective
of whether it is less than or more than the server property of
num.partitions.

On Tue, May 23, 2017 at 9:41 AM, Mohammed Manna  wrote:

> Hello,
>
> I wanted to understand the relationship between the number of partitions in
> default specification (server.properties) and topic specification. If I do
> the following:
>
>
> kafka-topics --create --zookeeper localhost:2181 --replication-factor 3
> --partitions 3 --topic myTopic
>
> It will actually *overwrite* the default partition number (num.partitions)
> provided in the respective server.properties file (if server.properties
> file have a smaller value). Otherwise, it will throw some kind of error
> since the minimum number of partitions (num.partitions) is greater than the
> topic specification (--partitions).
>
> Is that correct?
>
> Kindest Regards,
>


Question regarding buffer.memory, max.request.size and send.buffer.bytes

2017-05-23 Thread Milind Vaidya
I have set the producer properties as follows (0.10.0.0)

*"linger.ms "** : **"500"** ,*

 *"batch.size"** : **"1000"**,*

*"buffer.memory"** :**"**1**"**,*

 *"send.buffer.bytes"** : **"512000"*

*and default *

* max.request.size = *1048576


 If records are sent faster than they can be delivered, they will be
buffered. Now with buffer.memory having *1 *bytes value, if a record has
 more size than this what will happen ? say 11629 bytes in size. What is
the minimum value of buffer.memory in terms of other params ? Should it be
atleast equal to *send.buffer.bytes or **max.request.size or* better left
to default which is 33554432 ?

I am trying to debug some events not reaching consumer, so wondering if
this could be the reason.


Re: Partition assignment with multiple topics

2017-05-23 Thread Mike Gould
Hi
No joins - they're all separate data flows.
Having separate stream instances for subsets of topics would probably work.
Is doesn't seem as clean.  It's also slightly more tricky to distribute the
load across separate processes. We'd have to have only one stream thread
allocated to each topic in each process or the first to start grabs all the
load for the whole system.
Are there any details about the number of topics & partitions we can scale
to on example hardware?
Thanks

On Tue, 23 May 2017 at 09:00, Michal Borowiecki <
michal.borowie...@openbet.com> wrote:

> Hi Mike,
>
> Are you performing any operations (e.g. joins) across all topics?
>
> If so, I'd think increasing the number of partitions is indeed the way to
> go. Partition is the unit of parallelism per topic and all topics are bound
> together in your app in that case.
>
> If not, your other option is to break up your application into a number of
> KafkaStreams instances, each dealing with a subset of topics.
> Hope that helps.
> Michał
>
>
> On 23/05/17 08:47, Mike Gould wrote:
>
> Hi
> We have a couple of hundred topics - each carrying a similar but distinct
> message type but to keep the total partition count down each only has 3
> partitions.
>
> If I start Kafka-streams consuming all topics only 3 threads ever get
> assigned any partitions.
>
> I think the first thread to start gets the first partition of each topic,
> and so on until the 3rd thread, after that all the partitions are assigned
> - any further threads are just left idle.
>
> Is there any way to make the partition assignment smarter and either add a
> random element that moves partitions when further consumers start or
> considers all the partitions of subscribed topics together when assigning
> them?
>
> Our only alternative is creating many more partitions for each topic - and
> we're worried about how far this will scale.
>
> Thank you
> Mike G
>
>
>
>
> --
>  Michal Borowiecki
> Senior Software Engineer L4
> T: +44 208 742 1600
>
>
> +44 203 249 8448
>
>
>
> E: michal.borowie...@openbet.com
> W: www.openbet.com
> OpenBet Ltd
>
> Chiswick Park Building 9
>
> 566 Chiswick High Rd
>
> London
>
> W4 5XT
>
> UK
> 
> This message is confidential and intended only for the addressee. If you
> have received this message in error, please immediately notify the
> postmas...@openbet.com and delete it from your system as well as any
> copies. The content of e-mails as well as traffic data may be monitored by
> OpenBet for employment and security purposes. To protect the environment
> please do not print this e-mail unless necessary. OpenBet Ltd. Registered
> Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT,
> United Kingdom. A company registered in England and Wales. Registered no.
> 3134634. VAT no. GB927523612
>
-- 
 - MikeG
http://en.wikipedia.org/wiki/Common_misconceptions



Re: Kafka Authorization and ACLs Broken

2017-05-23 Thread Raghav
Darshan,

I have not yet successfully gotten the ACLs to work in Kafka. I am still
looking for help. I will update this email thread if I do find. In case you
get it working, please let me know.

Thanks.

R

On Tue, May 23, 2017 at 8:49 AM, Darshan Purandare <
purandare.dars...@gmail.com> wrote:

> Raghav
>
> I saw few posts of yours around Kafka ACLs and the problems. I have seen
> similar issues where Writer has not been able to write to any topic. I have
> seen "leader not available" and sometimes "unknown topic or partition", and
> "topic_authorization_failed" error.
>
> Let me know if you find a valid config that works.
>
> Thanks.
>
>
>
> On Tue, May 23, 2017 at 8:44 AM, Raghav  wrote:
>
>> Hello Kafka Users
>>
>> I am a new Kafka user and trying to make Kafka SSL work with Authorization
>> and ACLs. I followed posts from Kafka and Confluent docs exactly to the
>> point but my producer cannot write to kafka broker. I get
>> "LEADER_NOT_FOUND" errors. And even Consumer throws the same errors.
>>
>> Can someone please share their config which worked with ACLs.
>>
>> Here is my config. Please help.
>>
>> server.properties config
>> 
>> 
>> broker.id=0
>> auto.create.topics.enable=true
>> delete.topic.enable=true
>>
>> listeners=PLAINTEXT://kafka1.example.com:9092
>> ,SSL://kafka1.example.com:9093
>> 
>> host.name=kafka1.example.com 
>>
>>
>>
>> ssl.keystore.location=/var/private/kafka1.keystore.jks
>> ssl.keystore.password=12345678
>> ssl.key.password=12345678
>>
>> ssl.truststore.location=/var/private/kafka1.truststore.jks
>> ssl.truststore.password=12345678
>>
>> ssl.client.auth=required
>> ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
>> ssl.keystore.type=JKS
>> ssl.truststore.type=JKS
>>
>> authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer
>> 
>> 
>>
>>
>>
>> Here is producer Config(producer.properties)
>> 
>> 
>> security.protocol=SSL
>> ssl.truststore.location=/var/private/kafka2.truststore.jks
>> ssl.truststore.password=12345678
>>
>> ssl.keystore.location=/var/private/kafka2.keystore.jks
>> ssl.keystore.password=12345678
>> ssl.key.password=12345678
>>
>> ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
>> ssl.truststore.type=JKS
>> ssl.keystore.type=JKS
>>
>> 
>> 
>>
>>
>> Raqhav
>>
>
>


-- 
Raghav


Server Property num.partitions and Specification of Topic Paritions on Console

2017-05-23 Thread Mohammed Manna
Hello,

I wanted to understand the relationship between the number of partitions in
default specification (server.properties) and topic specification. If I do
the following:


kafka-topics --create --zookeeper localhost:2181 --replication-factor 3
--partitions 3 --topic myTopic

It will actually *overwrite* the default partition number (num.partitions)
provided in the respective server.properties file (if server.properties
file have a smaller value). Otherwise, it will throw some kind of error
since the minimum number of partitions (num.partitions) is greater than the
topic specification (--partitions).

Is that correct?

Kindest Regards,


Re: Streams error handling

2017-05-23 Thread Mike Gould
That's great for the value but not the key

On Thu, 13 Apr 2017 at 18:27, Sachin Mittal  wrote:

> We are also catching the exception in serde and returning null and then
> filtering out null values downstream so as they are not included.
>
> Thanks
> Sachin
>
>
> On Thu, Apr 13, 2017 at 9:13 PM, Mike Gould  wrote:
>
> > Great to know I've not gone off in the wrong direction
> > Thanks
> >
> > On Thu, 13 Apr 2017 at 16:34, Matthias J. Sax 
> > wrote:
> >
> > > Mike,
> > >
> > > thanks for your feedback. You are absolutely right that Streams API
> does
> > > not have great support for this atm. And it's very valuable that you
> > > report this (you are not the first person). It helps us prioritizing :)
> > >
> > > For now, there is no better solution as the one you described in your
> > > email, but its on our roadmap to improve the API -- and its priority
> got
> > > just increase by your request.
> > >
> > > I am sorry, that I can't give you a better answer right now :(
> > >
> > >
> > > -Matthias
> > >
> > >
> > > On 4/13/17 8:16 AM, Mike Gould wrote:
> > > > Hi
> > > > Are there any better error handling options for Kafka streams in
> java.
> > > >
> > > > Any errors in the serdes will break the stream.  The suggested
> > > > implementation is to use the byte[] serde and do the deserialisation
> > in a
> > > > map operation.  However this isn't ideal either as there's no great
> way
> > > to
> > > > handle exceptions.
> > > > My current tactics are to use flatMap in place of map everywhere and
> > > return
> > > > empySet on error. Unfortunately this means the error has to be
> handled
> > > > directly in the function where it happened and can only be handled
> as a
> > > > side effect.
> > > >
> > > > It seems to me that this could be done better. Maybe the *Mapper
> > > interfaces
> > > > could allow specific checked exceptions. These could be handled by
> > > specific
> > > > downstream KStream.mapException() steps which might e.g. Put an error
> > > > response on another stream branch.
> > > > Alternatively could it be made easier to return something like an
> > Either
> > > > from the Mappers with a the addition of few extra mapError or mapLeft
> > > > mapRight methods on KStream?
> > > >
> > > > Unless there's a better error handling pattern which I've entirely
> > > missed?
> > > >
> > > > Thanks
> > > > MIkeG
> > > >
> > >
> > > --
> >  - MikeG
> > http://en.wikipedia.org/wiki/Common_misconceptions
> > 
> >
>
-- 
 - MikeG
http://en.wikipedia.org/wiki/Common_misconceptions



Re: Streams error handling

2017-05-23 Thread Mike Gould
Hi please open one for me thank you

On Thu, 13 Apr 2017 at 17:04, Eno Thereska  wrote:

> Hi Mike,
>
> Thank you. Could you open a JIRA to capture this specific problem (a
> copy-paste would suffice)? Alternatively we can open it, up to you.
>
> Thanks
> Eno
> > On 13 Apr 2017, at 08:43, Mike Gould  wrote:
> >
> > Great to know I've not gone off in the wrong direction
> > Thanks
> >
> > On Thu, 13 Apr 2017 at 16:34, Matthias J. Sax 
> wrote:
> >
> >> Mike,
> >>
> >> thanks for your feedback. You are absolutely right that Streams API does
> >> not have great support for this atm. And it's very valuable that you
> >> report this (you are not the first person). It helps us prioritizing :)
> >>
> >> For now, there is no better solution as the one you described in your
> >> email, but its on our roadmap to improve the API -- and its priority got
> >> just increase by your request.
> >>
> >> I am sorry, that I can't give you a better answer right now :(
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 4/13/17 8:16 AM, Mike Gould wrote:
> >>> Hi
> >>> Are there any better error handling options for Kafka streams in java.
> >>>
> >>> Any errors in the serdes will break the stream.  The suggested
> >>> implementation is to use the byte[] serde and do the deserialisation
> in a
> >>> map operation.  However this isn't ideal either as there's no great way
> >> to
> >>> handle exceptions.
> >>> My current tactics are to use flatMap in place of map everywhere and
> >> return
> >>> empySet on error. Unfortunately this means the error has to be handled
> >>> directly in the function where it happened and can only be handled as a
> >>> side effect.
> >>>
> >>> It seems to me that this could be done better. Maybe the *Mapper
> >> interfaces
> >>> could allow specific checked exceptions. These could be handled by
> >> specific
> >>> downstream KStream.mapException() steps which might e.g. Put an error
> >>> response on another stream branch.
> >>> Alternatively could it be made easier to return something like an
> Either
> >>> from the Mappers with a the addition of few extra mapError or mapLeft
> >>> mapRight methods on KStream?
> >>>
> >>> Unless there's a better error handling pattern which I've entirely
> >> missed?
> >>>
> >>> Thanks
> >>> MIkeG
> >>>
> >>
> >> --
> > - MikeG
> > http://en.wikipedia.org/wiki/Common_misconceptions
> > 
>
> --
 - MikeG
http://en.wikipedia.org/wiki/Common_misconceptions



Re: Kafka Authorization and ACLs Broken

2017-05-23 Thread Darshan Purandare
Raghav

I saw few posts of yours around Kafka ACLs and the problems. I have seen
similar issues where Writer has not been able to write to any topic. I have
seen "leader not available" and sometimes "unknown topic or partition", and
"topic_authorization_failed" error.

Let me know if you find a valid config that works.

Thanks.



On Tue, May 23, 2017 at 8:44 AM, Raghav  wrote:

> Hello Kafka Users
>
> I am a new Kafka user and trying to make Kafka SSL work with Authorization
> and ACLs. I followed posts from Kafka and Confluent docs exactly to the
> point but my producer cannot write to kafka broker. I get
> "LEADER_NOT_FOUND" errors. And even Consumer throws the same errors.
>
> Can someone please share their config which worked with ACLs.
>
> Here is my config. Please help.
>
> server.properties config
> 
> 
> broker.id=0
> auto.create.topics.enable=true
> delete.topic.enable=true
>
> listeners=PLAINTEXT://kafka1.example.com:9092
> ,SSL://kafka1.example.com:9093
> 
> host.name=kafka1.example.com 
>
>
> ssl.keystore.location=/var/private/kafka1.keystore.jks
> ssl.keystore.password=12345678
> ssl.key.password=12345678
>
> ssl.truststore.location=/var/private/kafka1.truststore.jks
> ssl.truststore.password=12345678
>
> ssl.client.auth=required
> ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
> ssl.keystore.type=JKS
> ssl.truststore.type=JKS
>
> authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer
> 
> 
>
>
>
> Here is producer Config(producer.properties)
> 
> 
> security.protocol=SSL
> ssl.truststore.location=/var/private/kafka2.truststore.jks
> ssl.truststore.password=12345678
>
> ssl.keystore.location=/var/private/kafka2.keystore.jks
> ssl.keystore.password=12345678
> ssl.key.password=12345678
>
> ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
> ssl.truststore.type=JKS
> ssl.keystore.type=JKS
>
> 
> 
>
>
> Raqhav
>


Re: Partitions as mechanism to keep multitenant segregated data

2017-05-23 Thread João Peixoto
It seems like you're trying to use the partitioning mechanism as a routing
mechanism, which afaik is not really its objective.

It may work but it is definitely not the best approach imo.
1. You're throwing away the parallelism capabilities of Kafka. You'll have
a single "queue" per customer. By that point you could not use Kafka and
just have different REST endpoint for each customer routed through headers
for example.
2. Repartitioning is a cumbersome affair. If your customer pool increases
past your projections you'll need to shut everyone down while you change
the number of partitions.


On Tue, May 23, 2017 at 8:38 AM Tom Crayford  wrote:

> That might be ok. If that's the case, you can probably just "precreate" all
> the partitions for them upfront and avoid any worry about having to futz
> with consumers.
>
> On Tue, May 23, 2017 at 4:33 PM, David Espinosa  wrote:
>
> > Thanks for the answer Tom,
> > Indeed I will not have more than 10 or 20 customer per cluster, so that's
> > also the maximum number of partitions possible per topic.
> > Still a bad idea?
> >
> > 2017-05-23 16:48 GMT+02:00 Tom Crayford :
> >
> > > Hi there,
> > >
> > > I don't know about the consumer, but I'd *strongly* recommend not
> > designing
> > > your application around this. Kafka has severe and notable stability
> > > concerns with large numbers of partitions, and requiring "one partition
> > per
> > > customer" is going to be limiting, unless you only ever expect to have
> > > *very* small customer numbers (hundreds at most, ever). Instead, use a
> > hash
> > > function and a key, as recommended to land customers on the same
> > partition.
> > >
> > > Thanks
> > >
> > > Tom Crayford
> > > Heroku Kafka
> > >
> > > On Tue, May 23, 2017 at 9:46 AM, David Espinosa 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > In order to keep separated (physically) the data from different
> > customers
> > > > in our application, we are using a custom partitioner to drive
> messages
> > > to
> > > > a concrete partition of a topic. We know that we are loosing
> > parallelism
> > > > per topic this way, but our requirements regarding multitenancy are
> > > higher
> > > > than our throughput requirements.
> > > >
> > > > So, in order to increase the number of customers working on a
> cluster,
> > we
> > > > are increasing the number of partitions dinamically per topic as the
> > new
> > > > customer arrives using kafka AdminUtilities.
> > > > Our problem arrives when using the new kafka consumer and a new
> > partition
> > > > is added into the topic, as this consumer doesn't get updated with
> the
> > > "new
> > > > partition" and therefore messages driven into that new partition
> never
> > > > arrives to this consumer unless we reload the consumer itself. What
> was
> > > > surprising was to check that using the old consumer (configured to
> deal
> > > > with Zookeeper), a consumer does get messages from a new added
> > partition.
> > > >
> > > > Is there a way to emulate the old consumer behaviour when new
> > partitions
> > > > are added in the new consumer?
> > > >
> > > > Thanks in advance,
> > > > David
> > > >
> > >
> >
>


Kafka Authorization and ACLs Broken

2017-05-23 Thread Raghav
Hello Kafka Users

I am a new Kafka user and trying to make Kafka SSL work with Authorization
and ACLs. I followed posts from Kafka and Confluent docs exactly to the
point but my producer cannot write to kafka broker. I get
"LEADER_NOT_FOUND" errors. And even Consumer throws the same errors.

Can someone please share their config which worked with ACLs.

Here is my config. Please help.

server.properties config


broker.id=0
auto.create.topics.enable=true
delete.topic.enable=true

listeners=PLAINTEXT://kafka1.example.com:9092
,SSL://kafka1.example.com:9093

host.name=kafka1.example.com 


ssl.keystore.location=/var/private/kafka1.keystore.jks
ssl.keystore.password=12345678
ssl.key.password=12345678

ssl.truststore.location=/var/private/kafka1.truststore.jks
ssl.truststore.password=12345678

ssl.client.auth=required
ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
ssl.keystore.type=JKS
ssl.truststore.type=JKS

authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer





Here is producer Config(producer.properties)


security.protocol=SSL
ssl.truststore.location=/var/private/kafka2.truststore.jks
ssl.truststore.password=12345678

ssl.keystore.location=/var/private/kafka2.keystore.jks
ssl.keystore.password=12345678
ssl.key.password=12345678

ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
ssl.truststore.type=JKS
ssl.keystore.type=JKS





Raqhav


Re: Partitions as mechanism to keep multitenant segregated data

2017-05-23 Thread Tom Crayford
That might be ok. If that's the case, you can probably just "precreate" all
the partitions for them upfront and avoid any worry about having to futz
with consumers.

On Tue, May 23, 2017 at 4:33 PM, David Espinosa  wrote:

> Thanks for the answer Tom,
> Indeed I will not have more than 10 or 20 customer per cluster, so that's
> also the maximum number of partitions possible per topic.
> Still a bad idea?
>
> 2017-05-23 16:48 GMT+02:00 Tom Crayford :
>
> > Hi there,
> >
> > I don't know about the consumer, but I'd *strongly* recommend not
> designing
> > your application around this. Kafka has severe and notable stability
> > concerns with large numbers of partitions, and requiring "one partition
> per
> > customer" is going to be limiting, unless you only ever expect to have
> > *very* small customer numbers (hundreds at most, ever). Instead, use a
> hash
> > function and a key, as recommended to land customers on the same
> partition.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Tue, May 23, 2017 at 9:46 AM, David Espinosa 
> wrote:
> >
> > > Hi,
> > >
> > > In order to keep separated (physically) the data from different
> customers
> > > in our application, we are using a custom partitioner to drive messages
> > to
> > > a concrete partition of a topic. We know that we are loosing
> parallelism
> > > per topic this way, but our requirements regarding multitenancy are
> > higher
> > > than our throughput requirements.
> > >
> > > So, in order to increase the number of customers working on a cluster,
> we
> > > are increasing the number of partitions dinamically per topic as the
> new
> > > customer arrives using kafka AdminUtilities.
> > > Our problem arrives when using the new kafka consumer and a new
> partition
> > > is added into the topic, as this consumer doesn't get updated with the
> > "new
> > > partition" and therefore messages driven into that new partition never
> > > arrives to this consumer unless we reload the consumer itself. What was
> > > surprising was to check that using the old consumer (configured to deal
> > > with Zookeeper), a consumer does get messages from a new added
> partition.
> > >
> > > Is there a way to emulate the old consumer behaviour when new
> partitions
> > > are added in the new consumer?
> > >
> > > Thanks in advance,
> > > David
> > >
> >
>


Re: Partitions as mechanism to keep multitenant segregated data

2017-05-23 Thread David Espinosa
Thanks for the answer Tom,
Indeed I will not have more than 10 or 20 customer per cluster, so that's
also the maximum number of partitions possible per topic.
Still a bad idea?

2017-05-23 16:48 GMT+02:00 Tom Crayford :

> Hi there,
>
> I don't know about the consumer, but I'd *strongly* recommend not designing
> your application around this. Kafka has severe and notable stability
> concerns with large numbers of partitions, and requiring "one partition per
> customer" is going to be limiting, unless you only ever expect to have
> *very* small customer numbers (hundreds at most, ever). Instead, use a hash
> function and a key, as recommended to land customers on the same partition.
>
> Thanks
>
> Tom Crayford
> Heroku Kafka
>
> On Tue, May 23, 2017 at 9:46 AM, David Espinosa  wrote:
>
> > Hi,
> >
> > In order to keep separated (physically) the data from different customers
> > in our application, we are using a custom partitioner to drive messages
> to
> > a concrete partition of a topic. We know that we are loosing parallelism
> > per topic this way, but our requirements regarding multitenancy are
> higher
> > than our throughput requirements.
> >
> > So, in order to increase the number of customers working on a cluster, we
> > are increasing the number of partitions dinamically per topic as the new
> > customer arrives using kafka AdminUtilities.
> > Our problem arrives when using the new kafka consumer and a new partition
> > is added into the topic, as this consumer doesn't get updated with the
> "new
> > partition" and therefore messages driven into that new partition never
> > arrives to this consumer unless we reload the consumer itself. What was
> > surprising was to check that using the old consumer (configured to deal
> > with Zookeeper), a consumer does get messages from a new added partition.
> >
> > Is there a way to emulate the old consumer behaviour when new partitions
> > are added in the new consumer?
> >
> > Thanks in advance,
> > David
> >
>


Re: APACHE LICENSES

2017-05-23 Thread Mathieu Fenniak
Hi Lis,

Yes, they are free software.  The full terms of the licenses are
available here: https://github.com/apache/kafka/blob/trunk/LICENSE and
here: https://github.com/apache/zookeeper/blob/master/LICENSE.txt

Mathieu


On Tue, May 23, 2017 at 5:54 AM, LISBETH SANTAMARIA GUTIERREZ
 wrote:
> Good morning,
>
> The licenses Kafka &  Zookeeper  are free software?
>
>
>
> King regards
>
>
> *BBVA*
> *Lis Santamaría Gutiérrez*
> *CIB Engineering -  CTO -  Technical Systems & Solutions*
> E-mail: lisbeth.santamaria.contrac...@bbva.com
> 
>
> *LAS TABLAS III  - Isabel de Colbrand 4 - pl 1 – 28050 Madrid*
>
> --
>
>
> "Este mensaje está dirigido de manera exclusiva a su destinatario y puede
> contener información privada y confidencial. No lo reenvíe, copie o
> distribuya a terceros que no deban conocer su contenido. En caso de haberlo
> recibido por error,  rogamos lo notifique al remitente y proceda a su
> borrado, así como al de cualquier documento que pudiera adjuntarse.
>
>  Por favor tenga en cuenta que los correos enviados vía Internet no
> permiten garantizar la confidencialidad de los mensajes ni su transmisión
> de forma íntegra.
>
>  Las opiniones expresadas en el presente correo pertenecen únicamente al
> remitente y no representan necesariamente la opinión del Grupo BBVA."
>
>  "This message is intended exclusively for the adressee and may contain
> privileged and confidential information. Please, do not disseminate, copy
> or distribute it to third parties who should not receive it. In case you
> have received it by mistake, please inform the sender and delete the
> message and attachments from your system.
>
>  Please keep in mind that e-mails sent by Internet do not allow to
> guarantee neither the confidentiality or the integrity of the messages
> sent."


APACHE LICENSES

2017-05-23 Thread LISBETH SANTAMARIA GUTIERREZ
Good morning,

The licenses Kafka &  Zookeeper  are free software?



King regards


*BBVA*
*Lis Santamaría Gutiérrez*
*CIB Engineering -  CTO -  Technical Systems & Solutions*
E-mail: lisbeth.santamaria.contrac...@bbva.com


*LAS TABLAS III  - Isabel de Colbrand 4 - pl 1 – 28050 Madrid*

-- 


"Este mensaje está dirigido de manera exclusiva a su destinatario y puede 
contener información privada y confidencial. No lo reenvíe, copie o 
distribuya a terceros que no deban conocer su contenido. En caso de haberlo 
recibido por error,  rogamos lo notifique al remitente y proceda a su 
borrado, así como al de cualquier documento que pudiera adjuntarse.

 Por favor tenga en cuenta que los correos enviados vía Internet no 
permiten garantizar la confidencialidad de los mensajes ni su transmisión 
de forma íntegra.

 Las opiniones expresadas en el presente correo pertenecen únicamente al 
remitente y no representan necesariamente la opinión del Grupo BBVA."

 "This message is intended exclusively for the adressee and may contain 
privileged and confidential information. Please, do not disseminate, copy 
or distribute it to third parties who should not receive it. In case you 
have received it by mistake, please inform the sender and delete the 
message and attachments from your system.

 Please keep in mind that e-mails sent by Internet do not allow to 
guarantee neither the confidentiality or the integrity of the messages 
sent."


Re: Partitions as mechanism to keep multitenant segregated data

2017-05-23 Thread Tom Crayford
Hi there,

I don't know about the consumer, but I'd *strongly* recommend not designing
your application around this. Kafka has severe and notable stability
concerns with large numbers of partitions, and requiring "one partition per
customer" is going to be limiting, unless you only ever expect to have
*very* small customer numbers (hundreds at most, ever). Instead, use a hash
function and a key, as recommended to land customers on the same partition.

Thanks

Tom Crayford
Heroku Kafka

On Tue, May 23, 2017 at 9:46 AM, David Espinosa  wrote:

> Hi,
>
> In order to keep separated (physically) the data from different customers
> in our application, we are using a custom partitioner to drive messages to
> a concrete partition of a topic. We know that we are loosing parallelism
> per topic this way, but our requirements regarding multitenancy are higher
> than our throughput requirements.
>
> So, in order to increase the number of customers working on a cluster, we
> are increasing the number of partitions dinamically per topic as the new
> customer arrives using kafka AdminUtilities.
> Our problem arrives when using the new kafka consumer and a new partition
> is added into the topic, as this consumer doesn't get updated with the "new
> partition" and therefore messages driven into that new partition never
> arrives to this consumer unless we reload the consumer itself. What was
> surprising was to check that using the old consumer (configured to deal
> with Zookeeper), a consumer does get messages from a new added partition.
>
> Is there a way to emulate the old consumer behaviour when new partitions
> are added in the new consumer?
>
> Thanks in advance,
> David
>


Re: Partitions as mechanism to keep multitenant segregated data

2017-05-23 Thread Tom Crayford
Hi there,

I don't know about the consumer, but I'd *strongly* recommend not designing
your application around this. Kafka has severe and notable stability
concerns with large numbers of partitions, and requiring "one partition per
customer" is going to be limiting, unless you only ever expect to have
*very* small customer numbers (hundreds at most, ever). Instead, use a hash
function and a key, as recommended to land customers on the same partition.

Thanks

Tom Crayford
Heroku Kafka

On Tue, May 23, 2017 at 9:46 AM, David Espinosa  wrote:

> Hi,
>
> In order to keep separated (physically) the data from different customers
> in our application, we are using a custom partitioner to drive messages to
> a concrete partition of a topic. We know that we are loosing parallelism
> per topic this way, but our requirements regarding multitenancy are higher
> than our throughput requirements.
>
> So, in order to increase the number of customers working on a cluster, we
> are increasing the number of partitions dinamically per topic as the new
> customer arrives using kafka AdminUtilities.
> Our problem arrives when using the new kafka consumer and a new partition
> is added into the topic, as this consumer doesn't get updated with the "new
> partition" and therefore messages driven into that new partition never
> arrives to this consumer unless we reload the consumer itself. What was
> surprising was to check that using the old consumer (configured to deal
> with Zookeeper), a consumer does get messages from a new added partition.
>
> Is there a way to emulate the old consumer behaviour when new partitions
> are added in the new consumer?
>
> Thanks in advance,
> David
>


Re: Partition assignment with multiple topics

2017-05-23 Thread Michal Borowiecki

Hi Mike,

Are you performing any operations (e.g. joins) across all topics?

If so, I'd think increasing the number of partitions is indeed the way 
to go. Partition is the unit of parallelism per topic and all topics are 
bound together in your app in that case.


If not, your other option is to break up your application into a number 
of KafkaStreams instances, each dealing with a subset of topics.


Hope that helps.
Michał

On 23/05/17 08:47, Mike Gould wrote:

Hi
We have a couple of hundred topics - each carrying a similar but distinct
message type but to keep the total partition count down each only has 3
partitions.

If I start Kafka-streams consuming all topics only 3 threads ever get
assigned any partitions.

I think the first thread to start gets the first partition of each topic,
and so on until the 3rd thread, after that all the partitions are assigned
- any further threads are just left idle.

Is there any way to make the partition assignment smarter and either add a
random element that moves partitions when further consumers start or
considers all the partitions of subscribed topics together when assigning
them?

Our only alternative is creating many more partitions for each topic - and
we're worried about how far this will scale.

Thank you
Mike G




--
Signature
 Michal Borowiecki
Senior Software Engineer L4
T:  +44 208 742 1600


+44 203 249 8448



E:  michal.borowie...@openbet.com
W:  www.openbet.com 


OpenBet Ltd

Chiswick Park Building 9

566 Chiswick High Rd

London

W4 5XT

UK




This message is confidential and intended only for the addressee. If you 
have received this message in error, please immediately notify the 
postmas...@openbet.com  and delete it 
from your system as well as any copies. The content of e-mails as well 
as traffic data may be monitored by OpenBet for employment and security 
purposes. To protect the environment please do not print this e-mail 
unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building 
9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company 
registered in England and Wales. Registered no. 3134634. VAT no. 
GB927523612




Partition assignment with multiple topics

2017-05-23 Thread Mike Gould
Hi
We have a couple of hundred topics - each carrying a similar but distinct
message type but to keep the total partition count down each only has 3
partitions.

If I start Kafka-streams consuming all topics only 3 threads ever get
assigned any partitions.

I think the first thread to start gets the first partition of each topic,
and so on until the 3rd thread, after that all the partitions are assigned
- any further threads are just left idle.

Is there any way to make the partition assignment smarter and either add a
random element that moves partitions when further consumers start or
considers all the partitions of subscribed topics together when assigning
them?

Our only alternative is creating many more partitions for each topic - and
we're worried about how far this will scale.

Thank you
Mike G


-- 
 - MikeG
http://en.wikipedia.org/wiki/Common_misconceptions



How to scripting Zookeeper - Kafka startup procedure as step by step

2017-05-23 Thread Vincent Dautremont
Hi,
Working on a kafka project I'm trying to set up Integration test using docker 
to have Zookeeper and Kafka clusters and my client(s) program and some kafkacat 
clients on a docker network.

To set up this work context I need to script each action and I guess I have a 
beginner problem about Zookeeper and Kafka startup.


If I do :
/zookeeper-server-start.sh ../config/zookeeper.properties &
/kafka-server-start.sh ../config/server.properties &
kafkacat -P -c 10 -b my.broker -t myTopic < myMessages.txt

then kafka sometimes fails to start because zookeeper boot process isn't 
completely finished yet, and kafkatcat always fails to produce because kafka 
broker starting process in isn't completely finished neither.

Is there a way to starts these as a service (a parameter ?), which give back 
the control of the shell only when the starting process is completed and 
Zookeeper/Kafka ready to use.

Alternatively do you think that loop checking TCP connection until accept, on 
Zookeeper / Kafka client serving port would be a good way to consider that 
these two process are ready to serve clients ?

 Thank you.


Re: Kafka broker startup issue

2017-05-23 Thread dhiraj prajapati
Thanks for pointing this out. There was a broker instance of version
0.10.1.0 running.

On May 23, 2017 11:34 AM, "Ewen Cheslack-Postava"  wrote:

> Version 2 of UpdateMetadataRequest does not exist in version 0.9.0.1. This
> suggests that you have a broker with a newer version of Kafka running
> against the same ZK broker. Do you have any other versions running? Or is
> it possible this is a shared ZK cluster and you're not using a namespace
> within ZK for each cluster?
>
> -Ewen
>
> On Mon, May 22, 2017 at 12:33 AM, dhiraj prajapati 
> wrote:
>
> > Hi,
> > I am getting the below exception while starting kafka broker 0.9.0.1:
> >
> > kafka.common.KafkaException: Version 2 is invalid for
> > UpdateMetadataRequest. Valid versions are 0 or 1.
> > at
> > kafka.api.UpdateMetadataRequest$.readFrom(UpdateMetadataRequest.scala:
> 58)
> > at kafka.api.RequestKeys$$anonfun$7.apply(RequestKeys.scala:54)
> > at kafka.api.RequestKeys$$anonfun$7.apply(RequestKeys.scala:54)
> > at kafka.network.RequestChannel$Request.(RequestChannel.
> > scala:66)
> > at kafka.network.Processor$$anonfun$run$11.apply(
> > SocketServer.scala:426)
> > at kafka.network.Processor$$anonfun$run$11.apply(
> > SocketServer.scala:421)
> > at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
> > at scala.collection.IterableLike$class.foreach(IterableLike.
> scala:72)
> > at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> > at kafka.network.Processor.run(SocketServer.scala:421)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > What could be the issue?
> >
> > Regards,
> > Dhiraj
> >
>


Re: Kafka broker startup issue

2017-05-23 Thread Ewen Cheslack-Postava
Version 2 of UpdateMetadataRequest does not exist in version 0.9.0.1. This
suggests that you have a broker with a newer version of Kafka running
against the same ZK broker. Do you have any other versions running? Or is
it possible this is a shared ZK cluster and you're not using a namespace
within ZK for each cluster?

-Ewen

On Mon, May 22, 2017 at 12:33 AM, dhiraj prajapati 
wrote:

> Hi,
> I am getting the below exception while starting kafka broker 0.9.0.1:
>
> kafka.common.KafkaException: Version 2 is invalid for
> UpdateMetadataRequest. Valid versions are 0 or 1.
> at
> kafka.api.UpdateMetadataRequest$.readFrom(UpdateMetadataRequest.scala:58)
> at kafka.api.RequestKeys$$anonfun$7.apply(RequestKeys.scala:54)
> at kafka.api.RequestKeys$$anonfun$7.apply(RequestKeys.scala:54)
> at kafka.network.RequestChannel$Request.(RequestChannel.
> scala:66)
> at kafka.network.Processor$$anonfun$run$11.apply(
> SocketServer.scala:426)
> at kafka.network.Processor$$anonfun$run$11.apply(
> SocketServer.scala:421)
> at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.network.Processor.run(SocketServer.scala:421)
> at java.lang.Thread.run(Thread.java:745)
>
> What could be the issue?
>
> Regards,
> Dhiraj
>