Re: Kafka avro producer

2019-03-27 Thread Gerard Klijs
Not really possible as the producer assumes you are using the schema
registry. You can use avro for the deserialisation in some other way, but
you need to create (de)serializers that fit with the other way.

Op wo 27 mrt. 2019 om 17:33 schreef lsroudi abdel :

> It depend on your use case, you could push your schema with the first
> message get it on the other side
>
> Le mer. 27 mars 2019 à 17:15, KARISHMA MALIK  a
> écrit :
>
> > On Wed 27 Mar, 2019, 11:57 AM KARISHMA MALIK,  >
> > wrote:
> >
> > > Hi Team
> > > Is there any possible method to use apache Kafka avro producer without
> > > using schema registry ?
> > >
> > > Thanks
> > > Karishma
> > > M:7447426338
> > >
> >
>


Re: Stickers

2019-03-12 Thread Gerard Klijs
There's already a pretty active Kafka Meetup group 'Kafka Utrecht' which
had meetup in Amsterdam and Rotterdam in the past.

Op di 12 mrt. 2019 om 06:37 schreef Antoine Laffez <
antoine.laf...@lunatech.nl>:

> Hi!
> The workshop is an internal workshop given by certified trainer employee.
> It is 2 sessions of 10 people so 20 stickers will be awesome.
> Regarding meetup it could be a good idea indeed but I have to see with the
> trainer to find good speakers.
> Thanks a lot Ale, they will be happy as they really like the design of
> Kafka logo.
>
> Wish you the best,
>
> Antoine Laffez
> Marketing Manager
> +31(0)641542468 
>
>
>
> > On 12 Mar 2019, at 00:25, Ale Murray  wrote:
> >
> > Hi Antoine,
> >
> > I can definitely send you some.. how many are you after?
> >
> > Also.. would you be interested in hosting a Kafka Meetup at your offices
> for the community at any point? Thanks!
> >
> >
> > Ale Murray
> > Global Community Manager | Confluent
> > Skype: ale_amurray | @ale_amurray
> > Follow us: Twitter | blog | Slack | Meetups
> >
> >
> >
> >
> >
> >
> >
> >> On Mon, 11 Mar 2019 at 13:14, Antoine Laffez <
> antoine.laf...@lunatech.nl> wrote:
> >> Hi, I am Antoine, Marketing Manager at Lunatech labs.
> >>
> >> We organised Kafka courses and training and I wondering if it possible
> to have some Kafka stickers to give to the attendees?
> >>
> >> Our adress is: Lunatech Labs, Baan 74 3011 CD Rotterdam, The
> Netherlands.
> >>
> >> Thank you very much!
> >>
> >> Met vriendelijke groet,
> >> Kind regards,
> >>
> >>
> >>
> >>
> >> Antoine Laffez
> >> Marketing Manager
> >> +31(0)6.41.54.24.68 NL
> >>
> >>
> >>
>


Re: KafkaAvroSerializer to produce to a single topic with different schemas used for records

2017-01-30 Thread Gerard Klijs
Not really, as you can update the schema, and have multiple of them at the
same time. By default each schema has to backwards compatible, so you do
have to exclude the specific topic you use with different schema's. With
every write, the 'id' of the schema used is also written, so when you
deserialise the messages, you know which schema to use for which message.

Op zo 29 jan. 2017 om 17:35 schreef Mike Cargal :

> I was just looking into using KafkaAvroSerializer to produce records to a
> Kafka topic.  We are interested because the wire format has a reference to
> the schema so we don’t have to schema lookup information independently.
>
> We plan to keep a single topic that contain records using many different
> schemas (it’s important to maintain the ordering of these records).
>
> In looking at the code, it appears that it registers the schema with the
> registry with a topic+”-topic” subject.  This would seem to imply an
> assumption that a topic has a single schema associated with it (not many
> schemas that can vary from record to record).
>
> Am I understanding this correctly?  It seems like a surprising constraint.


Re: Kafka ACL's with SSL Protocol is not working

2016-12-15 Thread Gerard Klijs
Most likely something went wrong creating the keystores, causing the SSL
handshake to fail. Its important to have a valid chain, from the
certificate in the struststore, and then maybe intermediates tot the
keystore.

On Fri, Dec 16, 2016, 00:32 Raghu B  wrote:

Thanks Derar & Kiran, your suggestions are very useful.

I enabled Log4J debug mode and found that my client is trying to connect to
the Kafka server with the *User:ANONYMOUS, *It is really strange.


I added a new Super.User with the name *User:ANONYMOUS *then I am able to
send and receive the messages without any issues.

And now the question is how can I set my username name from Anonymous to
something like
*User:"CN=Unknown,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown"*
which
comes from SSL cert/keystore.

Please help me with your inputs.

Thanks in Advance,
Raghu

On Thu, Dec 15, 2016 at 5:29 AM, kiran kumar  wrote:

> I have just noticed that I am using the user which is not configured in
the
> kafka server jaas config file..
>
>
>
> On Thu, Dec 15, 2016 at 6:38 PM, kiran kumar 
> wrote:
>
> > Hi Raghu,
> >
> > I am also facing the same issue but with the SASL_PLAINTEXT protocol.
> >
> > after enabling debugging I see that authentication is being completed. I
> > don't see any debug logs being generated for authorization part (I might
> be
> > missing something).
> >
> > you can also set the log level to debug in properties and see whats
going
> > on.
> >
> > Thanks,
> > Kiran
> >
> > On Thu, Dec 15, 2016 at 7:09 AM, Derar Alassi 
> > wrote:
> >
> >> Make sure that the principal ID is exactly what Kafka sees. Guessing
> what
> >> the principal ID is by using keytool or openssl is not going to help
> from
> >> my experience. The best is to add some logging to output the SSL client
> ID
> >> in the org.apache.kafka.common.network.SslTransportLayer.
> peerPrincipal()
> >> .
> >> The p.getName() is what you are looking at.
> >>
> >> Instead of adding it to the super user list in your server props file,
> add
> >> ACLs to that user using the kafka-acls.sh in the bin directory.
> >>
> >>
> >>
> >> On Wed, Dec 14, 2016 at 3:57 PM, Raghu B  wrote:
> >>
> >> > Thanks Shrikant for your reply, but I did consumer part also and more
> >> over
> >> > I am not facing this issue only with consumer, I am getting this
> errors
> >> > with producer as well as consumer
> >> >
> >> > On Wed, Dec 14, 2016 at 3:53 PM, Shrikant Patel 
> >> wrote:
> >> >
> >> > > You need to execute kafka-acls.sh with --consumer to enable
> >> consumption
> >> > > from kafka.
> >> > >
> >> > > _
> >> > > Shrikant Patel  |  817.367.4302
> >> > > Enterprise Architecture Team
> >> > > PDX-NHIN
> >> > >
> >> > > -Original Message-
> >> > > From: Raghu B [mailto:raghu98...@gmail.com]
> >> > > Sent: Wednesday, December 14, 2016 5:42 PM
> >> > > To: secur...@kafka.apache.org
> >> > > Subject: Kafka ACL's with SSL Protocol is not working
> >> > >
> >> > > Hi All,
> >> > >
> >> > > I am trying to enable ACL's in my Kafka cluster with along with SSL
> >> > > Protocol.
> >> > >
> >> > > I tried with each and every parameters but no luck, so I need help
> to
> >> > > enable the SSL(without Kerberos) and I am attaching all the
> >> configuration
> >> > > details in this.
> >> > >
> >> > > Kindly Help me.
> >> > >
> >> > >
> >> > > *I tested SSL without ACL, it worked fine
> >> > > (listeners=SSL://10.247.195.122:9093 )*
> >> > >
> >> > >
> >> > > *This is my Kafka server properties file:*
> >> > >
> >> > > *# ACL SETTINGS
> >> > #*
> >> > >
> >> > > *auto.create.topics.enable=true*
> >> > >
> >> > > *authorizer.class.name
> >> > > =kafka.security.auth.SimpleAcl
> >> Authorizer*
> >> > >
> >> > > *security.inter.broker.protocol=SSL*
> >> > >
> >> > > *#allow.everyone.if.no.acl.found=true*
> >> > >
> >> > > *#principal.builder.class=CustomizedPrincipalBuilderClass*
> >> > >
> >> > > *#super.users=User:"CN=writeuser,OU=Unknown,O=
> >> > > Unknown,L=Unknown,ST=Unknown,C=Unknown"*
> >> > >
> >> > > *#super.users=User:Raghu;User:Admin*
> >> > >
> >> > > *#offsets.storage=kafka*
> >> > >
> >> > > *#dual.commit.enabled=true*
> >> > >
> >> > > *listeners=SSL://10.247.195.122:9093 *
> >> > >
> >> > > *#listeners=PLAINTEXT://10.247.195.122:9092 <
> >> http://10.247.195.122:9092
> >> > >*
> >> > >
> >> > > *#listeners=PLAINTEXT://10.247.195.122:9092
> >> > > ,SSL://10.247.195.122:9093
> >> > > *
> >> > >
> >> > > *#advertised.listeners=PLAINTEXT://10.247.195.122:9092
> >> > > *
> >> > >
> >> > >
> >> > > *
> >> > > ssl.keystore.location=/home/raghu/kafka/security/server.
> keystore.jks*
> >> 

Re: Restrict who can change ACLs

2016-10-04 Thread Gerard Klijs
You could limit the access to zookeeper, with kerberos, or with a firewall.
For example to only allow connections to zookeeper from the cluster itself,
this way you need to access those machines to be able to set acls. The
create permission is used for creating topics I think, there is no acl to
limit setting acl's.

On Tue, Oct 4, 2016 at 4:17 PM Shrikant Patel  wrote:

> Hi All,
>
> How can I restrict who can modify ACLs for kafka cluster? Anyone can use
> kafka-acls cli to modify the acl.
>
> I added superuser and thought that when we are running the kafka-acls, it
> validates that only spatel user can run this command. So what prevents user
> on n\w trying to modify ACLs.
>
> authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer
> super.users=User:CN=spatel-lt.nhsrx.com,OU=arch,O=pdx inc,L=fort
> worth,ST=tx,C=us
>
> Current ACLs for resource `Cluster:kafka-cluster`:
> User:CN=spatel-lt,OU=arch,O=pdx inc,L=fort worth,ST=tx,C=us has
> Allow permission for operations: Create from hosts: *
>
> Am I missing anything???
>
> Thanks in advance,
> Shri
> __
> Shrikant Patel   |   PDX-NHIN
> Enterprise Architecture Team
> Asserting the Role of Pharmacy in Healthcare  www.pdxinc.com<
> http://www.pdxinc.com/>
> main 817.246.6760 | ext 4302
> 101 Jim Wright Freeway South, Suite 200, Fort Worth, Texas 76108-2202<
> http://maps.google.com/maps?q=PDX,+Inc.=en=32.758696,-97.476397=0.006295,0.006295=0=1=h=17=A
> >
>
>
> P Please consider the environment before printing this email.
>
> This e-mail and its contents (to include attachments) are the property of
> National Health Systems, Inc., its subsidiaries and affiliates, including
> but not limited to Rx.com Community Healthcare Network, Inc. and its
> subsidiaries, and may contain confidential and proprietary or privileged
> information. If you are not the intended recipient of this e-mail, you are
> hereby notified that any unauthorized disclosure, copying, or distribution
> of this e-mail or of its attachments, or the taking of any unauthorized
> action based on information contained herein is strictly prohibited.
> Unauthorized use of information contained herein may subject you to civil
> and criminal prosecution and penalties. If you are not the intended
> recipient, please immediately notify the sender by telephone at
> 800-433-5719 or return e-mail and permanently delete the original e-mail.
>


Re: Is Kafka 8 user compatible with kafka 10 broker?

2016-10-03 Thread Gerard Klijs
I think, but don't know for sure, it doesn't matter for consumers, since
the messages you read are still 'old' images. I would expect errors van you
use an old producer, and/or when consuming the record from the old producer.

On Mon, Oct 3, 2016 at 7:09 AM Nikhil Goyal  wrote:

> Hi guys,
>
> I created a new cluster with kafka 10 broker. I have the following settings
> in server properties file:
>
> inter.broker.protocol.version=0.10.0.0
> message.format.version=0.10.0
> log.message.format.version=0.10.0
>
> When I try to read from old console consumer it still succeeds. Online
> documents say old consumer is not compatible as long as message format
> version is not set to 0.8.0. So I wanted to ask if I made a mistake
> somewhere in the server_properties or if this is the expected behavior?
>
> Thanks
> Nikhil
>


Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-23 Thread Gerard Klijs
I haven't tried it myself, nor very likely will in the near future, but
since it's also distributed I guess that with a large enough cluster you
will be able to handle any load. One of the things kafka might be better at
is more connecters available, a better at least once guarantee, better
monitoring options. I really don't know, but if latancy is really important
pulsar might be better, they used kafka before at yahoo and maybe still do
for some stuff, recent work on https://github.com/yahoo/kafka-manager seems
to suggest so.
Alternatively you could configure a kafka topic/producer/consumer to limit
latency, and that may also be enough to get a low enough latency. It would
certainly be interesting to compare the two, with the same hardware, and
with high load.

On Thu, Sep 22, 2016 at 6:01 PM kant kodali <kanth...@gmail.com> wrote:

> @Gerard Thanks for this. It looks good any benchmarks on this throughput
> wise?
>
>
>
>
>
>
> On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com
> wrote:
> We have a simple application producing 1 msg/sec, and did nothing to
>
> optimise the performance and have about a 10 msec delay between consumer
>
> and producer. When low latency is important, maybe pulsar is a better fit,
>
> https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ .
>
>
>
>
> On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman <mikfree...@gmail.com>
>
> wrote:
>
>
>
>
> > Thanks for sharing Radek, great article.
>
> >
>
> > Michael
>
> >
>
> > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski <ra...@gruchalski.com>
>
> > wrote:
>
> > >
>
> > > Please read this article:
>
> > >
>
> >
>
> https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
>
> > >
>
> > > –
>
> > > Best regards,
>
> > > Radek Gruchalski
>
> > > ra...@gruchalski.com
>
> > >
>
> > >
>
> > > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com)
>
> > wrote:
>
> > >
>
> > > Still it should be possible to implement using reactive streams right.
>
> > > Could you please enlighten me on what are the some major differences
> you
>
> > > see
>
> > > between a commit log and a message queue? I see them being different
> only
>
> > > in the
>
> > > implementation but not functionality wise so I would be glad to hear
> your
>
> > > thoughts.
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski
> ra...@gruchalski.com
>
> > > wrote:
>
> > > Kafka is not a queue. It’s a distributed commit log.
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > –
>
> > >
>
> > > Best regards,
>
> > >
>
> > > Radek Gruchalski
>
> > >
>
> > > ra...@gruchalski.com
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com)
>
> > > wrote:
>
> > >
>
> > >
>
> > >
>
> > >
>
> > > Hmm...Looks like Kafka is written in Scala. There is this thing called
>
> > >
>
> > > reactive
>
> > >
>
> > > streams where a slow consumer can apply back pressure if they are
>
> > consuming
>
> > >
>
> > > slow. Even with Java this is possible with a Library called RxJava and
>
> > >
>
> > > these
>
> > >
>
> > > ideas will be incorporated in Java 9 as well.
>
> > >
>
> > > I still don't see why they would pick poll just to solve this one
> problem
>
> > >
>
> > > and
>
> > >
>
> > > compensating on others. Poll just don't sound realtime. I heard from
> some
>
> > >
>
> > > people
>
> > >
>
> > > that they would set poll to 100ms. Well 1) that is a lot of time. 2)
>
> > >
>
> > > Financial
>
> > >
>
> > > applications requires micro second latency. Kafka from what I
> understand
>
> > >
>
> > > looks
>
> > >
>
> > > like has a very high latency and here is the article.

Re: why did Kafka choose pull instead of push for a consumer ?

2016-09-22 Thread Gerard Klijs
We have a simple application producing 1 msg/sec, and did nothing to
optimise the performance and have about a 10 msec delay between consumer
and producer. When low latency is important, maybe pulsar is a better fit,
https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ .

On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman 
wrote:

> Thanks for sharing Radek, great article.
>
> Michael
>
> > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski 
> wrote:
> >
> > Please read this article:
> >
> https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
> >
> > –
> > Best regards,
> > Radek Gruchalski
> > ra...@gruchalski.com
> >
> >
> > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com)
> wrote:
> >
> > Still it should be possible to implement using reactive streams right.
> > Could you please enlighten me on what are the some major differences you
> > see
> > between a commit log and a message queue? I see them being different only
> > in the
> > implementation but not functionality wise so I would be glad to hear your
> > thoughts.
> >
> >
> >
> >
> >
> >
> > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski ra...@gruchalski.com
> > wrote:
> > Kafka is not a queue. It’s a distributed commit log.
> >
> >
> >
> >
> > –
> >
> > Best regards,
> >
> > Radek Gruchalski
> >
> > ra...@gruchalski.com
> >
> >
> >
> >
> >
> >
> >
> > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com)
> > wrote:
> >
> >
> >
> >
> > Hmm...Looks like Kafka is written in Scala. There is this thing called
> >
> > reactive
> >
> > streams where a slow consumer can apply back pressure if they are
> consuming
> >
> > slow. Even with Java this is possible with a Library called RxJava and
> >
> > these
> >
> > ideas will be incorporated in Java 9 as well.
> >
> > I still don't see why they would pick poll just to solve this one problem
> >
> > and
> >
> > compensating on others. Poll just don't sound realtime. I heard from some
> >
> > people
> >
> > that they would set poll to 100ms. Well 1) that is a lot of time. 2)
> >
> > Financial
> >
> > applications requires micro second latency. Kafka from what I understand
> >
> > looks
> >
> > like has a very high latency and here is the article.
> >
> > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go by
> >
> > articles but I ran my own experiments on different queues and my numbers
> >
> > are
> >
> > very close to this article so I would say whoever wrote this article has
> >
> > done a
> >
> > good Job. 3) poll does generate unnecessary traffic in case if the data
> >
> > isn't
> >
> > available.
> >
> > Finally still not sure why they would pick poll() ? or do they plan on
> >
> > introducing reactive streams?Thanks,kant
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com
> >
> > wrote:
> >
> > I'm only guessing here regarding if this is the reason:
> >
> >
> >
> >
> > Pull is much more sensible when a lot of data is pushed through. It
> allows
> >
> > consumers consuming at their own pace, slow consumers do not slow the
> >
> > complete
> >
> > system down.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> >
> >
> >
> > Best regards,
> >
> >
> >
> >
> > Rad
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" <
> kanth...@gmail.com>
> >
> > wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > why did Kafka choose pull instead of push for a consumer? push sounds
> like
> >
> > it
> >
> >
> >
> >
> > is more realtime to me than poll and also wouldn't poll just keeps
> polling
> >
> > even
> >
> >
> >
> >
> > when they are no messages in the broker causing more traffic? please
> >
> > enlighten
> >
> >
> >
> >
> > me
>


Re: Slow machine disrupting the cluster

2016-09-21 Thread Gerard Klijs
It turned out it was een over-provisioned VM. It was eventually solved by
moving the VM to another cluster. He was also not a little slow but
something in the magnitude of 100 times slower. We are now looking for some
metrics to watch and alert in case it gets slow.

On Fri, Sep 16, 2016 at 4:41 PM David Garcia <dav...@spiceworks.com> wrote:

> To remediate, you could start another broker, rebalance, and then shut
> down the busted broker.  But, you really should put some monitoring on your
> system (to help diagnose the actual problem).  Datadog has a pretty good
> set of articles for using jmx to do this:
> https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/
>
> There are lots of jmx metrics gathering tools too…such as jmxtrans:
> https://github.com/jmxtrans/jmxtrans
>
> 
> confluent also offers tooling (such as command center) to help with
> monitoring.
> 
>
> As far as mirror maker goes, you can play with the consumer/producer
> timeout settings to make sure the process waits long enough for a slow
> machine.
>
> -David
>
> On 9/16/16, 7:11 AM, "Gerard Klijs" <gerard.kl...@dizzit.com> wrote:
>
> We just had an interesting issue, luckily this was only on our test
> cluster.
> Because of some reason one of the machines in a cluster became really
> slow.
> Because it was still alive, it stil was the leader for some
> topic-partitions. Our mirror maker reads and writes to multiple
> topic-partitions on each thread. When committing the offsets this will
> fail
> for the topic-partitions located on the slow machine, because the
> consumers
> have timed out. The data for these topic-partitions will be send over
> and
> over, causing a flood of duplicate messages.
> What would be the best way to prevent this in the future. Is there
> some way
> the broker could notice it's performing poorly and shut's off for
> example?
>
>
>


Slow machine disrupting the cluster

2016-09-16 Thread Gerard Klijs
We just had an interesting issue, luckily this was only on our test cluster.
Because of some reason one of the machines in a cluster became really slow.
Because it was still alive, it stil was the leader for some
topic-partitions. Our mirror maker reads and writes to multiple
topic-partitions on each thread. When committing the offsets this will fail
for the topic-partitions located on the slow machine, because the consumers
have timed out. The data for these topic-partitions will be send over and
over, causing a flood of duplicate messages.
What would be the best way to prevent this in the future. Is there some way
the broker could notice it's performing poorly and shut's off for example?


Re: v0.10 MirrorMaker producer cannot send v0.8 message from v0.10 broker

2016-09-16 Thread Gerard Klijs
This is a known bug, I think it was fixed in the 0.10.0.1 release. You
could alternatively use a custom message handler for the mirror maker, and
then use the produce without a timestamp when the timestamp is -1 in the
consuming message.

On Thu, Sep 15, 2016 at 9:48 AM Samuel Zhou  wrote:

> Hi,
>
> I have a pipeline that publish message with v0.8 client, the message goes
> to v0.10 broker first then mirror maker will consume it and publish it to
> another v0.10 brokers. But I got the following message from MM log:
>
> java.lang.IllegalArgumentException: Invalid timestamp -1
>
> at
>
> org.apache.kafka.clients.producer.ProducerRecord.(ProducerRecord.java:60)
>
> at
>
> kafka.tools.MirrorMaker$defaultMirrorMakerMessageHandler$.handle(MirrorMaker.scala:678)
>
> at
> kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:414)
>
> Above error makes MM dead. I am not sure if KAFKA-3188 covers the test. Or
> are there any parameters for MM that can fix above error?
>
> Thanks!
>
> Samuel
>


Re: Producer request latency increase after client 0.10 upgrade

2016-09-02 Thread Gerard Klijs
With a linger of 5 seconds, 2-3 seconds would make sense when the load is
smaller, are are sure the measurements with 0.8.2.1 where with the same
load and/or linger worked correctly there?

On Fri, Sep 2, 2016 at 1:12 AM Yifan Ying  wrote:

> We tried to upgrade the Kafka clients dependency from 0.8.2.1 to 0.10.0.0.
> As I understand, the producer client doesn't have major changes in 0.10, so
> we kept the same producer config in the upgrade:
>
> retries 3
> retry.backoff.ms 5000
> timeout.ms 1
> block.on.buffer.full false
> linger.ms 5000
> metadata.fetch.timeout.ms 3
>
> However, after the upgrade, the avg producer request latency increased by
> 5×, from a few hundreds milliseconds to 2-3 seconds. Anyone has seen this
> issue? Is there any change in 0.9/0.10 that would cause this?
>
> Thanks!
>
> --
> Yifan
>


Re: Offsets getting lost if no messages sent for a long time

2016-08-23 Thread Gerard Klijs
I don't know the answer to the second question, if you don't use (much)
auto-generated id's for the consumer group you should be ok, since it's a
compacted topic after all, you might want to check if the compaction is on.
We set the offsets.retention.minutes to a week without a problem.

On Tue, Aug 23, 2016 at 12:21 PM Michael Freeman 
wrote:

> Might be easier to handle duplicate messages as opposed to handling long
> periods of time without messages.
>
> Michael
>
> > On 22 Aug 2016, at 15:55, Misra, Rahul 
> wrote:
> >
> > Hi,
> >
> > Can anybody provide any guidance on the following:
> >
> > 1. Given a limited set of groups and consumers, will increasing
> 'offsets.retention.minutes' to a high value (say 30 days) cause the
> __consumer_offsets topic to bloat unnecessarily or will compaction ensure
> that the entries for each key remain limited (which would mean that having
> a high 'offsets.retention.minutes' value is not a problem. I would prefer
> this option).
> >
> > 2. If the consumer calls commitSync() with latest already committed
> offsets (which have been committed already but no messages have been
> received for a long time after that), will it make an entry to the
> __consumer_offsets topic and ensure that the offsets are retained even with
> a small 'offsets.retention.minutes'? In our application the dry period
> (period without a new message is not well defined in advance).
> >
> >
> > Regards,
> > Rahul Misra
> >
> >
> > -Original Message-
> > From: Misra, Rahul [mailto:rahul.mi...@altisource.com]
> > Sent: Sunday, August 21, 2016 12:46 AM
> > To: Ian Wrigley; users@kafka.apache.org
> > Subject: RE: Offsets getting lost if no messages sent for a long time
> >
> > Hi Ian,
> >
> > Thanks for the quick response. Your answer clears things up.
> > I have some follow up questions though:
> >
> > 1. Given a limited set of groups and consumers, will increasing
> 'offsets.retention.minutes' to a high value (say 30 days) cause the
> __consumer_offsets topic to bloat unnecessarily or will compaction ensure
> that the entries for each key remain limited (which would mean that having
> a high 'offsets.retention.minutes' value is not a problem. I would prefer
> this option).
> >
> > 2. If the consumer calls commitSync() with latest already committed
> offsets (which have been committed already but no messages have been
> received for a long time after that), will it make an entry to the
> __consumer_offsets topic and ensure that the offsets are retained even with
> a small 'offsets.retention.minutes'? In our application the dry period
> (period without a new message is not well defined in advance).
> >
> >
> > Regards,
> > Rahul Misra
> >
> >
> >
> >
> >
> > -Original Message-
> > From: Ian Wrigley [mailto:i...@confluent.io]
> > Sent: Sunday, August 21, 2016 12:01 AM
> > To: users@kafka.apache.org
> > Subject: Re: Offsets getting lost if no messages sent for a long time
> >
> > Since nothing was written to the __consumer_offsets topic for more than
> its configured retention period (offsets.retention.minutes, by default 1440
> minutes, or one day), the offset info will be removed. Retention period is
> all about when the last offset was written, not the last time a Consumer
> looked at a topic.
> >
> > You can increase the value of offsets.retention.minutes to ensure that
> offset info isn’t cleaned out before more messages are written to a topic
> and read by the Consumer (and hence the Consumer updates its offset info in
> __consumer_offsets).
> >
> > Ian.
> >
> > ---
> > Ian Wrigley
> > Director, Education Services
> > Confluent, Inc
> >
> >> On Aug 20, 2016, at 11:36 AM, Misra, Rahul 
> wrote:
> >>
> >> Hi,
> >>
> >> I have observed the following scenario (the consumer here has
> 'enable.auto.commit=false' and offsets are committed using commitSync() if
> any messages are received):
> >>
> >> 1.  Start a consumer (with a specific group.Id) and send some
> messages to its subscribed topic.
> >>
> >> 2.  The consumer consumes the messages and the group+consumer has
> an entry in the __commit_offsets with the latest offsets for this group and
> consumer.
> >>
> >> 3.  The consumer will keep polling the topic but don't send any
> more messages to the topic for a long time (longer than one day. The
> consumer keeps polling the topic in a while loop). The default duration for
> which a group's entry is retained in the offsets topic is 1 day.
> >>
> >> 4.  Now stop the consumer. (there is no other consumer for this
> group)
> >>
> >> 5.  Send some more messages to the topic.
> >>
> >> 6.  Start the consumer (with the same group and consumer id as
> earlier).
> >>
> >> 7.  The consumer does not pick up the new messages sent in step 5
> as it has lost the committed offsets and starts with the 'latest' offsets.
> >>
> >> Is this an expected behavior? Or do I have something wrong in the

Re: Same partition number of different Kafka topcs

2016-07-29 Thread Gerard Klijs
The default partitioner will take the key, make the hash from it, and do a
modulo operation to determine the partition it goes to. Some things which
might cause it to and up different for different topics:
- partition number are not the same (you already checked)
- key is not exactly the same, for example one might have a space after the
id
- the other topic is configured to use another partitioner
- the serialiser for the key is different for both topics, since the hash
is created based on the bytes of key of the serialised message
- all the topics use another partitioner (for example round robin)

On Thu, Jul 28, 2016 at 9:11 PM Jack Huang  wrote:

> Hi all,
>
> I have an application where I need to join events from two different
> topics. Every event is identified by an id, which is used as the key for
> the topic partition. After doing some experiment, I observed that events
> will go into different partitions even if the number of partitions for both
> topics are the same. I can't find any documentation on this point though.
> Does anyone know if this is indeed the case?
>
>
> Thanks,
> Jack
>


Re: SSD or not for Kafka brokers?

2016-07-29 Thread Gerard Klijs
As I under stood it won't really has any advantage over using HDD since
most things will work from the working memory anyway. You might want to use
SSD for zookeeper through.

On Fri, Jul 29, 2016 at 12:19 AM Kessiler Rodrigues 
wrote:

> Hi guys,
>
> Should I use SSD for my brokers or not?
>
> What are the pros and cons?
>
> Thank you!
>


Re: Jars in Kafka 0.10

2016-07-29 Thread Gerard Klijs
No, if you don't use streams you don't need them. If you have no clients
(so also no mirror maker) running on the same machine you also don't need
the client jar, if you run zookeeper separately you also don't need those.

On Fri, Jul 29, 2016 at 4:22 PM Bhuvaneswaran Gopalasami <
bhuvanragha...@gmail.com> wrote:

> I have recently started looking into Kafka I noticed the number of Jars in
> Kafka 0.10 has increased when compared to 0.8.2. Do we really need all
> those libraries to run Kafka ?
>
> Thanks,
> Bhuvanes
>


Re: Mirror maker higher offset in the mirror.

2016-07-25 Thread Gerard Klijs
Things like consumer rebalances on the cluster you copy from, and brokers
going down on the cluster your writing down can cause duplications. The
default settings are set to prevent data loss, making data duplication more
likely to happen in case of error.
You could possibly make a simple consumer to check for duplicates,
depending on your data.

On Mon, Jul 25, 2016 at 8:38 AM Sathyakumar Seshachalam <
sathyakumar_seshacha...@trimble.com> wrote:

> Am trying to mirror from a production Kafka cluster to a DR cluster.
>
> However the offsets between topics (retrieved with GetOffsetShell
> <
> https://cwiki.apache.org/confluence/display/KAFKA/System+Tools#SystemTools-GetOffsetShell
> >)
> on these two clusters do not always match.
>
> While a lesser offset is the "mirrored" cluster is understandable assuming
> mirror maker is still matching, what is the theoretical possibility of
> offset being higher in the mirrored cluster ? (As this is what is happening
> in my case.
>
> Thanks,
> Sathya
>


Re: Question about a kafka use case : sequence once a partition is added

2016-07-19 Thread Gerard Klijs
You can't you only get a guarantee on the order for each partition, not
over partitions. Adding partitions will possible make it a lot worse, since
items with the same key wll land in other partitions. For example with two
partitions these will be about the hashes in each partitions:
partition-0: 0,2,4,6,8
partition-1: 1,3,5,7,9
With three partitions:
partition-0: 0,3,6,9
partition-1: 1,4,7
partition-2: 2,5,8
So items with a key which hashes to 2, will move from partition-0, to
partition-2. I think if you really need to be able to guarantee order, you
will need to add some sequential id to the messages, and buffer them when
reading, but this will have all sorts of drawbacks, like lossing the
messages in the buffer in case of error (or you must make the commit offset
dependent on the buffer).

On Mon, Jul 18, 2016 at 9:19 PM Fumo, Vincent 
wrote:

> I want to use Kafka for notifications of changes to data in a
> dataservice/database. For each object that changes, a kafka message will be
> sent. This is easy and we've got that working no problem.
>
> Here is my use case : I want to be able to fire up a process that will
>
> 1) determine the current location of the kafka topic (right now we use 2
> partitions so that would be the offset for each partition)
> 2) do a long running process that will copy data from the database
> 3) once the process is over, put the location back into a kafka consumer
> and start processing notifications in sequence
>
> This isn't very hard either but there is a problem that we'd face if
> during step (2) partitions are added to the topic (say by our operations
> team).
>
> I know we can set up a ConsumerRebalanceListener but I don't think that
> will help because we'd need to back to a time when we had our original
> number of partitions and then we'd need to know exactly when to start
> reading from the new partition(s).
>
> for example
>
> start : 2 partitions (0,1) at offsets p0,100 and p1,100
>
> 1) we store the offsets and partitions : p0,100 and p1,100
> 2) we run the db ingest
> 3) messages are posted to p0 and p1
> 4) OPS team adds p2 and our ConsumerRebalanceListener would be notified
> 5) we are done and we set our consumer to p0,100 and p1,100 (and p2,0
> thanks to the ConsumerRebalanceListener)
>
> how would we guarantee the order of messages received from our consumer
> across all 3 partitions?
>
>
>


Re: Kafka Consumer Group Id bug?

2016-07-13 Thread Gerard Klijs
Are you sure the topic itself has indeed 1 partition?
If so the only partition should be matched to either one till some
error/rebalance occurs, does this indeed happen (a lot)?

On Wed, Jul 13, 2016 at 7:19 AM BYEONG-GI KIM  wrote:

> Hello.
>
> I'm not sure whether it's a bug or not, but here is a something wrong;
>
> I set 2 consumer apps that have same consumer group id, and partition has
> been set to 1 on my Kafka Broker.
>
> In theory, the messages on the Kafka Broker must be consumed either one,
> which means it must not be consumed at both of them. But the messages were
> sometimes consumed to both of them.
>
> I found the definition of the partition at the Kafka Website:
> *"Kafka only provides a total order over messages within a partition, not
> between different partitions in a topic. Per-partition ordering combined
> with the ability to partition data by key is sufficient for most
> applications. However, if you require a total order over messages this can
> be achieved with a topic that has only one partition, though this will mean
> only one consumer process per consumer group."*
>
> What's wrong with my setting?
>
> Regards
>
> KIM
>


Re: Kafka Consumer for Real-Time Application?

2016-07-12 Thread Gerard Klijs
Another option is to set enable.auto.commit=false and never commit the
offset, it should always go back to latest that way.

On Tue, Jul 12, 2016 at 1:34 PM Michael Noll  wrote:

> To explain what you're seeing:  After you have run a consumer application
> once, it will have stored its latest consumer offsets in Kafka.  Upon
> restart -- e.g. after a brief shutdown time as you mentioned -- the
> consumer application will continue processing from the point of these
> stored consumer offsets.
>
> The auto.offset.reset setting that Snehal mentioned above takes effect if
> and only if there are *no* consumer offsets stored in Kafka yet (i.e. the
> typical situation where auto.offset.reset does take effect is if you are
> starting a consumer application for the very first time).  This means that
> setting auto.offset.reset=latest won't be sufficient to solve your problem.
>
> To solve your problem you also need to do one of the following, in addition
> to setting auto.offset.reset=latest:
>
> 1. Delete the consumer offsets / the group (= group.id) of your consumer
> application and start fresh.  Kafka's `kafka-consumer-groups.sh` command
> allows you to delete the stored consumer offsets / the group (if you are
> using the Confluent Platform, the command is called `kafka-consumer-group`,
> i.e. it does not have the `.sh` suffix).  This is the approach that I would
> recommend.
>
> 2. Alternatively, as a crude workaround, you could also change the
> group.id
> setting of your consumer application whenever you restart it.  Changing the
> group.id is, in this case, a workaround to starting the processing "from
> scratch", because using a new, never-used-before group.id implies that
> there are no stored consumer offsets in Kafka from previous runs.
>
> For your convenience I copy-pasted the help display of
> `kafka-consumer-groups.sh` below.  If your consumer application uses
> Kafka's "new" consumer client, you must set the `--bootstrap-server` CLI
> option.  If you are using the old consumer client, you must set the
> `--zookeeper` CLI option.
>
> Hope this helps,
> Michael
>
>
> $ ./kafka-consumer-groups
> List all consumer groups, describe a consumer group, or delete consumer
> group info.
> Option Description
> -- ---
> --bootstrap-serverto>consumer): The server to connect
> to.
> --command-config  be
>   property file> passed to Admin Client and
> Consumer.
> --delete   Pass in groups to delete topic
>  partition offsets and ownership
>  information over the entire
> consumer
>  group. For instance --group g1 --
>  group g2
>Pass in groups with a single topic
> to
>  just delete the given topic's
>  partition offsets and ownership
>  information for the given consumer
>  groups. For instance --group g1 --
>  group g2 --topic t1
>Pass in just a topic to delete the
>  given topic's partition offsets
> and
>  ownership information for every
>  consumer group. For instance
> --topic
>  t1
>WARNING: Group deletion only works
> for
>  old ZK-based consumer groups, and
>  one has to use it carefully to
> only
>  delete groups that are not active.
> --describe Describe consumer group and list
>  offset lag related to given group.
> --groupThe consumer group we wish to act
> on.
> --list List all consumer groups.
> --new-consumer Use new consumer.
> --topic The topic whose consumer group
>  information should be deleted.
> --zookeeper  REQUIRED (unless new-consumer is
>  used): The connection string for
> the
>  zookeeper connection in the form
>  host:port. Multiple URLS can be
>  given to allow fail-over.
>
>
>
> On Tue, Jul 12, 2016 at 3:40 AM, BYEONG-GI KIM  wrote:
>
> > Thank you for 

Re: Duplicates consumed on rebalance. No compression, autocommit enabled.

2016-07-11 Thread Gerard Klijs
You could set the auto.commit.interval.ms to a lower value, in your example
it is 10 seconds, which can be a lot of messages. I don't really see how it
could be prevented any further, since offset's can only committed by
consumer to the partitions they are assigned to. I do believe there is some
work in progress in which the assigned of partitions to consumers is
somewhat sticky.
In that case when a consumer has been assigned the same partitions after
the rebalance as it has had before, and then it should not be necessary to
consume the same data again in those partitions.

On Mon, Jul 11, 2016 at 3:18 PM Michael Luban  wrote:

> Using the 0.8.2.1 client.
>
> Is it possible to statistically minimize the possibility of duplication in
> this scenario or has this behavior been corrected in a later client
> version?  Or is the test flawed?
>
> https://gist.github.com/mluban/03a5c0d9221182e6ddbc37189c4d3eb0
>


Re: Mirror maker setup - multi node

2016-06-28 Thread Gerard Klijs
With 3 nodes, I assume you mean 3 clusters? If I understand correctly, say
you have 3 clusters, A, B, and C, you simultaneously:
- want to copy from A and B to C, to get an aggregation in C
- want to copy fram A and C to B, to get a fail-back aggregation in B.
Now what will happen when a message is produced in cluster a?
- it will be copied to both C and B.
- the copy wil cause a new copy in C and B,
etc.
There are several ways out if this, depending on your use case. It's pretty
easy to change the behaviour of the mirrormaker, for example to copy it to
$topic-aggregation instead of $topic, and to not copy it when the topic
ends with aggregation

On Tue, Jun 28, 2016 at 10:15 AM cs user  wrote:

> Hi All,
>
> So I understand I can run the following to aggregate topics from two
> different clusters into a mirror:
>
> bin/kafka-run-class.sh kafka.tools.MirrorMaker --consumer.config
> sourceCluster1Consumer.config --consumer.config
> sourceCluster2Consumer.config --num.streams 2 --producer.config
> targetClusterProducer.config --whitelist=".*"
>
> Lets say my kafka mirror cluster consists of 3 nodes, can the above process
> be started on each of the 3 nodes, so that in the event it fails on one the
> other 2 will keep going?
>
> Or should only one of the nodes attempt to perform the aggregation?
>
> Thanks!
>


Re: SSL support for command line tools

2016-06-23 Thread Gerard Klijs
That particular tool doen't seem to support ssl, at least not the 0.10
version.

On Thu, Jun 23, 2016 at 9:17 AM Radu Radutiu <rradu...@gmail.com> wrote:

> I have read the documentation and I can connect the consumer and producer
> successfully with SSL. However I have trouble running other scripts like
>
> bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list
> {brokerUrl} —topic {topicName} --time -2
>
> if the broker is configured with SSL only.
>
> Regards,
> Radu
>
> On 23 June 2016 at 01:46, Harsha <ka...@harsha.io> wrote:
>
> > Radu,
> >  Please follow the instructions here
> >  http://kafka.apache.org/documentation.html#security_ssl . At
> >  the end of the SSL section we've an example for produce and
> >  consumer command line tools to pass in ssl configs.
> >
> > Thanks,
> > Harsha
> >
> > On Wed, Jun 22, 2016, at 07:40 AM, Gerard Klijs wrote:
> > > To eleborate:
> > > We start the process with --command-config /some/folder/ssl.properties
> > > the
> > > file we include in the image, and contains the ssl properties it needs,
> > > which is a subset of the properties (those specific for ssl) the client
> > > uses. In this case the certificate is accessed in a data container,
> > > having
> > > access to the same certificate as the broker (so we don't need to set
> > > acl's
> > > to use the tool).
> > >
> > > On Wed, Jun 22, 2016 at 2:47 PM Gerard Klijs <gerard.kl...@dizzit.com>
> > > wrote:
> > >
> > > > You need to pass the correct options, similar to how you would do to
> a
> > > > client. We use the consumer-groups in a docker container, in an
> > environment
> > > > witch is now only SSL (since the schema registry now supports it).
> > > >
> > > > On Wed, Jun 22, 2016 at 2:47 PM Radu Radutiu <rradu...@gmail.com>
> > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> Is is possible to configure the command line tools like
> > > >> kafka-consumer-groups.sh , kafka-topics.sh and all other command
> that
> > are
> > > >> not a consumer or producer to connect to a SSL only kafka cluster ?
> > > >>
> > > >> Regards,
> > > >> Radu
> > > >>
> > > >
> >
>


Re: Leader crash and data loss

2016-06-23 Thread Gerard Klijs
If your producer has acks set to 0, or if the retries is set to 0, in the
properties, it will be lost, else it will most likely be retried and send
to the new leader.

On Thu, Jun 23, 2016 at 2:53 AM Saeed Ansari  wrote:

> Hi,
> I searched a lot for my question and I did not find a good answer may
> someone help me in this group?
>
> When leader broker for a partition fails, ZK elects a new leader and this
> may take seconds. What happens to data published to that broker during
> election?
> How Kafka handles messages to failed broker?
>
>
> Thank you,
> Saeed
>


Re: SSL support for command line tools

2016-06-22 Thread Gerard Klijs
To eleborate:
We start the process with --command-config /some/folder/ssl.properties the
file we include in the image, and contains the ssl properties it needs,
which is a subset of the properties (those specific for ssl) the client
uses. In this case the certificate is accessed in a data container, having
access to the same certificate as the broker (so we don't need to set acl's
to use the tool).

On Wed, Jun 22, 2016 at 2:47 PM Gerard Klijs <gerard.kl...@dizzit.com>
wrote:

> You need to pass the correct options, similar to how you would do to a
> client. We use the consumer-groups in a docker container, in an environment
> witch is now only SSL (since the schema registry now supports it).
>
> On Wed, Jun 22, 2016 at 2:47 PM Radu Radutiu <rradu...@gmail.com> wrote:
>
>> Hi,
>>
>> Is is possible to configure the command line tools like
>> kafka-consumer-groups.sh , kafka-topics.sh and all other command that are
>> not a consumer or producer to connect to a SSL only kafka cluster ?
>>
>> Regards,
>> Radu
>>
>


Re: SSL support for command line tools

2016-06-22 Thread Gerard Klijs
You need to pass the correct options, similar to how you would do to a
client. We use the consumer-groups in a docker container, in an environment
witch is now only SSL (since the schema registry now supports it).

On Wed, Jun 22, 2016 at 2:47 PM Radu Radutiu  wrote:

> Hi,
>
> Is is possible to configure the command line tools like
> kafka-consumer-groups.sh , kafka-topics.sh and all other command that are
> not a consumer or producer to connect to a SSL only kafka cluster ?
>
> Regards,
> Radu
>


Re: Zookeeper offsets in new consumer

2016-06-21 Thread Gerard Klijs
No, why would you want to store the offsets in zookeeper? One of the
improvements is to not depend on zookeeper for the offsets. And there is
tooling to get the offsets (although the consumer group must exist).

On Mon, Jun 20, 2016 at 10:57 PM Bryan Baugher  wrote:

> Hi everyone,
>
> With the new Kafka consumer[1] is it possible to use zookeeper based offset
> storage?
>
> Bryan
>
> [1] -
>
> http://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
>


Re: Kafka logs on a Docker volume

2016-06-17 Thread Gerard Klijs
What do you mean with a *docker volume*? It's best to use a data container,
and use the volumes in your broker container, this way you can destroy the
broker container without affecting the data. The data container itself
needs to be configured depending on the host. For example when the host is
running SELinux, the volumes need some specific rights,
http://www.projectatomic.io/blog/2015/06/using-volumes-with-docker-can-cause-problems-with-selinux/


On Fri, Jun 17, 2016 at 1:25 PM OGrandeDiEnne 
wrote:

> Hello people,
>
> I'm running one single kafka broker from within a docker container. The
> folder where kafka writes the logs is mounted as *docker volume* on my
> system.
>
> As soon as I try to create a topic I get this error
>
> [2016-06-15 10:22:53,602] ERROR [KafkaApi-0] Error when handling request
>
> {controller_id=0,controller_epoch=1,partition_states=[{topic=mytopic,partition=0,controller_epoch=1,leader=0,leader_epoch=0,isr=[0],zk_version=0,replicas=[0]}],live_leaders=[{id=0,host=kafkadocker,port=9092}]}
> (kafka.server.KafkaApis)
> *java.io.IOException: Invalid argument*
> at sun.nio.ch.FileChannelImpl.map0(Native Method)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:926)
> *at kafka.log.OffsetIndex.(OffsetIndex.scala:75)*
> at kafka.log.LogSegment.(LogSegment.scala:58)
> at kafka.log.Log.loadSegments(Log.scala:233)
> at kafka.log.Log.(Log.scala:101)
> at kafka.log.LogManager.createLog(LogManager.scala:363)
> at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:96)
> at
>
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:176)
> at
>
> kafka.cluster.Partition$$anonfun$4$$anonfun$apply$2.apply(Partition.scala:176)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:176)
> at kafka.cluster.Partition$$anonfun$4.apply(Partition.scala:170)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
> at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
> at kafka.cluster.Partition.makeLeader(Partition.scala:170)
> at
>
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:699)
> at
>
> kafka.server.ReplicaManager$$anonfun$makeLeaders$4.apply(ReplicaManager.scala:698)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:698)
> at
>
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:644)
> at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:144)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:80)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
> at java.lang.Thread.run(Thread.java:745)
>
>
> The error is an IOException, so it looks like the broker has trouble trying
> to access the log file.
> Looks like Kafka assumes a feature of the underlying filesystem, which is
> not present.
>
> I do not get any error if I keep the kafka log-files inside the docker
> container.
>
> Have you seen the issue before ?
>
> Thanks.
>
> *Valerio*
>


Re: [DISCUSS] Java 8 as a minimum requirement

2016-06-17 Thread Gerard Klijs
+1 we already use java 8

On Fri, Jun 17, 2016 at 11:07 AM Jaikiran Pai 
wrote:

> +1 for Java 8. Our eco-system which uses Kafka and many other open
> source projects are now fully on Java 8 since a year or more.
>
> -Jaikiran
> On Friday 17 June 2016 02:15 AM, Ismael Juma wrote:
> > Hi all,
> >
> > I would like to start a discussion on making Java 8 a minimum requirement
> > for Kafka's next feature release (let's say Kafka 0.10.1.0 for now). This
> > is the first discussion on the topic so the idea is to understand how
> > people feel about it. If people feel it's too soon, then we can pick up
> the
> > conversation again after Kafka 0.10.1.0. If the feedback is mostly
> > positive, I will start a vote thread.
> >
> > Let's start with some dates. Java 7 hasn't received public updates since
> > April 2015[1], Java 8 was released in March 2014[2] and Java 9 is
> scheduled
> > to be released in March 2017[3].
> >
> > The first argument for dropping support for Java 7 is that the last
> public
> > release by Oracle contains a large number of known security
> > vulnerabilities. The effectiveness of Kafka's security features is
> reduced
> > if the underlying runtime is not itself secure.
> >
> > The second argument for moving to Java 8 is that it adds a number of
> > compelling features:
> >
> > * Lambda expressions and method references (particularly useful for the
> > Kafka Streams DSL)
> > * Default methods (very useful for maintaining compatibility when adding
> > methods to interfaces)
> > * java.util.stream (helpful for making collection transformations more
> > concise)
> > * Lots of improvements to java.util.concurrent (CompletableFuture,
> > DoubleAdder, DoubleAccumulator, StampedLock, LongAdder, LongAccumulator)
> > * Other nice things: SplittableRandom, Optional (and many others I have
> not
> > mentioned)
> >
> > The third argument is that it will simplify our testing matrix, we won't
> > have to test with Java 7 any longer (this is particularly useful for
> system
> > tests that take hours to run). It will also make it easier to support
> Scala
> > 2.12, which requires Java 8.
> >
> > The fourth argument is that many other open-source projects have taken
> the
> > leap already. Examples are Cassandra[4], Lucene[5], Akka[6], Hadoop 3[7],
> > Jetty[8], Eclipse[9], IntelliJ[10] and many others[11]. Even Android will
> > support Java 8 in the next version (although it will take a while before
> > most phones will use that version sadly). This reduces (but does not
> > eliminate) the chance that we would be the first project that would
> cause a
> > user to consider a Java upgrade.
> >
> > The main argument for not making the change is that a reasonable number
> of
> > users may still be using Java 7 by the time Kafka 0.10.1.0 is released.
> > More specifically, we care about the subset who would be able to upgrade
> to
> > Kafka 0.10.1.0, but would not be able to upgrade the Java version. It
> would
> > be great if we could quantify this in some way.
> >
> > What do you think?
> >
> > Ismael
> >
> > [1] https://java.com/en/download/faq/java_7.xml
> > [2] https://blogs.oracle.com/thejavatutorials/entry/jdk_8_is_released
> > [3] http://openjdk.java.net/projects/jdk9/
> > [4] https://github.com/apache/cassandra/blob/trunk/README.asc
> > [5] https://lucene.apache.org/#highlights-of-this-lucene-release-include
> > [6] http://akka.io/news/2015/09/30/akka-2.4.0-released.html
> > [7] https://issues.apache.org/jira/browse/HADOOP-11858
> > [8] https://webtide.com/jetty-9-3-features/
> > [9] http://markmail.org/message/l7s276y3xkga2eqf
> > [10]
> >
> https://intellij-support.jetbrains.com/hc/en-us/articles/206544879-Selecting-the-JDK-version-the-IDE-will-run-under
> > [11] http://markmail.org/message/l7s276y3xkga2eqf
> >
>
>


Re: Message loss with kafka 0.8.2.2

2016-06-17 Thread Gerard Klijs
You could try set the acks to -1, so you wait for the produce to be
succesfull, until most other brokers also received the message. Another
thing you could try is set the unclean.leader.election.enable to false
(this is a setting on the broker).
I think what's happening now is that the message in your example is send to
two different brokers, because one of them is not sending the record to the
actual leader. Since you have set your acks to one, you wont see any error
in the producer, cause it succeeded in sending it to the broker. You most
likely will see some error on the broker, because it is not the leader.

On Fri, Jun 17, 2016 at 5:19 AM Gulia, Vikram  wrote:

> Hi Users, I am facing message loss while using kafka v 0.8.2.2. Please see
> details below and help me if you can.
>
> Issue: 2 messages produced to same partition one by one – Kafka producer
> returns same offset back which means message produced earlier is lost.<
> http://stackoverflow.com/questions/37732088/2-messages-produced-to-same-partition-one-by-one-message-1-overridden-by-next
> >
>
> Details:
> I have a unique problem which is happening like 50-100 times a day with
> message volume of more than 2 millions per day on the topic.I am using
> Kafka producer API 0.8.2.2 and I have 12 brokers (v 0.8.2.2) running in
> prod with replication of 4. I have a topic with 60 partitions and I am
> calculating partition for all my messages and providing the value in the
> ProducerRecord itself. Now, the issue -
>
> Application creates 'ProducerRecord' using -
>
> new ProducerRecord(topic, 30, null, message1);
> providing topic, value message1 and partition 30. Then application call
> the send method and future is returned -
>
> // null is for callback
> Future future = producer.send(producerRecord. null);
> Now, app prints the offset and partition value by calling get on Future
> and then getting values from RecordMetadata - this is what i get -
>
> Kafka Response : partition 30, offset 3416092
> Now, the app produce the next message - message2 to same partition -
>
> new ProducerRecord(topic, 30, null, message2);
> and kafka response -
>
> Kafka Response : partition 30, offset 3416092
> I received the same offset again, and if I pull message from the offset of
> partition 30 using simple consumer, it ends up being the message2 which
> essentially mean i lost the message1.
>
> Currently, the messages are produced using 10 threads each having its own
> instance of kafka producer (Earlier threads shared 1 Kafka producer but it
> was performing slow and we still had message loss).
> I am using all default properties for producer except a few mentioned
> below, the message (String payload) size can be a few kbs to a 500 kbs. I
> am using acks value of 1.
>
> value.serializer: org.apache.kafka.common.serialization.StringSerializer
> key.serializer: org.apache.kafka.common.serialization.StringSerializer
> bootstrap.servers: {SERVER VIP ENDPOINT}
> acks: 1
> batch.size: 204800
> linger.ms: 10
> send.buffer.bytes: 1048576
> max.request.size: 1000
>
> What am i doing wrong here? Is there something I can look into or any
> producer property or server property I can tweak to make sure i don't lose
> any messages. I need some help here soon as I am losing some critical
> messages in production which is not good at all because as there is no
> exception given by Kafka Producer its even hard to find out the message
> lost unless downstream process reports it.
>
> Thank you,
> Vikram Gulia
>


Re: KafkaNet client, Avro and HDFS connector

2016-06-13 Thread Gerard Klijs
It's correct that's that's that what needs te be done. But usually you let
the serializer do that for your, if there is no way te set a serializer,
you probably could do it this way, and the message can be read by a regular
kafka avro consumer.

On Mon, Jun 13, 2016 at 5:45 PM Tauzell, Dave 
wrote:

> Hello,
>
> I have a .Net client (Microsoft KafkaNet) putting serialized Avro messages
> onto Kafka.  I started up the Kafka Connect HDFS connector but it fails
> with a "No magic byte!" error.  After some research it appears that I need
> to do the following:
>
>
> 1.   Register my avro schema with the Schema Registry
>
> 2.   When adding my message I need to prepend my data with:
>
> a.   A null byte (byte 0)
>
> b.  32 bit schema id
>
> Does that seem right?  I don't think the KafkaNet library will do this for
> me.
>
> -Dave
>
> Dave Tauzell | Senior Software Engineer | Surescripts
> O: 651.855.3042 | www.surescripts.com |
> dave.tauz...@surescripts.com
> Connect with us: Twitter I LinkedIn<
> https://www.linkedin.com/company/surescripts-llc> I Facebook<
> https://www.facebook.com/Surescripts> I YouTube<
> http://www.youtube.com/SurescriptsTV>
>
>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>


Re: Questions on Kafka Security

2016-06-09 Thread Gerard Klijs
If you can put the acl in a file, and there will be little or none changes,
you might be best of writing your own Authorizer implementation. If you can
used a shared file system to store the config you would even be able to
easily change it, and it will be the same across the cluster.

On Thu, Jun 9, 2016 at 5:54 AM Harsha  wrote:

> 1) Can the ACLs be specified statically in a config file of sorts? Or is
> bin/kafka-acl.sh or a similar kafka client API the only way to specify
> the
> ACLs?
>
> kafka-acls.sh executes simpleAClAuthorizer and the only way it accepts
> acls is via command-line params.
>
>
> 2) I notice that bin/kafka-acl.sh takes an argument to specify
> zookeeper,
> but doesn't seem to have a mechanism to specify any other authentication
> constructs. Does that mean anyone can point to my zookeeper instance and
> add/remove the ACLs?
>
> simpleAClAuthorizer uses zookeeper as ACL storage.  Remember in kerberos
> secure mode we highly recommend to turn on zookeeper.set.acl . This will
> put "sasl:principal_name" acls on zookeeper nodes. Here principal_name
> is the broker's principal.
> So one has to login with that principal name to make changes to any of
> the zookeeper nodes.
> Only the users who has access to the broker's keytab can modify
> zookeeper nodes.
>
> 3) I'd like to use SSL certificates for Authentication and ACLs, but
> don't
> wont to use encryption over the wire because of latency concerns
> mentioned
> here: https://issues.apache.org/jira/browse/KAFKA-2561
> Is that supported? Any instructions?
>
> openSSL is not supported yet.  Also dropping the encryption in SSL
> channel is not possible yet.
> Any reason for not use kerberos for this since we support non-encrypted
> channel for kerberos.
>
>
> Thanks,
> harsha
>
>
> On Wed, Jun 8, 2016, at 02:06 PM, Samir Shah wrote:
> > Hello,
> >
> > Few questions on Kafka Security.
> >
> > 1) Can the ACLs be specified statically in a config file of sorts? Or is
> > bin/kafka-acl.sh or a similar kafka client API the only way to specify
> > the
> > ACLs?
> >
> > 2) I notice that bin/kafka-acl.sh takes an argument to specify zookeeper,
> > but doesn't seem to have a mechanism to specify any other authentication
> > constructs. Does that mean anyone can point to my zookeeper instance and
> > add/remove the ACLs?
> >
> > 3) I'd like to use SSL certificates for Authentication and ACLs, but
> > don't
> > wont to use encryption over the wire because of latency concerns
> > mentioned
> > here: https://issues.apache.org/jira/browse/KAFKA-2561
> > Is that supported? Any instructions?
> >
> > Thanks in advance.
> > - Samir
>


Re: MockConsumer and MockProducer code examples

2016-06-08 Thread Gerard Klijs
I don't know about 'normal', we use the mockproducer with autocomplete set
to false, and use a responseThread to simulate produce behaviour like this:

private final class ResponseThread extends Thread {

public void run() {
try {
Thread.sleep(responseTime);
} catch (InterruptedException e) {
LOG.info("error simulating response time of the broker");
}
if (errorNow()) {
((MockProducer) getProducer()).errorNext(new
TimeoutException());
} else {
((MockProducer) getProducer()).completeNext();
}
}
}

where the errorNow() is a function which depending on a parameter has
a change to give back an error.


On Wed, Jun 8, 2016 at 9:10 AM Steve Crane  wrote:

> Anyone got any pointers to simple examples of their use?
>
> Through trial and error I have managed to queue a message to a mock
> consumer by:
>
> Initialisation:
>
> TopicPartition partition = new TopicPartition(topicName, 0);
> consumer.rebalance(singletonList(partition));
> consumer.seek(partition, 0);
>
> Queuing a message:
>
> consumer.addRecord(new ConsumerRecord<>(topicName, 0, 0, key, obj));
>
> Is this the recommended way of doing things? Advice welcome!
>
> -
>
> The information contained in this e-mail and in any attachments is
> confidential and is designated solely for the attention of the intended
> recipient(s). If you are not an intended recipient, you must not use,
> disclose, copy, distribute or retain this e-mail or any part thereof. If
> you have received this e-mail in error, please notify the sender by return
> e-mail and delete all copies of this e-mail from your computer system(s).
> Please direct any additional queries to: communicati...@s3group.com.
> Thank You. Silicon and Software Systems Limited (S3 Group). Registered in
> Ireland no. 378073. Registered Office: South County Business Park,
> Leopardstown, Dublin 18.
>


Re: Kafka take too long to update the client with metadata when a broker is gone

2016-06-03 Thread Gerard Klijs
I asume you use a replication factor of 3 for the topics? When I ran some
test with producer/consumers in a dockerized setup, there where only few
failures before the producer switched to to correct new broker again. I
don't know the exact time, but seemed like a few seconds at max, this was
with with 0.9.0.0.

On Fri, Jun 3, 2016 at 9:00 AM safique ahemad  wrote:

> Hi Steve,
>
> There is no way to access that from public side so I won't be able to do
> that. Sorry for that.
> But the step is quite simple. The only difference is that we have deployed
> Kafka cluster using mesos url.
>
> 1) launch 3 Kafka broker cluster and create a topic with multiple
> partitions at least 3 so that one partition land at least on a broker.
> 2) launch consumer/producer client.
> 3) kill a broker
> 4) just observe the behavior of producer client
>
>
>
> On Thu, Jun 2, 2016 at 8:15 PM, Steve Tian 
> wrote:
>
> > I see.  I'm not sure if this is a known issue.  Do you mind share the
> > brokers/topics setup and the steps to reproduce this issue?
> >
> > Cheers, Steve
> >
> > On Fri, Jun 3, 2016, 9:45 AM safique ahemad 
> wrote:
> >
> > > you got it right...
> > >
> > > But DialTimeout is not a concern here. Client try fetching metadata
> from
> > > Kafka brokers but Kafka give them stale metadata near 30-40 sec.
> > > It try to fetch 3-4 time in between until it get updated metadata.
> > > This is completely different problem than
> > > https://github.com/Shopify/sarama/issues/661
> > >
> > >
> > >
> > > On Thu, Jun 2, 2016 at 6:05 PM, Steve Tian 
> > > wrote:
> > >
> > > > So you are coming from https://github.com/Shopify/sarama/issues/661
> ,
> > > > right?   I'm not sure if anything from broker side can help but looks
> > > like
> > > > you already found DialTimeout on client side can help?
> > > >
> > > > Cheers, Steve
> > > >
> > > > On Fri, Jun 3, 2016, 8:33 AM safique ahemad 
> > > wrote:
> > > >
> > > > > kafka version:0.9.0.0
> > > > > go sarama client version: 1.8
> > > > >
> > > > > On Thu, Jun 2, 2016 at 5:14 PM, Steve Tian <
> steve.cs.t...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Client version?
> > > > > >
> > > > > > On Fri, Jun 3, 2016, 4:44 AM safique ahemad <
> saf.jnu...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > We are using Kafka broker cluster in our data center.
> > > > > > > Recently, It is realized that when a Kafka broker goes down
> then
> > > > client
> > > > > > try
> > > > > > > to refresh the metadata but it get stale metadata upto near 30
> > > > seconds.
> > > > > > >
> > > > > > > After near 30-35 seconds, updated metadata is obtained by
> client.
> > > > This
> > > > > is
> > > > > > > really a large time for the client continuously gets send
> failure
> > > for
> > > > > so
> > > > > > > long.
> > > > > > >
> > > > > > > Kindly, reply if any configuration may help here or something
> > else
> > > or
> > > > > > > required.
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Regards,
> > > > > > > Safique Ahemad
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Regards,
> > > > > Safique Ahemad
> > > > > GlobalLogic | Leaders in software R services
> > > > > P :+91 120 4342000-2990 | M:+91 9953533367
> > > > > www.globallogic.com
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Regards,
> > > Safique Ahemad
> > > GlobalLogic | Leaders in software R services
> > > P :+91 120 4342000-2990 | M:+91 9953533367
> > > www.globallogic.com
> > >
> >
>
>
>
> --
>
> Regards,
> Safique Ahemad
> GlobalLogic | Leaders in software R services
> P :+91 120 4342000-2990 | M:+91 9953533367
> www.globallogic.com
>


Re: Best monitoring tool for Kafka in production

2016-06-02 Thread Gerard Klijs
Not that I have anything against paying for monitoring, or against
Confluent, but you will need your consumers to be using kafka 1.10 is you
want to make most out of the confluent solution. We currently are using
zabbix, it's free, and it has complete functionality in one product. It
does can be a but verbose, and the graphs don't look very fancy.

On Thu, Jun 2, 2016 at 1:16 PM Michael Noll  wrote:

> Hafsa,
>
> since you specifically asked about non-free Kafka monitoring options as
> well:  As of version 3.0.0, the Confluent Platform provides a commercial
> monitoring tool for Kafka called Confluent Control Center.  (Disclaimer: I
> work for Confluent.)
>
> Quoting from the product page at
> http://www.confluent.io/product/control-center:
>
> "Know where your messages are at every step between source and destination.
> Identify slow brokers, delivery failures, and sleuth the truth out of
> unexpected latency in your network. Confluent Control Center delivers
> end-to-end stream monitoring. Unlike other monitoring tools, this one is
> purpose-built for your Kafka environment. Instead of identifying the
> throughput in your data center or other “clocks and cables” types of
> monitors, it tracks messages."
>
> Best wishes,
> Michael
>
>
>
>
> On Wed, May 25, 2016 at 12:42 PM, Hafsa Asif 
> wrote:
>
> > Hello,
> >
> > What is the best monitoring tool for Kafka in production, preferable free
> > tool? If there is no free tool, then please mention non-free efficient
> > monitoring tools also.
> >
> > We are feeling so much problem without monitoring tool. Sometimes brokers
> > goes down or consumer is not working, we are not informed.
> >
> > Best Regards,
> > Hafsa
> >
>


Re: MirrorMaker and identical replicas

2016-06-01 Thread Gerard Klijs
No you can't because:
- because of producer failures some messages may be duplicated.
- your not sure the cluster your copying from hasn't some already removed
data.

We try to solve the same problem, and are probably going to solve it by
copying the timestamps with the mirror maker, and on the switch let the
consumer go near the and, and check with the timestamp when it needs to
start processing again.

On Thu, Jun 2, 2016 at 5:10 AM Dave Cahill  wrote:

> Hi,
>
> I've read up a little on MirrorMaker (e.g. the wiki [1] and KIP-3 [2]), but
> haven't yet found a definitive answer to the following question.
>
> Let's assume I am producing a certain topic to a Kafka cluster in
> datacenter A.
>
> I set up MirrorMaker in datacenter B and C to mirror the topic from
> datacenter A.
>
> Can I assume that the mirrored data in datacenter B and C are exactly the
> same, including the same offsets? For example, let's say a consumer in
> datacenter B dies, and I know the offset up to which it has read. Can a
> consumer in datacenter C take over exactly where B left off by reading from
> its own copy starting at the same offset? If MirrorMaker has logic to
> prevent dupes (something like what's described in [3]) and lost messages,
> it seems like this should work.
>
> Please let me know if my terminology is imprecise and I'll try to clarify!
>
> Thanks,
> Dave.
>
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330
> [2]
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-3+-+Mirror+Maker+Enhancement
> [3]
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIgetexactly-oncemessagingfromKafka
> ?
>


Re: Kafka encryption

2016-06-01 Thread Gerard Klijs
You could add a header to every message, with information whether it's
encrypted or not, then you don't have to encrypt all the messages, or you
only do it for some topics.

On Thu, Jun 2, 2016 at 6:36 AM Bruno Rassaerts <bruno.rassae...@novazone.be>
wrote:

> It works indeed but encrypting individual messages really influences the
> batch compression done by Kafka.
> Performance drops to about 1/3 of what it is without (even if we prepare
> the encrypted samples upfront).
> In the end what we going for is only encrypting what we really really need
> to encrypt, not every message systematically.
>
> > On 31 May 2016, at 13:00, Gerard Klijs <gerard.kl...@dizzit.com> wrote:
> >
> > If you want system administrators not being able to see the data, the
> only
> > option is encryption, with only the clients sharing the key (or whatever
> is
> > used to (de)crypt the data). Like the example from eugene. I don't know
> the
> > kind of messages you have, but you could always wrap something around any
> > (de)serializer your currently using.
> >
> > On Tue, May 31, 2016 at 12:21 PM Bruno Rassaerts <
> > bruno.rassae...@novazone.be> wrote:
> >
> >> I’ve asked the same question in the past, and disk encryption was
> >> suggested as a solution as well.
> >> However, as far as I know, disk encryption will not prevent your data to
> >> be stolen when the machine is compromised.
> >> What we are looking for is even an additional barrier, so that even
> system
> >> administrators do not have access to the data.
> >> Any suggestions ?
> >>
> >>> On 24 May 2016, at 14:40, Tom Crayford <tcrayf...@heroku.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> There's no encryption at rest. It's recommended to use filesystem
> >>> encryption, or encryption of each individual message before producing
> it
> >>> for this.
> >>>
> >>> Only the new producer and consumers have SSL support.
> >>>
> >>> Thanks
> >>>
> >>> Tom Crayford
> >>> Heroku Kafka
> >>>
> >>> On Tue, May 24, 2016 at 11:33 AM, Snehalata Nagaje <
> >>> snehalata.nag...@harbingergroup.com> wrote:
> >>>
> >>>>
> >>>>
> >>>> Thanks for quick reply.
> >>>>
> >>>> Do you mean If I see messages in kafka, those will not be readable?
> >>>>
> >>>> And also, we are using new producer but old consumer , does old
> consumer
> >>>> have ssl support?
> >>>>
> >>>> As mentioned in document, its not there.
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Snehalata
> >>>>
> >>>> - Original Message -
> >>>> From: "Mudit Kumar" <mudit.ku...@askme.in>
> >>>> To: users@kafka.apache.org
> >>>> Sent: Tuesday, May 24, 2016 3:53:26 PM
> >>>> Subject: Re: Kafka encryption
> >>>>
> >>>> Yes,it does that.What specifically you are looking for?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 5/24/16, 3:52 PM, "Snehalata Nagaje" <
> >>>> snehalata.nag...@harbingergroup.com> wrote:
> >>>>
> >>>>> Hi All,
> >>>>>
> >>>>>
> >>>>> We have requirement of encryption in kafka.
> >>>>>
> >>>>> As per docs, we can configure kafka with ssl, for secured
> >> communication.
> >>>>>
> >>>>> But does kafka also stores data in encrypted format?
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Snehalata
> >>>>
> >>
> >>
>
>


Re: Is kafka message timestamp preserved in mirror maker

2016-06-01 Thread Gerard Klijs
Although I think it should have been an included option, it's very easy to
create and use your own message handler with the mirror maker. You can
simply copy the timestamp and type from the consumerecord to the
producerecord.

On Wed, Jun 1, 2016 at 5:48 PM Gwen Shapira  wrote:

> The intent was definitely as you described, but I think we forgot to
> actually modify the code accordingly.
>
> Do you mind opening a JIRA on the issue?
>
> Gwen
>
> On Wed, Jun 1, 2016 at 4:13 PM, tao xiao  wrote:
>
> > Hi,
> >
> > As per the description in KIP-32 the timestamp of Kafka message is
> > unchanged mirrored from one cluster to another if createTime is used.
> But I
> > tested with mirror maker in Kafka-0.10 this doesn't seem the case. The
> > timestamp of the same message is different in source and target. I
> checked
> > the latest code from trunk the timestamp field is not set in the producer
> > record that is sent to target.
> >
> >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/MirrorMaker.scala#L678
> >
> > Is this a bug or preserving timestamp is not a built-in feature?
> >
>


Re: Broker replication error “Not authorized to access topics: [Topic authorization failed.] ”

2016-06-01 Thread Gerard Klijs
Not necessarily "admin" any name is ok, we use the CN stored in the
keystore, but we don't use sasl, and that's how the brokers communicate to
each other. You need some way of allowing them to communicate.

On Wed, Jun 1, 2016 at 10:33 AM Rajini Sivaram <rajinisiva...@googlemail.com>
wrote:

> The server configuration in
>
> http://stackoverflow.com/questions/37536259/broker-replication-error-not-authorized-to-access-topics-topic-authorization
>  specifies security.inter.broker.protocol=PLAINTEXT. This would result in
> the principal "anonymous" to be used for inter-broker communication. Looks
> like you are expecting to use the username "admin" for the broker, so you
> should set security.inter.broker.protocol=SASL_PLAINTEXT. There is also a
> missing entry in the KafkaServer section of jaas.conf. You need to add
> user_admin="welcome1".
>
> Hope that helps.
>
> On Wed, Jun 1, 2016 at 7:23 AM, Gerard Klijs <gerard.kl...@dizzit.com>
> wrote:
>
> > What do you have configured, do you have the brokers set as super users,
> > with the right certificate?
> >
> > On Wed, Jun 1, 2016 at 6:43 AM 换个头像 <guoxu1...@foxmail.com> wrote:
> >
> > > Hi Kafka Experts,
> > >
> > >
> > > I setup a secured kafka cluster(slal-plain authentication). But when I
> > try
> > > to add ACLs for some existing topics, all three brokers  output errors
> > like
> > > "Not authorized to access topics: [Topic authorization failed.]".
> > >
> > >
> > > I checked my configuration several times according to official
> > > document(security section), but still not able to figure out why this
> > error
> > > caused.
> > >
> > >
> > > Please help.
> > >
> > >
> > > Broker replication error “Not authorized to access topics: [Topic
> > > authorization failed.] ”
> > >
> > >
> >
> http://stackoverflow.com/questions/37536259/broker-replication-error-not-authorized-to-access-topics-topic-authorization
> > >
> > >
> > > Regards
> > > Shawn
> >
>
>
>
> --
> Regards,
>
> Rajini
>


Re: SSL certificate CN validation against FQDN in v0.9

2016-06-01 Thread Gerard Klijs
We use almost the same properties (the same if you account for defaults),
and have not seen any check whether the FQDN matches the CN, as it's al
working without matching names. It seems the requirement is only needed if
you use SASL_SSL as security protocol, which from you config you don't seem
to do (just SSL).

On Wed, Jun 1, 2016 at 9:19 AM Phi Primed  wrote:

> I using Kafka v 0.9 with TLS enabled, including client auth.
>
> In
>
> http://www.confluent.io/blog/apache-kafka-security-authorization-authentication-encryption
> ,
> it is mentioned that "We need to generate a key and certificate for each
> broker and client in the cluster. The common name (CN) of the broker
> certificate must match the fully qualified domain name (FQDN) of the server
> as the client compares the CN with the DNS domain name to ensure that it is
> connecting to the desired broker (instead of a malicious one)."
>
> 1) Is there a specific additional configuration parameter to enable this or
> does it always happen if the other TLS/SSL parameters are set (as e.g.
> shown below) ?
>
> 2) Is it possible to make the broker(s) carry out the same check against
> client certificates if SSL client auth is enabled ?
>
> Regards,
> Phi
>
>
> listeners=SSL://:9093
>
> authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer
>
> super.users=User:CN=broker1.mydomain.com,OU=ABC,O=XYZ,L=SFO,ST=CA,C=US
>
> ssl.keystore.location=/opt/ssl/kafka.server.keystore.jks
> ssl.keystore.password=test1234
> ssl.key.password=test1234
> ssl.truststore.location=/opt/ssl/kafka.server.truststore.jks
> ssl.truststore.password=test1234
>
> ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
> ssl.keystore.type=JKS
> ssl.truststore.type=JKS
>
> security.inter.broker.protocol=SSL
>
> ssl.client.auth=required
>


Re: Scalability of Kafka Consumer 0.9.0.1

2016-06-01 Thread Gerard Klijs
If I understand it correctly each consumer should have it's 'own' thread,
and should not be accessible from other threads. But you could
(dynamically) create enough threads to cover all the partitions, so each
consumer only reads from one partition. You could also let all those
consumers access some threadsafe object if you need to combine the result.
In your linked example the consumers just each do there part, with solves
the multi-threaded issue, but when you want to combine data from different
consumer threads it becomes more tricky.

On Wed, Jun 1, 2016 at 2:57 AM BYEONG-GI KIM  wrote:

> Hello.
>
> I've implemented a Kafka Consumer Application which consume large number of
> monitoring data from Kafka Broker and analyze those data accordingly.
>
> I referred to a guide,
>
> http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
> ,
> since I thought the app needs to implement multi-threading for Kafka
> Consumer per Topic. Actually, A topic is assigned to each open-source
> monitoring software, e.g., Nagios, Collectd, etc., in order to distinguish
> those because each of these uses its own message format, such as JSON,
> String, and so on.
>
> There was, however, an Exception even though my source code for the Kafka
> Consumer are mostly copied and pasted from the guide;
> *java.util.ConcurrentModificationException:
> KafkaConsumer is not safe for multi-threaded access*
>
> First Question. Could the implementation in the guide really prevent the
> Exception?
>
> And second Question is, could the KafkaConsumer support such huge amount of
> data with one thread? The KafkaConsumer seems not thread-safe, and it can
> subscribe multi-topics at once. Do I need to change the implementation from
> the multi-threaded to one-thread and subscribing multi-topics?... I'm just
> wonder whether a KafkaConsumer is able to stand the bunch of data without
> performance degradation.
>
> Thanks in advance!
>
> Best regards
>
> KIM
>


Re: Broker replication error “Not authorized to access topics: [Topic authorization failed.] ”

2016-06-01 Thread Gerard Klijs
What do you have configured, do you have the brokers set as super users,
with the right certificate?

On Wed, Jun 1, 2016 at 6:43 AM 换个头像  wrote:

> Hi Kafka Experts,
>
>
> I setup a secured kafka cluster(slal-plain authentication). But when I try
> to add ACLs for some existing topics, all three brokers  output errors like
> "Not authorized to access topics: [Topic authorization failed.]".
>
>
> I checked my configuration several times according to official
> document(security section), but still not able to figure out why this error
> caused.
>
>
> Please help.
>
>
> Broker replication error “Not authorized to access topics: [Topic
> authorization failed.] ”
>
> http://stackoverflow.com/questions/37536259/broker-replication-error-not-authorized-to-access-topics-topic-authorization
>
>
> Regards
> Shawn


Re: Kafka encryption

2016-05-31 Thread Gerard Klijs
If you want system administrators not being able to see the data, the only
option is encryption, with only the clients sharing the key (or whatever is
used to (de)crypt the data). Like the example from eugene. I don't know the
kind of messages you have, but you could always wrap something around any
(de)serializer your currently using.

On Tue, May 31, 2016 at 12:21 PM Bruno Rassaerts <
bruno.rassae...@novazone.be> wrote:

> I’ve asked the same question in the past, and disk encryption was
> suggested as a solution as well.
> However, as far as I know, disk encryption will not prevent your data to
> be stolen when the machine is compromised.
> What we are looking for is even an additional barrier, so that even system
> administrators do not have access to the data.
> Any suggestions ?
>
> > On 24 May 2016, at 14:40, Tom Crayford  wrote:
> >
> > Hi,
> >
> > There's no encryption at rest. It's recommended to use filesystem
> > encryption, or encryption of each individual message before producing it
> > for this.
> >
> > Only the new producer and consumers have SSL support.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Tue, May 24, 2016 at 11:33 AM, Snehalata Nagaje <
> > snehalata.nag...@harbingergroup.com> wrote:
> >
> >>
> >>
> >> Thanks for quick reply.
> >>
> >> Do you mean If I see messages in kafka, those will not be readable?
> >>
> >> And also, we are using new producer but old consumer , does old consumer
> >> have ssl support?
> >>
> >> As mentioned in document, its not there.
> >>
> >>
> >> Thanks,
> >> Snehalata
> >>
> >> - Original Message -
> >> From: "Mudit Kumar" 
> >> To: users@kafka.apache.org
> >> Sent: Tuesday, May 24, 2016 3:53:26 PM
> >> Subject: Re: Kafka encryption
> >>
> >> Yes,it does that.What specifically you are looking for?
> >>
> >>
> >>
> >>
> >> On 5/24/16, 3:52 PM, "Snehalata Nagaje" <
> >> snehalata.nag...@harbingergroup.com> wrote:
> >>
> >>> Hi All,
> >>>
> >>>
> >>> We have requirement of encryption in kafka.
> >>>
> >>> As per docs, we can configure kafka with ssl, for secured
> communication.
> >>>
> >>> But does kafka also stores data in encrypted format?
> >>>
> >>>
> >>> Thanks,
> >>> Snehalata
> >>
>
>


Re: kafka.tools.ConsumerOffsetChecker fails for one topic

2016-05-31 Thread Gerard Klijs
It might be there never was/currently isn't a consumer with the group
jopgroup consuming from the twitter topic. I only used it for the new
consumer(offsets in broker), and then the group needs to be 'active' in
order to get the offsets.

On Mon, May 30, 2016 at 2:37 PM Diego Woitasen  wrote:

> Hi,
>   I have these topic in Kafka:
>
> # /opt/kafka/bin/kafka-topics.sh --zookeeper localhost:2181
> --list__consumer_offsets
> facebook
> twitter
>
> # bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group jopgroup
> --zookeeper localhost:2181 --topic twitter
> Exiting due to: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /consumers/jopgroup/offsets/twitter/0.
>
> Same command for "facebook" topic works fine.
>
> More info:
>
> # /opt/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --describe
> --topic twitter
> Topic:twitter   PartitionCount:1ReplicationFactor:1 Configs:
> Topic: twitter  Partition: 0Leader: 1   Replicas: 1
> Isr: 1
>
> Thanks in advance!
> --
>
> Diego Woitasen
> http://flugel.it
> Infrastructure Developers
>


Re: Kafka encryption

2016-05-24 Thread Gerard Klijs
For both old and new consumers/producers you can make your own
(de)serializer to do some encryption, maybe that could be an option?

On Tue, May 24, 2016 at 2:40 PM Tom Crayford  wrote:

> Hi,
>
> There's no encryption at rest. It's recommended to use filesystem
> encryption, or encryption of each individual message before producing it
> for this.
>
> Only the new producer and consumers have SSL support.
>
> Thanks
>
> Tom Crayford
> Heroku Kafka
>
> On Tue, May 24, 2016 at 11:33 AM, Snehalata Nagaje <
> snehalata.nag...@harbingergroup.com> wrote:
>
> >
> >
> > Thanks for quick reply.
> >
> > Do you mean If I see messages in kafka, those will not be readable?
> >
> > And also, we are using new producer but old consumer , does old consumer
> > have ssl support?
> >
> > As mentioned in document, its not there.
> >
> >
> > Thanks,
> > Snehalata
> >
> > - Original Message -
> > From: "Mudit Kumar" 
> > To: users@kafka.apache.org
> > Sent: Tuesday, May 24, 2016 3:53:26 PM
> > Subject: Re: Kafka encryption
> >
> > Yes,it does that.What specifically you are looking for?
> >
> >
> >
> >
> > On 5/24/16, 3:52 PM, "Snehalata Nagaje" <
> > snehalata.nag...@harbingergroup.com> wrote:
> >
> > >Hi All,
> > >
> > >
> > >We have requirement of encryption in kafka.
> > >
> > >As per docs, we can configure kafka with ssl, for secured communication.
> > >
> > >But does kafka also stores data in encrypted format?
> > >
> > >
> > >Thanks,
> > >Snehalata
> >
>


Re: kafka 0.8.2 broker behaviour

2016-05-23 Thread Gerard Klijs
Are you sure consumers are always up, when they are behind they could
generate a lot of traffic in a small amount of time?

On Mon, May 23, 2016 at 9:11 AM Anishek Agarwal  wrote:

> additionally all the read / writes are happening via storm topologies.
>
> On Mon, May 23, 2016 at 12:17 PM, Anishek Agarwal 
> wrote:
>
> > Hello,
> >
> > we are using 4 kafka machines in production with 4 topics and each topic
> > either 16/32 partitions and replication factor of 2. each machine has 3
> > disks for kafka logs.
> >
> > we see a strange behaviour where we see high disk usage spikes on one of
> > the disks on all machines. it varies over time, with different disks
> > showing spikes over time. is this normal .. i thought the partition per
> > topic is distributed per disk and hence unless traffic is such that its
> > hitting one partition this should not happen.
> >
> > any suggestions on how to verify what is happening will be helpful.
> >
> > Thanks
> > anishek
> >
> >
>


Re: OffSet checker.

2016-05-18 Thread Gerard Klijs
I was just trying to do the same thing, but it does not seem to support ssl
(or at least not in combination with the acl). I get similar errors as in
https://issues.apache.org/jira/browse/KAFKA-3151 but when I used
--command-config /home/kafka/consumer.properties to give the ssl properties
it worked. It does only seem to work as long as there are client
connections active.

This was my output:
kafka@2b28607f562f:/usr/bin$ ./kafka-consumer-groups --new-consumer --group
mirrormaker --bootstrap-server 192.168.99.100:9093 --describe
--command-config /home/kafka/consumer.properties
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
mirrormaker, local.payments.accountentry, 0, 26583507, 26583507, 0,
mirrormaker-0_172.17.0.1/172.17.0.1
mirrormaker, local.beb.time, 0, unknown, 0, unknown,
mirrormaker-0_172.17.0.1/172.17.0.1
mirrormaker, local.distribution.alert, 0, unknown, 0, unknown,
mirrormaker-0_172.17.0.1/172.17.0.1
mirrormaker, local.general.example, 0, unknown, 0, unknown,
mirrormaker-0_172.17.0.1/172.17.0.1

On Wed, May 18, 2016 at 2:36 PM Christian Posta 
wrote:

> Maybe give it a try with the kafka-consumer-groups.sh tool and the
> --new-consumer flag:
>
>
>
> On Wed, May 18, 2016 at 3:40 AM, Sathyakumar Seshachalam <
> sathyakumar_seshacha...@trimble.com> wrote:
>
> > The command line tool that comes with Kafka one seems to check offsets in
> > zookeeper. This does not seem to be related to this issue<
> > https://issues.apache.org/jira/browse/KAFKA-1951> - As I am sure the
> > offsets are stored in Kafka for burrow seem to retrieve it properly.
> > Is there any easier way to check the current offset for a given consumer
> > group in Kafka – considering that different consumers within the group
> > commit to either kafka or Zk ?
> >
> > Regards
> > Sathya,
> >
>
>
>
> --
> *Christian Posta*
> twitter: @christianposta
> http://www.christianposta.com/blog
> http://fabric8.io
>


Re: Migrating Kafka from old VMs to new VMs in a different Cluster

2016-05-12 Thread Gerard Klijs
Depends on your use case but I guess something like this:
- Install al fresh on the new VM's
- Start a mirror maker in the the new VM's to copy data from the old ones
- Be sure it's working right
- Shut down the old VM's and start using the new ones

The last step is the trickiest and depends a lot on the setup, like how the
clients connect to kafka and if you can just shut the client down and up
again. There might also be an issue with consumers getting all the data
again.

On Wed, May 11, 2016 at 9:24 PM Abhinav Damarapati 
wrote:

> Hello Everyone,
>
>
> We have Kafka brokers, Zookeepers and Mirror-makers running on old Virtual
> Machines. We need to migrate all of this to brand new VMs on a different
> DataCenter and bring down the old VMs. Is this possible? If so, please
> suggest a way to do it.
>
>
> Best,
>
> Abhinav
>


Re: Failing between mirrored clusters

2016-05-11 Thread Gerard Klijs
I don't think it's possible since the offsets of both clusters can be
different, you don't know if it will work correctly. When I used the mirror
maker accidentally on the  __consumer_offsets topic it also gave some
errors, so I don't know if it's technically possible. A possible future
solution would be to use the timestamp, and store it in the consumer
itself, then on a reconnect use that value to go to the correct location.

On Wed, May 11, 2016 at 4:20 PM Ben Stopford  wrote:

> Hi
>
> I’m looking at failing-over from one cluster to another, connected via
> mirror maker, where the __consumer_offsets topic is also mirrored.
>
> In theory this should allow consumers to be restarted to point at the
> secondary cluster such that they resume from the same offset they reached
> in the primary cluster. Retries in MM will cause the offsets to diverge
> somewhat, which would in turn cause some reprocessing of messages on
> failover, but this should be a better option than resorting to the
> earliest/latest offset.
>
> Does anyone have experience doing this?
>
> Thanks
>
> Ben


Re: Backing up Kafka data and using it later?

2016-05-11 Thread Gerard Klijs
You could create a docker image with a kafka installation, and start a
mirror maker in it, you could set the retention time for it to infinite,
and mount the data volume. With the data you could always restart the
docker, en mirror it to somewhere else. Not sure that's what you want, but
it's an option to save data for use some other place/time.

On Wed, May 11, 2016 at 12:33 AM Alex Loddengaard  wrote:

> You may find this interesting, although I don't believe it's exactly what
> you're looking for:
>
> https://github.com/pinterest/secor
>
> I'm not sure how stable and commonly used it is.
>
> Additionally, I see a lot of users use MirrorMaker for a "backup," where
> MirrorMaker copies all topics from one Kafka cluster to another "backup"
> cluster. I put "backup" in quotes because this architecture doesn't support
> snapshotting like a traditional backup would. I realize this doesn't
> address your specific use case, but thought you may find it interesting
> regardless.
>
> Sorry I'm a little late to the thread, too.
>
> Alex
>
> On Thu, May 5, 2016 at 7:05 AM, Rad Gruchalski 
> wrote:
>
> > John,
> >
> > I’m not as expert expert in Kafka but I would assume so.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Best regards,
> > Radek Gruchalski
> > ra...@gruchalski.com (mailto:ra...@gruchalski.com) (mailto:
> > ra...@gruchalski.com)
> > de.linkedin.com/in/radgruchalski/ (
> > http://de.linkedin.com/in/radgruchalski/)
> >
> > Confidentiality:
> > This communication is intended for the above-named person and may be
> > confidential and/or legally privileged.
> > If it has come to you in error you must take no action based on it, nor
> > must you copy or show it to anyone; please delete/destroy and inform the
> > sender immediately.
> >
> >
> >
> > On Thursday, 5 May 2016 at 01:46, John Bickerstaff wrote:
> >
> > > Thanks - does that mean that the only way to safely back up Kafka is to
> > > have replication?
> > >
> > > (I have done this partially - I can get the entire topic on the command
> > > line, after completely recreating the server, but my code that is
> > intended
> > > to do the same thing just hangs)
> > >
> > > On Wed, May 4, 2016 at 3:18 PM, Rad Gruchalski  > (mailto:ra...@gruchalski.com)> wrote:
> > >
> > > > John,
> > > >
> > > > I believe you mean something along the lines of:
> > > > http://markmail.org/message/f7xb5okr3ujkplk4
> > > > I don’t think something like this has been done.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Best regards,
> > > > Radek Gruchalski
> > > > ra...@gruchalski.com (mailto:ra...@gruchalski.com) (mailto:
> > > > ra...@gruchalski.com (mailto:ra...@gruchalski.com))
> > > > de.linkedin.com/in/radgruchalski/ (
> > http://de.linkedin.com/in/radgruchalski/) (
> > > > http://de.linkedin.com/in/radgruchalski/)
> > > >
> > > > Confidentiality:
> > > > This communication is intended for the above-named person and may be
> > > > confidential and/or legally privileged.
> > > > If it has come to you in error you must take no action based on it,
> nor
> > > > must you copy or show it to anyone; please delete/destroy and inform
> > the
> > > > sender immediately.
> > > >
> > > >
> > > >
> > > > On Wednesday, 4 May 2016 at 23:04, John Bickerstaff wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have what is probably an edge use case. I'd like to back up a
> > single
> > > > > Kafka instance such that I can recreate a new server, drop Kafka
> in,
> > drop
> > > > > the data in, start Kafka -- and have all my data ready to go again
> > for
> > > > > consumers.
> > > > >
> > > > > Is such a thing done? Does anyone have any experience trying this?
> > > > >
> > > > > I have, and I've run into some problems which suggest there's a
> > setting
> > > > or
> > > > > some other thing I'm unaware of...
> > > > >
> > > > > If you like, don't think of it as a backup problem so much as a
> > "cloning"
> > > > > problem. I want to clone a new Kafka machine without actually
> > cloning it
> > > > >
> > > >
> > > > -
> > > > > I.E. the data is somewhere else (log and index files) although
> > Zookeeper
> > > >
> > > > is
> > > > > up and running just fine.
> > > > >
> > > > > Thanks
> >
> >
>


Re: Kafka 9 version offset storage mechanism changes

2016-05-09 Thread Gerard Klijs
Both are possible, but the 'new' consumer stores the offset in an __offset
topic.

On Tue, May 10, 2016 at 7:07 AM Snehalata Nagaje <
snehalata.nag...@harbingergroup.com> wrote:

>
>
> Hi All,
>
>
> As per kafka 9 version, where does kafka store committed offset?
>
> is it in zookeeper or kafka broker?
>
> Also there is option to use offset storage outside kafka, does it mean ,
> kafka will not depend on zookeepr for offset.
>
> Thanks,
> snehalata
>


Re: How to define multiple serializers in kafka?

2016-05-03 Thread Gerard Klijs
Then you're probably best of using the confluent schema registry, you can
then use the io.confluent.kafka.serializers.KafkaAvroDeserializer for the
client with KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG="true" to
get back the object, deserialized with the same version of the schema the
object was sent with.

On Tue, May 3, 2016 at 12:06 PM Ratha v <vijayara...@gmail.com> wrote:

> I plan to use different topics for each type of object. (number of object
> types= number of topics)..
> So, I need deserializers/serializers= topics = number of objects.
>
> What would be the better way to achieve this?
>
> On 3 May 2016 at 18:20, Gerard Klijs <gerard.kl...@dizzit.com> wrote:
>
> > If you put them in one topic, you will need one
> > 'master' serializer/deserializers which can handle all the formats.
> > I don't know how you would like to use Avro schemas, the confluent schema
> > registry is by default configured to handle one schema at a time for one
> > topic, but you could configure it to use multiple non-compatible schema's
> > in one topic. Each object will be saved with a schema id, making it
> > possible to get back the original object.
> >
> > On Tue, May 3, 2016 at 1:52 AM Ratha v <vijayara...@gmail.com> wrote:
> >
> > > What is the best way for this? Do we need to have common
> > > serializer/deserializer for all type of the objects we publish? OR
> > seperate
> > > for each objects?
> > > If we have seperate serializer/deserializers, then how can I configure
> > > kafka?
> > > Or Is it recommended to use Avro schemas?
> > >
> > > Thanks
> > >
> > > On 2 May 2016 at 18:43, Gerard Klijs <gerard.kl...@dizzit.com> wrote:
> > >
> > > > I think by design it would be better to put different kind of
> messages
> > > in a
> > > > different topic. But if you would want to mix you can make your own
> > > > serializer/deserializer you could append a 'magic byte' to the byes
> you
> > > get
> > > > after you serialize, to be able to deserialize using the correct
> > methods.
> > > > The custom serializer would always return an Object, which you could
> > cast
> > > > when needed in the poll loop of the consumer. I think this is de
> > > > cleanest/best way, but maybe someone has a different idea?
> > > >
> > > > On Mon, May 2, 2016 at 7:54 AM Ratha v <vijayara...@gmail.com>
> wrote:
> > > >
> > > > > Hi all;
> > > > >
> > > > > Say, I publish and consume different type of java objects.For each
> I
> > > have
> > > > > to define own serializer implementations. How can we provide all
> > > > > implementations in the kafka consumer/producer properties file
> under
> > > the
> > > > > "serializer.class" property?
> > > > >
> > > > >
> > > > > --
> > > > > -Ratha
> > > > > http://vvratha.blogspot.com/
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -Ratha
> > > http://vvratha.blogspot.com/
> > >
> >
>
>
>
> --
> -Ratha
> http://vvratha.blogspot.com/
>


Re: How to define multiple serializers in kafka?

2016-05-03 Thread Gerard Klijs
If you put them in one topic, you will need one
'master' serializer/deserializers which can handle all the formats.
I don't know how you would like to use Avro schemas, the confluent schema
registry is by default configured to handle one schema at a time for one
topic, but you could configure it to use multiple non-compatible schema's
in one topic. Each object will be saved with a schema id, making it
possible to get back the original object.

On Tue, May 3, 2016 at 1:52 AM Ratha v <vijayara...@gmail.com> wrote:

> What is the best way for this? Do we need to have common
> serializer/deserializer for all type of the objects we publish? OR seperate
> for each objects?
> If we have seperate serializer/deserializers, then how can I configure
> kafka?
> Or Is it recommended to use Avro schemas?
>
> Thanks
>
> On 2 May 2016 at 18:43, Gerard Klijs <gerard.kl...@dizzit.com> wrote:
>
> > I think by design it would be better to put different kind of messages
> in a
> > different topic. But if you would want to mix you can make your own
> > serializer/deserializer you could append a 'magic byte' to the byes you
> get
> > after you serialize, to be able to deserialize using the correct methods.
> > The custom serializer would always return an Object, which you could cast
> > when needed in the poll loop of the consumer. I think this is de
> > cleanest/best way, but maybe someone has a different idea?
> >
> > On Mon, May 2, 2016 at 7:54 AM Ratha v <vijayara...@gmail.com> wrote:
> >
> > > Hi all;
> > >
> > > Say, I publish and consume different type of java objects.For each I
> have
> > > to define own serializer implementations. How can we provide all
> > > implementations in the kafka consumer/producer properties file under
> the
> > > "serializer.class" property?
> > >
> > >
> > > --
> > > -Ratha
> > > http://vvratha.blogspot.com/
> > >
> >
>
>
>
> --
> -Ratha
> http://vvratha.blogspot.com/
>


Re: kafka 0.9 offset unknown after cleanup

2016-05-03 Thread Gerard Klijs
Looks like it, you need to be sure the offset topic is using compaction,
and the broker is set to enable compaction.

On Tue, May 3, 2016 at 9:56 AM Jun MA  wrote:

> Hi,
>
> I’m using 0.9.0.1 new-consumer api. I noticed that after kafka cleans up
> all old log segments(reach delete.retention time), I got unknown offset.
>
> bin/kafka-consumer-groups.sh --bootstrap-server server:9092 --new-consumer
> --group testGroup --describe
> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
> testGroup, test, 0, unknown, 49, unknown, consumer-1_/10.32.241.2
> testGroup, test, 1, unknown, 61, unknown, consumer-1_/10.32.241.2
>
> In this situation, I cannot consume anything using new-consumer java
> driver if I disable auto-commit.
> I think this happens because new-consumer driver stores offset in broker
> as a topic(not in zookeeper), and after reaching delete.retention time, it
> got deleted and becomes unknown. And since I disabled auto-commit, it can
> never know where it is, then it cannot consume anything.
>
> Is this what happened here? What should I do in this situation?
>
> Thanks,
> Jun


Re: How to define multiple serializers in kafka?

2016-05-02 Thread Gerard Klijs
I think by design it would be better to put different kind of messages in a
different topic. But if you would want to mix you can make your own
serializer/deserializer you could append a 'magic byte' to the byes you get
after you serialize, to be able to deserialize using the correct methods.
The custom serializer would always return an Object, which you could cast
when needed in the poll loop of the consumer. I think this is de
cleanest/best way, but maybe someone has a different idea?

On Mon, May 2, 2016 at 7:54 AM Ratha v  wrote:

> Hi all;
>
> Say, I publish and consume different type of java objects.For each I have
> to define own serializer implementations. How can we provide all
> implementations in the kafka consumer/producer properties file under the
> "serializer.class" property?
>
>
> --
> -Ratha
> http://vvratha.blogspot.com/
>


Re: Filter plugins in Kafka

2016-04-29 Thread Gerard Klijs
Using kafka streams is one way, I used camel before with kafka, which also
has a nice way of using filters.

On Fri, Apr 29, 2016 at 1:51 PM Subramanian Karunanithi 
wrote:

> Hi,
>
> When a stream of data passes through Kafka, wanted to apply the filter and
> then let that message pass through to partitions.
>
> Regards,
> Subramanian. K
> On Apr 26, 2016 12:33, "Marko Bonaći"  wrote:
>
> > Instantly reminded me of Streams API, where you can use Java8 streams
> > semantics (filter being one of them) to do the first thing in Gouzhang's
> > response (filter messages from one topic into another - I assume that's
> > what you were looking for).
> >
> > Marko Bonaći
> > Monitoring | Alerting | Anomaly Detection | Centralized Log Management
> > Solr & Elasticsearch Support
> > Sematext  | Contact
> > 
> >
> > On Tue, Apr 26, 2016 at 6:22 PM, Guozhang Wang 
> wrote:
> >
> > > Hi Subramanian,
> > >
> > > Could you elaborate a bit more on "filtering"? Do you want to read raw
> > data
> > > from Kafka, and send the filtered data back to Kafka as a separate
> topic,
> > > or do you want to read raw data from an external service and send the
> > > filtered data into Kafka?
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Tue, Apr 26, 2016 at 8:07 AM, Subramanian Karunanithi <
> > > subub...@gmail.com
> > > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Do we have any plugin which is available, which can be used as a
> > > filtering
> > > > mechanism on the data it's working on?
> > > >
> > > > Regards,
> > > > Subramanian. K
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>


Re: Kafka Monitoring using JMX Mbeans

2016-04-25 Thread Gerard Klijs
We used a dockerized zabbix, one of the advantages of zabbix is that it
has, jmx readout, creation of items, graphs, alerts in one product. Also
how long to keep the history can be set for each item. The interface is not
very intuitive though.

On Mon, Apr 25, 2016 at 10:14 AM Mudit Kumar  wrote:

> Hi,
>
> Have anyone setup any monitoring using Mbeans ?What kind of command line
> tools been used?
>
> Thanks,
> Mudit


Re: ClientId and groups recommendation

2016-04-19 Thread Gerard Klijs
As far as I know the cientId is only used for logging, so you could set it
to whatever is most usefull in the logging. You might for example want to
use the ip as the id, so when you get errors you know where to look.

On Tue, Apr 19, 2016 at 6:51 PM Rick Rineholt  wrote:

> Hi,
> If I have multiple consumers in a consumer group for load handling for the
> same application is there any recommendation if the clientId should all be
> unique for each?   It's the same application.   Each will have it's own
> consumer memberId  given on the join group so they can always be
> distinguished by that.
>
> Thanks
>


Re: Console consumer group id question

2016-04-14 Thread Gerard Klijs
The options can only be used to set "The properties to initialize the
message formatter." You have several options, you could use different
properties fils, with only the group.id being different. Another option is
use a .properties.template, with a  group.id=, and with a batch
script first set the group.id and create a new .properties file, and then
call the kafka-console-consumer.sh. If your server is always the same you
would only have to pass the topic and the group to your script.

On Wed, Apr 13, 2016 at 5:20 PM Greg Hill  wrote:

> So, I know I can put group.id in the consumer.config file, but I would
> like to reuse the same config file for multiple groups in testing.  I
> *thought* this would work:
>
> kafka-console-consumer.sh --bootstrap-server  --new-consumer
> --consumer.config  --topic  --property
> group.id=
>
> That doesn't produce an error, but it also ignores the group.id property
> and generates a group id, which doesn't have access to the topic I'm
> testing against.
>
> So, *should* that work?  If not, why not?  Maybe it's just a bug?
>
> Also, why is --bootstrap-server required when I have it set in the config
> file?  The console producer has a similar issue that --broker-list is
> required despite bootstrap.servers being in the config file.
>
> Thanks in advance.
>
> Greg
>


Re: KafkaProducer Retries in .9.0.1

2016-04-06 Thread Gerard Klijs
Is it an option to set up a cluster and kill the leader? That's the way we
checked retries and at if we would not lose messages that way.
The sending to Kafka goes in two parts, some serialization etc, before an
attempt is made to really send the binary message, and the actual sending.
I'm not sure, but assume checking the size is part of the first step.

On Thu, Apr 7, 2016, 05:15 christopher palm  wrote:

> Hi Thanks for the suggestion.
> I lowered the broker message.max.bytes to be smaller than my payload so
> that I now receive an
> org.apache.kafka.common.errors.RecordTooLargeException
> :
>
> I still don't see the retries happening, the default back off is 100ms, and
> my producer loops for a few seconds, long enough to trigger the retry.
>
> Is there something else I need to set?
>
> I have tried this with a sync and async producer both with same results
>
> Thanks,
>
> Chris
>
> On Wed, Apr 6, 2016 at 12:01 AM, Manikumar Reddy <
> manikumar.re...@gmail.com>
> wrote:
>
> > Hi,
> >
> >  Producer message size validation checks ("buffer.memory",
> > "max.request.size" )  happens before
> >  batching and sending messages.  Retry mechanism is applicable for broker
> > side errors and network errors.
> > Try changing "message.max.bytes" broker config property for simulating
> > broker side error.
> >
> >
> >
> >
> >
> >
> > On Wed, Apr 6, 2016 at 9:53 AM, christopher palm 
> wrote:
> >
> > > Hi All,
> > >
> > > I am working with the KafkaProducer using the properties below,
> > > so that the producer keeps trying to send upon failure on Kafka .9.0.1.
> > > I am forcing a failure by setting my buffersize smaller than my
> > > payload,which causes the expected exception below.
> > >
> > > I don't see the producer retry to send on receiving this failure.
> > >
> > > Am I  missing something in the configuration to allow the producer to
> > retry
> > > on failed sends?
> > >
> > > Thanks,
> > > Chris
> > >
> > > .java.util.concurrent.ExecutionException:
> > > org.apache.kafka.common.errors.RecordTooLargeException: The message is
> > 8027
> > > bytes when serialized which is larger than the total memory buffer you
> > have
> > > configured with the buffer.memory configuration.
> > >
> > >  props.put("bootstrap.servers", bootStrapServers);
> > >
> > > props.put("acks", "all");
> > >
> > > props.put("retries", 3);//Try for 3 strikes
> > >
> > > props.put("batch.size", batchSize);//Need to see if this number should
> > > increase under load
> > >
> > > props.put("linger.ms", 1);//After 1 ms fire the batch even if the
> batch
> > > isn't full.
> > >
> > > props.put("buffer.memory", buffMemorySize);
> > >
> > > props.put("max.block.ms",500);
> > >
> > > props.put("max.in.flight.requests.per.connection", 1);
> > >
> > > props.put("key.serializer",
> > > "org.apache.kafka.common.serialization.StringSerializer");
> > >
> > > props.put("value.serializer",
> > > "org.apache.kafka.common.serialization.ByteArraySerializer");
> > >
> >
>


Re: Does kafka version 0.9.0x use zookeeper?

2016-04-03 Thread Gerard Klijs
Yes, but only via the broker you connect to.

On Mon, Apr 4, 2016, 07:10 Ratha v  wrote:

> I'm not seeing such parameter as an input for consumer.
>
> Does version 0.9.x use zookeeper?
>
> --
> -Ratha
> http://vvratha.blogspot.com/
>


Re: dumping JMX data

2016-03-31 Thread Gerard Klijs
Don't know if adding it to Kafka is a good thing. I assume you need some
java opts settings for it to work, and with other solutions these would be
different. It could be enabled with an option off course, then it's not in
the way if you use something else.
We use zabbix, this is a single tool which can be used to read in jmx data,
store the data for a certain time, configurable for each item, and create
triggers and graphs for those.

To see and copy jmx items we use the Oracle Java mission control, it has a
tab with info on each jmx item, which can be copied to clipboard.

On Fri, Apr 1, 2016, 02:03 Sean Clemmer  wrote:

> Another +1 for Jolokia. We've got a pretty cool setup here that deploys
> Jolokia alongside Kafka, and we wrote a small Sensu plugin to grab all the
> stats from Jolokia's JSON API and reformat them for Graphite.
>
> On Thu, Mar 31, 2016 at 4:36 PM, craig w  wrote:
>
> > Including jolokia would be great, I've used for kafka and it worked well.
> > On Mar 31, 2016 6:54 PM, "Christian Posta" 
> > wrote:
> >
> > > What if we added something like this to Kafka? https://jolokia.org
> > > I've added a JIRA to do that, just haven't gotten to it yet. Will soon
> > > though, especially if it'd be useful for others.
> > >
> > > https://issues.apache.org/jira/browse/KAFKA-3377
> > >
> > > On Thu, Mar 31, 2016 at 2:55 PM, David Sidlo 
> > wrote:
> > >
> > > > The Kafka JmxTool works fine although it is not user friendly, in
> that
> > > you
> > > > cannot perform a query of the Kafka Server mbeans to determine
> content
> > > and
> > > > to determine the path-string that you need to place into the
> > -object-name
> > > > option.
> > > >
> > > > Here's how I solved the problem...
> > > >
> > > > First, make sure that Kafka is running with jms options enabled...
> > > >
> > > > -  The following opens up the jxm port with no authentication
> > > (for
> > > > testing)...
> > > > -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.port=
> > > > -Dcom.sun.management.jmxremote.ssl=false
> > > > -Dcom.sun.management.jmxremote.authenticate=false
> > > >
> > > > Second, get jstatd running on the same server so that you can use
> > > VisualVM
> > > > to look into what is going on inside.
> > > >
> > > > Then, use VisualVM along with its jmx-plugin to view the mbeans and
> > > > contents.
> > > >
> > > > When using VisualVM, you will first connect to jstatd, then you have
> to
> > > > right click on the host to request a JMX connection, where you will
> > get a
> > > > dialog, where you have to add the jmx port number to the host name
> > (after
> > > > the colon).
> > > >
> > > > When you view the mbeans, you will see the path to the given
> attribute
> > if
> > > > you hover over it with the mouse pointer, but only for a moment...
> The
> > > > string can be quite long and... Unfortunately, you don't have the
> > option
> > > to
> > > > capture that string into the cut-buffer via the interface. I used a
> > > screen
> > > > capture utility to capture the string and typed it into the terminal.
> > > >
> > > > So here is what my first working query looked like...
> > > >
> > > > /opt/kafka/kafka/bin/kafka-run-class.sh kafka.tools.JmxTool
> > --object-name
> > > >
> > >
> >
> "kafka.consumer:type=FetchRequestAndResponseMetrics,name=FetchRequestRateAndTimeMs,clientId=ReplicaFetcherThread-0-5,brokerHost=
> > > > hostname05.cluster.com,brokerPort=9092" --jmx-url
> > > > service:jmx:rmi:///jndi/rmi://`hostname`:/jmxrmi
> > > >
> > > > That command will output all of the attributes for the given
> > object-name,
> > > > and it's a long object-name.
> > > > With some experimentation, I found that you can use wild-cards on
> > > portions
> > > > of the object-name that were clearly dynamic, such as the host-name
> or
> > > the
> > > > thread-name. So you can get the attribute for all similar objects by
> > > using
> > > > the following query...
> > > >
> > > > /opt/kafka/kafka/bin/kafka-run-class.sh kafka.tools.JmxTool
> > --object-name
> > > >
> > >
> >
> "kafka.consumer:type=FetchRequestAndResponseMetrics,name=FetchRequestRateAndTimeMs,clientId=ReplicaFetcherThread*,brokerHost=hostname*.
> > > > cluster.com,brokerPort=*" --jmx-url
> > > > service:jmx:rmi:///jndi/rmi://`hostname`:/jmxrmi
> > > >
> > > > There may be simpler tools to use, but I really do like the GUI
> > goodness
> > > > in VisualVM, (and it is free).
> > > >
> > > > I hope that helps.
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > *Christian Posta*
> > > twitter: @christianposta
> > > http://www.christianposta.com/blog
> > > http://fabric8.io
> > >
> >
>


Re: Java API for kafka-acls.sh

2016-03-31 Thread Gerard Klijs
You could check what it does, and do that instead of relying in the script.
It runs the kafka.admin.AclCommand class with some properties, and sets
some jvm settings.

On Thu, Mar 31, 2016 at 4:36 PM Kalpesh Jadhav <
kalpesh.jad...@citiustech.com> wrote:

> Hi,
>
> Is there any java api available to give access to kafka topic??
>
> As we does through kafka-acls.sh.
> Just wanted to run below command through java api.
>
> kafka-acls.sh --add --allow-principals user:ctadmin --operation ALL
> --topic marchTesting --authorizer-properties
> zookeeper.connect={hostname}:2181
>
> 
> Kalpesh Jadhav
> Sr. Software Engineer | Development
> CitiusTech Inc.
> www.citiustech.com
>
>
>
>
>
>
>
>


Re: Question about 'key'

2016-03-30 Thread Gerard Klijs
If you don't specify the partition, and do have a key, then the default
behaviour is to use a hash on the key to determine the partition. This to
make sure the messages with the same key and up on the same partition. This
helps to ensure ordering relative to the key/partition. Also when using
compaction instead of delete as cleanup policy the newest messages with the
same key are kept. This is also used for the internal __offset topic.

On Thu, Mar 31, 2016 at 4:32 AM Sharninder  wrote:

> The documentation says that the only purpose of the "key" is to decide the
> partition the data ends up in. The consumer doesn't decide that. I'll have
> to look at the documentation but I'm not entirely sure if the consumers
> have access to this key. The producer does. You can override the default
> partitioner class and write one that uses your understands and interprets
> your definition of the key to place data in a specific partition. By
> default, I believe data is distributed using a round robin partitioner.
>
>
>
> On Thu, Mar 31, 2016 at 2:58 AM, Marcelo Oikawa <
> marcelo.oik...@webradar.com
> > wrote:
>
> > Hi, list.
> >
> > We're working on a project that uses Kafka and we notice that for every
> > message we have a key (or null). I searched for more info about the key
> > itself and the documentation says that it is only used to decide the
> > partition where the message is placed.
> >
> > Is there a problem if we use keys with the application semantics
> > (metadata)? For instance, we can use the key "origin:foo;target:boo" and
> > the consumers may use the key info to make decisions. But, a lot of
> > messages may use the same key and it may produce unbalanced partitions,
> is
> > that right?
> >
> > Does anyone know more about the key and your role inside kafka?
> >
> > []s
> >
>
>
>
> --
> --
> Sharninder
>


Re: Security with SSL and not Kerberos?

2016-03-23 Thread Gerard Klijs
The super user is indeed for the broker to be able to do all the things it
needs to do. For consumers and producers you can set the correct rights
with the acl tool. http://kafka.apache.org/documentation.html#security_authz

On Tue, Mar 22, 2016 at 8:28 PM christopher palm  wrote:

> Hi Ismael,
>
> Ok I got the basic authentication/ACL authorization for SSL working with
> the principal  Kafka.example.com
>
> If that principal isn't in the server.properties as a super user, I was
> seeing errors on broker startup.
>
> In order to add new principals, the server.properties has to be updated and
> that principal user added to the super user group?
>
> How to I run the kafka producer/consumer as a different principal other
> than Kafka.example.com?
>
> Thanks,
> Chris
>
> On Mon, Mar 21, 2016 at 6:54 PM, Ismael Juma  wrote:
>
> > Hi Gopal,
> >
> > As you suspected, you have to set the appropriate ACLs for it to work.
> The
> > following will make the producer work:
> >
> > kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 \
> > --add --allow-principal
> > "User:CN=kafka.example.com
> ,OU=Client,O=Confluent,L=London,ST=London,C=GB"
> > \
> > --producer --topic securing-kafka
> >
> > The following will make the consumer work:
> >
> > kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 \
> > --add --allow-principal
> > "User:CN=kafka.example.com
> ,OU=Client,O=Confluent,L=London,ST=London,C=GB"
> > \
> > --consumer --topic securing-kafka --group securing-kafka-group
> >
> > Enabling the authorizer log is a good way to figure out the principal if
> > you don't know it.
> >
> > Hope this helps,
> > Ismael
> >
> > On Mon, Mar 21, 2016 at 10:27 PM, Raghavan, Gopal <
> gopal.ragha...@here.com
> > >
> > wrote:
> >
> > > >Hi Christopher,
> > >
> > > >On Mon, Mar 21, 2016 at 3:53 PM, christopher palm 
> > > wrote:
> > >
> > > >> Does Kafka support SSL authentication and ACL authorization without
> > > >> Kerberos?
> > > >>
> > >
> > > >Yes. The following branch modifies the blog example slightly to only
> > allow
> > > >SSL authentication.
> > >
> > > >https://github.com/confluentinc/securing-kafka-blog/tree/ssl-only
> > >
> > > >If so, can different clients have their own SSL certificate on the
> same
> > > >> broker?
> > > >>
> > >
> > > >Yes.
> > >
> > >
> > >
> > > I tried the “ssl-only” branch but am getting the following error:
> > >
> > > [vagrant@kafka ~]$ kafka-console-producer --broker-list
> > > kafka.example.com:9093 --topic securing-kafka --producer.config
> > > /etc/kafka/producer_ssl.properties
> > >
> > >
> > >
> > >
> > > test
> > >
> > >
> > >
> > >
> > > [2016-03-21 22:08:46,744] WARN Error while fetching metadata with
> > > correlation id 0 : {securing-kafka=TOPIC_AUTHORIZATION_FAILED}
> > > (org.apache.kafka.clients.NetworkClient)
> > >
> > >
> > >
> > >
> > > [2016-03-21 22:08:46,748] ERROR Error when sending message to topic
> > > securing-kafka with key: null, value: 4 bytes with error: Not
> authorized
> > to
> > > access topics: [securing-kafka]
> > > (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> > >
> > >
> > >
> > >
> > > I did not set topic level ACL, since I do not know the Principal name
> to
> > > use for --allow-principal parameter of kafka-acls
> > >
> > >
> > > Any suggestions ?
> > >
> > >
> > > >In reading the following security article, it seems that Kerberos is
> an
> > > >> option but not required if SSL is used.
> > > >>
> > >
> > > >That's right.
> > >
> > > >Ismael
> > >
> >
>


Re: How to publish/consume java bean objects to Kafka 2.11 version?

2016-03-22 Thread Gerard Klijs
If I'm reading right, your question is more about how to successfully
de(serialise) java object? You might want to take a look at the confluent
avro schema registry. Using avro schema's you can easily store messages in
a java object created by the schema. This way the messages will also be a
lot smaller, witch helps performance. And you don't have to maintain you
own de(serialiser).

On Tue, Mar 22, 2016 at 3:38 AM Ratha v  wrote:

> Hi all;
> Im a newbie to kafka. Im trying to publish my java object to kafka topic an
> try to consume.
> I see there are some API changes in the latest version of the kafka. can
> anybody point some samples for how to publish and consume java objects? I
> have written my own data serializer, but could not publish that to a topic.
> Any guide/samples would be appreciate..
>
>
> *Customserilaizer*
>
>
>
> import java.io.ByteArrayInputStream;
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.io.ObjectInput;
> import java.io.ObjectInputStream;
> import java.io.ObjectOutput;
> import java.io.ObjectOutputStream;
>
>
> import kafka.serializer.Decoder;
> import kafka.serializer.Encoder;
>
> public class CustomSerializer implements Encoder,
> Decoder< FileObj > {
>
> @Override
> public byte[] toBytes(FileObj file) {
> try {
>
> ByteArrayOutputStream bos = new ByteArrayOutputStream();
> ObjectOutput out = null;
> byte[] rawFileBytes;
> try {
> out = new ObjectOutputStream(bos);
> out.writeObject(file);
> rawFileBytes = bos.toByteArray();
>
> } finally {
> try {
> if (out != null) {
> out.close();
> bos.close();
> }
> } catch (Exception ex) {
> ex.getLocalizedMessage();
> }
>
> }
> return rawFileBytes;
> } catch (IOException e) {
> e.printStackTrace();
> return null;
> }
>
> }
>
> @Override
> public FileObj fromBytes(byte[] fileContent) {
> ByteArrayInputStream bis = new ByteArrayInputStream(fileContent);
> ObjectInput in = null;
> Object obj = null;
> try {
> in = new ObjectInputStream(bis);
> obj = in.readObject();
>
> } catch (IOException e) {
>
> e.printStackTrace();
> } catch (ClassNotFoundException e) {
>
> e.printStackTrace();
> } finally {
> try {
> bis.close();
> if (in != null) {
> in.close();
> }
> } catch (IOException ex) {
> // ignore
> }
>
> }
> return (FileObj) obj;
> }
>
> }
>
>
>
> -Ratha
>
> http://vvratha.blogspot.com/
>


Re: Security with SSL and not Kerberos?

2016-03-22 Thread Gerard Klijs
I only have experience with option 1. In this case it's simple. You provide
the location of the keystore in the properties, so you can use multiple
certificates for multiple clients. If you like this could even be in the
same application.

On Tue, Mar 22, 2016 at 3:13 AM Raghavan, Gopal 
wrote:

> Hi Ismael,
>
> Thanks for clarifying this with the example.
> I tried it and it worked as you have described below !
>
> I have a follow up question:
>
> Producer (PR) and Consumer (CO) are running on two different Clients and
> talking to broker (BR)
>
> Goal: Multiple Principals (P1 .. Pn) should be able to access PR in order
> to write/read to some topics (T1 .. Tn).
>
> I have a mapping like Px -> Tx.
>
> For example:
> Principal P1 has “Write” permission on topics T1, T2
> Principal P2 has “Write” permission on topics T2, T9
>
>
> If I do kafka-acls --list, it should look like:
> Current ACLs for resource `Topic:T1`:
> User:P1 has Allow permission for operations: Write from hosts: *
> Current ACLs for resource `Topic:T2:
> User:P1 has Allow permission for operations: Write from hosts: *
> User:P2 has Allow permission for operations: Write from hosts: *
> Current ACLs for resource `Topic:T9:
> User:P2 has Allow permission for operations: Write from hosts: *
>
>
>
>
>
>
> How should I build such topic level access control per principal.
>
> Option 1: One Certificate per Principal
> * Truststore of a client stores all the certificates corresponding to each
> Principal
> * But, how will the producer client dynamically pick a certificate based
> on the principal.
>
> Option 2: One Kerberos user per Principal
> * In long-running process (useKeyTab=true and not useTicketCache=true),
> the keyTab and principal is very specific to one user (Principal).
> * How to make this more dynamic to support users getting authenticated
> over a REST API ?
>
> If you have any examples or pointers that would be great.
>
> Thanks,
> —
> Gopal
>
>
>
>
>
>
>
> On 3/21/16, 7:54 PM, "isma...@gmail.com on behalf of Ismael Juma" <
> isma...@gmail.com on behalf of ism...@juma.me.uk> wrote:
>
> >Hi Gopal,
> >
> >As you suspected, you have to set the appropriate ACLs for it to work. The
> >following will make the producer work:
> >
> >kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 \
> >--add --allow-principal
> >"User:CN=kafka.example.com,OU=Client,O=Confluent,L=London,ST=London,C=GB"
> >\
> >--producer --topic securing-kafka
> >
> >The following will make the consumer work:
> >
> >kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 \
> >--add --allow-principal
> >"User:CN=kafka.example.com,OU=Client,O=Confluent,L=London,ST=London,C=GB"
> >\
> >--consumer --topic securing-kafka --group securing-kafka-group
> >
> >Enabling the authorizer log is a good way to figure out the principal if
> >you don't know it.
> >
> >Hope this helps,
> >Ismael
> >
> >On Mon, Mar 21, 2016 at 10:27 PM, Raghavan, Gopal <
> gopal.ragha...@here.com>
> >wrote:
> >
> >> >Hi Christopher,
> >>
> >> >On Mon, Mar 21, 2016 at 3:53 PM, christopher palm 
> >> wrote:
> >>
> >> >> Does Kafka support SSL authentication and ACL authorization without
> >> >> Kerberos?
> >> >>
> >>
> >> >Yes. The following branch modifies the blog example slightly to only
> allow
> >> >SSL authentication.
> >>
> >> >https://github.com/confluentinc/securing-kafka-blog/tree/ssl-only
> >>
> >> >If so, can different clients have their own SSL certificate on the same
> >> >> broker?
> >> >>
> >>
> >> >Yes.
> >>
> >>
> >>
> >> I tried the “ssl-only” branch but am getting the following error:
> >>
> >> [vagrant@kafka ~]$ kafka-console-producer --broker-list
> >> kafka.example.com:9093 --topic securing-kafka --producer.config
> >> /etc/kafka/producer_ssl.properties
> >>
> >>
> >>
> >>
> >> test
> >>
> >>
> >>
> >>
> >> [2016-03-21 22:08:46,744] WARN Error while fetching metadata with
> >> correlation id 0 : {securing-kafka=TOPIC_AUTHORIZATION_FAILED}
> >> (org.apache.kafka.clients.NetworkClient)
> >>
> >>
> >>
> >>
> >> [2016-03-21 22:08:46,748] ERROR Error when sending message to topic
> >> securing-kafka with key: null, value: 4 bytes with error: Not
> authorized to
> >> access topics: [securing-kafka]
> >> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> >>
> >>
> >>
> >>
> >> I did not set topic level ACL, since I do not know the Principal name to
> >> use for --allow-principal parameter of kafka-acls
> >>
> >>
> >> Any suggestions ?
> >>
> >>
> >> >In reading the following security article, it seems that Kerberos is an
> >> >> option but not required if SSL is used.
> >> >>
> >>
> >> >That's right.
> >>
> >> >Ismael
> >>
>


Re: Kafka LTS release

2016-03-22 Thread Gerard Klijs
There is also an IBM alternative,
https://console.ng.bluemix.net/catalog/services/message-hub, and there may
be others as well.

On Mon, Mar 21, 2016 at 8:40 PM Achanta Vamsi Subhash <
achanta.va...@flipkart.com> wrote:

> @Sam
> Not always. Not all LTS customers are cloudera/... customers. Do they
> actually backport the changes? If yes, an opensource LTS will actually help
> them.
>
> On Mon, Mar 21, 2016 at 7:47 PM, Sam Pegler <
> sam.peg...@infectiousmedia.com>
> wrote:
>
> > I would assume (maybe incorrectly) that users who were after a LTS style
> > release would instead be going for one of the commercial versions.
> > Clouderas for example is
> > https://cloudera.com/products/apache-hadoop/apache-kafka.html, they'll
> > then
> > manage patches and provide support for you?
> >
> > Sam Pegler
> >
> > Site Reliability Engineer
> >
> > T. +44(0) 07 562 867 486
> >
> > [image: Infectious Media]
> > 3-7 Herbal Hill / London / EC1R 5EJ
> > www.infectiousmedia.com
> >   [image: Infectious Media] <http://www.infectiousmedia.com/>
> > [image: Facebook] <http://www.facebook.com/infectiousmedia> [image:
> > Twitter]
> > <https://twitter.com/infectiousmedia> [image: LinkedIn]
> > <http://www.linkedin.com/company/infectious-media-ltd> [image: Youtube]
> > <http://www.youtube.com/user/InfectiousMediaLtd>
> >
> >
> > This email and any attachments are confidential and may also be
> privileged.
> > If you
> > are not the intended recipient, please notify the sender immediately, and
> > do not
> > disclose the contents to another person, use it for any purpose, or
> store,
> > or copy
> > the information in any medium. Please also destroy and delete the message
> > from
> > your computer.
> >
> >
> > On 21 March 2016 at 12:51, Achanta Vamsi Subhash <
> > achanta.va...@flipkart.com
> > > wrote:
> >
> > > Gerard,
> > >
> > > I think many people use Kafka just like any other stable software. The
> > > producer and consumer apis are mostly fixed now and many companies
> across
> > > the world are using it on production for critical use-cases. I think it
> > is
> > > already *expected *to work as per the theory and any bugs need to be
> > > patched. As there is no one patching the older releases and the
> companies
> > > refusing to upgrade due to the way enterprises work, can we somehow
> start
> > > towards an LTS release by treating 0.10.0.0 as the LTS release to start
> > > with?
> > >
> > > On Mon, Mar 21, 2016 at 4:49 PM, Gerard Klijs <gerard.kl...@dizzit.com
> >
> > > wrote:
> > >
> > > > I think Kafka at the moment is not mature enough to support a LTS
> > > release.
> > > > I think it will take a lot of effort to 'guarantee' a back-port will
> be
> > > > more safe to use in production then the new release. For example,
> when
> > > you
> > > > will manage the release of 0.9.0.2, with the fixes from 0.10.0.0, you
> > > need
> > > > to make sure all the 0.9.0.1 clients still work with it, and you
> don't
> > > > introduce new bugs by the partial merge.
> > > > I do think once there will be a 1.0.0.0 release it would be great to
> > > have a
> > > > lts release.
> > > >
> > > > On Mon, Mar 21, 2016 at 11:54 AM Achanta Vamsi Subhash <
> > > > achanta.va...@flipkart.com> wrote:
> > > >
> > > > > *bump*
> > > > >
> > > > > Any opinions on this?
> > > > >
> > > > > On Mon, Mar 14, 2016 at 4:37 PM, Achanta Vamsi Subhash <
> > > > > achanta.va...@flipkart.com> wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > We find that there are many releases of Kafka and not all the
> bugs
> > > are
> > > > > > back ported to the older releases. Can we have a LTS (Long Term
> > > > Support)
> > > > > > release which can be supported for 2 years with all the bugs
> > > > back-ported?
> > > > > >
> > > > > > This will be very helpful as during the last 2-3 releases, we
> often
> > > > have
> > > > > > the cases where the api of producers/consumers changes and the
> bugs
> > > are
> > > > > > only fixed in the newer components. Also, many people who are on
> > the
> > > > > older
> > > > > > versions treat the latest release of that series is the stable
> one
> > > and
> > > > > end
> > > > > > up with bugs in production.
> > > > > >
> > > > > > ​I can volunteer for the release management of the LTS release
> but
> > > as a
> > > > > > community, can we follow the rigour of back-porting the bug-fixes
> > to
> > > > the
> > > > > > LTS branch?​
> > > > > >
> > > > > > --
> > > > > > Regards
> > > > > > Vamsi Subhash
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards
> > > > > Vamsi Subhash
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards
> > > Vamsi Subhash
> > >
> >
>
>
>
> --
> Regards
> Vamsi Subhash
>


Re: Would Kafka streams be a good choice for a collaborative web app?

2016-03-21 Thread Gerard Klijs
Hi Mark,

I don't think it would be a good solution with the latencies to and from
the server your running from in mind. This is less of a problem is your app
is only mainly used in one region.

I recently went to a Firebase event, and it seems a lot more fitting. It
also allows the user to see it's own changes real-time, and provides
several authentication options, and has servers world-wide.

On Mon, Mar 21, 2016 at 7:53 AM Mark van Leeuwen  wrote:

> Hi,
>
> I'm soon to begin design and dev of a collaborative web app where
> changes made by one user should appear to other users in near real time.
>
> I'm new to Kafka, but having read a bit about Kafka streams I'm
> wondering if it would be a good solution. Change events produced by one
> user would be published to multiple consumer clients over a websocket,
> each having their own offset.
>
> Would this be viable?
>
> Are there any considerations I should be aware of?
>
> Thanks,
> Mark
>
>


Re: Kafka LTS release

2016-03-21 Thread Gerard Klijs
If everyone agrees about the the producer and consumer apis being fixed
enough, yes, but then we should not call it 0.10.0.0 but 1.0.0.0 in my
opinion. With streams just making their introduction I don't know if we are
there yet.

On Mon, Mar 21, 2016 at 1:51 PM Achanta Vamsi Subhash <
achanta.va...@flipkart.com> wrote:

> Gerard,
>
> I think many people use Kafka just like any other stable software. The
> producer and consumer apis are mostly fixed now and many companies across
> the world are using it on production for critical use-cases. I think it is
> already *expected *to work as per the theory and any bugs need to be
> patched. As there is no one patching the older releases and the companies
> refusing to upgrade due to the way enterprises work, can we somehow start
> towards an LTS release by treating 0.10.0.0 as the LTS release to start
> with?
>
> On Mon, Mar 21, 2016 at 4:49 PM, Gerard Klijs <gerard.kl...@dizzit.com>
> wrote:
>
> > I think Kafka at the moment is not mature enough to support a LTS
> release.
> > I think it will take a lot of effort to 'guarantee' a back-port will be
> > more safe to use in production then the new release. For example, when
> you
> > will manage the release of 0.9.0.2, with the fixes from 0.10.0.0, you
> need
> > to make sure all the 0.9.0.1 clients still work with it, and you don't
> > introduce new bugs by the partial merge.
> > I do think once there will be a 1.0.0.0 release it would be great to
> have a
> > lts release.
> >
> > On Mon, Mar 21, 2016 at 11:54 AM Achanta Vamsi Subhash <
> > achanta.va...@flipkart.com> wrote:
> >
> > > *bump*
> > >
> > > Any opinions on this?
> > >
> > > On Mon, Mar 14, 2016 at 4:37 PM, Achanta Vamsi Subhash <
> > > achanta.va...@flipkart.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > We find that there are many releases of Kafka and not all the bugs
> are
> > > > back ported to the older releases. Can we have a LTS (Long Term
> > Support)
> > > > release which can be supported for 2 years with all the bugs
> > back-ported?
> > > >
> > > > This will be very helpful as during the last 2-3 releases, we often
> > have
> > > > the cases where the api of producers/consumers changes and the bugs
> are
> > > > only fixed in the newer components. Also, many people who are on the
> > > older
> > > > versions treat the latest release of that series is the stable one
> and
> > > end
> > > > up with bugs in production.
> > > >
> > > > ​I can volunteer for the release management of the LTS release but
> as a
> > > > community, can we follow the rigour of back-porting the bug-fixes to
> > the
> > > > LTS branch?​
> > > >
> > > > --
> > > > Regards
> > > > Vamsi Subhash
> > > >
> > >
> > >
> > >
> > > --
> > > Regards
> > > Vamsi Subhash
> > >
> >
>
>
>
> --
> Regards
> Vamsi Subhash
>


Re: Kafka LTS release

2016-03-21 Thread Gerard Klijs
I think Kafka at the moment is not mature enough to support a LTS release.
I think it will take a lot of effort to 'guarantee' a back-port will be
more safe to use in production then the new release. For example, when you
will manage the release of 0.9.0.2, with the fixes from 0.10.0.0, you need
to make sure all the 0.9.0.1 clients still work with it, and you don't
introduce new bugs by the partial merge.
I do think once there will be a 1.0.0.0 release it would be great to have a
lts release.

On Mon, Mar 21, 2016 at 11:54 AM Achanta Vamsi Subhash <
achanta.va...@flipkart.com> wrote:

> *bump*
>
> Any opinions on this?
>
> On Mon, Mar 14, 2016 at 4:37 PM, Achanta Vamsi Subhash <
> achanta.va...@flipkart.com> wrote:
>
> > Hi all,
> >
> > We find that there are many releases of Kafka and not all the bugs are
> > back ported to the older releases. Can we have a LTS (Long Term Support)
> > release which can be supported for 2 years with all the bugs back-ported?
> >
> > This will be very helpful as during the last 2-3 releases, we often have
> > the cases where the api of producers/consumers changes and the bugs are
> > only fixed in the newer components. Also, many people who are on the
> older
> > versions treat the latest release of that series is the stable one and
> end
> > up with bugs in production.
> >
> > ​I can volunteer for the release management of the LTS release but as a
> > community, can we follow the rigour of back-porting the bug-fixes to the
> > LTS branch?​
> >
> > --
> > Regards
> > Vamsi Subhash
> >
>
>
>
> --
> Regards
> Vamsi Subhash
>


Re: How to get message count per topic?

2016-03-15 Thread Gerard Klijs
In addition, because we also use the acl, creating a lot of topics is
cumbersome. So in one of our tests I added an uuid to the message so I know
which message was produced for a certain test.

On Mon, Mar 14, 2016 at 11:15 PM Stevo Slavić  wrote:

> See
>
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
>
> Using metadata api one can get topic partitions and for each partition
> which broker is lead. Using offset api one can get partition size. Both
> apis are low level and to use them directly you would use SimpleConsumer.
>
> To achieve same you can also use new KafkaConsumer - instantiate one
> belonging to unique group, subscribe to topic(s), poll, seek to beggining,
> ask for position, then seek to end and ask for position.
>
> On Mon, Mar 14, 2016, 21:20 Grant Overby (groverby) 
> wrote:
>
> > What is the most direct way to get a message count per topic or per
> > partition?
> >
> > For context, this is to enable testing. We'd like to confirm with Kafka
> > that a certain number of messages have been written or that the number of
> > messages we processed is equal to the number received by Kafka.
> > [
> >
> http://www.cisco.com/web/europe/images/email/signature/est2014/logo_06.png?ct=1398192119726
> > ]
> >
> > Grant Overby
> > Software Engineer
> > Cisco.com
> > grove...@cisco.com
> > Mobile: 865 724 4910
> >
> >
> >
> >
> >
> >
> > [http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif] Think
> > before you print.
> >
> > This email may contain confidential and privileged material for the sole
> > use of the intended recipient. Any review, use, distribution or
> disclosure
> > by others is strictly prohibited. If you are not the intended recipient
> (or
> > authorized to receive for the recipient), please contact the sender by
> > reply email and delete all copies of this message.
> >
> > Please click here<
> > http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for
> > Company Registration Information.
> >
> >
> >
> >
> >
>


Re: Consuming previous messages and from different group.id

2016-03-14 Thread Gerard Klijs
Hi, if you use a new group for a consumer, the auto.offset.reset value will
determine whether it will start at the beginning (with value earliest) or
at the end (with value latest). For each group a separate offset is used,
to two consumer, belonging to two different groups, when started before the
producer, will both consume each message produced.

On Mon, Mar 14, 2016 at 2:37 AM I PVP  wrote:

> Hi everyone,
>
> Are there any specific configurations/properties that need to be set at
> the consumer or at the broker to allow:
>
> 1) The same message  M1 to be delivered to two consumers that "belong"to
> different consumer group.idS ?
>
> 2) A consumer to receive messages that were sent to broker before the
> consumer was started?
>
> Kafka version : 0.9.0.1
> Following the example available at :
>
> http://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
>
> Any help will be appreciated.
>
> Thanks
>
> --
> IPVP
>
>


Re: Kafka topics with infinite retention?

2016-03-14 Thread Gerard Klijs
You might find what you want when looking how Kafka is used for samza,
http://samza.apache.org/

On Mon, Mar 14, 2016 at 10:34 AM Daniel Schierbeck
 wrote:

> Partitions being limited by disk size is no different from e.g. a SQL
> store. This would not be used for extremely high throughput. If,
> eventually, there was a good case for not requiring that an entire
> partition must be stored on a single machine, it would be possible to use
> the log segments for distribution.
>
> On Mon, Mar 14, 2016 at 9:29 AM Giidox  wrote:
>
> > I would like to read an answer to this question as well. This is a
> similar
> > architecture as I am planning. Dealing with secondary data store for old
> > messages would indeed make things complicated.
> >
> > Clark Haskins wrote that the partition size is limited by machines
> > capacity (I assume disk space):
> >
> https://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3ce7b3c4a4-bb72-43f2-8848-9e09d0dcb...@kafka.guru%3E
> > <
> https://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3ce7b3c4a4-bb72-43f2-8848-9e09d0dcb...@kafka.guru%3E>
> <
> >
> https://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3ce7b3c4a4-bb72-43f2-8848-9e09d0dcb...@kafka.guru%3E
> > <
> https://mail-archives.apache.org/mod_mbox/kafka-users/201504.mbox/%3ce7b3c4a4-bb72-43f2-8848-9e09d0dcb...@kafka.guru%3E>>.
> So in theory one
> > could grow a single partition to terabytes-scale. But don’t take my word
> > for it, as I have not tried it.
> >
> > Cheers, Giidox
> >
> >
> >
> > > On 09 Mar 2016, at 15:10, Daniel Schierbeck  >
> > wrote:
> > >
> > > I'm considering an architecture where Kafka acts as the primary
> > datastore,
> > > with infinite retention of messages. The messages in this case will be
> > > domain events that must not be lost. Different downstream consumers
> would
> > > ingest the events and build up various views on them, e.g. aggregated
> > > stats, indexes by various properties, full text search, etc.
> > >
> > > The important bit is that I'd like to avoid having a separate datastore
> > for
> > > long-term archival of events, since:
> > >
> > > 1) I want to make it easy to spin up new materialized views based on
> past
> > > events, and only having to deal with Kafka is simpler.
> > > 2) Instead of having some sort of two-phased import process where I
> need
> > to
> > > first import historical data and then do a switchover to the Kafka
> > topics,
> > > I'd rather just start from offset 0 in the Kafka topics.
> > > 3) I'd like to be able to use standard tooling where possible, and most
> > > tools for ingesting events into e.g. Spark Streaming would be difficult
> > to
> > > use unless all the data was in Kafka.
> > >
> > > I'd like to know if anyone here has tried this use case. Based on the
> > > presentations by Jay Kreps and Martin Kleppmann I would expect that
> > someone
> > > had actually implemented some of the ideas they're been pushing. I'd
> also
> > > like to know what sort of problems Kafka would pose for long-term
> > storage –
> > > would I need special storage nodes, or would replication be sufficient
> to
> > > ensure durability?
> > >
> > > Daniel Schierbeck
> > > Senior Staff Engineer, Zendesk
> >
> >
>


Re: KafkaConsumer#poll not returning records for all partitions of topic in single call

2016-03-10 Thread Gerard Klijs
I noticed a similar effect with a test tool, which checked if the order the
records were produced in, was the same as the order in which they were
consumed. Using only one partition it works fine, but using multiple
partitions the order gets messed up. If I'm right this is by design, but I
would like to hear some feedback about this. Because messages with the same
key, end up in the same partition, if you have multiple partitions, only
the order within a partition is the same as the order they where produced
in. But when consuming form multiple partitions the order could be
different.

If this is true it would be interesting what you should do when you have a
topic were the order needs to be kept the same, and needs to be consumed by
more then one consumer at a time?

On Fri, Mar 11, 2016 at 5:50 AM Ewen Cheslack-Postava 
wrote:

> You definitely *might* see data from multiple partitions, and that won't be
> uncommon once you start processing data. However, there is no guarantee.
>
> In practice, it may be unlikely to see data for both partitions on the
> first call to poll() for a simple reason: poll() will return as soon as any
> data for any partition is available. Unless things are timed just right,
> you're probably making requests to different brokers for data in the
> different partitions. These requests won't be perfectly aligned -- one of
> them will get a response first and the poll() will be able to return with
> some data. Since only the one response will have been received, only one
> partition will get data.
>
> After the first poll, you probably spend some time processing that data
> before you call poll again. However, another request has been sent out to
> the broker that returned data faster and the other request also gets
> returned. So on the next poll, you might be more likely to see data from
> both partitions.
>
> So you're right: there's no hard guarantee, and you shouldn't write your
> consumer code to assume that data will be returned for all partitions. (And
> you can't assume that anyway; what if no new data had been published to one
> of the partitions?). However, many times you will see data from multiple
> partitions.
>
> -Ewen
>
> On Thu, Mar 10, 2016 at 11:21 AM, Shrijeet Paliwal <
> shrijeet.pali...@gmail.com> wrote:
>
> > Version: 0.9.0.1
> >
> > I have a test which creates two partitions in a topic, writes data to
> both
> > partitions. Then a single consumer subscribes to the topic, verifies that
> > it has got the assignment of both partitions in that topic & finally
> issues
> > a poll. The firs poll always comes back with records of only one
> partition.
> > I need to poll one more time to get records for the second partition. The
> > poll timeout has no effect on this.
> >
> > Unless I've misunderstood the contract - the first poll *could* have
> > returned records for the both the partitions. After-all poll
> > returns ConsumerRecords, which is a map of topic_partitions -->
> > records
> >
> > I acknowledge that API does not make any hard guarantees that align with
> my
> > expectation but  looks like API was crafted to support multiple
> partitions
> > & topics in single call. Is there an implementation detail which
> restricts
> > this? Is there a configuration which is controlling what gets fetched?
> >
> > --
> > Shrijeet
> >
>
>
>
> --
> Thanks,
> Ewen
>


Re: Kafka Streams

2016-03-10 Thread Gerard Klijs
Nice read. We just started using kafka, and have multiple cases which need
some kind of stream processing. So we most likely will start testing/using
it as soon as it will be released, adding stream processing containers to
our docker landscape.

On Fri, Mar 11, 2016 at 2:42 AM Jay Kreps  wrote:

> Hey David,
>
> Yeah I think the similarity to Spark (and Flink and RxJava) is the stream
> api style in the DSL. That is totally the way to go for stream processing.
> We tried really hard to make that work early on when we were doing Samza,
> but we really didn't understand the whole iterator/observable distinction
> and the experiment wasn't very successful. We ended up doing a process()
> callback in Samza which I think is just much much less readable. One of the
> nice things about Kafka Streams is I think we really got this right. The
> API is split into two layers--a kind of infrastructure layer which is based
> on modeling data flow DAGs, in some sense all stream processing boils down
> to this, though it is not necessarily the most readable way to express it.
> This layer is documented here (
>
> http://docs.confluent.io/2.1.0-alpha1/streams/developer-guide.html#streams-developer-guide-processor-api
> ).
> Then on top of that you can layer any kind of DSL or language you like. The
> KStreams layer is our take on a readable DSL.
>
> As for RxJava, it is super cool. We looked at it a little bit as a
> potential alternative language versus doing a custom DSL in KStreams. There
> is enough that is unique to distributed stream processing, including the
> whole table/stream distinction, the details of the partitioning model and
> when data is committed, etc that we felt trying to glue something on top
> would end up being a bit limiting. That said, I think there is
> reactive-streams integration for Kafka, though I have no experience with
> it:
>   https://github.com/akka/reactive-kafka
>
> Cheers,
>
> -Jay
>
> On Thu, Mar 10, 2016 at 3:26 PM, David Buschman 
> wrote:
>
> > Very interesting, looks a lot like many operations from Spark were
> brought
> > across.
> >
> > Any plans to integrate with the reactive-stream protocol for
> > interoperability with libraries akka-stream and RxJava?
> >
> > Thanks,
> > DaVe.
> >
> > David Buschman
> > d...@timeli.io
> >
> >
> >
> > > On Mar 10, 2016, at 2:26 PM, Jay Kreps  wrote:
> > >
> > > Hey all,
> > >
> > > Lot's of people have probably seen the ongoing work on Kafka Streams
> > > happening. There is no real way to design a system like this in a
> vacuum,
> > > so we put up a blog, some snapshot docs, and something you can download
> > and
> > > use easily to get feedback:
> > >
> > >
> >
> http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
> > >
> > > We'd love comments or thoughts from anyone...
> > >
> > > -Jay
> >
> >
>


Re: Mirror maker Configs 0.9.0

2016-03-09 Thread Gerard Klijs
What do you see in the logs?
It could be it goes wrong because you have the bootstrap.servers property
which is not supported for the old consumer.

On Wed, Mar 9, 2016 at 11:05 AM Gerard Klijs <gerard.kl...@dizzit.com>
wrote:

> Don't know the actual question, it matters what you want to do.
> Just watch out trying to copy every topic using a new consumer, cause then
> internal topics are copied, leading to errors.
> Here is a temple start script we used:
>
> #!/usr/bin/env bash
> export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.local.only=false 
> -Djava.rmi.server.hostname= 
> -Dcom.sun.management.jmxremote.rmi.port="
> export JMX_PORT=
> /usr/bin/kafka-mirror-maker --consumer.config $HOME_DIR/consumer.properties 
> --producer.config $HOME_DIR/producer.properties --whitelist='' 1>> 
> $LOG_DIR/mirror-maker.log 2>> $LOG_DIR/mirror-maker.log
>
> Both the consumer and producer configs have sensible defaults, these are out 
> consumer.properties template:
>
> #Consumer template to be used with the mirror maker
> zookeeper.connect=
> group.id=mirrormaker
> auto.offset.reset=smallest
> #next property is not available in new consumer
> exclude.internal.topics=true
>
> *And a producer.properties template:*
>
> #Producer template to be used with the mirror maker
> bootstrap.servers=
> client.id=mirrormaker
>
> Because the internal topics can't be excluded in the new consumer yet, we use 
> the old consumer.
>
> Hope this helps.
>
>
> On Wed, Mar 9, 2016 at 10:57 AM prabhu v <prabhuvrajp...@gmail.com> wrote:
>
>> Hi Experts,
>>
>> I am trying to mirror
>>
>>
>>
>>
>> --
>> Regards,
>>
>> Prabhu.V
>>
>


Re: Mirror maker Configs 0.9.0

2016-03-09 Thread Gerard Klijs
Don't know the actual question, it matters what you want to do.
Just watch out trying to copy every topic using a new consumer, cause then
internal topics are copied, leading to errors.
Here is a temple start script we used:

#!/usr/bin/env bash
export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.local.only=false
-Djava.rmi.server.hostname=
-Dcom.sun.management.jmxremote.rmi.port="
export JMX_PORT=
/usr/bin/kafka-mirror-maker --consumer.config
$HOME_DIR/consumer.properties --producer.config
$HOME_DIR/producer.properties --whitelist='' 1>>
$LOG_DIR/mirror-maker.log 2>> $LOG_DIR/mirror-maker.log

Both the consumer and producer configs have sensible defaults, these
are out consumer.properties template:

#Consumer template to be used with the mirror maker
zookeeper.connect=
group.id=mirrormaker
auto.offset.reset=smallest
#next property is not available in new consumer
exclude.internal.topics=true

*And a producer.properties template:*

#Producer template to be used with the mirror maker
bootstrap.servers=
client.id=mirrormaker

Because the internal topics can't be excluded in the new consumer yet,
we use the old consumer.

Hope this helps.


On Wed, Mar 9, 2016 at 10:57 AM prabhu v  wrote:

> Hi Experts,
>
> I am trying to mirror
>
>
>
>
> --
> Regards,
>
> Prabhu.V
>


Re: Creating new consumers after data has been discarded

2016-02-24 Thread Gerard Klijs
Hi Ted,

Maybe it's usefull to take a look at samza, http://samza.apache.org/ they
use kafka in a way which sounds similar to how you want to use it. As I
recall from a youtube conference the creator of samza also mentioned to
never delete the events. These things are off course very dependent on your
use case, some events aren't worth keeping them around for long.

On Wed, Feb 24, 2016 at 9:08 AM Ted Swerve  wrote:

> Hello,
>
> One of the big attractions of Kafka for me was the ability to write new
> consumers of topics that would then be able to connect to a topic and
> replay all the previous events.
>
> However, most of the time, Kafka appears to be used with a retention period
> - presumably in such cases, the events have been warehoused into HDFS
> or something similar.
>
> So my question is - how do people typically approach the scenario where a
> new piece of code needs to process all events in a topic from "day one",
> but has to source some of them from e.g HDFS and then connect to the
> real-time Kafka topic?  Are there any wrinkles with such an approach?
>
> Thanks,
> Ted
>


Re: Decreasing number of partitions INCREASED write throughput--thoughts?

2016-02-20 Thread Gerard Klijs
Interesting,

Do you have any way to look into the amount and size of the actual traffic
being send. I assume because there are more partitions the actual sending
of the messages is separated into more messages, leading to more overhead.

On Sat, Feb 20, 2016 at 12:57 PM John Yost  wrote:

> Hi Everyone,
>
> I discovered yesterday completely by accident. When I went from writing to
> 10 topics each with 10 partitions to 10 topics each with 2 partitions--all
> other config params are the same--my write throughput almost doubled!  I
> was not expecting this as I've always thought--and, admittedly, I am still
> coming up to speed on Kafka--that increasing parallelism via increasing the
> number of partitions and/or topics would increase write throughput. I was
> actually experimenting going from 10 partitions per topic to 20 partitions
> in an effort to increase throughput but stumbled upon 10X2.
>
> Important notes:
>
> 1. Replication factor is 1
> 2. async producer
> 3. request.required.acks is 1
>
> Any ideas?
>
> --John
>


Re: Stalling behaviour with 0.9 console consumer

2016-01-12 Thread Gerard Klijs
Hi Suyog,
It working as intended. You could set the property min.fetch.bytes to a
small value to get less messages in each batch. Setting it to zero will
probably mean you get one object with each batch, at least was the case
when I tried, but I was producing and consuming at the same time.

On Tue, Jan 12, 2016 at 3:47 AM Suyog Rao  wrote:

> Hi, I started with a clean install of 0.9 Kafka broker and populated a test
> topic with 1 million messages. I then used the console consumer to read
> from beginning offset. Using --new-consumer reads the messages, but it
> stalls after every x number of messages or so, and then continues again. It
> is very batchy in its behaviour. If I go back to the old consumer, I am
> able to stream the messages continuously. Am I missing a timeout setting or
> something?
>
> I created my own consumer in Java and call poll(0) in a loop, but I still
> get the same behaviour. This is on Mac OS X (yosemite) with java version
> "1.8.0_65".
>
> Any ideas?
>
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic
> apache_logs --from-beginning --new-consumer
>
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic
> apache_logs --from-beginning -zookeeper localhost:2181
>


Re: New consumer not fetching as quickly as possible

2015-12-01 Thread Gerard Klijs
Thanks Tao, it worked.
I also played around with my test setting trying to replicate your results,
using default settings. But als long as the poll timeout is set to 100ms or
larger the only time-out I get are near the start and near the end (when
indeed there is nothing to consume). This is with a producer putting out
1000 messages a second. Maybe the load of the producer your using is not
constant? Maybe you could run a test with the
org.apache.kafka.tools.ProducerPerformance class to see if it makes a
difference?

On Tue, Dec 1, 2015 at 11:35 AM tao xiao <xiaotao...@gmail.com> wrote:

> Gerard,
>
> In your case I think you can set fetch.min.bytes=1 so that the server will
> answer the fetch request as soon as a single byte of data is available
> instead of accumulating enough messages.
>
> But in my case is I have plenty of messages in broker and I am sure the
> size of total message are much larger than the default setting which is
> 1024 bytes but still the consumer doesn't return messages for every poll.
>
>
> On Tue, 1 Dec 2015 at 18:29 Gerard Klijs <gerard.kl...@dizzit.com> wrote:
>
> > I was experimenting with the timeout setting, but as long as messages are
> > produced and the consumer(s) keep polling I saw little difference. I did
> > see for example that when producing only 1 message a second, still it
> > sometimes wait to get three messages. So I also would like to know if
> there
> > is a faster way.
> >
> > On Tue, Dec 1, 2015 at 10:35 AM tao xiao <xiaotao...@gmail.com> wrote:
> >
> > > Hi team,
> > >
> > > I am using the new consumer with broker version 0.9.0. I notice that
> > > poll(time) occasionally returns 0 message even though I have enough
> > > messages in broker. The rate of returning 0 message is quite high like
> 4
> > > out of 5 polls return 0 message. It doesn't help by increasing the poll
> > > timeout from 300ms to 1 second. are there any configurations that I can
> > > tune to fetch  data as quickly as possible?
> > >
> > > Both consumer and broker configs are default
> > >
> >
>


Re: New consumer not fetching as quickly as possible

2015-12-01 Thread Gerard Klijs
Another possible reason witch comes to me mind is that you have multiple
consumer threads, but not the partitions/brokers to support them. When I'm
running my tool on multiple threads I get a lot of time-outs. When I only
use one consumer thread I get them only at the start and the end.

On Wed, Dec 2, 2015 at 5:43 AM Jason Gustafson <ja...@confluent.io> wrote:

> There is some initial overhead before data can be fetched. For example, the
> group has to be joined and topic metadata has to be fetched. Do you see
> unexpected empty fetches beyond the first 10 polls?
>
> Thanks,
> Jason
>
> On Tue, Dec 1, 2015 at 7:43 PM, tao xiao <xiaotao...@gmail.com> wrote:
>
> > Hi Jason,
> >
> > You are correct. I initially produced 1 messages in Kafka before I
> > started up my consumer with auto.offset.reset=earliest. But like I said
> the
> > majority number of first 10 polls returned 0 message and the lag remained
> > above 0 which means I still have enough messages to consume.  BTW I
> commit
> > offset manually so the lag should accurately reflect how many messages
> > remaining.
> >
> > I will turn on debug logging and test again.
> >
> > On Wed, 2 Dec 2015 at 07:17 Jason Gustafson <ja...@confluent.io> wrote:
> >
> > > Hey Tao, other than high latency between the brokers and the consumer,
> > I'm
> > > not sure what would cause this. Can you turn on debug logging and run
> > > again? I'm looking for any connection problems or metadata/fetch
> request
> > > errors. And I have to ask a dumb question, how do you know that more
> > > messages are available? Are you monitoring the consumer's lag?
> > >
> > > -Jason
> > >
> > > On Tue, Dec 1, 2015 at 10:07 AM, Gerard Klijs <gerard.kl...@dizzit.com
> >
> > > wrote:
> > >
> > > > Thanks Tao, it worked.
> > > > I also played around with my test setting trying to replicate your
> > > results,
> > > > using default settings. But als long as the poll timeout is set to
> > 100ms
> > > or
> > > > larger the only time-out I get are near the start and near the end
> > (when
> > > > indeed there is nothing to consume). This is with a producer putting
> > out
> > > > 1000 messages a second. Maybe the load of the producer your using is
> > not
> > > > constant? Maybe you could run a test with the
> > > > org.apache.kafka.tools.ProducerPerformance class to see if it makes a
> > > > difference?
> > > >
> > > > On Tue, Dec 1, 2015 at 11:35 AM tao xiao <xiaotao...@gmail.com>
> wrote:
> > > >
> > > > > Gerard,
> > > > >
> > > > > In your case I think you can set fetch.min.bytes=1 so that the
> server
> > > > will
> > > > > answer the fetch request as soon as a single byte of data is
> > available
> > > > > instead of accumulating enough messages.
> > > > >
> > > > > But in my case is I have plenty of messages in broker and I am sure
> > the
> > > > > size of total message are much larger than the default setting
> which
> > is
> > > > > 1024 bytes but still the consumer doesn't return messages for every
> > > poll.
> > > > >
> > > > >
> > > > > On Tue, 1 Dec 2015 at 18:29 Gerard Klijs <gerard.kl...@dizzit.com>
> > > > wrote:
> > > > >
> > > > > > I was experimenting with the timeout setting, but as long as
> > messages
> > > > are
> > > > > > produced and the consumer(s) keep polling I saw little
> difference.
> > I
> > > > did
> > > > > > see for example that when producing only 1 message a second,
> still
> > it
> > > > > > sometimes wait to get three messages. So I also would like to
> know
> > if
> > > > > there
> > > > > > is a faster way.
> > > > > >
> > > > > > On Tue, Dec 1, 2015 at 10:35 AM tao xiao <xiaotao...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi team,
> > > > > > >
> > > > > > > I am using the new consumer with broker version 0.9.0. I notice
> > > that
> > > > > > > poll(time) occasionally returns 0 message even though I have
> > enough
> > > > > > > messages in broker. The rate of returning 0 message is quite
> high
> > > > like
> > > > > 4
> > > > > > > out of 5 polls return 0 message. It doesn't help by increasing
> > the
> > > > poll
> > > > > > > timeout from 300ms to 1 second. are there any configurations
> > that I
> > > > can
> > > > > > > tune to fetch  data as quickly as possible?
> > > > > > >
> > > > > > > Both consumer and broker configs are default
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: New consumer not fetching as quickly as possible

2015-12-01 Thread Gerard Klijs
I was experimenting with the timeout setting, but as long as messages are
produced and the consumer(s) keep polling I saw little difference. I did
see for example that when producing only 1 message a second, still it
sometimes wait to get three messages. So I also would like to know if there
is a faster way.

On Tue, Dec 1, 2015 at 10:35 AM tao xiao  wrote:

> Hi team,
>
> I am using the new consumer with broker version 0.9.0. I notice that
> poll(time) occasionally returns 0 message even though I have enough
> messages in broker. The rate of returning 0 message is quite high like 4
> out of 5 polls return 0 message. It doesn't help by increasing the poll
> timeout from 300ms to 1 second. are there any configurations that I can
> tune to fetch  data as quickly as possible?
>
> Both consumer and broker configs are default
>


Re: connection time out

2015-11-30 Thread Gerard Klijs
I just ran in almost the same problem. In my case it was solved by setting
the 'advertised.host.name' to the correct value in de server properties.
The hostname you enter here should be resolvable from the cluster your
running the test from.

On Mon, Nov 30, 2015 at 3:40 AM Yuheng Du  wrote:

> Also, I can see the topic "speedx2" being created in the broker, but not
> message data is coming through.
>
> On Sun, Nov 29, 2015 at 7:00 PM, Yuheng Du 
> wrote:
>
> > Hi guys,
> >
> > I was running a single node broker in a cluster. And when I run the
> > producer in another cluster, I got connection time out error.
> >
> > I can ping into port 9092 and other ports on the broker machine from the
> > producer. I just can't publish any messages. The command I used to run
> the
> > producer is:
> >
> > bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
> > speedx2 50 100 -1 acks=1 bootstrap.servers=130.127.xxx.xxx:9092
> > buffer.memory=67184 batch.size=8196
> >
> > Can anyone suggest what the problem might be?
> >
> >
> > Thank you!
> >
> >
> > best,
> >
> > Yuheng
> >
>


500 ms delay using new consumer and schema registry.

2015-11-28 Thread Gerard Klijs
Hi all,
I'm running all little test, with both zookeeper, Kafka and the schema
registry running locally. Using the new consumer, and the 2.0.0-snapshot
version of the registry, which has an decoder giving back instances of the
schema object.

It's all working fine, but I see a consistent delay maximum around 500 ms.
I'm just wondering if anyone knows what might be the cause. The delay is
from creating the record, to receiving the object.

For who wants to try the same thing, I ran into some problems until I
created the java from a schema using avro, instead of using avro to
generate schema from a class. Thus could just have been caused by the
default constructor not being available in the java class I used.


Nullpointer using 'old' producer with 0.9.0 when node fails

2015-11-13 Thread Gerard Klijs
I don't think it's a big problem, but I just ran into an issue playing
around with vagrant. I was using the 0.9.0 github branch to run kafka, and
used vagrant to (by default) bring up one zookeeper and 3 broker instances.
Then I created two topics like:

./bin/kafka-topics.sh --create --zookeeper 192.168.50.1:2181
--replication-factor 3 --partitions 1 --topic transactions

After this I ran a demo project, build with the 0.8.2.1 sources. All worked
well. But when bringing down a broker, I got a nullpointer,
because org.apache.kafka.common.PartitionInfo no longer had a leader
object. causing a nullpointer at line 75. This nullpointer is already
resolved by revision 0699ff2ce60abb466cab5315977a224f1a70a4da introducing
the new consumer. Also it's only affecting the producer. But it might by
nice to now this can happen for people looking to upgrade their server,
while using old clients.