Re: What's the difference between kafka_2.11 and kafka-client?

2016-06-23 Thread Grant Henke
kafka_2.11 is the Kafka server code and old Scala clients. kafka-client are
the new Java clients.

Thanks,
Grant

On Thu, Jun 23, 2016 at 9:25 PM, BYEONG-GI KIM  wrote:

> Hello.
>
> I wonder what the difference is between kafka_2.11 and kafka-client on
> Maven Repo.
>
> Thank you in advance!
>
> Best regards
>
> KIM
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


What's the difference between kafka_2.11 and kafka-client?

2016-06-23 Thread BYEONG-GI KIM
Hello.

I wonder what the difference is between kafka_2.11 and kafka-client on
Maven Repo.

Thank you in advance!

Best regards

KIM


Nagios Healthcheck for Kafka Consumer Groups (using Burrow)

2016-06-23 Thread Jason J. W. Williams
In case it helps anyone else, we opened sourced our Nagios health check for
monitoring consumer group health using Burrow:

https://github.com/williamsjj/kafka_health

-J


Re: Kafka streams for out of order density aggregation

2016-06-23 Thread Guozhang Wang
Hello Ryan,

On the DSL layer, currently there is not support for record window yet; and
we are discussing about adding such support in the future, maybe first
session windows then others.

On the Processor API layer, you can definitely implement this "record
window" feature yourself by keeping track of the most recent 10 records in
a state store, and upon each new incoming record, delete the oldest record
and insert this new record, and re-compute the count (for your specific
case you do not need to re-scan all 10 records, but just get the new counts
as

old counts + (if new record > 100) ? 1 : 0 - (if old record > 100) ? 1 : 0


Guozhang


On Thu, Jun 23, 2016 at 1:55 PM, Ryan Thompson 
wrote:

> Hello,
>
> Say I have a stream, and want to determine whether or not a given "density"
> of of records match a given condition.  For example, let's say I want to
> how many of the last 10 records have a numerical value greater than 100.
>
> Does the kafka streams DSL (or processor API) provide a way to do this type
> of aggregation in a way that supports out of order messages?
>
> I can't use a time window based aggregation here because my window is based
> on a quantity of records (in this case, the last 10) rather than time.
> However, I still want to know about the last 10 records regardless of what
> order they arrive.
>
> Thanks,
> Ryan
>



-- 
-- Guozhang


Kafka Borker ID disappears from /borkers/ids

2016-06-23 Thread Chaitra Ranganna
Hi 

Kafka version used : 0.8.2.1 
Zookeeper version: 3.4.6
We have scenario where kafka 's broker in zookeeper path /brokers/ids just 
disappears.
We see the zookeeper connection active and no network issue.
The zookeeper conection timeout is set to 6000ms in server.properties
Hence Kafka not participating in cluster

Kafka busy spitting below logs
[2016-06-21 21:59:58,863] INFO Partition [applog,19] on broker 3: Shrinking ISR 
for partition [applog,19] from 3,5 to 3 (kafka.cluster.Partition)
[2016-06-21 21:59:58,865] INFO Partition [applog,19] on broker 3: Cached 
zkVersion [475] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2016-06-21 21:59:58,866] INFO Partition [vztetappstdout,14] on broker 3: 
Shrinking ISR for partition [vztetappstdout,14] from 3,2 to 3 
(kafka.cluster.Partition)
[2016-06-21 21:59:58,868] INFO Partition [vztetappstdout,14] on broker 3: 
Cached zkVersion [265] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2016-06-21 21:59:58,868] INFO Partition [appstderr,14] on broker 3: Shrinking 
ISR for partition [appstderr,14] from 3,5 to 3 (kafka.cluster.Partition)
[2016-06-21 21:59:58,870] INFO Partition [appstderr,14] on broker 3: Cached 
zkVersion [473] not equal to that in zookeeper, skip updating ISR 
(kafka.cluster.Partition)
[2016-06-21 21:59:58,870] INFO Partition [vztetappstdout,2] on broker 3: 
Shrinking ISR for partition [vztetappstdout

Please help me with this issue

Kafka streams for out of order density aggregation

2016-06-23 Thread Ryan Thompson
Hello,

Say I have a stream, and want to determine whether or not a given "density"
of of records match a given condition.  For example, let's say I want to
how many of the last 10 records have a numerical value greater than 100.

Does the kafka streams DSL (or processor API) provide a way to do this type
of aggregation in a way that supports out of order messages?

I can't use a time window based aggregation here because my window is based
on a quantity of records (in this case, the last 10) rather than time.
However, I still want to know about the last 10 records regardless of what
order they arrive.

Thanks,
Ryan


RE: Failed to deserialize data to Avro:

2016-06-23 Thread Tauzell, Dave
That should work then.  I would take some messages off the queue and verify 
that they have the correct magic byte (byte 0).

-Dave

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-Original Message-
From: Sungwook Yoon [mailto:sy...@maprtech.com]
Sent: Thursday, June 23, 2016 2:32 PM
To: users@kafka.apache.org
Subject: Re: Failed to deserialize data to Avro:

So, I am just consuming from already existing Kafka queue and topics.
According to our internal documentation, when we put event data into Kafka 
queue, each message has

1 magic byte
4 bytes of Schema ID
Then Avro serialized data

And we have our Schema Registry server running.

Sungwook


On Thu, Jun 23, 2016 at 12:25 PM, Tauzell, Dave < dave.tauz...@surescripts.com> 
wrote:

> How are you putting data onto the Topic?  The HdfsSink expects that
> you used the KafkaAvroSerializer (
> http://docs.confluent.io/1.0/schema-registry/docs/serializer-formatter
> .html) which prepends a null byte and schema registry id to the front
> of the serialized avro data.  If you just put avro onto the topic you
> will see this error.
>
> -Dave
>
> Dave Tauzell | Senior Software Engineer | Surescripts
> O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
> Connect with us: Twitter I LinkedIn I Facebook I YouTube
>
>
> -Original Message-
> From: Sungwook Yoon [mailto:sy...@maprtech.com]
> Sent: Thursday, June 23, 2016 1:46 PM
> To: users@kafka.apache.org
> Subject: Failed to deserialize data to Avro:
>
> Hi,
>
> I am testing kafka connect and got this error,
>
> Exception in thread "WorkerSinkTask-local-file-sink-0"
> org.apache.kafka.connect.errors.DataException: Failed to deserialize
> data to Avro:
> at
>
> io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:109)
> at
>
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:266)
> at
>
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:175)
> at
>
> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.iteration(WorkerSinkTaskThread.java:90)
> at
>
> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.execute(WorkerSinkTaskThread.java:58)
> at
>
> org.apache.kafka.connect.util.ShutdownableThread.run(ShutdownableThrea
> d.java:82) Caused by:
> org.apache.kafka.common.errors.SerializationException: Error
> deserializing Avro message for id -1 Caused by:
> org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
>
>
> I am using
> Kafka Connect 2.0.0
> Kafka 0.9
> with Schema Registry Running for Avro Schema Serving.
>
> Why am I getting this error?
> I believe the schema registry server is running fine, since I checked
> its running through other means.
>
>
> Any insight would be helpful.
>
> Thanks,
>
> Sungwook
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of
> the individual or entity to whom they are addressed. If you have
> received this e-mail in error, please notify the sender by reply
> e-mail immediately and destroy all copies of the e-mail and any attachments.
>
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


Re: Dealing with diverse consumption speeds

2016-06-23 Thread Marcos Juarez
I'm also interested in knowing if other people have run into this problem
of different consumption speeds across consumers, and how they've dealt
with it.  I've run into this in 0.7, 0.8, both beta and release, and now
0.9.0.1.  It doesn't seem to be partition-specific, but consumer-specific.
In our case, all 24 partitions in one consumer are lagging by ~50M offsets,
while all 24 partitions in another consumer are fully caught up.  The box
has plenty of capacity available, so the consumers were never starved of
CPU or RAM.  The offsets for the group id keep changing on all partitions,
so none of them are stuck, but the current rate of consumption seems to be
only as fast as the non-lagging consumer.  It's almost as if all lagging
consumers are mimicking the consumption rate of the fully caught up
consumer, and won't go any faster.

What I'm thinking right now to mitigate the issue is adding some context to
consumers (they all live within a single app), so that we'll be able to
pause consumption if lag becomes too high, and let the other consumers
catch up.

Any thoughts/suggestions on that?

Thanks,

Marcos Juarez

On Sun, Jan 24, 2016 at 6:10 AM, Jens Rantil  wrote:

> Hi,
>
> How are you dealing with a slow consumer in Kafka? In the best of world,
> each consumer will have the exact same specs and the exact same workload.
> But unfortunately that's rarely true: Virtual machines share hardware with
> other VMs, some Kafka tasks takes longer to process, some partition keys
> occasionally make the Kafka cluster unbalanced etc.
>
> On a larger perspective, maybe it would be nice if a consumer group would
> occasionally rebalance consumers based on lag.
>
> Cheers,
> Jens
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook  Linkedin
> <
> http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
> >
>  Twitter 
>


Re: Failed to deserialize data to Avro:

2016-06-23 Thread Sungwook Yoon
So, I am just consuming from already existing Kafka queue and topics.
According to our internal documentation, when we put event data into Kafka
queue, each message has

1 magic byte
4 bytes of Schema ID
Then Avro serialized data

And we have our Schema Registry server running.

Sungwook


On Thu, Jun 23, 2016 at 12:25 PM, Tauzell, Dave <
dave.tauz...@surescripts.com> wrote:

> How are you putting data onto the Topic?  The HdfsSink expects that you
> used the KafkaAvroSerializer (
> http://docs.confluent.io/1.0/schema-registry/docs/serializer-formatter.html)
> which prepends a null byte and schema registry id to the front of the
> serialized avro data.  If you just put avro onto the topic you will see
> this error.
>
> -Dave
>
> Dave Tauzell | Senior Software Engineer | Surescripts
> O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
> Connect with us: Twitter I LinkedIn I Facebook I YouTube
>
>
> -Original Message-
> From: Sungwook Yoon [mailto:sy...@maprtech.com]
> Sent: Thursday, June 23, 2016 1:46 PM
> To: users@kafka.apache.org
> Subject: Failed to deserialize data to Avro:
>
> Hi,
>
> I am testing kafka connect and got this error,
>
> Exception in thread "WorkerSinkTask-local-file-sink-0"
> org.apache.kafka.connect.errors.DataException: Failed to deserialize data
> to Avro:
> at
>
> io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:109)
> at
>
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:266)
> at
>
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:175)
> at
>
> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.iteration(WorkerSinkTaskThread.java:90)
> at
>
> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.execute(WorkerSinkTaskThread.java:58)
> at
>
> org.apache.kafka.connect.util.ShutdownableThread.run(ShutdownableThread.java:82)
> Caused by: org.apache.kafka.common.errors.SerializationException: Error
> deserializing Avro message for id -1 Caused by:
> org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
>
>
> I am using
> Kafka Connect 2.0.0
> Kafka 0.9
> with Schema Registry Running for Avro Schema Serving.
>
> Why am I getting this error?
> I believe the schema registry server is running fine, since I checked its
> running through other means.
>
>
> Any insight would be helpful.
>
> Thanks,
>
> Sungwook
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>


RE: Failed to deserialize data to Avro:

2016-06-23 Thread Tauzell, Dave
How are you putting data onto the Topic?  The HdfsSink expects that you used 
the KafkaAvroSerializer 
(http://docs.confluent.io/1.0/schema-registry/docs/serializer-formatter.html) 
which prepends a null byte and schema registry id to the front of the 
serialized avro data.  If you just put avro onto the topic you will see this 
error.

-Dave

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-Original Message-
From: Sungwook Yoon [mailto:sy...@maprtech.com]
Sent: Thursday, June 23, 2016 1:46 PM
To: users@kafka.apache.org
Subject: Failed to deserialize data to Avro:

Hi,

I am testing kafka connect and got this error,

Exception in thread "WorkerSinkTask-local-file-sink-0"
org.apache.kafka.connect.errors.DataException: Failed to deserialize data to 
Avro:
at
io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:109)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:266)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:175)
at
org.apache.kafka.connect.runtime.WorkerSinkTaskThread.iteration(WorkerSinkTaskThread.java:90)
at
org.apache.kafka.connect.runtime.WorkerSinkTaskThread.execute(WorkerSinkTaskThread.java:58)
at
org.apache.kafka.connect.util.ShutdownableThread.run(ShutdownableThread.java:82)
Caused by: org.apache.kafka.common.errors.SerializationException: Error 
deserializing Avro message for id -1 Caused by: 
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!


I am using
Kafka Connect 2.0.0
Kafka 0.9
with Schema Registry Running for Avro Schema Serving.

Why am I getting this error?
I believe the schema registry server is running fine, since I checked its 
running through other means.


Any insight would be helpful.

Thanks,

Sungwook
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


Failed to deserialize data to Avro:

2016-06-23 Thread Sungwook Yoon
Hi,

I am testing kafka connect and got this error,

Exception in thread "WorkerSinkTask-local-file-sink-0"
org.apache.kafka.connect.errors.DataException: Failed to deserialize data
to Avro:
at
io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:109)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:266)
at
org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:175)
at
org.apache.kafka.connect.runtime.WorkerSinkTaskThread.iteration(WorkerSinkTaskThread.java:90)
at
org.apache.kafka.connect.runtime.WorkerSinkTaskThread.execute(WorkerSinkTaskThread.java:58)
at
org.apache.kafka.connect.util.ShutdownableThread.run(ShutdownableThread.java:82)
Caused by: org.apache.kafka.common.errors.SerializationException: Error
deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown
magic byte!


I am using
Kafka Connect 2.0.0
Kafka 0.9
with Schema Registry Running for Avro Schema Serving.

Why am I getting this error?
I believe the schema registry server is running fine, since I checked its
running through other means.


Any insight would be helpful.

Thanks,

Sungwook


Re: Expired messages in kafka topic

2016-06-23 Thread Krish
Gwen,
Have selected priority 'minor', component as 'core', have assigned no
labels.
Jira link: https://issues.apache.org/jira/browse/KAFKA-3895.

I have also added a question to the jira issue, alongwith a rough approach
that I have in mind.
It would be great if you can have a look and provide comments.

I am still setting up the dev env for kafka; will update as I progress.
Thanks.



--
κρισhναν

On Thu, Jun 23, 2016 at 9:30 PM, Gwen Shapira  wrote:

> Thats a pretty cool feature, if anyone feels like opening a JIRA :)
>
> On Thu, Jun 23, 2016 at 8:46 AM, Christian Posta
>  wrote:
> > Sounds like something a traditional message broker (ie, ActiveMQ) would
> be
> > able to do with a TTL setting and expiry. Expired messages get moved to a
> > DLQ.
> >
> > On Thu, Jun 23, 2016 at 2:45 AM, Krish 
> wrote:
> >
> >> Hi,
> >> I am trying to design a real-time application where message timeout can
> be
> >> as low as a minute or two (message can get stale real-fast).
> >>
> >> In the rare chance that the consumers lag too far behind in processing
> >> messages from the broker, is there a concept of expired message queue in
> >> Kafka?
> >>
> >> I would like to know if a message has expired and then park it in some
> >> topic till as such time that a service can dequeue, process it and/or
> >> investigate it.
> >>
> >> Thanks.
> >>
> >> Best,
> >> Krish
> >>
> >
> >
> >
> > --
> > *Christian Posta*
> > twitter: @christianposta
> > http://www.christianposta.com/blog
> > http://fabric8.io
>


Re: is kafka the right choice

2016-06-23 Thread Philippe Derome
See Keyhole Software blog and particularly John Boardman's presentation of
sample app with responsive web client using WebSockets connecting to a
netty embedded web server that itself uses producer and consumer clients
with a Kafka infrastructure (@johnwboardman). On first look, it seems like
a valid approach. Behind the web server are services that are Kafka apps
interacting with external web APIs.

Anecdotally quite a few companies post jobs with Kafka playing a role in a
micro architecture solution.

I'll now let experts speak...
On 23 Jun 2016 11:47 a.m., "Pranay Suresh"  wrote:

> Hey Kafka experts,
>
> After having read Jay Kreps awesome Kafka reading(
>
> https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
> )
> I have a doubt.
>
> For communication between browsers (lets say collaborative editing, chat
> etc.) is Kafka the right choice ? Especially given that Kafka consumers are
> designed to pull , rather than a callback style push. For low latency
> possibly ephemeral data/events is Kafka a good choice ? Can I have a
> browser open a socket into a webserver and each request initiate a consumer
> to consume from kafka (by polling?) OR is Kafka designed and/or meant to be
> used for a separate usecase ?
>
> Any feedback is appreciated. Let the bashing begin!
>
> Many Thanks,
> pranay
>


Re: Expired messages in kafka topic

2016-06-23 Thread Gwen Shapira
Thats a pretty cool feature, if anyone feels like opening a JIRA :)

On Thu, Jun 23, 2016 at 8:46 AM, Christian Posta
 wrote:
> Sounds like something a traditional message broker (ie, ActiveMQ) would be
> able to do with a TTL setting and expiry. Expired messages get moved to a
> DLQ.
>
> On Thu, Jun 23, 2016 at 2:45 AM, Krish  wrote:
>
>> Hi,
>> I am trying to design a real-time application where message timeout can be
>> as low as a minute or two (message can get stale real-fast).
>>
>> In the rare chance that the consumers lag too far behind in processing
>> messages from the broker, is there a concept of expired message queue in
>> Kafka?
>>
>> I would like to know if a message has expired and then park it in some
>> topic till as such time that a service can dequeue, process it and/or
>> investigate it.
>>
>> Thanks.
>>
>> Best,
>> Krish
>>
>
>
>
> --
> *Christian Posta*
> twitter: @christianposta
> http://www.christianposta.com/blog
> http://fabric8.io


Re: Expired messages in kafka topic

2016-06-23 Thread Krish
Tom,
When you say this:
"Deletion can happen at different times on the different replicas of the
log, and to different messages. Whilst a consumer will only be reading from
the lead broker for any log at any one time, the leader can and will change
to handle broker failure."

basically it means that a dead-letter Q like functionality can get multiple
messages, and anyone draining the queue needs to take care of duplicates.
Right so far?




--
κρισhναν

On Thu, Jun 23, 2016 at 9:21 PM, Krish  wrote:

> Well, we are already using Kafka and would like to get this feature.
> How hard can it be to hack it and use a custom kafka!? ;)
>
> Let me look up the source code (never have checked it) and see what can be
> done.
> Thanks Tom and Christian, for helping me decide fast.
>
>
>
> --
> κρισhναν
>
> On Thu, Jun 23, 2016 at 9:16 PM, Christian Posta <
> christian.po...@gmail.com> wrote:
>
>> Sounds like something a traditional message broker (ie, ActiveMQ) would
>> be able to do with a TTL setting and expiry. Expired messages get moved to
>> a DLQ.
>>
>> On Thu, Jun 23, 2016 at 2:45 AM, Krish  wrote:
>>
>>> Hi,
>>> I am trying to design a real-time application where message timeout can
>>> be
>>> as low as a minute or two (message can get stale real-fast).
>>>
>>> In the rare chance that the consumers lag too far behind in processing
>>> messages from the broker, is there a concept of expired message queue in
>>> Kafka?
>>>
>>> I would like to know if a message has expired and then park it in some
>>> topic till as such time that a service can dequeue, process it and/or
>>> investigate it.
>>>
>>> Thanks.
>>>
>>> Best,
>>> Krish
>>>
>>
>>
>>
>> --
>> *Christian Posta*
>> twitter: @christianposta
>> http://www.christianposta.com/blog
>> http://fabric8.io
>>
>>
>


Re: Expired messages in kafka topic

2016-06-23 Thread Krish
Well, we are already using Kafka and would like to get this feature.
How hard can it be to hack it and use a custom kafka!? ;)

Let me look up the source code (never have checked it) and see what can be
done.
Thanks Tom and Christian, for helping me decide fast.



--
κρισhναν

On Thu, Jun 23, 2016 at 9:16 PM, Christian Posta 
wrote:

> Sounds like something a traditional message broker (ie, ActiveMQ) would be
> able to do with a TTL setting and expiry. Expired messages get moved to a
> DLQ.
>
> On Thu, Jun 23, 2016 at 2:45 AM, Krish  wrote:
>
>> Hi,
>> I am trying to design a real-time application where message timeout can be
>> as low as a minute or two (message can get stale real-fast).
>>
>> In the rare chance that the consumers lag too far behind in processing
>> messages from the broker, is there a concept of expired message queue in
>> Kafka?
>>
>> I would like to know if a message has expired and then park it in some
>> topic till as such time that a service can dequeue, process it and/or
>> investigate it.
>>
>> Thanks.
>>
>> Best,
>> Krish
>>
>
>
>
> --
> *Christian Posta*
> twitter: @christianposta
> http://www.christianposta.com/blog
> http://fabric8.io
>
>


is kafka the right choice

2016-06-23 Thread Pranay Suresh
Hey Kafka experts,

After having read Jay Kreps awesome Kafka reading(
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying)
I have a doubt.

For communication between browsers (lets say collaborative editing, chat
etc.) is Kafka the right choice ? Especially given that Kafka consumers are
designed to pull , rather than a callback style push. For low latency
possibly ephemeral data/events is Kafka a good choice ? Can I have a
browser open a socket into a webserver and each request initiate a consumer
to consume from kafka (by polling?) OR is Kafka designed and/or meant to be
used for a separate usecase ?

Any feedback is appreciated. Let the bashing begin!

Many Thanks,
pranay


Re: Expired messages in kafka topic

2016-06-23 Thread Christian Posta
Sounds like something a traditional message broker (ie, ActiveMQ) would be
able to do with a TTL setting and expiry. Expired messages get moved to a
DLQ.

On Thu, Jun 23, 2016 at 2:45 AM, Krish  wrote:

> Hi,
> I am trying to design a real-time application where message timeout can be
> as low as a minute or two (message can get stale real-fast).
>
> In the rare chance that the consumers lag too far behind in processing
> messages from the broker, is there a concept of expired message queue in
> Kafka?
>
> I would like to know if a message has expired and then park it in some
> topic till as such time that a service can dequeue, process it and/or
> investigate it.
>
> Thanks.
>
> Best,
> Krish
>



-- 
*Christian Posta*
twitter: @christianposta
http://www.christianposta.com/blog
http://fabric8.io


Re: Expired messages in kafka topic

2016-06-23 Thread Tom Crayford
No, there's no control over that. The right way to do this is to keep up
with the head of the topic and decide on "old" yourself in the consumer.

Deletion can happen at different times on the different replicas of the
log, and to different messages. Whilst a consumer will only be reading from
the lead broker for any log at any one time, the leader can and will change
to handle broker failure.

On Thu, Jun 23, 2016 at 4:37 PM, Krish  wrote:

> Thanks Tom.
> Is there any way a consumer can be triggered when the message is about to
> be deleted by Kafka?
>
>
>
> --
> κρισhναν
>
> On Thu, Jun 23, 2016 at 6:16 PM, Tom Crayford 
> wrote:
>
>> Hi,
>>
>> A pretty reasonable thing to do here would be to have a consumer that
>> moved "old" events to another topic.
>>
>> Kafka has no concept of an expired queue, the only thing it can do once a
>> message is aged out is delete it. The deletion is done in bulk and
>> typically is set to 24h or even higher (LinkedIn use 4 days, the default is
>> 7 days).
>>
>> Thanks
>>
>> Tom Crayford
>> Heroku Kafka
>>
>> On Thu, Jun 23, 2016 at 10:45 AM, Krish 
>> wrote:
>>
>>> Hi,
>>> I am trying to design a real-time application where message timeout can
>>> be
>>> as low as a minute or two (message can get stale real-fast).
>>>
>>> In the rare chance that the consumers lag too far behind in processing
>>> messages from the broker, is there a concept of expired message queue in
>>> Kafka?
>>>
>>> I would like to know if a message has expired and then park it in some
>>> topic till as such time that a service can dequeue, process it and/or
>>> investigate it.
>>>
>>> Thanks.
>>>
>>> Best,
>>> Krish
>>>
>>
>>
>


Re: Expired messages in kafka topic

2016-06-23 Thread Krish
Thanks Tom.
Is there any way a consumer can be triggered when the message is about to
be deleted by Kafka?



--
κρισhναν

On Thu, Jun 23, 2016 at 6:16 PM, Tom Crayford  wrote:

> Hi,
>
> A pretty reasonable thing to do here would be to have a consumer that
> moved "old" events to another topic.
>
> Kafka has no concept of an expired queue, the only thing it can do once a
> message is aged out is delete it. The deletion is done in bulk and
> typically is set to 24h or even higher (LinkedIn use 4 days, the default is
> 7 days).
>
> Thanks
>
> Tom Crayford
> Heroku Kafka
>
> On Thu, Jun 23, 2016 at 10:45 AM, Krish  wrote:
>
>> Hi,
>> I am trying to design a real-time application where message timeout can be
>> as low as a minute or two (message can get stale real-fast).
>>
>> In the rare chance that the consumers lag too far behind in processing
>> messages from the broker, is there a concept of expired message queue in
>> Kafka?
>>
>> I would like to know if a message has expired and then park it in some
>> topic till as such time that a service can dequeue, process it and/or
>> investigate it.
>>
>> Thanks.
>>
>> Best,
>> Krish
>>
>
>


Re: Expired messages in kafka topic

2016-06-23 Thread Tom Crayford
Hi,

A pretty reasonable thing to do here would be to have a consumer that moved
"old" events to another topic.

Kafka has no concept of an expired queue, the only thing it can do once a
message is aged out is delete it. The deletion is done in bulk and
typically is set to 24h or even higher (LinkedIn use 4 days, the default is
7 days).

Thanks

Tom Crayford
Heroku Kafka

On Thu, Jun 23, 2016 at 10:45 AM, Krish  wrote:

> Hi,
> I am trying to design a real-time application where message timeout can be
> as low as a minute or two (message can get stale real-fast).
>
> In the rare chance that the consumers lag too far behind in processing
> messages from the broker, is there a concept of expired message queue in
> Kafka?
>
> I would like to know if a message has expired and then park it in some
> topic till as such time that a service can dequeue, process it and/or
> investigate it.
>
> Thanks.
>
> Best,
> Krish
>


Re: SSL support for command line tools

2016-06-23 Thread Gerard Klijs
That particular tool doen't seem to support ssl, at least not the 0.10
version.

On Thu, Jun 23, 2016 at 9:17 AM Radu Radutiu  wrote:

> I have read the documentation and I can connect the consumer and producer
> successfully with SSL. However I have trouble running other scripts like
>
> bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list
> {brokerUrl} —topic {topicName} --time -2
>
> if the broker is configured with SSL only.
>
> Regards,
> Radu
>
> On 23 June 2016 at 01:46, Harsha  wrote:
>
> > Radu,
> >  Please follow the instructions here
> >  http://kafka.apache.org/documentation.html#security_ssl . At
> >  the end of the SSL section we've an example for produce and
> >  consumer command line tools to pass in ssl configs.
> >
> > Thanks,
> > Harsha
> >
> > On Wed, Jun 22, 2016, at 07:40 AM, Gerard Klijs wrote:
> > > To eleborate:
> > > We start the process with --command-config /some/folder/ssl.properties
> > > the
> > > file we include in the image, and contains the ssl properties it needs,
> > > which is a subset of the properties (those specific for ssl) the client
> > > uses. In this case the certificate is accessed in a data container,
> > > having
> > > access to the same certificate as the broker (so we don't need to set
> > > acl's
> > > to use the tool).
> > >
> > > On Wed, Jun 22, 2016 at 2:47 PM Gerard Klijs 
> > > wrote:
> > >
> > > > You need to pass the correct options, similar to how you would do to
> a
> > > > client. We use the consumer-groups in a docker container, in an
> > environment
> > > > witch is now only SSL (since the schema registry now supports it).
> > > >
> > > > On Wed, Jun 22, 2016 at 2:47 PM Radu Radutiu 
> > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> Is is possible to configure the command line tools like
> > > >> kafka-consumer-groups.sh , kafka-topics.sh and all other command
> that
> > are
> > > >> not a consumer or producer to connect to a SSL only kafka cluster ?
> > > >>
> > > >> Regards,
> > > >> Radu
> > > >>
> > > >
> >
>


Expired messages in kafka topic

2016-06-23 Thread Krish
Hi,
I am trying to design a real-time application where message timeout can be
as low as a minute or two (message can get stale real-fast).

In the rare chance that the consumers lag too far behind in processing
messages from the broker, is there a concept of expired message queue in
Kafka?

I would like to know if a message has expired and then park it in some
topic till as such time that a service can dequeue, process it and/or
investigate it.

Thanks.

Best,
Krish


Re: Kafka broker crash

2016-06-23 Thread Radu Radutiu
/tmp is not a good location for storing files. It will get cleaned up
periodically, depending on your linux distribution.

Radu

On 22 June 2016 at 19:33, Misra, Rahul  wrote:

> Hi Madhukar,
>
> Thanks for your quick response. The path is "/tmp/kafka-logs/". But the
> servers have not been restarted any time lately. The uptime for all the 3
> servers is almost 67 days.
>
> Regards,
> Rahul Misra
>
>
> -Original Message-
> From: Madhukar Bharti [mailto:bhartimadhu...@gmail.com]
> Sent: Wednesday, June 22, 2016 8:37 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka broker crash
>
> Hi Rahul,
>
> Whether the path is  "/tmp/kafka-logs/" or "/temp/kafka-logs" ?
>
> Mostly if path is set to "/tmp/" then in case machine restart it may
> delete the files. So it is throwing FileNotFoundException.
> you can change the file location to some other path and restart all broker.
> This might fix the issue.
>
> Regrads,
> Madhukar
>
> On Wed, Jun 22, 2016 at 1:40 PM, Misra, Rahul 
> wrote:
>
> > Hi,
> >
> > I'm facing a strange issue in my Kafka cluster. Could anybody please
> > help me with it. The issue is as follows:
> >
> > We have a 3 node kafka cluster. We installed the zookeeper separately
> > and have pointed the brokers to it. The zookeeper is also 3 node, but
> > for our POC setup, the zookeeper nodes are on the same machines as the
> > Kafka brokers.
> >
> > While receiving messages from an existing topic using a new groupId, 2
> > of the brokers crashed with same FATAL errors:
> >
> > 
> > < [server 2 logs] >>>
> >
> > [2016-06-21 23:09:14,697] INFO [GroupCoordinator 1]: Stabilized group
> > pocTestNew11 generation 1 (kafka.coordinator.Gro
> > upCoordinator)
> > [2016-06-21 23:09:15,006] INFO [GroupCoordinator 1]: Assignment
> > received from leader for group pocTestNew11 for genera tion 1
> > (kafka.coordinator.GroupCoordinator)
> > [2016-06-21 23:09:20,335] FATAL [Replica Manager on Broker 1]: Halting
> > due to unrecoverable I/O error while handling p roduce request:
> > (kafka.server.ReplicaManager)
> > kafka.common.KafkaStorageException: I/O exception in append to log
> > '__consumer_offsets-4'
> > at kafka.log.Log.append(Log.scala:318)
> > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442)
> > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428)
> > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268)
> > at
> > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428)
> > at
> >
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401)
> > at
> >
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386)
> > at
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> > at
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> > at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
> > at
> > scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
> > at
> scala.collection.AbstractTraversable.map(Traversable.scala:104)
> > at
> > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386)
> > at
> > kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322)
> > at
> >
> kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228)
> > at
> >
> kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
> > at
> >
> kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
> > at scala.Option.foreach(Option.scala:257)
> > at
> >
> kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429)
> > at
> > kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280)
> > at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
> > at
> > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.io.FileNotFoundException:
> > /tmp/kafka-logs/__consumer_offsets-4/.index (No
> > such file or directory)
> > at java.io.RandomAccessFile.open0(Native Method)
> > at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
> > at java.io.RandomAccessFile.(RandomAccessFile.java:243)
> > at
> > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
> > at
> > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
> > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> > at 

Re: SSL support for command line tools

2016-06-23 Thread Radu Radutiu
I have read the documentation and I can connect the consumer and producer
successfully with SSL. However I have trouble running other scripts like

bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list
{brokerUrl} —topic {topicName} --time -2

if the broker is configured with SSL only.

Regards,
Radu

On 23 June 2016 at 01:46, Harsha  wrote:

> Radu,
>  Please follow the instructions here
>  http://kafka.apache.org/documentation.html#security_ssl . At
>  the end of the SSL section we've an example for produce and
>  consumer command line tools to pass in ssl configs.
>
> Thanks,
> Harsha
>
> On Wed, Jun 22, 2016, at 07:40 AM, Gerard Klijs wrote:
> > To eleborate:
> > We start the process with --command-config /some/folder/ssl.properties
> > the
> > file we include in the image, and contains the ssl properties it needs,
> > which is a subset of the properties (those specific for ssl) the client
> > uses. In this case the certificate is accessed in a data container,
> > having
> > access to the same certificate as the broker (so we don't need to set
> > acl's
> > to use the tool).
> >
> > On Wed, Jun 22, 2016 at 2:47 PM Gerard Klijs 
> > wrote:
> >
> > > You need to pass the correct options, similar to how you would do to a
> > > client. We use the consumer-groups in a docker container, in an
> environment
> > > witch is now only SSL (since the schema registry now supports it).
> > >
> > > On Wed, Jun 22, 2016 at 2:47 PM Radu Radutiu 
> wrote:
> > >
> > >> Hi,
> > >>
> > >> Is is possible to configure the command line tools like
> > >> kafka-consumer-groups.sh , kafka-topics.sh and all other command that
> are
> > >> not a consumer or producer to connect to a SSL only kafka cluster ?
> > >>
> > >> Regards,
> > >> Radu
> > >>
> > >
>


Re: Leader crash and data loss

2016-06-23 Thread Gerard Klijs
If your producer has acks set to 0, or if the retries is set to 0, in the
properties, it will be lost, else it will most likely be retried and send
to the new leader.

On Thu, Jun 23, 2016 at 2:53 AM Saeed Ansari  wrote:

> Hi,
> I searched a lot for my question and I did not find a good answer may
> someone help me in this group?
>
> When leader broker for a partition fails, ZK elects a new leader and this
> may take seconds. What happens to data published to that broker during
> election?
> How Kafka handles messages to failed broker?
>
>
> Thank you,
> Saeed
>