Dealing with large messages

2015-10-05 Thread Pradeep Gollakota
Fellow Kafkaers,

We have a pretty heavyweight legacy event logging system for batch
processing. We're now sending the events into Kafka now for realtime
analytics. But we have some pretty large messages (> 40 MB).

I'm wondering if any of you have use cases where you have to send large
messages to Kafka and how you're dealing with them.

Thanks,
Pradeep


Re: Dealing with large messages

2015-10-05 Thread Rahul Jain
In addition to the config changes mentioned in that post, you may also have
to change producer config if you are using the new producer.

Specifically, *max.request.size* and *request.timeout.ms
* have to be increased to allow the producer to
send large messages.


On 6 Oct 2015 02:02, "James Cheng"  wrote:

> Here’s an article that Gwen wrote earlier this year on handling large
> messages in Kafka.
>
> http://ingest.tips/2015/01/21/handling-large-messages-kafka/
>
> -James
>
> > On Oct 5, 2015, at 11:20 AM, Pradeep Gollakota 
> wrote:
> >
> > Fellow Kafkaers,
> >
> > We have a pretty heavyweight legacy event logging system for batch
> > processing. We're now sending the events into Kafka now for realtime
> > analytics. But we have some pretty large messages (> 40 MB).
> >
> > I'm wondering if any of you have use cases where you have to send large
> > messages to Kafka and how you're dealing with them.
> >
> > Thanks,
> > Pradeep
>
>
> 
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>


Re: Dealing with large messages

2015-10-05 Thread James Cheng
Here’s an article that Gwen wrote earlier this year on handling large messages 
in Kafka.

http://ingest.tips/2015/01/21/handling-large-messages-kafka/

-James

> On Oct 5, 2015, at 11:20 AM, Pradeep Gollakota  wrote:
>
> Fellow Kafkaers,
>
> We have a pretty heavyweight legacy event logging system for batch
> processing. We're now sending the events into Kafka now for realtime
> analytics. But we have some pretty large messages (> 40 MB).
>
> I'm wondering if any of you have use cases where you have to send large
> messages to Kafka and how you're dealing with them.
>
> Thanks,
> Pradeep




This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.


Re: Experiences with corrupted messages

2015-10-05 Thread Alexey Sverdelov
Hi Marina,

this is how I "fixed" this problem:
http://stackoverflow.com/questions/32904383/apache-kafka-with-high-level-consumer-skip-corrupted-messages/32945841

This is a workaround and I hope it will be fixed in some of next Kafka
releases.

Have a nice day,
Alexey

On Fri, Oct 2, 2015 at 2:57 PM, Marina  wrote:

> Hi, Lance,I'm very interested in your analyses of handling corrupt
> messages in High-level consumer as well.
> We also experienced some un-explained "deaths" of some high-level
> consumers. Very rarely though. We could not figure out why they died yet.
> Now I wonder if this could be due to such corrupted messagesWhen you
> say "Your only other recourse is to iterate past the problem offset" - what
> exactly do you mean?
> 1) do you mean by manually updating current offset in Zookeeper (if ZK
> storage is used)? what if the new Kafka-based storage is used?
> 2) or do you mean to skip this message when iterating over events in the
> consumer code - when reading Kafka's stream?
> ConsumerIterator iter = kafkaStream.iterator();
> while (iter.hasNext()) {
> --- skip bad message here somehow?
>  }
> I would think that if you can get message in the while{} - you are already
> past the point at which Consumer dies if the message is corrupt is it
> not the case?
> thanks!MArina
> [sorry, I did not mean to high-jack the thread - but I think it is
> important to understand how to skip corrupted messages for both use cases
> ]
>
>   From: Lance Laursen 
>  To: users@kafka.apache.org
>  Sent: Thursday, October 1, 2015 4:49 PM
>  Subject: Re: Experiences with corrupted messages
>
> Hey Jörg,
>
> Unfortunately when the high level consumer hits a corrupt message, it
> enters an invalid state and closes. The only way around this is to iterate
> your offset by 1 in order to skip the corrupt message. This is currently
> not automated. You can catch this exception if you are using the simple
> consumer client, but unfortunately mirrormaker uses the high level client.
>
> There have been some corrupt producer message bugs related to using snappy
> compression recently, but this does not seem to be the same as your
> problem.
>
> Does MM stop on the exact same message each time (
>
> https://cwiki.apache.org/confluence/display/KAFKA/System+Tools#SystemTools-ConsumerOffsetChecker
> )? I would suggest triple checking that your configurations are the same
> across all DC's (you mentioned that MM mirrors successfully to another DC
> with no problem), as well as examine the problem message to see if you can
> find anything different about it when compared to the others (See:
>
> https://cwiki.apache.org/confluence/display/KAFKA/System+Tools#SystemTools-SimpleConsumerShell
> ). Your only other recourse is to iterate past the problem offset.
>
>
>
> On Thu, Oct 1, 2015 at 1:22 AM, Jörg Wagner 
> wrote:
>
> > Hey everyone,
> >
> > I've been having some issues with corrupted messages and mirrormaker as I
> > wrote previously. Since there was no feedback, I want to ask a new
> question:
> >
> > Did you ever have corrupted messages in kafka? Did things break? How did
> > you recover or work around that?
> >
> > Thanks
> > Jörg
> >
>
>
>


Re: Offset rollover/overflow?

2015-10-05 Thread Grant Henke
I can't be sure of how every client will handle it, it is probably not
likely, and there could potentially be unforeseen issues.

That said, given that offsets are stored in a (signed) Long. I would
suspect that it would rollover to negative values and increment from there.
That means instead of 9,223,372,036,854,775,807 potential offset values,
you actually have 18,446,744,073,709,551,614 potential values. To put that
into perspective if we assign 1 byte to each offset thats just over 18
Exabytes.

You will likely run into many more issues other than offset rollover,
before you are able to retain 18 Exabytes in single Kafka topic. (And if
not, I would evaluate breaking up your topic into multiple smaller ones).

Thanks,
Grant


On Sat, Oct 3, 2015 at 8:58 PM, Li Tao  wrote:

> It will never happan.
>
> On Thu, Oct 1, 2015 at 4:22 AM, Chad Lung  wrote:
>
> > I seen a previous question (http://search-hadoop.com/m/uyzND1lrGUW1PgKGG
> )
> > on offset rollovers but it doesn't look like it was ever answered.
> >
> > Does anyone one know what happens when an offset max limit is reached?
> > Overflow, or something else?
> >
> > Thanks,
> >
> > Chad
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


How to verify offsets topic exists?

2015-10-05 Thread Stevo Slavić
Hello Apache Kafka community,

In my integration tests, with single 0.8.2.2 broker, for newly created
topic with single partition, after determining through topic metadata
request that partition has lead broker assigned, when I try to reset offset
for given consumer group, I first try to discover offset coordinator and
that lookup is throwing ConsumerCoordinatorNotAvailableException

On
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommit/FetchAPI
it is documented that broker returns ConsumerCoordinatorNotAvailableCode
for consumer metadata requests or offset commit requests if the offsets
topic has not yet been created.

I wonder if this is really the case, that the offsets topic has not been
created. Any tips how to ensure/verify that offsets topic exists?

Kind regards,

Stevo Slavic.


Re: How to verify offsets topic exists?

2015-10-05 Thread Grant Henke
Hi Stevo,

There are a couple of options to verify the topic exists:

   1. Consume from a topic with "offsets.storage=kafka". If its not created
   already, this should create it.
   2. List and describe the topic using the Kafka topics script. Ex:

bin/kafka-topics.sh --zookeeper localhost:2181 --list

bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic
__consumer_offsets


   1. Check the ZNode exists in Zookeeper. Ex:

bin/zookeeper-shell.sh localhost:2181
ls /brokers/topics/__consumer_offsets

get /brokers/topics/__consumer_offsets


Thanks,
Grant

On Mon, Oct 5, 2015 at 10:44 AM, Stevo Slavić  wrote:

> Hello Apache Kafka community,
>
> In my integration tests, with single 0.8.2.2 broker, for newly created
> topic with single partition, after determining through topic metadata
> request that partition has lead broker assigned, when I try to reset offset
> for given consumer group, I first try to discover offset coordinator and
> that lookup is throwing ConsumerCoordinatorNotAvailableException
>
> On
>
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommit/FetchAPI
> it is documented that broker returns ConsumerCoordinatorNotAvailableCode
> for consumer metadata requests or offset commit requests if the offsets
> topic has not yet been created.
>
> I wonder if this is really the case, that the offsets topic has not been
> created. Any tips how to ensure/verify that offsets topic exists?
>
> Kind regards,
>
> Stevo Slavic.
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke