Re: Producer becomes slow over time

2015-09-23 Thread Prabhjot Bharaj
Hi,

I would like to dig deep into this issue. I've changed log4j.properties for
logging in ALL mode in all places. I am getting lost in the logs.

Any pointers would be welcome

Please let me know if you would need any information regarding this

Thanks,
Prabhjot

On Wed, Sep 23, 2015 at 6:46 PM, Prabhjot Bharaj 
wrote:

> Hello Folks,
>
> I've noticed that 2 producer machines, that I had configured, have become
> very slow over time
> They are giving 17-19 MB/s
>
> But, a producer that I setup today is giving 70MB/s as the write throughput
>
> If I see the contents of bin, libs, config directories, nothing is
> different in the files on any of the producer machines.
>
> Producer is running on the kafka machines itself
>
> Request your expertise
>
> Regards,
> Prabhjot
>
>
>


-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


Re: Log Cleaner Thread Stops

2015-09-23 Thread James Cheng

On Sep 18, 2015, at 10:25 AM, Todd Palino  wrote:

> I think the last major issue with log compaction (that it couldn't handle
> compressed messages) was committed as part of
> https://issues.apache.org/jira/browse/KAFKA-1374 in August, but I'm not
> certain what version this will end up in. It may be part of 0.8.2.2.
> 
> Regardless, you'll probably be OK now. We've found that once we clean this
> issue up once it doesn't appear to recur. As long as you're not writing in
> compressed messages to a log compacted topic (and that won't happen with
> __consumer_offsets, as it's managed by the brokers themselves - it would
> only be if you were using other log compacted topics), you're likely in the
> clear now.
> 

Todd,

If I understand your description of the problem, you are saying that enabling 
log compaction on a topic with compressed messages can (will?) cause the log 
cleaner to crash when it encounters those compressed messages. And the death of 
the cleaner thread will prevent log compaction from running on other topics, 
even ones that don't have compressed messages.

That means if we have a cluster where we want to use log compaction on *any* 
topic, we need to either:
1) apply https://issues.apache.org/jira/browse/KAFKA-1374 (or upgrade to some 
version it is applied)
OR
2) make sure that we don't use compressed messages in *any* topic that has log 
compaction turned on.

And, more specifically, if we want to make use of __consumer_offsets, then we 
cannot use compressed messages in any topic that has compaction turned on.

Is that right?
-James

> -Todd
> 
> 
> On Fri, Sep 18, 2015 at 9:54 AM, John Holland <
> john.holl...@objectpartners.com> wrote:
> 
>> Thanks!
>> 
>> I did what you suggested and it worked except it was necessary for me to
>> remove the cleaner-offset-checkpoint file from the data directory and
>> restart the servers.  The log indicates all is well.
>> 
>> Do you know what version the fix to this will be in? I'm not looking
>> forward to dealing with this on a reoccurring basis.
>> 
>> -John
>> 
>> On Fri, Sep 18, 2015 at 8:48 AM Todd Palino  wrote:
>> 
>>> Yes, this is a known concern, and it should be fixed with recent commits.
>>> In the meantime, you'll have to do a little manual cleanup.
>>> 
>>> The problem you're running into is a corrupt message in the offsets
>> topic.
>>> We've seen this a lot. What you need to do is set the topic configuration
>>> to remove the cleanup.policy config, and set retention.ms and segment.ms
>>> to
>>> something reasonably low. I suggest using a value of 3 or 4 times your
>>> commit interval for consumers. Then wait until the log segments are
>> reaped
>>> (wait twice as long as the retention.ms you chose, to be safe). Once
>> this
>>> is done, you can set the topic configuration back the way it was (remove
>>> segment.ms and retention.ms configs, and set cleanup.policy=compact).
>>> Lastly, you'll need to do a rolling bounce of the cluster to restart the
>>> brokers (which restarts the log cleaner threads). Technically, you only
>>> need to restart brokers where the threads have died, but it's easier to
>>> just restart all of them.
>>> 
>>> Keep in mind that when you do this, you are deleting old offsets. If your
>>> consumers are all live and healthy, this shouldn't be a problem because
>>> they will just continue to commit their offsets properly. But if you have
>>> an offline consumer, you'll lose the committed offsets by doing this.
>>> 
>>> -Todd
>>> 
>>> 
>>> On Fri, Sep 18, 2015 at 5:31 AM, John Holland <
>>> john.holl...@objectpartners.com> wrote:
>>> 
 I've been experiencing this issue across several of our environments
>> ever
 since we enabled the log cleaner for the __consumer_offsets topic.
 
 We are on version 0.8.2.1 of kafka, using the new producer.  All of our
 consumers are set to commit to kafka only.
 
 Below is the stack trace in the log I've encountered across several
 different clusters.  A simple restart of kafka will allow compaction to
 continue on all of the other partitions but the incorrect one will
>> always
 fail.
 
 Here are the values for it from the kafka-topics --describe command:
 
 Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:3
 Configs:segment.bytes=104857600,cleanup.policy=compact
 
 Are there any recommendations on how to prevent this and the best way
>> to
 recover from this exception?  This is causing disk space to fill up
>>> quickly
 on the node.
 
 I did see an open issue that seems very similar to this
 https://issues.apache.org/jira/browse/KAFKA-1641 but this is the
 __consumer_offsets topic which I have not had any part in setting up
>> nor
 producing to.
 
 [2015-09-18 02:57:25,520] INFO Cleaner 0: Beginning cleaning of log
 __consumer_offsets-17. (kafka.log.LogCleaner)
 [2015-09-18 02:57:25,520] INFO Cleaner 0: Building offset map for
 __consumer_offsets-17

Re: Log Cleaner Thread Stops

2015-09-23 Thread Jason Rosenberg
It looks like that fix will not be included in a release until 0.9.0.0.

I'm thinking maybe it makes sense not to switch to kafka storage for
offsets until then?

Jason

On Fri, Sep 18, 2015 at 1:25 PM, Todd Palino  wrote:

> I think the last major issue with log compaction (that it couldn't handle
> compressed messages) was committed as part of
> https://issues.apache.org/jira/browse/KAFKA-1374 in August, but I'm not
> certain what version this will end up in. It may be part of 0.8.2.2.
>
> Regardless, you'll probably be OK now. We've found that once we clean this
> issue up once it doesn't appear to recur. As long as you're not writing in
> compressed messages to a log compacted topic (and that won't happen with
> __consumer_offsets, as it's managed by the brokers themselves - it would
> only be if you were using other log compacted topics), you're likely in the
> clear now.
>
> -Todd
>
>
> On Fri, Sep 18, 2015 at 9:54 AM, John Holland <
> john.holl...@objectpartners.com> wrote:
>
> > Thanks!
> >
> > I did what you suggested and it worked except it was necessary for me to
> > remove the cleaner-offset-checkpoint file from the data directory and
> > restart the servers.  The log indicates all is well.
> >
> > Do you know what version the fix to this will be in? I'm not looking
> > forward to dealing with this on a reoccurring basis.
> >
> > -John
> >
> > On Fri, Sep 18, 2015 at 8:48 AM Todd Palino  wrote:
> >
> > > Yes, this is a known concern, and it should be fixed with recent
> commits.
> > > In the meantime, you'll have to do a little manual cleanup.
> > >
> > > The problem you're running into is a corrupt message in the offsets
> > topic.
> > > We've seen this a lot. What you need to do is set the topic
> configuration
> > > to remove the cleanup.policy config, and set retention.ms and
> segment.ms
> > > to
> > > something reasonably low. I suggest using a value of 3 or 4 times your
> > > commit interval for consumers. Then wait until the log segments are
> > reaped
> > > (wait twice as long as the retention.ms you chose, to be safe). Once
> > this
> > > is done, you can set the topic configuration back the way it was
> (remove
> > > segment.ms and retention.ms configs, and set cleanup.policy=compact).
> > > Lastly, you'll need to do a rolling bounce of the cluster to restart
> the
> > > brokers (which restarts the log cleaner threads). Technically, you only
> > > need to restart brokers where the threads have died, but it's easier to
> > > just restart all of them.
> > >
> > > Keep in mind that when you do this, you are deleting old offsets. If
> your
> > > consumers are all live and healthy, this shouldn't be a problem because
> > > they will just continue to commit their offsets properly. But if you
> have
> > > an offline consumer, you'll lose the committed offsets by doing this.
> > >
> > > -Todd
> > >
> > >
> > > On Fri, Sep 18, 2015 at 5:31 AM, John Holland <
> > > john.holl...@objectpartners.com> wrote:
> > >
> > > > I've been experiencing this issue across several of our environments
> > ever
> > > > since we enabled the log cleaner for the __consumer_offsets topic.
> > > >
> > > > We are on version 0.8.2.1 of kafka, using the new producer.  All of
> our
> > > > consumers are set to commit to kafka only.
> > > >
> > > > Below is the stack trace in the log I've encountered across several
> > > > different clusters.  A simple restart of kafka will allow compaction
> to
> > > > continue on all of the other partitions but the incorrect one will
> > always
> > > > fail.
> > > >
> > > > Here are the values for it from the kafka-topics --describe command:
> > > >
> > > > Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:3
> > > > Configs:segment.bytes=104857600,cleanup.policy=compact
> > > >
> > > > Are there any recommendations on how to prevent this and the best way
> > to
> > > > recover from this exception?  This is causing disk space to fill up
> > > quickly
> > > > on the node.
> > > >
> > > > I did see an open issue that seems very similar to this
> > > > https://issues.apache.org/jira/browse/KAFKA-1641 but this is the
> > > > __consumer_offsets topic which I have not had any part in setting up
> > nor
> > > > producing to.
> > > >
> > > > [2015-09-18 02:57:25,520] INFO Cleaner 0: Beginning cleaning of log
> > > > __consumer_offsets-17. (kafka.log.LogCleaner)
> > > > [2015-09-18 02:57:25,520] INFO Cleaner 0: Building offset map for
> > > > __consumer_offsets-17... (kafka.log.LogCleaner)
> > > > [2015-09-18 02:57:25,609] INFO Cleaner 0: Building offset map for log
> > > > __consumer_offsets-17 for 46 segments in offset range [468079184,
> > > > 528707475). (kafka.log.LogCleaner)
> > > > [2015-09-18 02:57:25,645] ERROR [kafka-log-cleaner-thread-0], Error
> due
> > > to
> > > >  (kafka.log.LogCleaner)
> > > > java.lang.IllegalArgumentException: requirement failed: Last clean
> > offset
> > > > is 468079184 but segment base offset is 0 for log
> > __consumer_offsets-17.
> > > >   

Producer becomes slow over time

2015-09-23 Thread Prabhjot Bharaj
Hello Folks,

I've noticed that 2 producer machines, that I had configured, have become
very slow over time
They are giving 17-19 MB/s

But, a producer that I setup today is giving 70MB/s as the write throughput

If I see the contents of bin, libs, config directories, nothing is
different in the files on any of the producer machines.

Producer is running on the kafka machines itself

Request your expertise

Regards,
Prabhjot


Re: Which perf-test tool?

2015-09-23 Thread Ben Stopford
Both classes work ok. I prefer the Java one simply because has better output 
and it does less overriding of default values.

However, in both cases you probably need to tweak settings to suit your use 
case. Most notably: 
acks
batch.size
linger.ms
based on whether you are interested in latency or throughput. That is usually 
sufficient for producer performance measurement. 

At present nothing is deprecated but there are some changes going in to clean 
these up a little. 

B


> On 23 Sep 2015, at 10:14, Markus Jais  wrote:
> 
> Hello,
> 
> I have a question about performance testing:
> 
> Performance tests for producers can run (using a Java class) with:
> 
> bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
> 
> but there is also:
> ./bin/kafka-producer-perf-test.sh
> This is calling a Scala Class called kafka.tools.ProducerPerformance.
> 
> Which tool is recommend for 0.8.2.1 and newer?
> 
> What are the differences and limitations? I couldn't find anything in the 
> Kafka docs.
> 
> They seem to be similar according to the source code.
> Is one of the tools deprecated?
> 
> Best,
> 
> Markus



kafkaconsumer pause/resume

2015-09-23 Thread john pradeep
Hi,
Just wondering when is 0.9 version of the kafka library releasing? I am
particularly interested in the KafkaConsumer pause/resume version.

Is there any other way to pause consumer without triggering a rebalancing
process in 0.8.x?

Thanks,
John


Re: high level consumer timeout?

2015-09-23 Thread noah
I've found that (especially in tests) I can get a ConsumerIterator, but
that the consumer is not actually connected and settled onto its partitions
yet. Unfortunately, I haven't found a good way to establish that aside from
sending pilot messages through first. smallest is easier :)

On Wed, Sep 23, 2015 at 4:26 AM Joris Peeters 
wrote:

> Looks like you are right, although I am not sure I understand why. :)
>
> The outlined scenario is indeed part of a test suite, which goes through
> the following stages (with some temporal separation between each),
>
> 1/ Create a new topic (with some guid in the name),  with three partitions
> 2/ Launch the three consumers (all part of the same consumer group, which
> is fixed at compile time).
> 3/ Fire the five messages
>
> Adding 'props.put("auto.offset.reset", "smallest");' to my list of
> settings seems to resolve the hanging issue I mentioned.
>
> The line
> >>
> If this is set to largest, the consumer may lose some messages when the
> number of partitions, for the topics it subscribes to, changes on the
> broker. To prevent data loss during partition addition, set
> auto.offset.reset to smallest"
> <<
>
> seems to suggest that problems would only occur if the number of
> partitions changes on the broker, but that is set to (and stays at) 3.
> Whilst I'm quite interested in finding out what is going on here, I'm in
> any case quite thrilled that at least it seems to work now. :) Thanks!
> -J
>
> -Original Message-
> From: noah [mailto:iamn...@gmail.com]
> Sent: 23 September 2015 09:44
> To: users@kafka.apache.org
> Subject: Re: high level consumer timeout?
>
> Assuming this is a test case with a new topic/consumer groups for each
> run, do you have auto.offset.reset=smallest? This happens to me constantly
> in tests because my consumers end up missing the first message since the
> default is largest (in which case auto commit is a red herring.)
>
> On Wed, Sep 23, 2015 at 3:17 AM Joris Peeters  >
> wrote:
>
> > I have a topic with three partitions, to which I send 5 messages (very
> > rapidly after one another) and they get partitioned well ([1,2,2] as
> > it happens).
> > I'm launching three identical high level java customers (kafka 8.2.1),
> > single-threaded, to consume those messages. Once a message is
> > received, the consumers have to do a fair amount of work, which
> > typically takes a few minutes. Right now I'm committing the offset
> > after every message has been completed (so I have auto.commit.enable to
> false *).
> >
> > What I am seeing, with this setup, is that the three consumers each
> > work through their first message and then just hang (in the subsequent
> > "it.hasNext()"). The remaining two messages do not arrive. When, on
> > the other hand, I set auto.commit.enable to true, everything goes well
> > and the three consumers churn through the five messages without trouble.
> >
> > This makes me suspect there is some sort of heartbeat issue going on,
> > that is triggered by the auto.commit? We don't want to auto-commit in
> > this case, though, so does anyone know of a different solution that
> could be tried?
> >
> > My situation sounds somewhat similar to
> > http://grokbase.com/t/kafka/users/14ck70w38q/kafka-consumer-session-ti
> > meouts
> > - but I am not sure if anyone got to the bottom of that.
> >
> > As always, any help would be much appreciated!
> > -Joris.
> >
> >
> > (*) These are the consumer properties I have set explicitly:
> >
> > props.put("auto.commit.interval.ms", "1000");
> > props.put("auto.commit.enable", "false"); // or true.
> > props.put("offsets.storage", "kafka");
> >
> >
> >
> >
> > Winton Capital Management Limited ("Winton") is a limited company
> > registered in England and Wales with its registered offices at 16 Old
> > Bailey, London, EC4M 7EG (Registered Company No. 3311531). Winton is
> > authorised and regulated by the Financial Conduct Authority in the
> > United Kingdom, registered as an investment adviser with the US
> > Securities and Exchange Commission, registered with the US Commodity
> > Futures Trading Commission and a member of the National Futures
> > Association in the United States.
> >
> > This communication, including any attachments, is confidential and may
> > be privileged. This email is for use by the intended recipient only.
> > If you receive it in error, please notify the sender and delete it.
> > You should not copy or disclose all or any part of this email.
> >
> > This email does not constitute an offer or solicitation and nothing
> > contained in this email constitutes, and should not be construed as,
> > investment advice. Prospective investors should request offering
> > materials and consult their own advisers with respect to investment
> > decisions and inform themselves as to applicable legal requirements,
> > exchange control regulations and taxes in the countries of their
> > citizenship, residence or domicile. Past performance is not indi

Duplicate messages when consumers are restarted

2015-09-23 Thread Jananee S
We are using kafka 0.8.2.1 and are noticing a weird behavior when the high
level consumers are restarted.

We have two consumers (C1 and C2, processes on different machines) in a
consumer group. The topic has two partitions. We use kafka as offsets
storage and have auto commit disabled.

Here is the situation

1. C1 is processing a message 'M1'.
2. C1 and C2 are restarted in order before C1 could finish processing the
message and do a commit.
3. C2 and then C1 both get the same message 'M1'.


I came across some old messages in the forum that talked about this. But
the situation there was the messages were compressed. In our case, the
messages are not compressed. Is this a bug in consumer re-balancing?


Thanks,
Jananee


RE: high level consumer timeout?

2015-09-23 Thread Joris Peeters
Looks like you are right, although I am not sure I understand why. :)

The outlined scenario is indeed part of a test suite, which goes through the 
following stages (with some temporal separation between each),

1/ Create a new topic (with some guid in the name),  with three partitions
2/ Launch the three consumers (all part of the same consumer group, which is 
fixed at compile time).
3/ Fire the five messages

Adding 'props.put("auto.offset.reset", "smallest");' to my list of settings 
seems to resolve the hanging issue I mentioned.

The line
>>
If this is set to largest, the consumer may lose some messages when the number 
of partitions, for the topics it subscribes to, changes on the broker. To 
prevent data loss during partition addition, set auto.offset.reset to smallest"
<<

seems to suggest that problems would only occur if the number of partitions 
changes on the broker, but that is set to (and stays at) 3.
Whilst I'm quite interested in finding out what is going on here, I'm in any 
case quite thrilled that at least it seems to work now. :) Thanks!
-J

-Original Message-
From: noah [mailto:iamn...@gmail.com]
Sent: 23 September 2015 09:44
To: users@kafka.apache.org
Subject: Re: high level consumer timeout?

Assuming this is a test case with a new topic/consumer groups for each run, do 
you have auto.offset.reset=smallest? This happens to me constantly in tests 
because my consumers end up missing the first message since the default is 
largest (in which case auto commit is a red herring.)

On Wed, Sep 23, 2015 at 3:17 AM Joris Peeters 
wrote:

> I have a topic with three partitions, to which I send 5 messages (very
> rapidly after one another) and they get partitioned well ([1,2,2] as
> it happens).
> I'm launching three identical high level java customers (kafka 8.2.1),
> single-threaded, to consume those messages. Once a message is
> received, the consumers have to do a fair amount of work, which
> typically takes a few minutes. Right now I'm committing the offset
> after every message has been completed (so I have auto.commit.enable to false 
> *).
>
> What I am seeing, with this setup, is that the three consumers each
> work through their first message and then just hang (in the subsequent
> "it.hasNext()"). The remaining two messages do not arrive. When, on
> the other hand, I set auto.commit.enable to true, everything goes well
> and the three consumers churn through the five messages without trouble.
>
> This makes me suspect there is some sort of heartbeat issue going on,
> that is triggered by the auto.commit? We don't want to auto-commit in
> this case, though, so does anyone know of a different solution that could be 
> tried?
>
> My situation sounds somewhat similar to
> http://grokbase.com/t/kafka/users/14ck70w38q/kafka-consumer-session-ti
> meouts
> - but I am not sure if anyone got to the bottom of that.
>
> As always, any help would be much appreciated!
> -Joris.
>
>
> (*) These are the consumer properties I have set explicitly:
>
> props.put("auto.commit.interval.ms", "1000");
> props.put("auto.commit.enable", "false"); // or true.
> props.put("offsets.storage", "kafka");
>
>
>
>
> Winton Capital Management Limited ("Winton") is a limited company
> registered in England and Wales with its registered offices at 16 Old
> Bailey, London, EC4M 7EG (Registered Company No. 3311531). Winton is
> authorised and regulated by the Financial Conduct Authority in the
> United Kingdom, registered as an investment adviser with the US
> Securities and Exchange Commission, registered with the US Commodity
> Futures Trading Commission and a member of the National Futures
> Association in the United States.
>
> This communication, including any attachments, is confidential and may
> be privileged. This email is for use by the intended recipient only.
> If you receive it in error, please notify the sender and delete it.
> You should not copy or disclose all or any part of this email.
>
> This email does not constitute an offer or solicitation and nothing
> contained in this email constitutes, and should not be construed as,
> investment advice. Prospective investors should request offering
> materials and consult their own advisers with respect to investment
> decisions and inform themselves as to applicable legal requirements,
> exchange control regulations and taxes in the countries of their
> citizenship, residence or domicile. Past performance is not indicative of 
> future results.
>
> Winton takes reasonable steps to ensure the accuracy and integrity of
> its communications, including emails. However Winton accepts no
> liability for any materials transmitted. Emails are not secure and
> cannot be guaranteed to be error free.
>


Winton Capital Management Limited (“Winton”) is a limited company registered in 
England and Wales with its registered offices at 16 Old Bailey, London, EC4M 
7EG (Registered Company No. 3311531). Winton is authorised and regulated by t

Which perf-test tool?

2015-09-23 Thread Markus Jais

Hello,

I have a question about performance testing:

Performance tests for producers can run (using a Java class) with:

bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance

but there is also:
./bin/kafka-producer-perf-test.sh
This is calling a Scala Class called kafka.tools.ProducerPerformance.

Which tool is recommend for 0.8.2.1 and newer?

What are the differences and limitations? I couldn't find anything in 
the Kafka docs.


They seem to be similar according to the source code.
Is one of the tools deprecated?

Best,

Markus


Re: high level consumer timeout?

2015-09-23 Thread noah
Assuming this is a test case with a new topic/consumer groups for each run,
do you have auto.offset.reset=smallest? This happens to me constantly in
tests because my consumers end up missing the first message since the
default is largest (in which case auto commit is a red herring.)

On Wed, Sep 23, 2015 at 3:17 AM Joris Peeters 
wrote:

> I have a topic with three partitions, to which I send 5 messages (very
> rapidly after one another) and they get partitioned well ([1,2,2] as it
> happens).
> I'm launching three identical high level java customers (kafka 8.2.1),
> single-threaded, to consume those messages. Once a message is received, the
> consumers have to do a fair amount of work, which typically takes a few
> minutes. Right now I'm committing the offset after every message has been
> completed (so I have auto.commit.enable to false *).
>
> What I am seeing, with this setup, is that the three consumers each work
> through their first message and then just hang (in the subsequent
> "it.hasNext()"). The remaining two messages do not arrive. When, on the
> other hand, I set auto.commit.enable to true, everything goes well and the
> three consumers churn through the five messages without trouble.
>
> This makes me suspect there is some sort of heartbeat issue going on, that
> is triggered by the auto.commit? We don't want to auto-commit in this case,
> though, so does anyone know of a different solution that could be tried?
>
> My situation sounds somewhat similar to
> http://grokbase.com/t/kafka/users/14ck70w38q/kafka-consumer-session-timeouts
> - but I am not sure if anyone got to the bottom of that.
>
> As always, any help would be much appreciated!
> -Joris.
>
>
> (*) These are the consumer properties I have set explicitly:
>
> props.put("auto.commit.interval.ms", "1000");
> props.put("auto.commit.enable", "false"); // or true.
> props.put("offsets.storage", "kafka");
>
>
>
>
> Winton Capital Management Limited ("Winton") is a limited company
> registered in England and Wales with its registered offices at 16 Old
> Bailey, London, EC4M 7EG (Registered Company No. 3311531). Winton is
> authorised and regulated by the Financial Conduct Authority in the United
> Kingdom, registered as an investment adviser with the US Securities and
> Exchange Commission, registered with the US Commodity Futures Trading
> Commission and a member of the National Futures Association in the United
> States.
>
> This communication, including any attachments, is confidential and may be
> privileged. This email is for use by the intended recipient only. If you
> receive it in error, please notify the sender and delete it. You should not
> copy or disclose all or any part of this email.
>
> This email does not constitute an offer or solicitation and nothing
> contained in this email constitutes, and should not be construed as,
> investment advice. Prospective investors should request offering materials
> and consult their own advisers with respect to investment decisions and
> inform themselves as to applicable legal requirements, exchange control
> regulations and taxes in the countries of their citizenship, residence or
> domicile. Past performance is not indicative of future results.
>
> Winton takes reasonable steps to ensure the accuracy and integrity of its
> communications, including emails. However Winton accepts no liability for
> any materials transmitted. Emails are not secure and cannot be guaranteed
> to be error free.
>


high level consumer timeout?

2015-09-23 Thread Joris Peeters
I have a topic with three partitions, to which I send 5 messages (very rapidly 
after one another) and they get partitioned well ([1,2,2] as it happens).
I'm launching three identical high level java customers (kafka 8.2.1), 
single-threaded, to consume those messages. Once a message is received, the 
consumers have to do a fair amount of work, which typically takes a few 
minutes. Right now I'm committing the offset after every message has been 
completed (so I have auto.commit.enable to false *).

What I am seeing, with this setup, is that the three consumers each work 
through their first message and then just hang (in the subsequent 
"it.hasNext()"). The remaining two messages do not arrive. When, on the other 
hand, I set auto.commit.enable to true, everything goes well and the three 
consumers churn through the five messages without trouble.

This makes me suspect there is some sort of heartbeat issue going on, that is 
triggered by the auto.commit? We don't want to auto-commit in this case, 
though, so does anyone know of a different solution that could be tried?

My situation sounds somewhat similar to 
http://grokbase.com/t/kafka/users/14ck70w38q/kafka-consumer-session-timeouts - 
but I am not sure if anyone got to the bottom of that.

As always, any help would be much appreciated!
-Joris.


(*) These are the consumer properties I have set explicitly:

props.put("auto.commit.interval.ms", "1000");
props.put("auto.commit.enable", "false"); // or true.
props.put("offsets.storage", "kafka");




Winton Capital Management Limited ("Winton") is a limited company registered in 
England and Wales with its registered offices at 16 Old Bailey, London, EC4M 
7EG (Registered Company No. 3311531). Winton is authorised and regulated by the 
Financial Conduct Authority in the United Kingdom, registered as an investment 
adviser with the US Securities and Exchange Commission, registered with the US 
Commodity Futures Trading Commission and a member of the National Futures 
Association in the United States.

This communication, including any attachments, is confidential and may be 
privileged. This email is for use by the intended recipient only. If you 
receive it in error, please notify the sender and delete it. You should not 
copy or disclose all or any part of this email.

This email does not constitute an offer or solicitation and nothing contained 
in this email constitutes, and should not be construed as, investment advice. 
Prospective investors should request offering materials and consult their own 
advisers with respect to investment decisions and inform themselves as to 
applicable legal requirements, exchange control regulations and taxes in the 
countries of their citizenship, residence or domicile. Past performance is not 
indicative of future results.

Winton takes reasonable steps to ensure the accuracy and integrity of its 
communications, including emails. However Winton accepts no liability for any 
materials transmitted. Emails are not secure and cannot be guaranteed to be 
error free.


Fwd: MirrorMaker kafka.message.InvalidMessageException

2015-09-23 Thread Jörg Wagner

Cheers!

Nobody has an Idea what to do with InvalidMessageExceptions?

Jörg


 Forwarded Message 
Subject:MirrorMaker kafka.message.InvalidMessageException
Date:   Fri, 18 Sep 2015 11:37:20 +0200
From:   Jörg Wagner 
Reply-To:   users@kafka.apache.org
To: users@kafka.apache.org



Hey everyone!

One of my Mirrormakers is exiting with the following error:

[2015-09-18 11:27:35,591] FATAL [mirrormaker-consumer-0] Stream
unexpectedly exited. (kafka.tools.MirrorMaker$ConsumerThread)
kafka.message.InvalidMessageException: Message is corrupt (stored crc =
3256823012, computed crc = 4230971415)
at kafka.message.Message.ensureValid(Message.scala:166)
at
kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:101)
at
kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
at
kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32)
at
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at kafka.consumer.KafkaStream.foreach(KafkaStream.scala:25)
at
kafka.tools.MirrorMaker$ConsumerThread.run(MirrorMaker.scala:259)

The same topic has been mirrored to another DC without a problem.

A few questions arise:

 * Why could the other DC mirror the topic successfully?
 * Why is the MM not failing for the message, but carrying on?
 * What can I do now to carry on consuming with this MM?
 * How can I automate the answers to the above?

My MM currently also does not want to write a log, although I passed on:
KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:${KAFKA_CONFIG}/mirror/log4j.properties"
Any clues why I don't get any log output? (I got the error from running
MM interactive)

Thanks

Jörg