Clean shutdown but corrupted data on reboot?

2016-12-22 Thread Stephane Maarek
Hi,

I’m shutting down kafka properly see down the bottom.
But on reboot, I sometimes get the following:


[2016-12-23 07:45:26,544] INFO Loading logs. (kafka.log.LogManager)

[2016-12-23 07:45:26,609] WARN Found a corrupted index file due to
requirement failed: Corrupt index found, index file
(/mnt/kafka-logs/__consumer_offsets-36/.index) has
non-zero size but the last offset is 0 which is no larger than the base
offset 0.}. deleting
/mnt/kafka-logs/__consumer_offsets-36/.timeindex,
/mnt/kafka-logs/__consumer_offsets-36/.index and
rebuilding index... (kafka.log.Log)

[2016-12-23 07:45:26,617] INFO Recovering unflushed segment 0 in log
__consumer_offsets-36. (kafka.log.Log)

[2016-12-23 07:45:26,621] INFO Completed load of log __consumer_offsets-36
with 1 log segments and log end offset 0 in 28 ms (kafka.log.Log)

[2016-12-23 07:45:26,645] WARN Found a corrupted index file due to
requirement failed: Corrupt index found, index file
(/mnt/kafka-logs/dsresolvedbet-5/00590991.index) has non-zero
size but the last offset is 590991 which is no larger than the base offset
590991.}. deleting
/mnt/kafka-logs/dsresolvedbet-5/00590991.timeindex,
/mnt/kafka-logs/dsresolvedbet-5/00590991.index and rebuilding
index... (kafka.log.Log)

[2016-12-23 07:45:31,683] INFO Recovering unflushed segment 590991 in log
dsresolvedbet-5. (kafka.log.Log)

[2016-12-23 07:45:31,930] INFO Completed load of log dsresolvedbet-5 with 6
log segments and log end offset 650751 in 5301 ms (kafka.log.Log)



I’m a little bit puzzled by those, do you have any insights into what may
be the cause?




Thanks!

Stephane



Shutdown log:

[2016-12-23 07:44:31,731] INFO [Kafka Server 7], Controlled shutdown
succeeded (kafka.server.KafkaServer)

[2016-12-23 07:44:31,737] INFO [Socket Server on Broker 7], Shutting down
(kafka.network.SocketServer)

[2016-12-23 07:44:31,744] INFO [Socket Server on Broker 7], Shutdown
completed (kafka.network.SocketServer)

[2016-12-23 07:44:31,749] INFO [Kafka Request Handler on Broker 7],
shutting down (kafka.server.KafkaRequestHandlerPool)

[2016-12-23 07:44:31,883] INFO [Kafka Request Handler on Broker 7], shut
down completely (kafka.server.KafkaRequestHandlerPool)

[2016-12-23 07:44:31,885] INFO [ThrottledRequestReaper-Fetch], Shutting
down (kafka.server.ClientQuotaManager$ThrottledRequestReaper)

[2016-12-23 07:44:32,429] INFO [ThrottledRequestReaper-Fetch], Stopped
(kafka.server.ClientQuotaManager$ThrottledRequestReaper)

[2016-12-23 07:44:32,430] INFO [ThrottledRequestReaper-Fetch], Shutdown
completed (kafka.server.ClientQuotaManager$ThrottledRequestReaper)

[2016-12-23 07:44:32,430] INFO [ThrottledRequestReaper-Produce], Shutting
down (kafka.server.ClientQuotaManager$ThrottledRequestReaper)

[2016-12-23 07:44:33,429] INFO [ThrottledRequestReaper-Produce], Stopped
(kafka.server.ClientQuotaManager$ThrottledRequestReaper)

[2016-12-23 07:44:33,429] INFO [ThrottledRequestReaper-Produce], Shutdown
completed (kafka.server.ClientQuotaManager$ThrottledRequestReaper)

[2016-12-23 07:44:33,430] INFO [KafkaApi-7] Shutdown complete.
(kafka.server.KafkaApis)

[2016-12-23 07:44:33,461] INFO [Replica Manager on Broker 7]: Shutting down
(kafka.server.ReplicaManager)

[2016-12-23 07:44:33,461] INFO [ReplicaFetcherManager on broker 7] shutting
down (kafka.server.ReplicaFetcherManager)

[2016-12-23 07:44:33,462] INFO [ReplicaFetcherThread-0-8], Shutting down
(kafka.server.ReplicaFetcherThread)

[2016-12-23 07:44:33,868] INFO [ReplicaFetcherThread-0-8], Stopped
(kafka.server.ReplicaFetcherThread)

[2016-12-23 07:44:33,869] INFO [ReplicaFetcherThread-0-8], Shutdown
completed (kafka.server.ReplicaFetcherThread)

[2016-12-23 07:44:33,872] INFO [ReplicaFetcherThread-0-9], Shutting down
(kafka.server.ReplicaFetcherThread)

[2016-12-23 07:44:34,305] INFO [ReplicaFetcherThread-0-9], Stopped
(kafka.server.ReplicaFetcherThread)

[2016-12-23 07:44:34,305] INFO [ReplicaFetcherThread-0-9], Shutdown
completed (kafka.server.ReplicaFetcherThread)

[2016-12-23 07:44:34,309] INFO [ReplicaFetcherManager on broker 7] shutdown
completed (kafka.server.ReplicaFetcherManager)

[2016-12-23 07:44:34,309] INFO [ExpirationReaper-7], Shutting down
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)

[2016-12-23 07:44:34,498] INFO [ExpirationReaper-7], Stopped
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)

[2016-12-23 07:44:34,498] INFO [ExpirationReaper-7], Shutdown completed
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)

[2016-12-23 07:44:34,498] INFO [ExpirationReaper-7], Shutting down
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)

[2016-12-23 07:44:34,508] INFO [ExpirationReaper-7], Stopped
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)

[2016-12-23 07:44:34,508] INFO [ExpirationReaper-7], Shutdown completed
(kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)


Kafa 0.9.0.1 producer warning in log UNKNOWN_TOPIC_OR_PARTITION

2016-12-22 Thread em vee
Hi ,

I am using Kafka 0.9.0.1 client . If a message is sent to a a topic and
then if the topic is deleted  then you see the following entries in the log
once a subsequent  message is sent to any other topic.


===
WARN  org.apache.kafka.clients.NetworkClient {} - Error while fetching
metadata with correlation id 2181 : {t12425 =UNKNOWN_TOPIC_OR_PARTITION
=

 There is an issue opened for this already and looks like it has been fixed
in kafka-client 0.10.1.0
https://issues.apache.org/jira/browse/KAFKA-2948

But the Kafka client does not seem to be compatible with 0.9. brokers.
Does there any workaround or solution exist to prevent his log message in
the Kafka client 0.9  series.


Thanks,
emvee.


can kafka 10 stream API read the topic from a Kafka 9 cluster?

2016-12-22 Thread Joanne Contact
Hello I have a program which requires 0.10.1.0 streams API. The jar is
packaged by maven with all dependencies. I tried to consume a Kafka
topic spit from a Kafka 9 cluster.

It has such error:
 org.apache.kafka.common.protocol.types.SchemaException: Error reading
field 'topic_metadata': Error reading array of size 1768180577, only
167 bytes available

I wonder if there is any work around?

Thanks,

J


suscribe

2016-12-22 Thread em vee



Re: Kafka won't replicate from a specific broker

2016-12-22 Thread Jan Omar
Unfortunately I think you hit this bug:

https://issues.apache.org/jira/browse/KAFKA-4477 


The only option I know of is to reboot the affected broker. And upgrade to 
0.10.1.1 as quickly as possible. We haven't seen this issue on 0.10.1.1.RC0.

Regards

Jan


> On 22 Dec 2016, at 18:16, Ismael Juma  wrote:
> 
> Hi Valentin,
> 
> Is inter.broker.protocol.version set correctly in brokers 1 and 2? It
> should be 0.10.0 so that they can talk to the older broker without issue.
> 
> Ismael
> 
> On Thu, Dec 22, 2016 at 4:42 PM, Valentin Golev 
> wrote:
> 
>> Hello,
>> 
>> I have a three broker Kafka setup (the ids are 1, 2 (kafka 0.10.1.0) and
>> 1001 (kafka 0.10.0.0)). After a failure of two of them a lot of the
>> partitions have the third one (1001) as their leader. It's like this:
>> 
>> Topic: userevents0.open Partition: 5Leader: 1   Replicas:
>> 1,2,1001  Isr: 1,1001,2
>>Topic: userevents0.open Partition: 6Leader: 2   Replicas:
>> 2,1,1001  Isr: 1,2,1001
>>Topic: userevents0.open Partition: 7Leader: 1001Replicas:
>> 1001,2,1  Isr: 1001
>>Topic: userevents0.open Partition: 8Leader: 1   Replicas:
>> 1,1001,2  Isr: 1,1001,2
>>Topic: userevents0.open Partition: 9Leader: 1001Replicas:
>> 2,1001,1  Isr: 1001
>>Topic: userevents0.open Partition: 10   Leader: 1001Replicas:
>> 1001,1,2  Isr: 1001
>> 
>> As you can see, only the partitions with Leaders 1 or 2 have successfully
>> replicated. Brokers 1 and 2, however, are unable to fetch data from the
>> 1001.
>> 
>> All of the partitions are available to the consumers and producers. So
>> everything is fine except replication. 1001 is available from the other
>> servers.
>> 
>> I can't restart the broker 1001 because it seems that it will cause data
>> loss (as you can see, it's the only ISR on many partitions). Restarting the
>> other brokers didn't help at all. Neither did just plain waiting (it's the
>> third day of this going on). So what do I do?
>> 
>> The logs of the broker 2 (the one which tries to fetch data) are full of
>> this:
>> 
>> [2016-12-22 16:38:52,199] WARN [ReplicaFetcherThread-0-1001], Error in
>> fetch kafka.server.ReplicaFetcherThread$FetchRequest@117a49bf
>> (kafka.server.ReplicaFetcherThread)
>> java.io.IOException: Connection to 1001 was disconnected before the
>> response was read
>>at
>> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
>> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
>>at
>> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
>> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
>>at scala.Option.foreach(Option.scala:257)
>>at
>> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
>> extension$1.apply(NetworkClientBlockingOps.scala:112)
>>at
>> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
>> extension$1.apply(NetworkClientBlockingOps.scala:108)
>>at
>> kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(
>> NetworkClientBlockingOps.scala:137)
>>at
>> kafka.utils.NetworkClientBlockingOps$.kafka$utils$
>> NetworkClientBlockingOps$$pollContinuously$extension(
>> NetworkClientBlockingOps.scala:143)
>>at
>> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(
>> NetworkClientBlockingOps.scala:108)
>>at
>> kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:
>> 253)
>>at
>> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
>>at
>> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
>>at
>> kafka.server.AbstractFetcherThread.processFetchRequest(
>> AbstractFetcherThread.scala:118)
>>at
>> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
>>at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>> 
>> The logs of the broker 1001 are full of this:
>> 
>> [2016-12-22 16:38:54,226] ERROR Processor got uncaught exception.
>> (kafka.network.Processor)
>> java.nio.BufferUnderflowException
>>at java.nio.Buffer.nextGetIndex(Buffer.java:506)
>>at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:361)
>>at
>> kafka.api.FetchRequest$$anonfun$1$$anonfun$apply$1.
>> apply(FetchRequest.scala:55)
>>at
>> kafka.api.FetchRequest$$anonfun$1$$anonfun$apply$1.
>> apply(FetchRequest.scala:52)
>>at
>> scala.collection.TraversableLike$$anonfun$map$
>> 1.apply(TraversableLike.scala:234)
>>at
>> scala.collection.TraversableLike$$anonfun$map$
>> 1.apply(TraversableLike.scala:234)
>>at scala.collection.immutable.Range.foreach(Range.scala:160)
>>at
>> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>>at 

Re: Memory / resource leak in 0.10.1.1 release

2016-12-22 Thread Jon Yeargers
Yes - that's the one. It's 100% reproducible (for me).


On Thu, Dec 22, 2016 at 8:03 AM, Damian Guy  wrote:

> Hi Jon,
>
> Is this for the topology where you are doing something like:
>
> topology: kStream -> groupByKey.aggregate(minute) -> foreach
>  \-> groupByKey.aggregate(hour) -> foreach
>
> I'm trying to understand how i could reproduce your problem. I've not seen
> any such issues with 0.10.1.1, but then i'm not sure what you are doing.
>
> Thanks,
> Damian
>
> On Thu, 22 Dec 2016 at 15:26 Jon Yeargers 
> wrote:
>
> > Im still hitting this leak with the released version of 0.10.1.1.
> >
> > Process mem % grows over the course of 10-20 minutes and eventually the
> OS
> > kills it.
> >
> > Messages like this appear in /var/log/messages:
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked
> > oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java cpuset=/
> > mems_allowed=0
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID:
> 9550
> > Comm: java Tainted: GE   4.4.19-29.55.amzn1.x86_64 #1
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware name:
> > Xen HVM domU, BIOS 4.2.amazon 11/11/2016
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >  88071c517a70 812c958f 88071c517c58
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> >  88071c517b00 811ce76d 8109db14
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > 810b2d91  0010 817d0fe9
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace:
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] dump_stack+0x63/0x84
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] dump_header+0x5e/0x1d8
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] ? set_next_entity+0xa4/0x710
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] ? __raw_callee_save___pv_queued_
> spin_unlock+0x11/0x20
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] oom_kill_process+0x205/0x3d0
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] out_of_memory+0x431/0x480
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] __alloc_pages_nodemask+0x91e/0xa60
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] alloc_pages_current+0x88/0x120
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] __page_cache_alloc+0xb4/0xc0
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] filemap_fault+0x188/0x3e0
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] ext4_filemap_fault+0x36/0x50 [ext4]
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] __do_fault+0x3d/0x70
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] handle_mm_fault+0xf27/0x1870
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] ? __raw_callee_save___pv_queued_
> spin_unlock+0x11/0x20
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] __do_page_fault+0x183/0x3f0
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] do_page_fault+0x22/0x30
> >
> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> > [] page_fault+0x28/0x30
> >
>


Re: Kafka won't replicate from a specific broker

2016-12-22 Thread Ismael Juma
Hi Valentin,

Is inter.broker.protocol.version set correctly in brokers 1 and 2? It
should be 0.10.0 so that they can talk to the older broker without issue.

Ismael

On Thu, Dec 22, 2016 at 4:42 PM, Valentin Golev 
wrote:

> Hello,
>
> I have a three broker Kafka setup (the ids are 1, 2 (kafka 0.10.1.0) and
> 1001 (kafka 0.10.0.0)). After a failure of two of them a lot of the
> partitions have the third one (1001) as their leader. It's like this:
>
>  Topic: userevents0.open Partition: 5Leader: 1   Replicas:
> 1,2,1001  Isr: 1,1001,2
> Topic: userevents0.open Partition: 6Leader: 2   Replicas:
> 2,1,1001  Isr: 1,2,1001
> Topic: userevents0.open Partition: 7Leader: 1001Replicas:
> 1001,2,1  Isr: 1001
> Topic: userevents0.open Partition: 8Leader: 1   Replicas:
> 1,1001,2  Isr: 1,1001,2
> Topic: userevents0.open Partition: 9Leader: 1001Replicas:
> 2,1001,1  Isr: 1001
> Topic: userevents0.open Partition: 10   Leader: 1001Replicas:
> 1001,1,2  Isr: 1001
>
> As you can see, only the partitions with Leaders 1 or 2 have successfully
> replicated. Brokers 1 and 2, however, are unable to fetch data from the
> 1001.
>
> All of the partitions are available to the consumers and producers. So
> everything is fine except replication. 1001 is available from the other
> servers.
>
> I can't restart the broker 1001 because it seems that it will cause data
> loss (as you can see, it's the only ISR on many partitions). Restarting the
> other brokers didn't help at all. Neither did just plain waiting (it's the
> third day of this going on). So what do I do?
>
> The logs of the broker 2 (the one which tries to fetch data) are full of
> this:
>
> [2016-12-22 16:38:52,199] WARN [ReplicaFetcherThread-0-1001], Error in
> fetch kafka.server.ReplicaFetcherThread$FetchRequest@117a49bf
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 1001 was disconnected before the
> response was read
> at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
> at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
> at scala.Option.foreach(Option.scala:257)
> at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:112)
> at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:108)
> at
> kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(
> NetworkClientBlockingOps.scala:137)
> at
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$
> NetworkClientBlockingOps$$pollContinuously$extension(
> NetworkClientBlockingOps.scala:143)
> at
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(
> NetworkClientBlockingOps.scala:108)
> at
> kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:
> 253)
> at
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
> at
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> at
> kafka.server.AbstractFetcherThread.processFetchRequest(
> AbstractFetcherThread.scala:118)
> at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>
> The logs of the broker 1001 are full of this:
>
> [2016-12-22 16:38:54,226] ERROR Processor got uncaught exception.
> (kafka.network.Processor)
> java.nio.BufferUnderflowException
> at java.nio.Buffer.nextGetIndex(Buffer.java:506)
> at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:361)
> at
> kafka.api.FetchRequest$$anonfun$1$$anonfun$apply$1.
> apply(FetchRequest.scala:55)
> at
> kafka.api.FetchRequest$$anonfun$1$$anonfun$apply$1.
> apply(FetchRequest.scala:52)
> at
> scala.collection.TraversableLike$$anonfun$map$
> 1.apply(TraversableLike.scala:234)
> at
> scala.collection.TraversableLike$$anonfun$map$
> 1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.Range.foreach(Range.scala:160)
> at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at kafka.api.FetchRequest$$anonfun$1.apply(FetchRequest.scala:52)
> at kafka.api.FetchRequest$$anonfun$1.apply(FetchRequest.scala:49)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(
> TraversableLike.scala:241)
> at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(
> TraversableLike.scala:241)
> at 

Kafka won't replicate from a specific broker

2016-12-22 Thread Valentin Golev
Hello,

I have a three broker Kafka setup (the ids are 1, 2 (kafka 0.10.1.0) and
1001 (kafka 0.10.0.0)). After a failure of two of them a lot of the
partitions have the third one (1001) as their leader. It's like this:

 Topic: userevents0.open Partition: 5Leader: 1   Replicas:
1,2,1001  Isr: 1,1001,2
Topic: userevents0.open Partition: 6Leader: 2   Replicas:
2,1,1001  Isr: 1,2,1001
Topic: userevents0.open Partition: 7Leader: 1001Replicas:
1001,2,1  Isr: 1001
Topic: userevents0.open Partition: 8Leader: 1   Replicas:
1,1001,2  Isr: 1,1001,2
Topic: userevents0.open Partition: 9Leader: 1001Replicas:
2,1001,1  Isr: 1001
Topic: userevents0.open Partition: 10   Leader: 1001Replicas:
1001,1,2  Isr: 1001

As you can see, only the partitions with Leaders 1 or 2 have successfully
replicated. Brokers 1 and 2, however, are unable to fetch data from the
1001.

All of the partitions are available to the consumers and producers. So
everything is fine except replication. 1001 is available from the other
servers.

I can't restart the broker 1001 because it seems that it will cause data
loss (as you can see, it's the only ISR on many partitions). Restarting the
other brokers didn't help at all. Neither did just plain waiting (it's the
third day of this going on). So what do I do?

The logs of the broker 2 (the one which tries to fetch data) are full of
this:

[2016-12-22 16:38:52,199] WARN [ReplicaFetcherThread-0-1001], Error in
fetch kafka.server.ReplicaFetcherThread$FetchRequest@117a49bf
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 1001 was disconnected before the
response was read
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
at scala.Option.foreach(Option.scala:257)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:112)
at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:108)
at
kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:137)
at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
at
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:253)
at
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
at
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

The logs of the broker 1001 are full of this:

[2016-12-22 16:38:54,226] ERROR Processor got uncaught exception.
(kafka.network.Processor)
java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:506)
at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:361)
at
kafka.api.FetchRequest$$anonfun$1$$anonfun$apply$1.apply(FetchRequest.scala:55)
at
kafka.api.FetchRequest$$anonfun$1$$anonfun$apply$1.apply(FetchRequest.scala:52)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at kafka.api.FetchRequest$$anonfun$1.apply(FetchRequest.scala:52)
at kafka.api.FetchRequest$$anonfun$1.apply(FetchRequest.scala:49)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at
scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at kafka.api.FetchRequest$.readFrom(FetchRequest.scala:49)
at
kafka.network.RequestChannel$Request$$anonfun$2.apply(RequestChannel.scala:65)
at
kafka.network.RequestChannel$Request$$anonfun$2.apply(RequestChannel.scala:65)
at

Re: NotLeaderForPartitionException while doing repartitioning

2016-12-22 Thread Dvorkin, Eugene (CORP)
I had this issue lately. 
On broker 9. Check what errors you got from a change log:
kafka-run-class kafka.tools.StateChangeLogMerger --logs 
/var/log/kafka/state-change.log --topic [__consumer_offsets

If it complains about connection, it maybe this broker does not read data from 
zookeeper
Check zookeeper for this topic – number of partition.
If it there correctly, restart the broker. 




On 12/21/16, 11:49 PM, "Stephane Maarek"  wrote:

Hi,

I’m doing a repartitioning from broker 4 5 6 to broker 7 8 9. I’m getting a
LOT of the following errors (for all topics):

[2016-12-22 04:47:21,957] ERROR [ReplicaFetcherThread-0-9], Error for
partition [__consumer_offsets,29] to broker
9:org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.
(kafka.server.ReplicaFetcherThread)


1) Is that bad?
2) why is it happening? I’m just running the normal reassign partition tool…


I’m getting this with kafka 0.10.1.0

Regards,
Stephane



--
This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, notify the sender immediately by return email and delete the message 
and any attachments from your system.


Error in kafka-stream example

2016-12-22 Thread Amrit Jangid
Hi All,

I want to try out kafka stream example using this example :
https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/main/scala/io/confluent/examples/streams/MapFunctionScalaExample.scala

Getting exception while compiling code :

*[error] val uppercasedWithMapValues: KStream[Array[Byte], String] =
textLines.mapValues(x => makeCaps(x))*
*[error]
  ^*
*[error] /home/ubuntu/stream/MapFunctionScalaExample.scala:32: missing
parameter type*
*[error] val uppercasedWithMap: KStream[Array[Byte], String] =
textLines.map((key, value) => (key, value.toUpperCase()))*
*[error]
   ^*


Is this example working for anyone, Has anyone tried it out?

This is my sbt file :

name := "Kafka_Stream_Test"
organization := "com.goibibo"
version := "0.1"
scalaVersion := "2.11.0"

javacOptions ++= Seq("-source", "1.8", "-target", "1.8")


resolvers ++= Seq(
  "confluent-repository" at "http://packages.confluent.io/maven/;
)

libraryDependencies ++= Seq(
  "org.apache.kafka" % "kafka-streams" % "0.10.1.0-cp2",
  "org.apache.kafka" % "kafka-clients" % "0.10.1.0-cp2"
)

Is there anything i'm missing here ?


Regards,
Amrit


Failure metrics

2016-12-22 Thread Gaurav Abbi
Hi,
I am trying to visualize Kafka metrics that can help me gain better
insights to failures in the Kafka Server level while handling requests.

While looking at the various metrics in JMX, I can see following two metrics

   - FailedProduceRequestsPerSec
   - FailedFetchRequestsPerSec

When I visualize the metrics in my production system, I can see that for
the past 60 days, both these metrics show values of 0.

I suspect if this is the case since we had a couple of incidents where
there was an increase in the failures while reading or publishing?

Could someone verify If these metrics are still available? Quick googling
or search are not giving any conclusive answers. I am using Kafka version
0.9.0.1.

Another question on similar term, are there some metrics that can help in
visualizing failures for other types of requests like (Metadata)

Best Regards,
Gaurav Abbi


what is the key used for change log topic backed by windowed store

2016-12-22 Thread Sachin Mittal
Hi All,
Our stream is something like

builder.stream()
.groupByKey()
.aggregate(Initializer, Aggregator, TimeWindows, valueSerde,
"table-name')

So This creates a changelog topic.
I was wondering what would be the key used for this topic.

Would it be they key we use to group by or a compounded key of (our key,
window key).

I am asking this because what is observed that when stream runs for few
days we start getting the exception where message size is greater than the
allowable message size.

However our messages are windowed on hourly windows, so size of a message
should be far less than the max message size.

Thanks
Sachin


Common client metrics - per request?

2016-12-22 Thread Stevo Slavić
Hello Apache Kafka community,

Please correct me if wrong, I assume common Kafka client metrics (see
https://kafka.apache.org/documentation.html#selector_monitoring ) are
aggregated metrics of all different requests particular client instance
makes. So e.g. producer common metrics like outgoing-byte-rate and
incoming-byte-rate include not only Produce request/response byte-rate but
also all other requests producer makes, like Metadata request/response,
correct?

If so, would it be technically possible to have, as addition, or instead,
common client metrics per request/response?

E.g. I'm seeing that producer instance outgoing-byte-rate (data producer
sends to brokers) is lower than incoming-byte-rate (data received by
producer) - I guess that metadata requests are contributing to
incoming-byte-rate, but cannot see it from client metrics.

Or is it considered to be enough to have this breakdown per request not in
client metrics, but on broker side only?

Kind regards,
Stevo Slavic.


Re: Kafka Backup Strategy

2016-12-22 Thread Stephane Maarek
Thanks Andrew for the detailed response

We’re having a replication factor of 3 so we’re safe there. What do you
recommend for min.insync.replicas, acks and log flush internal?

I was worried about region failures or someone that goes in and deletes our
instances and associated volumes (that’s a kind of disaster).

- Mirror Maker is great, but does it keep the topic configuration,
partitioning and offsets? We basically want our consumers to keep on
working like they used to in case we lose a whole cluster.

- Secor as you said, will allow me to backup the data, but not to reload it
to a cluster, so I don’t think it fits our exact purpose

- EBS snapshots guarantee some kind of point in time recovery (although
some data may be lost as you said). Shutting down the brokers before the
EBS backup, one at a time, sounds like an option, the backup will just take
time to happen over the whole cluster I guess?

I appreciate your feedback and look forward to hearing from you

Regards,
Stephane





On 22 December 2016 at 5:40:30 pm, Andrew Clarkson (
andrew.clark...@rallyhealth.com) wrote:

Hi Stephane,

I say this not to be condescending in any way, but simple replication
*might* cover your needs. This will cover most node failures (causing
unclean shutdown) like disk or power failure. This assumes that one of the
replicas of your data survives (see the configs min.insync.replicas, acks,
and log.flush.interval.*). Making sure that you have the correct ack'ing
and replication strategy will likely cover a lot of the failure/recovery
use cases.

If you need better recovery/availability guarantees than simple
replication, the de facto mechanism is "mirroring
" using
a tool called "mirror maker". This would cover cases where an entire
cluster crashed (like an AWS region being down) or other catastrophic
failures. This is the preferred way to do multi-data center (multi-region)
replication.

Back to EBS snapshots. From what I understand, snapshotting the file system
won't give you a full picture of what's going on because brokers flush the
logs infrequently and, as you mentioned, leave logs in a "corrupted" state.

If you need a persistent record in order to rerun expired data (see the
configs log.retention.*), you might want to look at a tool like Secor
. Secor will write all messages to an
S3 bucket from which you could rerun the data if you need to. Sadly, it
doesn't come with a producer to rerun the data and you would have to write
your own.

Let me know if that helps!

Thanks much,
Andrew Clarkson

On Wed, Dec 21, 2016 at 9:32 PM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Hi,
>
> I have Kafka running on EC2 in AWS.
> I would like to backup my data volumes daily in order to recover to a
point
> in time in case of a disaster.
>
> One thing I’m worried about is that if I do an EBS snapshot while Kafka
is
> running, it seems a Kafka that recovers on it will have to deal with
> corrupted logs (it goes through a repair / rebuild index process). It
seems
> that Kafka on shutdown properly closes the logs.
>
> Questions:
> 1) If I take the EBS snapshots while Kafka is running, is it dangerous
that
> a new instance launched from this backup has to go through a repair
> process?
> 2) The other option I see is to stop the Kafka broker, and then take my
EBS
> snapshot. But I can’t do that for all brokers simultaneously as I would
> lose my cluster, so therefore if I do: stop kafka broker, take snapshot,
> start kafka, next broker same steps, I would get a clean backup, but not
a
> point in time backup… is that an issue?
> 3) Are there any other backup strategies I haven’t considered?
>
> Thanks!
> Stephane
>