Re: Mirror Maker - Message Format Issue?

2016-10-12 Thread Craig Swift
Hello,

Just to close this issue out. The 8 producer going to the 10 cluster was
the root issue. The mirror maker by default was unable to produce the
message to the destination cluster. The work around was to include a
MirrorMakerMessageHandler
that did nothing but repackage the message again. In the future it might be
nice if the mirror handled this auto-magically, but at least the ability to
alter the behavior provided an easy fix. Hope this helps someone else,
thanks.

Craig

Hello,
>
> I think we're misunderstanding the docs on some level and I need a little
> clarification. We have the following setup:
>
> 1) 0.8.2 producer -> writing to Kafka 0.10.0.1 cluster w/ version 10
> message format (source cluster).
> 2) 0.10.0.1 mirror using the 'new consumer' reading from the source
> cluster and writing to Kafka 0.10.0.1 cluster w/version 0.8.2 message
> format (destination cluster). We need some of the features like SSL, hence
> using the new consumer.
> 3) Lots of old 0.8.2 consumers reading from the destination cluster that
> still need to be upgraded.
>
> We're seeing errors from the mirror maker when trying to produce to the
> destination cluster like the following:
>
> java.lang.IllegalArgumentException: Invalid timestamp -1
> at org.apache.kafka.clients.producer.ProducerRecord.
> (ProducerRecord.java:60)
>
> Is the root problem the 0.8.2 producer sending data to the source cluster
> or the new 10 mirror writing data to the destination cluster in 0.8.2
> format? From the docs we were under the impression that the data would be
> stored in the source cluster in 10 format regardless of the producer and
> the mirror could produce to the destination cluster regardless of it's
> message format setting.
>
> Is this current setup non-functional or is there a way to make this work?
> For example, if the mirror producing is the issue could we implement a
> custom MirrorMakerMessageHandler? Any advice and clarification would be
> helpful, thanks.
>
> Craig
>


Mirror Maker - Message Format Issue?

2016-10-11 Thread Craig Swift
Hello,

I think we're misunderstanding the docs on some level and I need a little
clarification. We have the following setup:

1) 0.8.2 producer -> writing to Kafka 0.10.0.1 cluster w/ version 10
message format (source cluster).
2) 0.10.0.1 mirror using the 'new consumer' reading from the source cluster
and writing to Kafka 0.10.0.1 cluster w/version 0.8.2 message format
(destination cluster). We need some of the features like SSL, hence using
the new consumer.
3) Lots of old 0.8.2 consumers reading from the destination cluster that
still need to be upgraded.

We're seeing errors from the mirror maker when trying to produce to the
destination cluster like the following:

java.lang.IllegalArgumentException: Invalid timestamp -1
at
org.apache.kafka.clients.producer.ProducerRecord.(ProducerRecord.java:60)

Is the root problem the 0.8.2 producer sending data to the source cluster
or the new 10 mirror writing data to the destination cluster in 0.8.2
format? From the docs we were under the impression that the data would be
stored in the source cluster in 10 format regardless of the producer and
the mirror could produce to the destination cluster regardless of it's
message format setting.

Is this current setup non-functional or is there a way to make this work?
For example, if the mirror producing is the issue could we implement a
custom MirrorMakerMessageHandler? Any advice and clarification would be
helpful, thanks.

Craig


Re: Kafka 10 Consumer Reading from Kafka 8 Cluster?

2016-10-06 Thread Craig Swift
Ok great - thanks for the clarification. Exactly what I needed. :)

Craig

On Thu, Oct 6, 2016 at 2:09 PM, Scott Reynolds  wrote:

> you cannot use a k10 client with a k8 cluster. The protocol changed
>
> You CAN use a k8 client with a k10 cluster.
>
> On Thu, Oct 6, 2016 at 12:00 PM Craig Swift
>  wrote:
>
> > We're doing some fairly intensive data transformations in the current
> > workers so it's not as straight forward as just reading/producing to
> > another topic. However, if you mean can we mirror the source topic to the
> > kafka 10 cluster and then update the worker to read/write to 10 - that
> > could be an option. I'd still like to know if any of the k10 client
> > consumer (old or new consumer) code can work with a k8 cluster though.
> >
> > Craig
> >
> > On Thu, Oct 6, 2016 at 12:38 PM, David Garcia 
> > wrote:
> >
> > > Any reason you can’t use mirror maker?
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=27846330
> > >
> > > -David
> > >
> > > On 10/6/16, 1:32 PM, "Craig Swift"  INVALID>
> > > wrote:
> > >
> > > Hello,
> > >
> > > We're in the process of upgrading several of our clusters to Kafka
> > 10.
> > > I
> > > was wondering if it's possible to use the Kafka 10 client code (old
> > or
> > > new)
> > > to read from a source Kafka 8 cluster and then use the new 10
> > producer
> > > to
> > > write to a destination Kafka 10 cluster? I know there's a
> recommended
> > > upgrade path per the documentation but we're unfortunately not able
> > to
> > > update the source cluster quite yet. Thoughts?
> > >
> > > Craig
> > >
> > >
> > >
> >
>


Re: Kafka 10 Consumer Reading from Kafka 8 Cluster?

2016-10-06 Thread Craig Swift
We're doing some fairly intensive data transformations in the current
workers so it's not as straight forward as just reading/producing to
another topic. However, if you mean can we mirror the source topic to the
kafka 10 cluster and then update the worker to read/write to 10 - that
could be an option. I'd still like to know if any of the k10 client
consumer (old or new consumer) code can work with a k8 cluster though.

Craig

On Thu, Oct 6, 2016 at 12:38 PM, David Garcia  wrote:

> Any reason you can’t use mirror maker?
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330
>
> -David
>
> On 10/6/16, 1:32 PM, "Craig Swift" 
> wrote:
>
> Hello,
>
> We're in the process of upgrading several of our clusters to Kafka 10.
> I
> was wondering if it's possible to use the Kafka 10 client code (old or
> new)
> to read from a source Kafka 8 cluster and then use the new 10 producer
> to
> write to a destination Kafka 10 cluster? I know there's a recommended
> upgrade path per the documentation but we're unfortunately not able to
> update the source cluster quite yet. Thoughts?
>
> Craig
>
>
>


Kafka 10 Consumer Reading from Kafka 8 Cluster?

2016-10-06 Thread Craig Swift
Hello,

We're in the process of upgrading several of our clusters to Kafka 10. I
was wondering if it's possible to use the Kafka 10 client code (old or new)
to read from a source Kafka 8 cluster and then use the new 10 producer to
write to a destination Kafka 10 cluster? I know there's a recommended
upgrade path per the documentation but we're unfortunately not able to
update the source cluster quite yet. Thoughts?

Craig


Re: Running Zookeeper on same instance as kafka brokers in multi node cluster

2016-09-19 Thread Craig Swift
Hello,

Initially we had our setup with both ZK and Kafka on the same nodes as
well. Over time though this proved problematic as we both increased the
usage of Kafka and had to scale out the cluster. ZK specifically can be
very touchy if it gets IO bound - so when the Kafka cluster would get under
high load we would start to see slow response and in the worst case
scenario a cascading failure. Also, from my understanding ZK doesn't really
scale well past five nodes - so once your cluster grows large enough your
nodes become non-homogeneous as you're only running ZK on certain nodes. So
from our experience it was much cleaner to spin up a 3 node ZK cluster for
each Kafka cluster we ran, dedicated to that cluster. That allows us to
grow out the Kafka cluster when needed without ever worrying about ZK. In
most cases we're more concerned about the ZK integrity and keeping that
solid and find it fairly easy to reprovision/add Kafka nodes when
necessary. Hope that helps.

Craig


On Mon, Sep 19, 2016 at 3:01 PM, Digumarthi, Prabhakar Venkata Surya <
prabhakarvenkatasurya.digumar...@capitalone.com> wrote:

> Hi Team,
>
>
> What are the downsides of installing Zookeeper and kafka on same machine,
> in multi broker environment?
>
> We are trying to install Zookeeper and kafka in AWS world and its becoming
> difficult for us to maintain ZK and Kafka with some issues. Also
> re-provisioning  ZK and Kafka instances separately is getting complicated.
> Thanks,
> Prabhakar
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>


JMXTrans Monitoring for Individual Topics

2016-07-07 Thread Craig Swift
Hello,

I was hoping someone might know this off the top of their head. I haven't
been able to find any documentation on it so I was curious if it was still
available. We're upgrading to Kafka 8.2 from 8.1 (ya, we're a bit behind)
and we used to have the following in jmxtrans that would give us monitoring
on individual topics:

{
 "outputWriters": [
  {
   "@class": "com.googlecode.jmxtrans.model.output.GraphiteWriter",
   "settings": {
"typeNames": [ "name" ],
"host": "bla.blabla.net",
"port": 2013,
"rootPrefix": "app.infrastructure.kafka.green.green-01"
   }
  }
 ],
 "resultAlias": "topics.topic_name",
 "obj": "kafka.log:type=Log,name=topic_name-*",
 "attr": [ "Value" ]
}

which would yield metrics on size, log end offset and number of log
segments for that topic. Is a similar option still available in 8.2?


Re: [DISCUSS] Java 8 as a minimum requirement

2016-06-16 Thread Craig Swift
+1

Craig J. Swift
Principal Software Engineer - Data Pipeline
ReturnPath Inc.
Work: 303-999-3220 Cell: 720-560-7038

On Thu, Jun 16, 2016 at 2:50 PM, Henry Cai 
wrote:

> +1 for Lambda expression.
>
> On Thu, Jun 16, 2016 at 1:48 PM, Rajiv Kurian  wrote:
>
> > +1
> >
> > On Thu, Jun 16, 2016 at 1:45 PM, Ismael Juma  wrote:
> >
> > > Hi all,
> > >
> > > I would like to start a discussion on making Java 8 a minimum
> requirement
> > > for Kafka's next feature release (let's say Kafka 0.10.1.0 for now).
> This
> > > is the first discussion on the topic so the idea is to understand how
> > > people feel about it. If people feel it's too soon, then we can pick up
> > the
> > > conversation again after Kafka 0.10.1.0. If the feedback is mostly
> > > positive, I will start a vote thread.
> > >
> > > Let's start with some dates. Java 7 hasn't received public updates
> since
> > > April 2015[1], Java 8 was released in March 2014[2] and Java 9 is
> > scheduled
> > > to be released in March 2017[3].
> > >
> > > The first argument for dropping support for Java 7 is that the last
> > public
> > > release by Oracle contains a large number of known security
> > > vulnerabilities. The effectiveness of Kafka's security features is
> > reduced
> > > if the underlying runtime is not itself secure.
> > >
> > > The second argument for moving to Java 8 is that it adds a number of
> > > compelling features:
> > >
> > > * Lambda expressions and method references (particularly useful for the
> > > Kafka Streams DSL)
> > > * Default methods (very useful for maintaining compatibility when
> adding
> > > methods to interfaces)
> > > * java.util.stream (helpful for making collection transformations more
> > > concise)
> > > * Lots of improvements to java.util.concurrent (CompletableFuture,
> > > DoubleAdder, DoubleAccumulator, StampedLock, LongAdder,
> LongAccumulator)
> > > * Other nice things: SplittableRandom, Optional (and many others I have
> > not
> > > mentioned)
> > >
> > > The third argument is that it will simplify our testing matrix, we
> won't
> > > have to test with Java 7 any longer (this is particularly useful for
> > system
> > > tests that take hours to run). It will also make it easier to support
> > Scala
> > > 2.12, which requires Java 8.
> > >
> > > The fourth argument is that many other open-source projects have taken
> > the
> > > leap already. Examples are Cassandra[4], Lucene[5], Akka[6], Hadoop
> 3[7],
> > > Jetty[8], Eclipse[9], IntelliJ[10] and many others[11]. Even Android
> will
> > > support Java 8 in the next version (although it will take a while
> before
> > > most phones will use that version sadly). This reduces (but does not
> > > eliminate) the chance that we would be the first project that would
> > cause a
> > > user to consider a Java upgrade.
> > >
> > > The main argument for not making the change is that a reasonable number
> > of
> > > users may still be using Java 7 by the time Kafka 0.10.1.0 is released.
> > > More specifically, we care about the subset who would be able to
> upgrade
> > to
> > > Kafka 0.10.1.0, but would not be able to upgrade the Java version. It
> > would
> > > be great if we could quantify this in some way.
> > >
> > > What do you think?
> > >
> > > Ismael
> > >
> > > [1] https://java.com/en/download/faq/java_7.xml
> > > [2] https://blogs.oracle.com/thejavatutorials/entry/jdk_8_is_released
> > > [3] http://openjdk.java.net/projects/jdk9/
> > > [4] https://github.com/apache/cassandra/blob/trunk/README.asc
> > > [5]
> https://lucene.apache.org/#highlights-of-this-lucene-release-include
> > > [6] http://akka.io/news/2015/09/30/akka-2.4.0-released.html
> > > [7] https://issues.apache.org/jira/browse/HADOOP-11858
> > > [8] https://webtide.com/jetty-9-3-features/
> > > [9] http://markmail.org/message/l7s276y3xkga2eqf
> > > [10]
> > >
> > >
> >
> https://intellij-support.jetbrains.com/hc/en-us/articles/206544879-Selecting-the-JDK-version-the-IDE-will-run-under
> > > [11] http://markmail.org/message/l7s276y3xkga2eqf
> > >
> >
>


Re: Will Mirror Maker only support 1 to 1?

2015-09-21 Thread Craig Swift
Hello,

We've done a variety of different mirror configurations in the past (we
mirror from AWS into several different data centers) including the first
one you describe. In fact for some high volume / large message topics we
actually found splitting it up and placing a dedicated mirror for the large
topic to be useful.

Craig J. Swift

On Mon, Sep 21, 2015 at 1:59 AM, Prabhjot Bharaj 
wrote:

> Hi,
>
> Can I have 2 separate mirror maker processes in this format:-
>
> Process 1 - source: 2, target: 1
> Process 2 - source: 3, target: 1
>
> If this is not supported, then will this circular kind of setup work ?
>
> Process 1 (on cluster 1) - source: 2, target: 1
> Process 1 (on cluster 2) - source: 1, target: 2
>
> Regards,
> Prabhjot
>
> On Sat, Sep 19, 2015 at 1:43 AM, Xiang Zhou (Samuel) 
> wrote:
>
> > Hi, folks,
> >
> > I found that the Mirror Maker docs should be updated(KAFKA-2449) since it
> > mentions N source to 1 destination will not be supported. So is that true
> > it only support 1 source to 1 destination in 0.9.0? Or it will be
> extended
> > to support N source to N dest?
> >
> > Thanks,
> > Samuel
> >
>
>
>
> --
> -
> "There are only 10 types of people in the world: Those who understand
> binary, and those who don't"
>


Re: MirrorMaker - Not consuming from all partitions

2015-09-11 Thread Craig Swift
Scary but thanks! :) We'll start digging into the network and see if we can
find a smoking gun. Appreciate the response, thanks again.

Craig J. Swift
Software Engineer - Data Pipeline
ReturnPath Inc.
Work: 303-999-3220 Cell: 720-560-7038

On Fri, Sep 11, 2015 at 11:29 AM, Steve Miller 
wrote:

>I have a vague feeling that I've seen stuff like this when the network
> on the broker that's disappearing is actually unreachable from time to time
> -- though I'd like to believe that's not such an issue when talking to AWS
> (though there could be a lot of screwed-up Internet between you and it,
> depending on exactly wht you're doing).
>
>One thing you could consider doing would be to look at some of the
> traffic.  wireshark/tshark knows how to decode Kafka transactions, though
> digging through the output is... exciting, because there can be so very
> much of it.  I'd written up some notes on how to do that, which you can see
> at:
>
>
> http://mail-archives.apache.org/mod_mbox/kafka-users/201408.mbox/%3c20140812180358.ga24...@idrathernotsay.com%3E
>
> (though I expect in your case, you'd need to be doing a capture all the
> time, and then figure out when MirrorMaker stops fetching from that broker,
> and then stare at a ton of data to find whatever protocol thing has
> happened -- all of which is ugly enough that I'd been reluctant to mention
> it until now).
>
> -Steve
>
> On Fri, Sep 11, 2015 at 08:52:21AM -0600, Craig Swift wrote:
> > Just wanted to bump this again and see if the community had any thoughts
> or
> > if we're just missing something stupid. For added context the topic we're
> > reading from has 24 partitions and we see roughly 15k messages per
> minute.
> > As I mentioned before the throughput seems fine, but I'm not entirely
> sure
> > how the MirrorMaker cycles through it's topics/partitions and why it
> would
> > read very slowly or bypass reading from certain partitions.
> >
> > Craig J. Swift
> > Software Engineer - Data Pipeline
> > ReturnPath Inc.
> > Work: 303-999-3220 Cell: 720-560-7038
> >
> > On Wed, Sep 9, 2015 at 9:29 AM, Craig Swift 
> > wrote:
> >
> > > Hello,
> > >
> > > Hope everyone is doing well. I was hoping to get some assistance with a
> > > strange issue we're experiencing while using the MirrorMaker to pull
> data
> > > down from an 8 node Kafka cluster in AWS into our data center. Both
> Kafka
> > > clusters and the mirror are using version 0.8.1.1 with dedicated
> Zookeeper
> > > clusters for each cluster respectively (running 3.4.5).
> > >
> > > The problem we're seeing is that the mirror starts up and begins
> consuming
> > > from the cluster on a specific topic. It correctly attaches to all 24
> > > partitions for that topic - but inevitably there are a series of
> partitions
> > > that either don't get read or are read at a very slow rate. Those
> > > partitions are always associated with the same brokers. For example,
> all
> > > partitions on broker 2 won't be read or all partitions on broker 2 and
> 4
> > > won't be read. On restarting the mirror, these 'stuck' partitions may
> stay
> > > the same or move. If they move the backlog is drained very quickly. If
> we
> > > add more mirrors for additional capacity the same situation happens
> except
> > > that each mirror has it's own set of stuck partitions. I've included
> the
> > > mirror's configurations below along with samples from the logs.
> > >
> > > 1) The partition issue seems to happen when the mirror first starts up.
> > > Once in a blue moon it reads from everything normally, but on restart
> it
> > > can easily get back into this state.
> > >
> > > 2) We're fairly sure it isn't a processing/throughput issue. We can
> turn
> > > the mirror off for a while, incur a large backlog of data, and when it
> is
> > > enabled it chews through the data very quickly minus the handful of
> stuck
> > > partitions.
> > >
> > > 3) We've looked at both the zookeeper and broker logs and there doesn't
> > > seem to be anything out of the normal. We see the mirror connecting,
> there
> > > are a few info messages about zookeeper nodes already existing, etc. No
> > > specific errors.
> > >
> > > 4) We've enabled debugging on the mirror and we've noticed that during
> the
> > > zk heartbeat/updates we're miss

Re: MirrorMaker - Not consuming from all partitions

2015-09-11 Thread Craig Swift
Just wanted to bump this again and see if the community had any thoughts or
if we're just missing something stupid. For added context the topic we're
reading from has 24 partitions and we see roughly 15k messages per minute.
As I mentioned before the throughput seems fine, but I'm not entirely sure
how the MirrorMaker cycles through it's topics/partitions and why it would
read very slowly or bypass reading from certain partitions.

Craig J. Swift
Software Engineer - Data Pipeline
ReturnPath Inc.
Work: 303-999-3220 Cell: 720-560-7038

On Wed, Sep 9, 2015 at 9:29 AM, Craig Swift 
wrote:

> Hello,
>
> Hope everyone is doing well. I was hoping to get some assistance with a
> strange issue we're experiencing while using the MirrorMaker to pull data
> down from an 8 node Kafka cluster in AWS into our data center. Both Kafka
> clusters and the mirror are using version 0.8.1.1 with dedicated Zookeeper
> clusters for each cluster respectively (running 3.4.5).
>
> The problem we're seeing is that the mirror starts up and begins consuming
> from the cluster on a specific topic. It correctly attaches to all 24
> partitions for that topic - but inevitably there are a series of partitions
> that either don't get read or are read at a very slow rate. Those
> partitions are always associated with the same brokers. For example, all
> partitions on broker 2 won't be read or all partitions on broker 2 and 4
> won't be read. On restarting the mirror, these 'stuck' partitions may stay
> the same or move. If they move the backlog is drained very quickly. If we
> add more mirrors for additional capacity the same situation happens except
> that each mirror has it's own set of stuck partitions. I've included the
> mirror's configurations below along with samples from the logs.
>
> 1) The partition issue seems to happen when the mirror first starts up.
> Once in a blue moon it reads from everything normally, but on restart it
> can easily get back into this state.
>
> 2) We're fairly sure it isn't a processing/throughput issue. We can turn
> the mirror off for a while, incur a large backlog of data, and when it is
> enabled it chews through the data very quickly minus the handful of stuck
> partitions.
>
> 3) We've looked at both the zookeeper and broker logs and there doesn't
> seem to be anything out of the normal. We see the mirror connecting, there
> are a few info messages about zookeeper nodes already existing, etc. No
> specific errors.
>
> 4) We've enabled debugging on the mirror and we've noticed that during the
> zk heartbeat/updates we're missing these messages for the 'stuck'
> partitions:
>
> [2015-09-08 18:38:12,157] DEBUG Reading reply sessionid:0x14f956bd57d21ee,
> packet:: clientPath:null serverPath:null finished:false header:: 357,5
>  replyHeader:: 357,8597251893,0  request::
> '/consumers/mirror-kafkablk-kafka-gold-east-to-kafkablk-den/offsets/MessageHeadersBody/5,#34303537353838,-1
>  response::
> s{4295371756,8597251893,1439969185754,1441759092134,19500,0,0,0,7,0,4295371756}
>  (org.apache.zookeeper.ClientCnxn)
>
> i.e. we see this message for all the processing partitions, but never for
> the stuck ones. There are no errors in the log prior to this though, and
> once in a great while we might see a log entry for one of the stuck
> partitions.
>
> 5) We've checked latency/response time with zookeeper from the brokers and
> the mirror and it appears fine.
>
> Mirror consumer config:
> group.id=mirror-kafkablk-kafka-gold-east-to-kafkablk-den
> consumer.id=mirror-kafkablk-mirror00-den-kafka-gold-east-to-kafkablk-den
> zookeeper.connect=zk.strange.dev.net:2181
> fetch.message.max.bytes=15728640
> socket.receive.buffer.bytes=6400
> socket.timeout.ms=6
> zookeeper.connection.timeout.ms=6
> zookeeper.session.timeout.ms=3
> zookeeper.sync.time.ms=4000
> auto.offset.reset=smallest
> auto.commit.interval.ms=2
>
> Mirror producer config:
> client.id=mirror-kafkablk-mirror00-den-kafka-gold-east-to-kafkablk-den
> metadata.broker.list=kafka00.lan.strange.dev.net:9092,
> kafka01.lan.strange.dev.net:9092,kafka02.lan.strange.dev.net:9092,
> kafka03.lan.strange.dev.net:9092,kafka04.lan.strange.dev.net:9092
> request.required.acks=1
> producer.type=async
> request.timeout.ms=2
> retry.backoff.ms=1000
> message.send.max.retries=6
> serializer.class=kafka.serializer.DefaultEncoder
> send.buffer.bytes=134217728
> compression.codec=gzip
>
> Mirror startup settings:
> --num.streams 2 --num.producers 4
>
> Any thoughts/suggestions would be very helpful. At this point we're
> running out of things to try.
>
>
> Craig J. Swift
> Software Engineer
>
>
>


MirrorMaker - Not consuming from all partitions

2015-09-09 Thread Craig Swift
Hello,

Hope everyone is doing well. I was hoping to get some assistance with a
strange issue we're experiencing while using the MirrorMaker to pull data
down from an 8 node Kafka cluster in AWS into our data center. Both Kafka
clusters and the mirror are using version 0.8.1.1 with dedicated Zookeeper
clusters for each cluster respectively (running 3.4.5).

The problem we're seeing is that the mirror starts up and begins consuming
from the cluster on a specific topic. It correctly attaches to all 24
partitions for that topic - but inevitably there are a series of partitions
that either don't get read or are read at a very slow rate. Those
partitions are always associated with the same brokers. For example, all
partitions on broker 2 won't be read or all partitions on broker 2 and 4
won't be read. On restarting the mirror, these 'stuck' partitions may stay
the same or move. If they move the backlog is drained very quickly. If we
add more mirrors for additional capacity the same situation happens except
that each mirror has it's own set of stuck partitions. I've included the
mirror's configurations below along with samples from the logs.

1) The partition issue seems to happen when the mirror first starts up.
Once in a blue moon it reads from everything normally, but on restart it
can easily get back into this state.

2) We're fairly sure it isn't a processing/throughput issue. We can turn
the mirror off for a while, incur a large backlog of data, and when it is
enabled it chews through the data very quickly minus the handful of stuck
partitions.

3) We've looked at both the zookeeper and broker logs and there doesn't
seem to be anything out of the normal. We see the mirror connecting, there
are a few info messages about zookeeper nodes already existing, etc. No
specific errors.

4) We've enabled debugging on the mirror and we've noticed that during the
zk heartbeat/updates we're missing these messages for the 'stuck'
partitions:

[2015-09-08 18:38:12,157] DEBUG Reading reply sessionid:0x14f956bd57d21ee,
packet:: clientPath:null serverPath:null finished:false header:: 357,5
 replyHeader:: 357,8597251893,0  request::
'/consumers/mirror-kafkablk-kafka-gold-east-to-kafkablk-den/offsets/MessageHeadersBody/5,#34303537353838,-1
 response::
s{4295371756,8597251893,1439969185754,1441759092134,19500,0,0,0,7,0,4295371756}
 (org.apache.zookeeper.ClientCnxn)

i.e. we see this message for all the processing partitions, but never for
the stuck ones. There are no errors in the log prior to this though, and
once in a great while we might see a log entry for one of the stuck
partitions.

5) We've checked latency/response time with zookeeper from the brokers and
the mirror and it appears fine.

Mirror consumer config:
group.id=mirror-kafkablk-kafka-gold-east-to-kafkablk-den
consumer.id=mirror-kafkablk-mirror00-den-kafka-gold-east-to-kafkablk-den
zookeeper.connect=zk.strange.dev.net:2181
fetch.message.max.bytes=15728640
socket.receive.buffer.bytes=6400
socket.timeout.ms=6
zookeeper.connection.timeout.ms=6
zookeeper.session.timeout.ms=3
zookeeper.sync.time.ms=4000
auto.offset.reset=smallest
auto.commit.interval.ms=2

Mirror producer config:
client.id=mirror-kafkablk-mirror00-den-kafka-gold-east-to-kafkablk-den
metadata.broker.list=kafka00.lan.strange.dev.net:9092,
kafka01.lan.strange.dev.net:9092,kafka02.lan.strange.dev.net:9092,
kafka03.lan.strange.dev.net:9092,kafka04.lan.strange.dev.net:9092
request.required.acks=1
producer.type=async
request.timeout.ms=2
retry.backoff.ms=1000
message.send.max.retries=6
serializer.class=kafka.serializer.DefaultEncoder
send.buffer.bytes=134217728
compression.codec=gzip

Mirror startup settings:
--num.streams 2 --num.producers 4

Any thoughts/suggestions would be very helpful. At this point we're running
out of things to try.


Craig J. Swift
Software Engineer