Re: User interface for kafka: Management/administration

2020-01-22 Thread Stephen Powis
Definitely more focused on the consuming side vs management, but has some
management tooling and has a binary distribution:
https://github.com/sourcelaborg/kafka-webview

On Thu, Jan 23, 2020 at 1:10 PM Sunil CHAUDHARI
 wrote:

> Hi all,
> Please help me to get some user interface for management and
> administration of my kafka cluster.
> There are some open source available, but they either have some
> dependencies or those need to be built before running.
> Is there any pre-build(ready to use package) which I can just download and
> run?
> Our environment have many restrictions, so its difficult to
> download/install dependencies.
>
> I hope you guys understood my problem.
>
> Regards,
> Sunil.
>
> CONFIDENTIAL NOTE:
> The information contained in this email is intended only for the use of
> the individual or entity named above and may contain information that is
> privileged, confidential and exempt from disclosure under applicable law.
> If the reader of this message is not the intended recipient, you are hereby
> notified that any dissemination, distribution or copying of this
> communication is strictly prohibited. If you have received this message in
> error, please immediately notify the sender and delete the mail. Thank you.
>


Re: help for regarding my question

2019-11-13 Thread Stephen Powis
true

On Thu, Nov 14, 2019 at 2:03 PM prashanth sri 
wrote:

> i am new to kafka , can any one help me , regarding my problem.
>


Re: Any simple UI Tool for Kafka

2019-08-24 Thread Stephen Powis
Pretty sure Kafka-WebView checks most of those boxes :)

https://github.com/SourceLabOrg/kafka-webview



On Sat, Aug 24, 2019 at 12:58 PM Darius Cooper 
wrote:

> I'm looking for recommendations for a simple UI -based tool that will help
>
> * viewing lists of Kafka topics,
> * viewing Consumer groups for each topics
> * viewing messages for a topic
> * post test messages to a topic
>
> The audience would be developers who are using Kafka and could use a
> simple tool within their test environments.
>
> I found a couple of tools:
> * Kafka Tool
> * Yahoo Kafka-manager
>
> I'm looking for recommendations of more tools
>


Re: kafka server shutdown automatically

2019-05-10 Thread Stephen Powis
Looks like someone/something on your system sent it a SIGHUP signal:

[2019-05-09 12:54:56,295] INFO *Terminating process due to signal SIGHUP *
> (org.apache.kafka.common.utils.LoggingSignalHandler)
>
>
On Fri, May 10, 2019 at 5:43 PM  wrote:

> Hi guys,
> my kafka server dead without any error log. i see something in
> server.log file like this:
> [2019-05-09 12:02:24,103] INFO [GroupMetadataManager brokerId=0]
> Removed 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-05-09 12:12:24,103] INFO [GroupMetadataManager brokerId=0]
> Removed 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-05-09 12:22:24,103] INFO [GroupMetadataManager brokerId=0]
> Removed 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-05-09 12:32:24,103] INFO [GroupMetadataManager brokerId=0]
> Removed 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-05-09 12:42:24,103] INFO [GroupMetadataManager brokerId=0]
> Removed 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-05-09 12:52:24,103] INFO [GroupMetadataManager brokerId=0]
> Removed 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-05-09 12:54:56,295] INFO Terminating process due to signal SIGHUP
> (org.apache.kafka.common.utils.LoggingSignalHandler)
> [2019-05-09 12:54:56,339] INFO [KafkaServer id=0] shutting down
> (kafka.server.KafkaServer)
> [2019-05-09 12:54:56,341] INFO [KafkaServer id=0] Starting controlled
> shutdown (kafka.server.KafkaServer)
> [2019-05-09 12:54:56,445] INFO [KafkaServer id=0] Controlled shutdown
> succeeded (kafka.server.KafkaServer)
> [2019-05-09 12:54:56,452] INFO [/config/changes-event-process-thread]:
> Shutting down
> (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
> [2019-05-09 12:54:56,464] INFO [/config/changes-event-process-thread]:
> Stopped
> (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
> [2019-05-09 12:54:56,465] INFO [/config/changes-event-process-thread]:
> Shutdown completed
> (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
> [2019-05-09 12:54:56,465] INFO [SocketServer brokerId=0] Stopping
> socket server request processors (kafka.network.SocketServer)
> [2019-05-09 12:54:56,495] INFO [SocketServer brokerId=0] Stopped socket
> server request processors (kafka.network.SocketServer)
> [2019-05-09 12:54:56,500] INFO [data-plane Kafka Request Handler on
> Broker 0], shutting down (kafka.server.KafkaRequestHandlerPool)
> [2019-05-09 12:54:56,504] INFO [data-plane Kafka Request Handler on
> Broker 0], shut down completely (kafka.server.KafkaRequestHandlerPool)
> [2019-05-09 12:54:56,508] INFO [KafkaApi-0] Shutdown complete.
> (kafka.server.KafkaApis)
> [2019-05-09 12:54:56,509] INFO [ExpirationReaper-0-topic]: Shutting
> down (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2019-05-09 12:54:56,628] INFO [ExpirationReaper-0-topic]: Stopped
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2019-05-09 12:54:56,628] INFO [ExpirationReaper-0-topic]: Shutdown
> completed
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2019-05-09 12:54:56,631] INFO [TransactionCoordinator id=0] Shutting
> down. (kafka.coordinator.transaction.TransactionCoordinator)
> [2019-05-09 12:54:56,633] INFO [ProducerId Manager 0]: Shutdown
> complete: last producerId assigned 28000
> (kafka.coordinator.transaction.ProducerIdManager)
> [2019-05-09 12:54:56,634] INFO [Transaction State Manager 0]: Shutdown
> complete (kafka.coordinator.transaction.TransactionStateManager)
> [2019-05-09 12:54:56,634] INFO [Transaction Marker Channel Manager 0]:
> Shutting down
> (kafka.coordinator.transaction.TransactionMarkerChannelManager)
> [2019-05-09 12:54:56,634] INFO [Transaction Marker Channel Manager 0]:
> Stopped (kafka.coordinator.transaction.TransactionMarkerChannelManager)
> [2019-05-09 12:54:56,634] INFO [Transaction Marker Channel Manager 0]:
> Shutdown completed
> (kafka.coordinator.transaction.TransactionMarkerChannelManager)
> [2019-05-09 12:54:56,634] INFO [TransactionCoordinator id=0] Shutdown
> complete. (kafka.coordinator.transaction.TransactionCoordinator)
> [2019-05-09 12:54:56,635] INFO [GroupCoordinator 0]: Shutting down.
> (kafka.coordinator.group.GroupCoordinator)
> [2019-05-09 12:54:56,636] INFO [ExpirationReaper-0-Heartbeat]: Shutting
> down (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2019-05-09 12:54:56,652] INFO [ExpirationReaper-0-Heartbeat]: Stopped
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2019-05-09 12:54:56,652] INFO [ExpirationReaper-0-Heartbeat]: Shutdown
> completed
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2019-05-09 12:54:56,652] INFO [ExpirationReaper-0-Rebalance]: Shutting
> down 

Re: Kafka tests on a remote cluster

2018-12-26 Thread Stephen Powis
Without looking into how the integration tests work my best guess is within
the context they were written to run in, it doesn't make sense to run them
against a remote cluster.  The "internal" cluster is running the same code,
so why require having to coordinate with an external dependency?

For the use case you gave, and I'm not sure if tests exist that cover this
behavior or not -- Running the brokers locally in the context of the tests
mean that those tests have control over the brokers (IE shut them off,
restart them, etc.. programmatically) and validate behavior.  To coordinate
these operations on a remote broker would be significantly more difficult.

Not sure this helps...but perhaps you're either asking the wrong questions
or trying to go about solving your problem using the wrong set of tools?
My gut feeling says if you want to do a full scale multi-server load / HA
test, Kafka's test suite is not the best place to start.

Stephen



On Thu, Dec 27, 2018 at 10:53 AM Parviz deyhim  wrote:

> Hi,
>
> I'm looking to see who has done this before and get some guidance. On
> frequent basis I like to run basic tests on a remote Kafka cluster while
> some random chaos/faults are being performed. In other words I like to run
> chaos engineering tasks (network outage, disk outage, etc) and see how
> Kafka behaves. For example:
>
> 1) bring some random Broker node down
> 2) send 2000 messages
> 3) consumes messages
> 4) confirm there's no data loss
>
> My questions: I'm pretty sure most of the scenarios I'm looking to test
> have been covered under Kafka's integration,unit and other existing tests.
> What I cannot figure out is how to run those tests on a remote cluster vs.
> a local one which the tests seems to run on. For example I like to run the
> following command but the tests to be executed on a remote cluster of my
> choice:
>
> ./gradlew cleanTest integrationTest
>
> Any guidance/help would be appreciated.
>
> Thanks
>


Re: Kafka UI

2018-09-14 Thread Stephen Powis
Shameless plug: https://github.com/sourcelaborg/kafka-webview

The project is definitely more geared around viewing data within topics
but you can view details about your clusters as well.  You can view things
such as node membership, configuration of the nodes, topics defined on the
cluster, partition distribution, ISR, etc..

Stephen

On Fri, Sep 14, 2018 at 5:22 PM, Mohit Srivastav 
wrote:

> What are best/recommended UI for kafka to check topics, brokers,
> partitions, brokers, etc - preferably ones which can be deployed without
> docker? It must be open source and should work with kafka 2.0+ version.
>


Re: [DISCUSS] KIP-369 Alternative Partitioner to Support "Always Round-Robin" Selection

2018-08-30 Thread Stephen Powis
Neat, this would be super helpful! I submitted this ages ago:
https://issues.apache.org/jira/browse/KAFKA-

On Fri, Aug 31, 2018 at 5:04 AM, Satish Duggana 
wrote:

> +including both dev and user mailing lists.
>
> Hi,
> Thanks for the KIP.
>
> "* For us, the message keys represent some metadata which we use to either
> ignore messages (if a loop-back to the sender), or log some information.*"
>
> Above statement was mentioned in the KIP about how key value is used. I
> guess the topic is not configured to be compacted and you do not want to
> have partitioning based on that key. IMHO, it qualifies more as a header
> than a key. What do you think about building records with a specific header
> and consumers to execute the logic whether to process or ignore the
> messages based on that header value.
>
> Thanks,
> Satish.
>
>
> On Fri, Aug 31, 2018 at 1:32 AM, Satish Duggana 
> wrote:
>
> > Hi,
> > Thanks for the KIP.
> >
> > "* For us, the message keys represent some metadata which we use to
> > either ignore messages (if a loop-back to the sender), or log some
> > information.*"
> >
> > Above statement was mentioned in the KIP about how key value is used. I
> > guess the topic is not configured to be compacted and you do not want to
> > have partitioning based on that key. IMHO, it qualifies more as a header
> > than a key. What do you think about building records with a specific
> header
> > and consumers to execute the logic whether to process or ignore the
> > messages based on that header value.
> >
> > Thanks,
> > Satish.
> >
> >
> > On Fri, Aug 31, 2018 at 12:02 AM, M. Manna  wrote:
> >
> >> Hi Harsha,
> >>
> >> thanks for reading the KIP.
> >>
> >> The intent is to use the DefaultPartitioner logic for round-robin
> >> selection
> >> of partition regardless of key type.
> >>
> >> Implementing Partitioner interface isn’t the issue here, you would have
> to
> >> do that anyway if  you are implementing your own. But we also want this
> to
> >> be part of formal codebase.
> >>
> >> Regards,
> >>
> >> On Thu, 30 Aug 2018 at 16:58, Harsha  wrote:
> >>
> >> > Hi,
> >> >   Thanks for the KIP. I am trying to understand the intent of the
> >> > KIP.  Is the use case you specified can't be achieved by implementing
> >> the
> >> > Partitioner interface here?
> >> > https://github.com/apache/kafka/blob/trunk/clients/src/main/
> >> java/org/apache/kafka/clients/producer/Partitioner.java#L28
> >> > .
> >> > Use your custom partitioner to be configured in your producer clients.
> >> >
> >> > Thanks,
> >> > Harsha
> >> >
> >> > On Thu, Aug 30, 2018, at 1:45 AM, M. Manna wrote:
> >> > > Hello,
> >> > >
> >> > > I opened a very simple KIP and there exists a JIRA for it.
> >> > >
> >> > > I would be grateful if any comments are available for action.
> >> > >
> >> > > Regards,
> >> >
> >>
> >
> >
>


Re: Mirrormaker producing to only one partition in a topic

2018-05-29 Thread Stephen Powis
Hey Ryan,

I ran into a similar issue and it was how the RoundRobinAssignor/Partitioner
was hashing the keys in my messages.  You may want to look at how thats
implemented and see if its causing all of your messages to end up in the
same partition.

For what its worth, this ticket has the implementation I put together to
deal with it: https://issues.apache.org/jira/browse/KAFKA-

Stephen

On Wed, May 30, 2018 at 5:28 AM, Ryan En  wrote:

> Hi, I'm using Kafka version 0.10.2.0 and trying to use Mirrormaker to the
> messages from one Kafka cluster to another.
>
>
> The source and target Kafka cluster are pretty much set up the same...
> replication factor is 3, number of partitions is 3,
> auto.create.topics.enable is true.
>
>
> I am finding that in the target Kafka cluster, all the messages produced
> by Mirrormaker are going into one partition only.
>
>
> Here are my configs:
>
>
> mirrormaker producer
>
>
> security.protocol=SASL_SSL
> sasl.mechanism=PLAIN
> sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule
> required username="" password="xx";
> ssl.truststore.location=/path/to/truststore.jks
> ssl.truststore.password=xxx
> ssl.keystore.location=/path/to/keystore.jks
> ssl.keystore.password=x
> ssl.key.password=
> retries=500
> max.in.flight.requests.per.connection=1
> client.id=
> compression.type=lz4
> bootstrap.servers=:9292
> acks=all
>
>
> mirrormaker consumer
>
>
> security.protocol=SASL_SSL
> sasl.mechanism=PLAIN
> sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule
> required username="" password="xx";
> ssl.truststore.location=/path/to/truststore.jks
> ssl.truststore.password=xx
> ssl.keystore.location=/path/to/keystore.jks
> ssl.keystore.password=xx
> ssl.key.password=x
> client.id=
> partition.assignment.strategy=org.apache.kafka.clients.
> consumer.RoundRobinAssignor
> auto.commit.interval.ms=1000
> auto.offset.reset=earliest
> bootstrap.servers=:9192
> session.timeout.ms=3
> group.id=xx
> enable.auto.commit=false
>
>
> Any ideas why this may happen?
>
>
> Thanks in advance
>


Re: Using Kafka CLI without specifying the URLs every single time?

2018-04-22 Thread Stephen Powis
We also have created simple wrapper scripts for common operations.

On Sat, Apr 21, 2018 at 2:20 AM, Peter Bukowinski  wrote:

> One solution is to build wrapper scripts around the standard kafka
> scripts. You’d put your relevant cluster parameters (brokers, zookeepers)
> in a single config file (I like yaml), then your script would import that
> config file and pass the appropriate parameters to the kafka command. You
> could call the wrapper scripts by passing the name of the cluster as an
> argument and then passing the standard kafka options, e.g.
>
> ktopics --cluster my_cluster --list
>
>
> -- Peter Bukowinski
>
> > On Apr 20, 2018, at 3:23 AM, Horváth Péter Gergely <
> horvath.peter.gerg...@gmail.com> wrote:
> >
> > Hello All,
> >
> > I wondering if there is any way to avoid having to enter the host URLs
> for
> > each Kafka CLI command you execute.
> >
> > This is kind of tedious as different CLI commands require specifying
> > different servers (--broker-list, --bootstrap-server and --zookeeper);
> > which is especially painful if the host names are long, and only slightly
> > different (e.g. naming scheme for AWS:
> > ec2-12-34-56-2.region-x.compute.amazonaws.com).
> >
> > I know I could simply export shell variables for each type of endpoint
> and
> > refer that in the command, but that still only eases the pain:
> > export KAFKA_ZK=ec2-12-34-56-2.region-x.compute.amazonaws.com
> > bin/kafka-topics.sh --list --zookeeper ${KAFKA_ZK}
> >
> > Is there by any chance a better way of doing this I am not aware of?
> > Technically I am looking for some solution where I don't have to remember
> > that a Kafka CLI command expects --broker-list, --bootstrap-server or
> > --zookeeper, but can specify these settings once.
> >
> > Thanks,
> > Peter
>


Re: What's the use of timestamp in ProducerRecord?

2018-01-18 Thread Stephen Powis
A common use case is for calculating a "processing lag time" comparing the
record's timestamp (when it was published/persisted) against the current
time that the consumer 'consumed' it.  In theory, the delta is 'how far
behind' the consumer is, tho you can come up with a variety of reasons why
this wouldn't actually be true.

On Fri, Jan 19, 2018 at 12:52 PM, 1095193...@qq.com <1095193...@qq.com>
wrote:

> kafka  does not delete message when message is consumed, it will purge
> message when this message is expired. I guess this timeStamp is for
> checking whether message is expired.
>
>
>
> 1095193...@qq.com
>
> From: Jake Yoon
> Date: 2018-01-19 11:46
> To: users
> Subject: What's the use of timestamp in ProducerRecord?
> Hi, I am very new to Kafka.
> And I have a very basic question.
>
> Kafka doc says,
>
> *ProducerRecord
>  ProducerRecord.html#ProducerRecord(java.lang.String,%20java.lang.Integer,%
> 20java.lang.Long,%20K,%20V)>*
> (String
>  String.html?is-external=true>
> topic, Integer
>  Integer.html?is-external=true>
> partition, Long
>  Long.html?is-external=true>
> timestamp, K
>  ProducerRecord.html>
> key, V
>  ProducerRecord.html>
> value)
>
> and I know the default timestamp is the current time. But I am not sure
> what's the use of it.
>
> - Is it just to log when the record is added?
> - How Kafka use it for?
> - Are there any other uses of it?
> - Can "Consumer" retrieves the timestamp of the "ProducerRecord"?
>
> Thanks,
>


Re: Figuring out lag within Java consumer application

2017-10-12 Thread Stephen Powis
So I have the same use case as the original poster and had the same issue
with the older 0.10.x clients and not being able to determine the tail
offsets even tho the fetch response contains the HW mark.

>From what I could understand by tracing through the 0.11.0 consumer code,
it makes additional API/network calls to the kafka cluster to retrieve the
tail/end offsets information.  Assuming I haven't mis-read/mis-understood
the code, for most use cases this probably makes sense.  But in time
sensitive code, it bummed me out to have to make additional calls to get
that information when technically it's already available via the HW
property in the fetches, just the consumer has no access to it.

Is there any talk about exposing this property somewhere in the consumers
in the future?

On Fri, Oct 13, 2017 at 8:35 AM, Manan G  wrote:

> NM. 0.11 KafkaConsumer seems to have added "endOffsets" API!
>
> On Thu, Oct 12, 2017 at 3:31 PM, Manan G  wrote:
>
> > For my use case, I need to figure out the lag within the Java consumer
> > itself that is consuming some topic. Ideally, the consumer application
> > would monitor the lag every minute or so and take some action on its own
> if
> > consumer falls behind (i.e. spin up more threads to process records - my
> > use case does not care about record order). For our purpose, it is OK if
> > lag information is bit stale.
> >
> > * AFAIK, Java KafkaConsumer APIs do not seem to expose any information
> > directly based on which I can figure out the lag within my Java consumer
> > application.
> >
> > * It seems that, alternatively, I can create separate new KafkaConsumer
> > (or possibly use existing KafkaConsumer my consumer application is
> using),
> > seek to end, and call "position()" API to figure out the end offsets for
> > all partitions I am interested in. Since within my consumer application,
> I
> > already know which offset consumer is at, I can figure out the lag.
> > However, for this one basic information, this solution is bit more
> involved
> > for our framework for various reasons. Also, it requires one to either
> use
> > separate KafkaConsumer just to figure out lag or possibly re-use same
> > KafkaConsumer our consumer application is using, but somehow implement
> the
> > logic of forwarding to end to find out end offset and resetting back (not
> > sure if it's feasible yet without issues).
> >
> > * If I am not mistaken, looking at KafkaConsumer code itself, "high
> > watermark" information seems to be already there in FetchResponse
> > (FetchResponse.PartitionData). It's just that it is not exposed. Is there
> > any way to retrieve this information somehow in my consumer application?
> >
> > In general, wouldn't it be useful for consumer application to have lag
> > information available with simple API call?
> >
> > Thanks,
> > M
> >
> >
> >
>


Re: Consumer service that supports retry with exponential backoff

2017-10-09 Thread Stephen Powis
Pretty much the implementation that we're using, exception using partitions
within a single topic because we have a low number of failures passing
through. Granted the first partition will take the brunt of the load vs the
higher order partitions.

On Tue, Oct 10, 2017 at 7:30 AM, Steven Schlansker <
sschlans...@opentable.com> wrote:

>
> > On Oct 9, 2017, at 2:41 PM, John Walker  wrote:
> >
> > I have a pair of services.  One dispatches commands to the other for
> > processing.
> >
> > My consumer sometimes fails to execute commands as a result of transient
> > errors.  To deal with this, commands are retried after an exponentially
> > increasing delay up to a maximum of 4 times.  (Delays: 1hr, 2hr, 4hr,
> 8hr.)
> >  What's the standard way to set up something like this using Kafka?
> >
> > The only solution I've found so far is to setup 5 topics (main_topic,
> > delayed_1hr, delayed_2hr, delayed_4hr,delayed_8hr), and then have pollers
> > that poll each of these topics, enforce delays, and escalate messages
> from
> > one topic to another if errors occur.
>
> Our approach for a similar scheduling problem was to assign each command a
> unique key and a "valid after" date.  Any time you fail to execute a
> command,
> you update the "valid after" with your exponential backoff algorithm, and
> produce
> the updated value over the same key.  Old versions are removed by
> compaction.
>
> Biggest downside is now your queue is no longer ordered, but for our
> problem
> in practice the number of pending messages is relatively small so we simply
> scan all outstanding messages every delivery interval to see if any is
> eligible
> for a retry.
>
>


Re: creating a controlled lag in for a consumer group

2017-09-03 Thread Stephen Powis
Hey Ido.

I haven't tried to do something like this, but we've worked out some future
plans to do something similar, so I have a bit of interest in what you're
saying.

I feel like there are some details left out from your post.  Are your
consumers able to keep up w/ the throughput without the sleep strategy in
place?  How are messages being published across partitions?  Evenly?  I
assume the timestamps on the messages are always increasing?  Where are the
timestamps generated? By the client publishers?  Are the clocks of all the
publishers sync'd?

Stephen

On Sun, Sep 3, 2017 at 8:01 PM, Ido Barkan  wrote:

> Hey all.
> We are trying to create a controlled lag (a lag of 30 secs). We are doing
> that by inspecting a timestamp field for each msg on a specific topic and
> doing an actual Thread.sleep for a computed time in the consumer thread
> (until the msg is 30 secs old). We were hoping to see that eventually most
> of the messages are 30 seconds old and the created lag stands for 30 secs
> of processing rate.
> This works for us in low throughput but on high rates we are starting to
> witness strange behavior: lag on partitions is starting to grow in an
> uncontrollable manner and becomes uneven per partition.
> We have tried multiple variations of this: sleeping per bulk of msgs,
> sleeping per msg, capping the sleep time etc but failed to resolve this.
>
> Has anyone tried to do this? any ideas?
>
> Thanks,
> Ido
> --
> Thanks,
> Ido
>


Re: Help please: Topics deleted, but Kafka continues to try and sync them with Zookeeper

2017-07-20 Thread Stephen Powis
I could be totally wrong, but I seem to recall that delete wasn't fully
implemented in 0.8.x?

On Fri, Jul 21, 2017 at 10:10 AM, Carl Haferd 
wrote:

> Chris,
>
> You could first check to make sure that delete.topic.enable is true and try
> deleting again if not.  If that doesn't work with 0.8.1.1 you might need to
> manually remove the topic's log files from the configured log.dirs folder
> on each broker in addition to removing the topic's zookeeper path.
>
> Carl
>
> On Thu, Jul 20, 2017 at 10:06 AM, Chris Neal  wrote:
>
> > Hi all,
> >
> > I have a weird situation here.  I have deleted a few topics on my 0.8.1.1
> > cluster (old, I know...).  The deletes succeeded according to the
> > controller.log:
> >
> > [2017-07-20 16:40:31,175] INFO [TopicChangeListener on Controller 1]: New
> > topics: [Set()], deleted topics:
> > [Set(perf_doorway-supplier-adapter-uat_raw)], new partition replica
> > assignment [Map()]
> > (kafka.controller.PartitionStateMachine$TopicChangeListener)
> > [2017-07-20 16:40:33,507] INFO [TopicChangeListener on Controller 1]: New
> > topics: [Set()], deleted topics:
> > [Set(perf_doorway-supplier-scheduler-uat_raw)], new partition replica
> > assignment [Map()]
> > (kafka.controller.PartitionStateMachine$TopicChangeListener)
> > [2017-07-20 16:40:36,504] INFO [TopicChangeListener on Controller 1]: New
> > topics: [Set()], deleted topics: [Set(perf_gocontent-uat_raw)], new
> > partition replica assignment [Map()]
> > (kafka.controller.PartitionStateMachine$TopicChangeListener)
> > [2017-07-20 16:40:38,290] INFO [TopicChangeListener on Controller 1]: New
> > topics: [Set()], deleted topics: [Set(perf_goplatform-uat_raw)], new
> > partition replica assignment [Map()]
> > (kafka.controller.PartitionStateMachine$TopicChangeListener)
> >
> > I query Zookeeper and the path is not there under /brokers/topics as
> well.
> >
> > But, one of the nodes in my cluster continues to try and use them:
> >
> > [2017-07-20 17:04:36,723] ERROR Conditional update of path
> > /brokers/topics/perf_doorway-supplier-scheduler-uat_raw/
> partitions/3/state
> > with data
> > {"controller_epoch":34,"leader":1,"version":1,"leader_
> > epoch":2,"isr":[1,0]}
> > and expected version 69 failed due to
> > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> > NoNode for
> > /brokers/topics/perf_doorway-supplier-scheduler-uat_raw/
> partitions/3/state
> > (kafka.utils.ZkUtils$)
> > [2017-07-20 17:04:36,723] INFO Partition
> > [perf_doorway-supplier-scheduler-uat_raw,3] on broker 1: Cached
> zkVersion
> > [69] not equal to that in zookeeper, skip updating ISR
> > (kafka.cluster.Partition)
> > [2017-07-20 17:04:36,723] INFO Partition
> > [perf_doorway-supplier-scheduler-uat_raw,3] on broker 1: Cached
> zkVersion
> > [69] not equal to that in zookeeper, skip updating ISR
> > (kafka.cluster.Partition)
> > [2017-07-20 17:04:36,764] INFO Partition [perf_goplatform-uat_raw,2] on
> > broker 1: Shrinking ISR for partition [perf_goplatform-uat_raw,2] from
> 1,0
> > to 1 (kafka.cluster.Partition)
> > [2017-07-20 17:04:36,764] INFO Partition [perf_goplatform-uat_raw,2] on
> > broker 1: Shrinking ISR for partition [perf_goplatform-uat_raw,2] from
> 1,0
> > to 1 (kafka.cluster.Partition)
> > [2017-07-20 17:04:36,765] ERROR Conditional update of path
> > /brokers/topics/perf_goplatform-uat_raw/partitions/2/state with data
> > {"controller_epoch":34,"leader":1,"version":1,"leader_
> epoch":2,"isr":[1]}
> > and expected version 70 failed due to
> > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> > NoNode for /brokers/topics/perf_goplatform-uat_raw/partitions/2/state
> > (kafka.utils.ZkUtils$)
> > [2017-07-20 17:04:36,765] ERROR Conditional update of path
> > /brokers/topics/perf_goplatform-uat_raw/partitions/2/state with data
> > {"controller_epoch":34,"leader":1,"version":1,"leader_
> epoch":2,"isr":[1]}
> > and expected version 70 failed due to
> > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> > NoNode for /brokers/topics/perf_goplatform-uat_raw/partitions/2/state
> > (kafka.utils.ZkUtils$)
> > [2017-07-20 17:04:36,765] INFO Partition [perf_goplatform-uat_raw,2] on
> > broker 1: Cached zkVersion [70] not equal to that in zookeeper, skip
> > updating ISR (kafka.cluster.Partition)
> > [2017-07-20 17:04:36,765] INFO Partition [perf_goplatform-uat_raw,2] on
> > broker 1: Cached zkVersion [70] not equal to that in zookeeper, skip
> > updating ISR (kafka.cluster.Partition)
> > [2017-07-20 17:04:36,981] INFO Partition [perf_gocontent-uat_raw,1] on
> > broker 1: Shrinking ISR for partition [perf_gocontent-uat_raw,1] from 1,0
> > to 1 (kafka.cluster.Partition)
> > [2017-07-20 17:04:36,981] INFO Partition [perf_gocontent-uat_raw,1] on
> > broker 1: Shrinking ISR for partition [perf_gocontent-uat_raw,1] from 1,0
> > to 1 (kafka.cluster.Partition)
> > [2017-07-20 17:04:36,988] ERROR Conditional update of path
> > 

Is this a decent use case for Kafka Streams?

2017-07-12 Thread Stephen Powis
Hey! I was hoping I could get some input from people more experienced with
Kafka Streams to determine if they'd be a good use case/solution for me.

I have multi-tenant clients submitting data to a Kafka topic that they want
ETL'd to a third party service.  I'd like to batch and group these by
tenant over a time window, somewhere between 1 and 5 minutes.  At the end
of a time window then issue an API request to the third party service for
each tenant sending the batch of data over.

Other points of note:
- Ideally we'd have exactly-once semantics, sending data multiple times
would typically be bad.  But we'd need to gracefully handle things like API
request errors / service outages.

- We currently use Storm for doing stream processing, but the long running
time-windows and potentially large amount of data stored in memory make me
a bit nervous to use it for this.

Thoughts?  Thanks in Advance!
Stephen


Re: Taking a long time to roll a new log segment (~1 min)

2017-01-13 Thread Stephen Powis
Yep, that's exactly what the output is showing.

On Fri, Jan 13, 2017 at 11:25 AM, Tauzell, Dave <
dave.tauz...@surescripts.com> wrote:

> I haven't used dtrace, but is it possible to have it running and recording
> the ftruncate64 times?  Then when you see one of these long roll times look
> at the dtrace log to see if it was that call?
>
> -Dave
>
> -Original Message-
> From: Stephen Powis [mailto:spo...@salesforce.com]
> Sent: Friday, January 13, 2017 9:25 AM
> To: users@kafka.apache.org
> Subject: Re: Taking a long time to roll a new log segment (~1 min)
>
> So the underlying system call is ftruncate64, logged using dtrace.
>
> # BEGIN stack trace for ftruncate, call took 34160541200ns:
> > args==
> >  0x7f5f9a1134d7 : ftruncate64+0x7/0x30 [/lib64/libc-2.12.so]
> >  0x7f5f987b80c3 : 0x7f5f987b80c3
> > [/usr/lib/jvm/jdk1.8.0_51/jre/lib/amd64/libjava.so+0x180c3/0x22c000]
> > # END stack trace for ftruncate
> >
>
> These index files seem to be typically around 10MB for open log segments,
> and get truncated down to ~1MB after being closed.  I'd imagine truncating
> off ~9MB shouldn't be that heavy of an operation.
>
> ZFS options:
>
> > data/kafka   type  filesystem -
> > data/kafka   creation  Tue Feb  9 22:45 2016  -
> > data/kafka   used  196G   -
> > data/kafka   available 6.65T  -
> > data/kafka   referenced196G   -
> > data/kafka   compressratio 4.59x  -
> > data/kafka   mounted   yes-
> > data/kafka   quota none   default
> > data/kafka   reservation   none   default
> > data/kafka   recordsize128K   default
> > data/kafka   mountpoint/kafka local
> > data/kafka   sharenfs  offdefault
> > data/kafka   checksum  on default
> > data/kafka   compression   lz4inherited
> > from data
> > data/kafka   atime offinherited
> > from data
> > data/kafka   devices   on default
> > data/kafka   exec  on default
> > data/kafka   setuidon default
> > data/kafka   readonly  offdefault
> > data/kafka   zoned offdefault
> > data/kafka   snapdir   hidden default
> > data/kafka   aclinheritrestricted default
> > data/kafka   canmount  on default
> > data/kafka   xattr on default
>
> data/kafka   copies1  default
> > data/kafka   version   5  -
> > data/kafka   utf8only  off-
> > data/kafka   normalization none   -
> > data/kafka   casesensitivity   sensitive  -
> > data/kafka   vscan offdefault
> > data/kafka   nbmandoffdefault
> > data/kafka   sharesmb  offdefault
> > data/kafka   refquota  none   default
> > data/kafka   refreservationnone   default
> > data/kafka   primarycache  alldefault
> > data/kafka   secondarycachealldefault
> > data/kafka   usedbysnapshots   0  -
> > data/kafka   usedbydataset 196G   -
> > data/kafka   usedbychildren0  -
> > data/kafka   usedbyrefreservation  0  -
> > data/kafka   logbias   throughput local
> > data/kafka   dedup offdefault
> > data/kafka   mlslabel  none   default
> > data/kafka   sync  standard   default
> > data/kafka   refcompressratio  4.59x  -
> > data/kafka   written   196G   -
> > data/kafka   logicalused   900G   -
> > data/kafka   logicalr

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-13 Thread Stephen Powis
ata/kafka   overlay   offdefault
>

CPU IO Wait is extremely low (below 0.01%) with no recorded spikes.

On Fri, Jan 13, 2017 at 6:57 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Stephen,
>
> I think it's worth capturing this in a JIRA so that it doesn't get lost.
>
> Ismael
>
> On Thu, Jan 12, 2017 at 7:22 PM, Stephen Powis <spo...@salesforce.com>
> wrote:
>
> > I've further narrowed it down to this particular line:
> > https://github.com/apache/kafka/blob/0.10.0/core/src/
> main/scala/kafka/log/
> > OffsetIndex.scala#L294
> >
> > But I'm still at a loss to why this would be slow sometimes and not
> others.
> >
> > On Thu, Jan 12, 2017 at 10:56 AM, Stephen Powis <spo...@salesforce.com>
> > wrote:
> >
> > > Using the little bash script in that JIRA ticket to go thru the GC log
> > and
> > > sum up the total pause times I come up with the following.  I don't see
> > > anything that would indicate a ~28 second pause.
> > >
> > > 2017-01-12T07:00 = 72.2961
> > >> 2017-01-12T07:01 = 59.8112
> > >> 2017-01-12T07:02 = 69.6915
> > >> 2017-01-12T07:03 = 74.4093
> > >> 2017-01-12T07:04 = 73.507
> > >> 2017-01-12T07:05 = 73.7028
> > >> 2017-01-12T07:06 = 72.3392
> > >> 2017-01-12T07:07 = 70.502
> > >> 2017-01-12T07:08 = 71.7471
> > >> 2017-01-12T07:09 = 76.8371
> > >> 2017-01-12T07:10 = 72.3836
> > >> 2017-01-12T07:11 = 74.7462
> > >> 2017-01-12T07:12 = 57.1328
> > >> 2017-01-12T07:13 = 80.005
> > >> 2017-01-12T07:14 = 76.0012
> > >> 2017-01-12T07:15 = 79.2221
> > >> 2017-01-12T07:16 = 57.5385
> > >> 2017-01-12T07:17 = 58.1185
> > >> 2017-01-12T07:18 = 72.0524
> > >> 2017-01-12T07:19 = 110.6736
> > >> 2017-01-12T07:20 = 64.9808
> > >> 2017-01-12T07:21 = 66.8767
> > >> 2017-01-12T07:22 = 32.409
> > >> 2017-01-12T07:23 = 85.3671
> > >> 2017-01-12T07:24 = 123.9212
> > >> 2017-01-12T07:25 = 70.3219
> > >>
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Jan 12, 2017 at 10:49 AM, Stephen Powis <spo...@salesforce.com
> >
> > > wrote:
> > >
> > >> Just realized that GCEasy doesn't keep reports around for very long
> > >> anymore, here is a screencap of the report:  http://imgur.com/a/MEubD
> > >>
> > >> The longest reported GC pause was 30ms, tho they happen somewhat
> > >> frequently at an average of once per 12 seconds.  KAFKA-4616 certainly
> > >> sounds just like my scenario, but I'm having trouble finding the long
> GC
> > >> pauses described in that ticket in my own logs.
> > >>
> > >> On Thu, Jan 12, 2017 at 10:42 AM, tao xiao <xiaotao...@gmail.com>
> > wrote:
> > >>
> > >>> You may run into this bug https://issues.apache.org/jira
> > >>> /browse/KAFKA-4614
> > >>>
> > >>> On Thu, 12 Jan 2017 at 23:38 Stephen Powis <spo...@salesforce.com>
> > >>> wrote:
> > >>>
> > >>> > Per my email to the list in Sept, when I reviewed GC logs then, I
> > >>> didn't
> > >>> > see anything out of the ordinary. (
> > >>> >
> > >>> > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.
> > >>> mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ%
> > >>> 40mail.gmail.com%3E
> > >>> > )
> > >>> >
> > >>> > Reviewing the GC logs from this morning around the timestamp in my
> > >>> previous
> > >>> > email, I see GC running roughly every 10-12 seconds, with total
> times
> > >>> > similar to the following:
> > >>> >
> > >>> > 2017-01-12T07:16:46.867-0500: 46891.844: Total time for which
> > >>> application
> > >>> > > threads were stopped: 0.0141281 seconds, Stopping threads took:
> > >>> 0.0002171
> > >>> > > seconds
> > >>> > >
> > >>> > >
> > >>> > Here is a link to a GCEasy report:
> > >>> >
> > >>> > http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIv
> > >>> LS10b3RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=
> > >>> >
> > >>> >
> > 

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
Running Centos 6.7 3.10.95-1.el6.elrepo.x86_64.  4 SATA disks in RAID10
with ZFS

On Thu, Jan 12, 2017 at 2:27 PM, Tauzell, Dave <dave.tauz...@surescripts.com
> wrote:

> You have a local filesystem? Linux?
>
> -Dave
>
> -Original Message-
> From: Stephen Powis [mailto:spo...@salesforce.com]
> Sent: Thursday, January 12, 2017 1:22 PM
> To: users@kafka.apache.org
> Subject: Re: Taking a long time to roll a new log segment (~1 min)
>
> I've further narrowed it down to this particular line:
> https://github.com/apache/kafka/blob/0.10.0/core/src/main/scala/kafka/log/
> OffsetIndex.scala#L294
>
> But I'm still at a loss to why this would be slow sometimes and not others.
>
> On Thu, Jan 12, 2017 at 10:56 AM, Stephen Powis <spo...@salesforce.com>
> wrote:
>
> > Using the little bash script in that JIRA ticket to go thru the GC log
> > and sum up the total pause times I come up with the following.  I
> > don't see anything that would indicate a ~28 second pause.
> >
> > 2017-01-12T07:00 = 72.2961
> >> 2017-01-12T07:01 = 59.8112
> >> 2017-01-12T07:02 = 69.6915
> >> 2017-01-12T07:03 = 74.4093
> >> 2017-01-12T07:04 = 73.507
> >> 2017-01-12T07:05 = 73.7028
> >> 2017-01-12T07:06 = 72.3392
> >> 2017-01-12T07:07 = 70.502
> >> 2017-01-12T07:08 = 71.7471
> >> 2017-01-12T07:09 = 76.8371
> >> 2017-01-12T07:10 = 72.3836
> >> 2017-01-12T07:11 = 74.7462
> >> 2017-01-12T07:12 = 57.1328
> >> 2017-01-12T07:13 = 80.005
> >> 2017-01-12T07:14 = 76.0012
> >> 2017-01-12T07:15 = 79.2221
> >> 2017-01-12T07:16 = 57.5385
> >> 2017-01-12T07:17 = 58.1185
> >> 2017-01-12T07:18 = 72.0524
> >> 2017-01-12T07:19 = 110.6736
> >> 2017-01-12T07:20 = 64.9808
> >> 2017-01-12T07:21 = 66.8767
> >> 2017-01-12T07:22 = 32.409
> >> 2017-01-12T07:23 = 85.3671
> >> 2017-01-12T07:24 = 123.9212
> >> 2017-01-12T07:25 = 70.3219
> >>
> >
> >
> >
> >
> >
> > On Thu, Jan 12, 2017 at 10:49 AM, Stephen Powis
> > <spo...@salesforce.com>
> > wrote:
> >
> >> Just realized that GCEasy doesn't keep reports around for very long
> >> anymore, here is a screencap of the report:  http://imgur.com/a/MEubD
> >>
> >> The longest reported GC pause was 30ms, tho they happen somewhat
> >> frequently at an average of once per 12 seconds.  KAFKA-4616
> >> certainly sounds just like my scenario, but I'm having trouble
> >> finding the long GC pauses described in that ticket in my own logs.
> >>
> >> On Thu, Jan 12, 2017 at 10:42 AM, tao xiao <xiaotao...@gmail.com>
> wrote:
> >>
> >>> You may run into this bug https://issues.apache.org/jira
> >>> /browse/KAFKA-4614
> >>>
> >>> On Thu, 12 Jan 2017 at 23:38 Stephen Powis <spo...@salesforce.com>
> >>> wrote:
> >>>
> >>> > Per my email to the list in Sept, when I reviewed GC logs then, I
> >>> didn't
> >>> > see anything out of the ordinary. (
> >>> >
> >>> > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.
> >>> mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ%
> >>> 40mail.gmail.com%3E
> >>> > )
> >>> >
> >>> > Reviewing the GC logs from this morning around the timestamp in my
> >>> previous
> >>> > email, I see GC running roughly every 10-12 seconds, with total
> >>> > times similar to the following:
> >>> >
> >>> > 2017-01-12T07:16:46.867-0500: 46891.844: Total time for which
> >>> application
> >>> > > threads were stopped: 0.0141281 seconds, Stopping threads took:
> >>> 0.0002171
> >>> > > seconds
> >>> > >
> >>> > >
> >>> > Here is a link to a GCEasy report:
> >>> >
> >>> > http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIv
> >>> LS10b3RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=
> >>> >
> >>> >
> >>> > Currently using G1 gc with the following settings:
> >>> >
> >>> > -Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc
> >>> > -Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps
> >>> -XX:+PrintGCDetails
> >>> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> >>> > -XX:+PrintTLAB -XX:+DisableExplicitGC -XX:+UseGCLogFi

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
I've further narrowed it down to this particular line:
https://github.com/apache/kafka/blob/0.10.0/core/src/main/scala/kafka/log/OffsetIndex.scala#L294

But I'm still at a loss to why this would be slow sometimes and not others.

On Thu, Jan 12, 2017 at 10:56 AM, Stephen Powis <spo...@salesforce.com>
wrote:

> Using the little bash script in that JIRA ticket to go thru the GC log and
> sum up the total pause times I come up with the following.  I don't see
> anything that would indicate a ~28 second pause.
>
> 2017-01-12T07:00 = 72.2961
>> 2017-01-12T07:01 = 59.8112
>> 2017-01-12T07:02 = 69.6915
>> 2017-01-12T07:03 = 74.4093
>> 2017-01-12T07:04 = 73.507
>> 2017-01-12T07:05 = 73.7028
>> 2017-01-12T07:06 = 72.3392
>> 2017-01-12T07:07 = 70.502
>> 2017-01-12T07:08 = 71.7471
>> 2017-01-12T07:09 = 76.8371
>> 2017-01-12T07:10 = 72.3836
>> 2017-01-12T07:11 = 74.7462
>> 2017-01-12T07:12 = 57.1328
>> 2017-01-12T07:13 = 80.005
>> 2017-01-12T07:14 = 76.0012
>> 2017-01-12T07:15 = 79.2221
>> 2017-01-12T07:16 = 57.5385
>> 2017-01-12T07:17 = 58.1185
>> 2017-01-12T07:18 = 72.0524
>> 2017-01-12T07:19 = 110.6736
>> 2017-01-12T07:20 = 64.9808
>> 2017-01-12T07:21 = 66.8767
>> 2017-01-12T07:22 = 32.409
>> 2017-01-12T07:23 = 85.3671
>> 2017-01-12T07:24 = 123.9212
>> 2017-01-12T07:25 = 70.3219
>>
>
>
>
>
>
> On Thu, Jan 12, 2017 at 10:49 AM, Stephen Powis <spo...@salesforce.com>
> wrote:
>
>> Just realized that GCEasy doesn't keep reports around for very long
>> anymore, here is a screencap of the report:  http://imgur.com/a/MEubD
>>
>> The longest reported GC pause was 30ms, tho they happen somewhat
>> frequently at an average of once per 12 seconds.  KAFKA-4616 certainly
>> sounds just like my scenario, but I'm having trouble finding the long GC
>> pauses described in that ticket in my own logs.
>>
>> On Thu, Jan 12, 2017 at 10:42 AM, tao xiao <xiaotao...@gmail.com> wrote:
>>
>>> You may run into this bug https://issues.apache.org/jira
>>> /browse/KAFKA-4614
>>>
>>> On Thu, 12 Jan 2017 at 23:38 Stephen Powis <spo...@salesforce.com>
>>> wrote:
>>>
>>> > Per my email to the list in Sept, when I reviewed GC logs then, I
>>> didn't
>>> > see anything out of the ordinary. (
>>> >
>>> > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.
>>> mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ%
>>> 40mail.gmail.com%3E
>>> > )
>>> >
>>> > Reviewing the GC logs from this morning around the timestamp in my
>>> previous
>>> > email, I see GC running roughly every 10-12 seconds, with total times
>>> > similar to the following:
>>> >
>>> > 2017-01-12T07:16:46.867-0500: 46891.844: Total time for which
>>> application
>>> > > threads were stopped: 0.0141281 seconds, Stopping threads took:
>>> 0.0002171
>>> > > seconds
>>> > >
>>> > >
>>> > Here is a link to a GCEasy report:
>>> >
>>> > http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIv
>>> LS10b3RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=
>>> >
>>> >
>>> > Currently using G1 gc with the following settings:
>>> >
>>> > -Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc
>>> > -Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps
>>> -XX:+PrintGCDetails
>>> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
>>> > -XX:+PrintTLAB -XX:+DisableExplicitGC -XX:+UseGCLogFileRotation
>>> > -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -XX:+UseCompressedOops
>>> > -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxGCPauseMillis=20
>>> > -XX:+HeapDumpOnOutOfMemoryError
>>> > -XX:HeapDumpPath=/var/log/kafka/heapDump.log
>>> > -Xloggc:/opt/kafka/current/bin/../logs/kafkaServer-gc.log -verbose:gc
>>> > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
>>> >
>>> >
>>> >
>>> > On Thu, Jan 12, 2017 at 9:55 AM, Tauzell, Dave <
>>> > dave.tauz...@surescripts.com
>>> > > wrote:
>>> >
>>> > > Can you collect garbage collection stats and verify there isn't a
>>> long GC
>>> > > happening at the same time?
>>> > >
>>> > > -Dave
>>> > >
>>> > > -Original Message-
>>> &

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
Using the little bash script in that JIRA ticket to go thru the GC log and
sum up the total pause times I come up with the following.  I don't see
anything that would indicate a ~28 second pause.

2017-01-12T07:00 = 72.2961
> 2017-01-12T07:01 = 59.8112
> 2017-01-12T07:02 = 69.6915
> 2017-01-12T07:03 = 74.4093
> 2017-01-12T07:04 = 73.507
> 2017-01-12T07:05 = 73.7028
> 2017-01-12T07:06 = 72.3392
> 2017-01-12T07:07 = 70.502
> 2017-01-12T07:08 = 71.7471
> 2017-01-12T07:09 = 76.8371
> 2017-01-12T07:10 = 72.3836
> 2017-01-12T07:11 = 74.7462
> 2017-01-12T07:12 = 57.1328
> 2017-01-12T07:13 = 80.005
> 2017-01-12T07:14 = 76.0012
> 2017-01-12T07:15 = 79.2221
> 2017-01-12T07:16 = 57.5385
> 2017-01-12T07:17 = 58.1185
> 2017-01-12T07:18 = 72.0524
> 2017-01-12T07:19 = 110.6736
> 2017-01-12T07:20 = 64.9808
> 2017-01-12T07:21 = 66.8767
> 2017-01-12T07:22 = 32.409
> 2017-01-12T07:23 = 85.3671
> 2017-01-12T07:24 = 123.9212
> 2017-01-12T07:25 = 70.3219
>





On Thu, Jan 12, 2017 at 10:49 AM, Stephen Powis <spo...@salesforce.com>
wrote:

> Just realized that GCEasy doesn't keep reports around for very long
> anymore, here is a screencap of the report:  http://imgur.com/a/MEubD
>
> The longest reported GC pause was 30ms, tho they happen somewhat
> frequently at an average of once per 12 seconds.  KAFKA-4616 certainly
> sounds just like my scenario, but I'm having trouble finding the long GC
> pauses described in that ticket in my own logs.
>
> On Thu, Jan 12, 2017 at 10:42 AM, tao xiao <xiaotao...@gmail.com> wrote:
>
>> You may run into this bug https://issues.apache.org/jira
>> /browse/KAFKA-4614
>>
>> On Thu, 12 Jan 2017 at 23:38 Stephen Powis <spo...@salesforce.com> wrote:
>>
>> > Per my email to the list in Sept, when I reviewed GC logs then, I didn't
>> > see anything out of the ordinary. (
>> >
>> > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.
>> mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ%
>> 40mail.gmail.com%3E
>> > )
>> >
>> > Reviewing the GC logs from this morning around the timestamp in my
>> previous
>> > email, I see GC running roughly every 10-12 seconds, with total times
>> > similar to the following:
>> >
>> > 2017-01-12T07:16:46.867-0500: 46891.844: Total time for which
>> application
>> > > threads were stopped: 0.0141281 seconds, Stopping threads took:
>> 0.0002171
>> > > seconds
>> > >
>> > >
>> > Here is a link to a GCEasy report:
>> >
>> > http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIv
>> LS10b3RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=
>> >
>> >
>> > Currently using G1 gc with the following settings:
>> >
>> > -Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc
>> > -Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails
>> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
>> > -XX:+PrintTLAB -XX:+DisableExplicitGC -XX:+UseGCLogFileRotation
>> > -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -XX:+UseCompressedOops
>> > -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxGCPauseMillis=20
>> > -XX:+HeapDumpOnOutOfMemoryError
>> > -XX:HeapDumpPath=/var/log/kafka/heapDump.log
>> > -Xloggc:/opt/kafka/current/bin/../logs/kafkaServer-gc.log -verbose:gc
>> > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
>> >
>> >
>> >
>> > On Thu, Jan 12, 2017 at 9:55 AM, Tauzell, Dave <
>> > dave.tauz...@surescripts.com
>> > > wrote:
>> >
>> > > Can you collect garbage collection stats and verify there isn't a
>> long GC
>> > > happening at the same time?
>> > >
>> > > -Dave
>> > >
>> > > -Original Message-
>> > > From: Stephen Powis [mailto:spo...@salesforce.com]
>> > > Sent: Thursday, January 12, 2017 8:34 AM
>> > > To: users@kafka.apache.org
>> > > Subject: Re: Taking a long time to roll a new log segment (~1 min)
>> > >
>> > > So per the kafka docs I up'd our FD limit to 100k, and we are no
>> longer
>> > > seeing the process die, which is good.
>> > >
>> > > Unfortunately we're still seeing very high log segment roll times, and
>> > I'm
>> > > unsure if this is considered 'normal', as it tends to block producers
>> > > during this period.
>> > >
>> > > We are running kafka 0.10.0.1, but I patched in some additio

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
Just realized that GCEasy doesn't keep reports around for very long
anymore, here is a screencap of the report:  http://imgur.com/a/MEubD

The longest reported GC pause was 30ms, tho they happen somewhat frequently
at an average of once per 12 seconds.  KAFKA-4616 certainly sounds just
like my scenario, but I'm having trouble finding the long GC pauses
described in that ticket in my own logs.

On Thu, Jan 12, 2017 at 10:42 AM, tao xiao <xiaotao...@gmail.com> wrote:

> You may run into this bug https://issues.apache.org/jira/browse/KAFKA-4614
>
> On Thu, 12 Jan 2017 at 23:38 Stephen Powis <spo...@salesforce.com> wrote:
>
> > Per my email to the list in Sept, when I reviewed GC logs then, I didn't
> > see anything out of the ordinary. (
> >
> > http://mail-archives.apache.org/mod_mbox/kafka-users/
> 201609.mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz
> 0kCjmdB9c0SDcLQ%40mail.gmail.com%3E
> > )
> >
> > Reviewing the GC logs from this morning around the timestamp in my
> previous
> > email, I see GC running roughly every 10-12 seconds, with total times
> > similar to the following:
> >
> > 2017-01-12T07:16:46.867-0500: 46891.844: Total time for which application
> > > threads were stopped: 0.0141281 seconds, Stopping threads took:
> 0.0002171
> > > seconds
> > >
> > >
> > Here is a link to a GCEasy report:
> >
> > http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIvLS10b3
> RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=
> >
> >
> > Currently using G1 gc with the following settings:
> >
> > -Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc
> > -Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails
> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> > -XX:+PrintTLAB -XX:+DisableExplicitGC -XX:+UseGCLogFileRotation
> > -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -XX:+UseCompressedOops
> > -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxGCPauseMillis=20
> > -XX:+HeapDumpOnOutOfMemoryError
> > -XX:HeapDumpPath=/var/log/kafka/heapDump.log
> > -Xloggc:/opt/kafka/current/bin/../logs/kafkaServer-gc.log -verbose:gc
> > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> >
> >
> >
> > On Thu, Jan 12, 2017 at 9:55 AM, Tauzell, Dave <
> > dave.tauz...@surescripts.com
> > > wrote:
> >
> > > Can you collect garbage collection stats and verify there isn't a long
> GC
> > > happening at the same time?
> > >
> > > -Dave
> > >
> > > -Original Message-
> > > From: Stephen Powis [mailto:spo...@salesforce.com]
> > > Sent: Thursday, January 12, 2017 8:34 AM
> > > To: users@kafka.apache.org
> > > Subject: Re: Taking a long time to roll a new log segment (~1 min)
> > >
> > > So per the kafka docs I up'd our FD limit to 100k, and we are no longer
> > > seeing the process die, which is good.
> > >
> > > Unfortunately we're still seeing very high log segment roll times, and
> > I'm
> > > unsure if this is considered 'normal', as it tends to block producers
> > > during this period.
> > >
> > > We are running kafka 0.10.0.1, but I patched in some additionally
> timing
> > > statements into the kafka.log.log roll() method to narrow down exactly
> > > which part of that method is taking so long.
> > >
> > > Again, typically the process to roll a new log file takes only 1-2ms
> > tops,
> > > but several times a day it takes 30-60+ seconds, across all of our
> > > brokers.  I've narrowed it down to this bit of code causing the issue:
> > > https://github.com/apache/kafka/blob/0.10.0/core/src/
> > > main/scala/kafka/log/Log.scala#L652-L658
> > >
> > > Here's an example of output w/ my additional timing log statements:
> > >
> > > [2017-01-12 07:17:58,199] INFO Rolled new log segment for 'MyTopic-4'
> in
> > > > 28028 ms. (kafka.log.Log)
> > >
> > > [2017-01-12 07:17:58,200] INFO LastEntry match time 'EventStream-4' in
> > > > 28027 ms. (kafka.log.Log)
> > > >
> > >
> > > Any thoughts?  Help?  This considered normal?
> > >
> > > Thanks
> > > Stephen
> > >
> > >
> > > On Tue, Jan 10, 2017 at 2:37 AM, Ewen Cheslack-Postava <
> > e...@confluent.io>
> > > wrote:
> > >
> > > > I can't speak to the exact details of why fds would be kept open
> > > > longer in that specific case, but are you aware that the
> > > &g

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
Per my email to the list in Sept, when I reviewed GC logs then, I didn't
see anything out of the ordinary. (
http://mail-archives.apache.org/mod_mbox/kafka-users/201609.mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ%40mail.gmail.com%3E
)

Reviewing the GC logs from this morning around the timestamp in my previous
email, I see GC running roughly every 10-12 seconds, with total times
similar to the following:

2017-01-12T07:16:46.867-0500: 46891.844: Total time for which application
> threads were stopped: 0.0141281 seconds, Stopping threads took: 0.0002171
> seconds
>
>
Here is a link to a GCEasy report:
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIvLS10b3RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=


Currently using G1 gc with the following settings:

-Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc
-Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails
-XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
-XX:+PrintTLAB -XX:+DisableExplicitGC -XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -XX:+UseCompressedOops
-XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxGCPauseMillis=20
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/kafka/heapDump.log
-Xloggc:/opt/kafka/current/bin/../logs/kafkaServer-gc.log -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps



On Thu, Jan 12, 2017 at 9:55 AM, Tauzell, Dave <dave.tauz...@surescripts.com
> wrote:

> Can you collect garbage collection stats and verify there isn't a long GC
> happening at the same time?
>
> -Dave
>
> -Original Message-
> From: Stephen Powis [mailto:spo...@salesforce.com]
> Sent: Thursday, January 12, 2017 8:34 AM
> To: users@kafka.apache.org
> Subject: Re: Taking a long time to roll a new log segment (~1 min)
>
> So per the kafka docs I up'd our FD limit to 100k, and we are no longer
> seeing the process die, which is good.
>
> Unfortunately we're still seeing very high log segment roll times, and I'm
> unsure if this is considered 'normal', as it tends to block producers
> during this period.
>
> We are running kafka 0.10.0.1, but I patched in some additionally timing
> statements into the kafka.log.log roll() method to narrow down exactly
> which part of that method is taking so long.
>
> Again, typically the process to roll a new log file takes only 1-2ms tops,
> but several times a day it takes 30-60+ seconds, across all of our
> brokers.  I've narrowed it down to this bit of code causing the issue:
> https://github.com/apache/kafka/blob/0.10.0/core/src/
> main/scala/kafka/log/Log.scala#L652-L658
>
> Here's an example of output w/ my additional timing log statements:
>
> [2017-01-12 07:17:58,199] INFO Rolled new log segment for 'MyTopic-4' in
> > 28028 ms. (kafka.log.Log)
>
> [2017-01-12 07:17:58,200] INFO LastEntry match time 'EventStream-4' in
> > 28027 ms. (kafka.log.Log)
> >
>
> Any thoughts?  Help?  This considered normal?
>
> Thanks
> Stephen
>
>
> On Tue, Jan 10, 2017 at 2:37 AM, Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
> > I can't speak to the exact details of why fds would be kept open
> > longer in that specific case, but are you aware that the
> > recommendation for production clusters for open fd limits is much
> > higher? It's been suggested to be 100,000 as a starting point for quite
> awhile:
> > http://kafka.apache.org/documentation.html#os
> >
> > -Ewen
> >
> > On Mon, Jan 9, 2017 at 12:45 PM, Stephen Powis <spo...@salesforce.com>
> > wrote:
> >
> > > Hey!
> > >
> > > I've run into something concerning in our production clusterI
> > > believe I've posted this question to the mailing list previously (
> > > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.mbox/bro
> > > wser
> > )
> > > but the problem has become considerably more serious.
> > >
> > > We've been fighting issues where Kafka 0.10.0.1 hits its max file
> > > descriptor limit.  Our limit is set to ~16k, and under normal
> > > operation
> > it
> > > holds steady around 4k open files.
> > >
> > > But occasionally Kafka will roll a new log segment, which typically
> > > takes on the order of magnitude of a few milliseconds.
> > > However...sometimes it will take a considerable amount of time, any
> > > where from 40 seconds up to over a minute.  When this happens, it
> > > seems like connections are not released by kafka, and we end up with
> > > thousands of client connections
> > stuck
> > > in CLOSE_WAIT, which pile up and exceed our max file descriptor limit.
> > > This h

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
So per the kafka docs I up'd our FD limit to 100k, and we are no longer
seeing the process die, which is good.

Unfortunately we're still seeing very high log segment roll times, and I'm
unsure if this is considered 'normal', as it tends to block producers
during this period.

We are running kafka 0.10.0.1, but I patched in some additionally timing
statements into the kafka.log.log roll() method to narrow down exactly
which part of that method is taking so long.

Again, typically the process to roll a new log file takes only 1-2ms tops,
but several times a day it takes 30-60+ seconds, across all of our
brokers.  I've narrowed it down to this bit of code causing the issue:
https://github.com/apache/kafka/blob/0.10.0/core/src/main/scala/kafka/log/Log.scala#L652-L658

Here's an example of output w/ my additional timing log statements:

[2017-01-12 07:17:58,199] INFO Rolled new log segment for 'MyTopic-4' in
> 28028 ms. (kafka.log.Log)

[2017-01-12 07:17:58,200] INFO LastEntry match time 'EventStream-4' in
> 28027 ms. (kafka.log.Log)
>

Any thoughts?  Help?  This considered normal?

Thanks
Stephen


On Tue, Jan 10, 2017 at 2:37 AM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> I can't speak to the exact details of why fds would be kept open longer in
> that specific case, but are you aware that the recommendation for
> production clusters for open fd limits is much higher? It's been suggested
> to be 100,000 as a starting point for quite awhile:
> http://kafka.apache.org/documentation.html#os
>
> -Ewen
>
> On Mon, Jan 9, 2017 at 12:45 PM, Stephen Powis <spo...@salesforce.com>
> wrote:
>
> > Hey!
> >
> > I've run into something concerning in our production clusterI believe
> > I've posted this question to the mailing list previously (
> > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.mbox/browser
> )
> > but the problem has become considerably more serious.
> >
> > We've been fighting issues where Kafka 0.10.0.1 hits its max file
> > descriptor limit.  Our limit is set to ~16k, and under normal operation
> it
> > holds steady around 4k open files.
> >
> > But occasionally Kafka will roll a new log segment, which typically takes
> > on the order of magnitude of a few milliseconds.  However...sometimes it
> > will take a considerable amount of time, any where from 40 seconds up to
> > over a minute.  When this happens, it seems like connections are not
> > released by kafka, and we end up with thousands of client connections
> stuck
> > in CLOSE_WAIT, which pile up and exceed our max file descriptor limit.
> > This happens all in the span of about a minute.
> >
> > Our logs look like this:
> >
> > [2017-01-08 01:10:17,117] INFO Rolled new log segment for 'MyTopic-8' in
> > > 41122 ms. (kafka.log.Log)
> > > [2017-01-08 01:10:32,550] INFO Rolled new log segment for 'MyTopic-4'
> in
> > 1
> > > ms. (kafka.log.Log)
> > > [2017-01-08 01:11:10,039] INFO [Group Metadata Manager on Broker 4]:
> > > Removed 0 expired offsets in 0 milliseconds.
> > > (kafka.coordinator.GroupMetadataManager)
> > > [2017-01-08 01:19:02,877] ERROR Error while accepting connection
> > > (kafka.network.Acceptor)
> > > java.io.IOException: Too many open files   at
> > > sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> > >
> > at
> > > sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:
> > 422)
> > > at
> > > sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:
> > 250)
> > > at kafka.network.Acceptor.accept(SocketServer.scala:323)
> > > at kafka.network.Acceptor.run(SocketServer.scala:268)
> > > at java.lang.Thread.run(Thread.java:745)
> > > [2017-01-08 01:19:02,877] ERROR Error while accepting connection
> > > (kafka.network.Acceptor)
> > > java.io.IOException: Too many open files
> > > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> > > at
> > > sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:
> > 422)
> > > at
> > > sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:
> > 250)
> > > at kafka.network.Acceptor.accept(SocketServer.scala:323)
> > > at kafka.network.Acceptor.run(SocketServer.scala:268)
> > > at java.lang.Thread.run(Thread.java:745)
> > > .
> > >
> >
> >
> > And then kafka crashes.
> >
> > Has anyone seen this behavior of slow log segmented being rolled?  Any
> > ideas of how to track down what could be causing this?
> >
> > Thanks!
> > Stephen
> >
>


Taking a long time to roll a new log segment (~1 min)

2017-01-09 Thread Stephen Powis
Hey!

I've run into something concerning in our production clusterI believe
I've posted this question to the mailing list previously (
http://mail-archives.apache.org/mod_mbox/kafka-users/201609.mbox/browser)
but the problem has become considerably more serious.

We've been fighting issues where Kafka 0.10.0.1 hits its max file
descriptor limit.  Our limit is set to ~16k, and under normal operation it
holds steady around 4k open files.

But occasionally Kafka will roll a new log segment, which typically takes
on the order of magnitude of a few milliseconds.  However...sometimes it
will take a considerable amount of time, any where from 40 seconds up to
over a minute.  When this happens, it seems like connections are not
released by kafka, and we end up with thousands of client connections stuck
in CLOSE_WAIT, which pile up and exceed our max file descriptor limit.
This happens all in the span of about a minute.

Our logs look like this:

[2017-01-08 01:10:17,117] INFO Rolled new log segment for 'MyTopic-8' in
> 41122 ms. (kafka.log.Log)
> [2017-01-08 01:10:32,550] INFO Rolled new log segment for 'MyTopic-4' in 1
> ms. (kafka.log.Log)
> [2017-01-08 01:11:10,039] INFO [Group Metadata Manager on Broker 4]:
> Removed 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.GroupMetadataManager)
> [2017-01-08 01:19:02,877] ERROR Error while accepting connection
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files   at
> sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>
at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at kafka.network.Acceptor.accept(SocketServer.scala:323)
> at kafka.network.Acceptor.run(SocketServer.scala:268)
> at java.lang.Thread.run(Thread.java:745)
> [2017-01-08 01:19:02,877] ERROR Error while accepting connection
> (kafka.network.Acceptor)
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at kafka.network.Acceptor.accept(SocketServer.scala:323)
> at kafka.network.Acceptor.run(SocketServer.scala:268)
> at java.lang.Thread.run(Thread.java:745)
> .
>


And then kafka crashes.

Has anyone seen this behavior of slow log segmented being rolled?  Any
ideas of how to track down what could be causing this?

Thanks!
Stephen


Re: kafka cluster network bandwidth is too high

2016-12-08 Thread Stephen Powis
Yea, we have a 7 node cluster with ~200 topics and see sustained 100Mbps
going between the nodes.  Very bandwidth hungry :p

On Thu, Dec 8, 2016 at 1:51 AM, Matthias J. Sax 
wrote:

> You cannot sent images over the mailing list. They get automatically
> removed.
>
> On 12/6/16 11:55 PM, 陈超 wrote:
> > Hi kafka developer,
> >
> >
> >
> >  I have a kafka cluster with 3 node. And it have 3 topic now. We
> > have not many data into the kafka topic now. But the node sync data to
> > each other node bandwidth is up to 4 Mb/s. I don’t know why so high.
> > This is the picture below:
> >
> >
> >
> > Iftop info:
> >
> > 说明:
> > C:\Users\lenovo\AppData\Roaming\Tencent\Users\
> 247135449\QQ\WinTemp\RichOle\PZZ[Q[7A8_0F%{WM6ZC[K{O.png
> >
> >
> >
> > Ps –mp kafkapid –o THREAD,tid,time
> >
> > 说明:
> > C:\Users\lenovo\AppData\Roaming\Tencent\Users\
> 247135449\QQ\WinTemp\RichOle\5(X37VYS5XVO7@@MF6SMX6T.png
> >
> >
> >
> > Jstack info:
> >
> > 说明:
> > C:\Users\lenovo\AppData\Roaming\Tencent\Users\
> 247135449\QQ\WinTemp\RichOle\SEXO2G62J[L9}GDSO7)BS}B.png
> >
> >
> >
> > Can you help me find the problem?
> >
> > Thank you very much!
> >
>
>


Slow 'rolled new log segment' ?

2016-09-16 Thread Stephen Powis
Hey!

I'm trying to determine why I had a weird hiccup in publishing to my Kafka
cluster.  I'm running Kafka 0.10.0.1.  Here's a graph of my producer times:



You can see at around ~10:08 we had a large spike in kafka's internal
publishing, waiting for the leader.  Our producers are configured to
timeout write requests after 150ms.  So during this bump, lots of producers
timed out.

I first dug into my GC logs, and see nothing relevant or taking over 0.00X
seconds around the same time frame.

Digging thru server logs on the leader, at roughly the same time, I've
found the following:

[2016-09-16 10:08:02,508] INFO Rolled new log segment for 'EventStream-27'
> in 743 ms. (kafka.log.Log)
> [2016-09-16 10:08:34,727] INFO Rolled new log segment for 'EventStream-33'
> in 16 ms. (kafka.log.Log)
> [2016-09-16 10:09:01,485] INFO Rolled new log segment for 'EventStream-15'
> in 14 ms. (kafka.log.Log)
>

Reviewing logs going backwards and forwards, it seems like most new log
segments take ~15ms or less.

Curious if anyone knows any other potential causes or things to review that
might cause this issue?  Do you guys think the new log segment could cause
this issue?  Any ideas what would cause this new segment to take so long to
get created?

Thanks for the help!
Stephen


Re: [DISCUSS] Java 8 as a minimum requirement

2016-06-16 Thread Stephen Powis
+1

On Thu, Jun 16, 2016 at 5:05 PM, Philippe Derome  wrote:

> I strongly support motion having difficulty running (Apache Kafka as
> opposed to Confluent) Stream examples with JDK 8 today.
> On 16 Jun 2016 4:46 p.m., "Ismael Juma"  wrote:
>
> > Hi all,
> >
> > I would like to start a discussion on making Java 8 a minimum requirement
> > for Kafka's next feature release (let's say Kafka 0.10.1.0 for now). This
> > is the first discussion on the topic so the idea is to understand how
> > people feel about it. If people feel it's too soon, then we can pick up
> the
> > conversation again after Kafka 0.10.1.0. If the feedback is mostly
> > positive, I will start a vote thread.
> >
> > Let's start with some dates. Java 7 hasn't received public updates since
> > April 2015[1], Java 8 was released in March 2014[2] and Java 9 is
> scheduled
> > to be released in March 2017[3].
> >
> > The first argument for dropping support for Java 7 is that the last
> public
> > release by Oracle contains a large number of known security
> > vulnerabilities. The effectiveness of Kafka's security features is
> reduced
> > if the underlying runtime is not itself secure.
> >
> > The second argument for moving to Java 8 is that it adds a number of
> > compelling features:
> >
> > * Lambda expressions and method references (particularly useful for the
> > Kafka Streams DSL)
> > * Default methods (very useful for maintaining compatibility when adding
> > methods to interfaces)
> > * java.util.stream (helpful for making collection transformations more
> > concise)
> > * Lots of improvements to java.util.concurrent (CompletableFuture,
> > DoubleAdder, DoubleAccumulator, StampedLock, LongAdder, LongAccumulator)
> > * Other nice things: SplittableRandom, Optional (and many others I have
> not
> > mentioned)
> >
> > The third argument is that it will simplify our testing matrix, we won't
> > have to test with Java 7 any longer (this is particularly useful for
> system
> > tests that take hours to run). It will also make it easier to support
> Scala
> > 2.12, which requires Java 8.
> >
> > The fourth argument is that many other open-source projects have taken
> the
> > leap already. Examples are Cassandra[4], Lucene[5], Akka[6], Hadoop 3[7],
> > Jetty[8], Eclipse[9], IntelliJ[10] and many others[11]. Even Android will
> > support Java 8 in the next version (although it will take a while before
> > most phones will use that version sadly). This reduces (but does not
> > eliminate) the chance that we would be the first project that would
> cause a
> > user to consider a Java upgrade.
> >
> > The main argument for not making the change is that a reasonable number
> of
> > users may still be using Java 7 by the time Kafka 0.10.1.0 is released.
> > More specifically, we care about the subset who would be able to upgrade
> to
> > Kafka 0.10.1.0, but would not be able to upgrade the Java version. It
> would
> > be great if we could quantify this in some way.
> >
> > What do you think?
> >
> > Ismael
> >
> > [1] https://java.com/en/download/faq/java_7.xml
> > [2] https://blogs.oracle.com/thejavatutorials/entry/jdk_8_is_released
> > [3] http://openjdk.java.net/projects/jdk9/
> > [4] https://github.com/apache/cassandra/blob/trunk/README.asc
> > [5] https://lucene.apache.org/#highlights-of-this-lucene-release-include
> > [6] http://akka.io/news/2015/09/30/akka-2.4.0-released.html
> > [7] https://issues.apache.org/jira/browse/HADOOP-11858
> > [8] https://webtide.com/jetty-9-3-features/
> > [9] http://markmail.org/message/l7s276y3xkga2eqf
> > [10]
> >
> >
> https://intellij-support.jetbrains.com/hc/en-us/articles/206544879-Selecting-the-JDK-version-the-IDE-will-run-under
> > [11] http://markmail.org/message/l7s276y3xkga2eqf
> >
>


Re: JVM Optimizations

2016-06-09 Thread Stephen Powis
Hey Ben

Using G1 with those settings appears to be working well for us.  Infrequent
younggen/minor GCs averaging a run time of 12ms, no full GCs in the 24
hours logged that I uploaded.  I'd say enable the GC log flags and let it
run for a bit, then change a setting or two and compare.



On Thu, Jun 9, 2016 at 3:59 PM, Ben Osheroff <b...@zendesk.com.invalid>
wrote:

> We've been having quite a few symptoms that appear to be big GC stalls
> (nonsensical ZK session timeouts) with the following config:
>
> -Xmx16g
> -Xms16g
> -server
> -XX:+CMSClassUnloadingEnabled
> -XX:+CMSScavengeBeforeRemark
> -XX:+UseG1GC
> -XX:+DisableExplicitGC
>
> Next steps will be to turn on gc logging and try to confirm that the ZK
> session timeouts are indeed GC pauses (they look like major
> collections), but meanwhile, does anyone have experience around whether
> these options (taken from https://kafka.apache.org/081/ops.html) helped?
> Would prefer to not just blindly turn on options if possible.
>
> -XX:PermSize=48m
> -XX:MaxPermSize=48m
> -XX:MaxGCPauseMillis=20
> -XX:InitiatingHeapOccupancyPercent=35
>
> Thanks!
> Ben Osheroff
> Zendesk.com
>
> On Thu, Jun 09, 2016 at 03:52:41PM -0400, Stephen Powis wrote:
> > NOTE -- GC tuning is outside the realm of my expertise by all means, so
> I'm
> > not sure I'd use our info as any kind of benchmark.
> >
> > But in the interest of sharing, we use the following options
> >
> > export KAFKA_HEAP_OPTS="-Xmx12G -Xms12G"
> > >
> > > export KAFKA_JVM_PERFORMANCE_OPTS="-server -Djava.awt.headless=true
> > > -XX:MaxPermSize=48M -verbose:gc -Xloggc:/var/log/kafka/gc.log
> > > -XX:+PrintGCDateStamps -XX:+PrintGCDetails
> -XX:+PrintTenuringDistribution
> > > -XX:+PrintGCApplicationStoppedTime -XX:+PrintTLAB
> -XX:+DisableExplicitGC
> > > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=100M
> > > -XX:+UseCompressedOops -XX:+AlwaysPreTouch -XX:+UseG1GC
> > > -XX:MaxGCPauseMillis=20 -XX:+HeapDumpOnOutOfMemoryError
> > > -XX:HeapDumpPath=/var/log/kafka/heapDump.log"
> > >
> >
> > You can then take your gc.log files and use an analyzer tool...I've
> > attached a link to one of our brokers gclog run thru gceasy.io.
> >
> > https://protect-us.mimecast.com/s/wXqqBJuqdZb1Tn
> >
> > On Thu, Jun 9, 2016 at 3:39 PM, Lawrence Weikum <lwei...@pandora.com>
> wrote:
> >
> > > Hi Tom,
> > >
> > > Currently we’re using the default settings – no special tuning
> > > whatsoever.  I think the kafka-run-class.sh has this:
> > >
> > >
> > > # Memory options
> > > if [ -z "$KAFKA_HEAP_OPTS" ]; then
> > >   KAFKA_HEAP_OPTS="-Xmx256M"
> > > fi
> > >
> > > # JVM performance options
> > > if [ -z "$KAFKA_JVM_PERFORMANCE_OPTS" ]; then
> > >   KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC
> -XX:MaxGCPauseMillis=20
> > > -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC
> > > -Djava.awt.headless=true"
> > > fi
> > >
> > >
> > > Is this the confluent doc you were referring to?
> > > https://protect-us.mimecast.com/s/arXXBOspkvORCD
> > >
> > > Thanks!
> > >
> > > Lawrence Weikum
> > >
> > >
> > > On 6/9/16, 1:32 PM, "Tom Crayford" <tcrayf...@heroku.com> wrote:
> > >
> > > >Hi Lawrence,
> > > >
> > > >What JVM options were you using? There's a few pages in the confluent
> docs
> > > >on JVM tuning iirc. We simply use the G1 and a 4GB Max heap and things
> > > work
> > > >well (running many thousands of clusters).
> > > >
> > > >Thanks
> > > >Tom Crayford
> > > >Heroku Kafka
> > > >
> > > >On Thursday, 9 June 2016, Lawrence Weikum <lwei...@pandora.com>
> wrote:
> > > >
> > > >> Hello all,
> > > >>
> > > >> We’ve been running a benchmark test on a Kafka cluster of ours
> running
> > > >> 0.9.0.1 – slamming it with messages to see when/if things might
> break.
> > > >> During our test, we caused two brokers to throw OutOfMemory errors
> > > (looks
> > > >> like from the Heap) even though each machine still has 43% of the
> total
> > > >> memory unused.
> > > >>
> > > >> I’m curious what JVM optimizations are recommended for Kafka
> brokers?
> > > Or
> > > >> if there aren’t any that are recommended, what are some
> optimizations
> > > >> others are using to keep the brokers running smoothly?
> > > >>
> > > >> Best,
> > > >>
> > > >> Lawrence Weikum
> > > >>
> > > >>
> > >
> > >
>


Re: JVM Optimizations

2016-06-09 Thread Stephen Powis
NOTE -- GC tuning is outside the realm of my expertise by all means, so I'm
not sure I'd use our info as any kind of benchmark.

But in the interest of sharing, we use the following options

export KAFKA_HEAP_OPTS="-Xmx12G -Xms12G"
>
> export KAFKA_JVM_PERFORMANCE_OPTS="-server -Djava.awt.headless=true
> -XX:MaxPermSize=48M -verbose:gc -Xloggc:/var/log/kafka/gc.log
> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime -XX:+PrintTLAB -XX:+DisableExplicitGC
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
> -XX:+UseCompressedOops -XX:+AlwaysPreTouch -XX:+UseG1GC
> -XX:MaxGCPauseMillis=20 -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/var/log/kafka/heapDump.log"
>

You can then take your gc.log files and use an analyzer tool...I've
attached a link to one of our brokers gclog run thru gceasy.io.

http://gceasy.io/my-gc-report.jsp?p=L2hvbWUvcmFtL3VwbG9hZC9pbnN0YW5jZTEvc2hhcmVkLzIwMTYtNi05L2thZmthU2VydmVyLWdjLmxvZy4wLmN1cnJlbnQtMTktNTAtMjI=

On Thu, Jun 9, 2016 at 3:39 PM, Lawrence Weikum  wrote:

> Hi Tom,
>
> Currently we’re using the default settings – no special tuning
> whatsoever.  I think the kafka-run-class.sh has this:
>
>
> # Memory options
> if [ -z "$KAFKA_HEAP_OPTS" ]; then
>   KAFKA_HEAP_OPTS="-Xmx256M"
> fi
>
> # JVM performance options
> if [ -z "$KAFKA_JVM_PERFORMANCE_OPTS" ]; then
>   KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20
> -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC
> -Djava.awt.headless=true"
> fi
>
>
> Is this the confluent doc you were referring to?
> http://docs.confluent.io/2.0.0/kafka/deployment.html
>
> Thanks!
>
> Lawrence Weikum
>
>
> On 6/9/16, 1:32 PM, "Tom Crayford"  wrote:
>
> >Hi Lawrence,
> >
> >What JVM options were you using? There's a few pages in the confluent docs
> >on JVM tuning iirc. We simply use the G1 and a 4GB Max heap and things
> work
> >well (running many thousands of clusters).
> >
> >Thanks
> >Tom Crayford
> >Heroku Kafka
> >
> >On Thursday, 9 June 2016, Lawrence Weikum  wrote:
> >
> >> Hello all,
> >>
> >> We’ve been running a benchmark test on a Kafka cluster of ours running
> >> 0.9.0.1 – slamming it with messages to see when/if things might break.
> >> During our test, we caused two brokers to throw OutOfMemory errors
> (looks
> >> like from the Heap) even though each machine still has 43% of the total
> >> memory unused.
> >>
> >> I’m curious what JVM optimizations are recommended for Kafka brokers?
> Or
> >> if there aren’t any that are recommended, what are some optimizations
> >> others are using to keep the brokers running smoothly?
> >>
> >> Best,
> >>
> >> Lawrence Weikum
> >>
> >>
>
>


Re: Test

2016-05-29 Thread Stephen Powis
Hello!

On Sun, May 29, 2016 at 1:02 PM, Igor Kravzov 
wrote:

> Please someone reply. Not sure if subscribed properly.
>


Re: Hash partition of key with skew

2016-05-03 Thread Stephen Powis
Having difficulty following the use cases behind this thread so this could
be entirely noise... but I'll toss this out there in case its relevant:
https://issues.apache.org/jira/browse/KAFKA-

We had an issue where a producer was setting the same key on every message
but distributing the publishes equally across all partitions.  We never
really noticed it until we tried to replicate the cluster to another
datacenter and all the messages ended up on a single partition...oops.

I created the above ticket and tossed together a round robin partitioner to
solve the issue.

On Tue, May 3, 2016 at 1:22 PM, Tauzell, Dave 
wrote:

> Srikanth,
>
> I think the most efficient use of the partitions would be to spread all
> messages evenly across all partitions by *not* using a key. Then, all of
> your consumers in the same consumer group would receive about equal numbers
> of messages.   What will you do with the messages as you pull them off of
> Kafka?
>
> -Dave
>
>
> -Original Message-
> From: Srikanth [mailto:srikanth...@gmail.com]
> Sent: Tuesday, May 03, 2016 12:12 PM
> To: users@kafka.apache.org
> Subject: Re: Hash partition of key with skew
>
> Jens,
> Thanks for the link. That is something to consider. Of course it has
> downsides too.
>
> Wesley,
> That is some good info on hashing. We've explored a couple of these
> options.
> I see that you are hesitant to put these in production. Even we want to
> evaluate or options first.
>
> Dave,
> Our need to do this is similar to Wesley. And consumers of topic can be
> efficient if they get records from one(or a very few) keys.
> Why do you think it is not applicable to Kafka? Are you suggesting that
> there are other ways to handle it when using Kafka?
>
> Srikanth
>
> On Tue, May 3, 2016 at 11:58 AM, Tauzell, Dave <
> dave.tauz...@surescripts.com
> > wrote:
>
> > Yeah, that makes sense for the target system (Cassandra for example),
> > but I don't see that you would need that for Kafka.  Good info on
> > hashing, though, that I am going to take a look at when I get time.
> >
> > -Dave
> >
> > Dave Tauzell | Senior Software Engineer | Surescripts
> > O: 651.855.3042 | www.surescripts.com |   dave.tauz...@surescripts.com
> > Connect with us: Twitter I LinkedIn I Facebook I YouTube
> >
> >
> > -Original Message-
> > From: Wesley Chow [mailto:w...@chartbeat.com]
> > Sent: Tuesday, May 03, 2016 10:51 AM
> > To: users@kafka.apache.org
> > Subject: Re: Hash partition of key with skew
> >
> > I’m not the OP, but in our case, we sometimes want data locality. For
> > example, suppose that we have 100 consumers that are building up a
> > cache of customer -> data mapping. If customer data is spread randomly
> > across all partitions then a query for that customer’s data would have
> > to hit all 100 consumers. If customer data exhibits some locality,
> > then queries for that data only hit a subset of consumers.
> >
> > Wes
> >
> >
> > > On May 3, 2016, at 11:18 AM, Tauzell, Dave
> > > 
> > wrote:
> > >
> > > Do you need the messages to be ordered in some way?   Why pass a key if
> > you don't want all the messages to go to one partition?
> > >
> > > -Dave
> > >
> > > Dave Tauzell | Senior Software Engineer | Surescripts
> > > O: 651.855.3042 | www.surescripts.com 
> > > |
> > dave.tauz...@surescripts.com 
> > > Connect with us: Twitter I LinkedIn I Facebook I YouTube
> > >
> > >
> > > -Original Message-
> > > From: Wesley Chow [mailto:w...@chartbeat.com
> > > ]
> > > Sent: Tuesday, May 03, 2016 9:51 AM
> > > To: users@kafka.apache.org 
> > > Subject: Re: Hash partition of key with skew
> > >
> > > I’ve come up with a couple solutions since we too have a power law
> > distribution. However, we have not put anything into practice.
> > >
> > > Fixed Slicing
> > >
> > > One simple thing to do is to take each key and slice it into some
> > > fixed
> > number of partitions. So your function might be:
> > >
> > > (hash(key) % num) + (hash(key) % 10)
> > >
> > > In order to distribute it across 10 partitions. Or:
> > >
> > > hash(key + ‘0’) % num
> > > hash(key + ‘1’) % num
> > > …
> > > hash(key + ‘9’) % num
> > >
> > >
> > > Hyperspace Hashing
> > >
> > > If your data is multi-dimensional, then you might find hyperspace
> > hashing useful. I’ll give a simple example, but it’s easy to generalize.
> > Suppose that you have two dimensions you’d like to partition on:
> > customer id (C) and city location (L). You’d like to be able to
> > subscribe to all data for some subset of customers, and you’d also
> > like to be able to subscribe to all data for some subset of locations.
> > Suppose that this data goes into a topic with 256 partitions.
> > >
> > > For any piece of data, you’d construct the partition it goes to like
> so:
> > >
> > > ((hash(C) % 16) << 4) + ((hash(L) % 

Re: Mirror maker Configs 0.9.0

2016-03-09 Thread Stephen Powis
I've attached my two configs here.  Pay close attention to the
--num-streams argument to mirror-maker.  I have a lot of thru-put on my
topics so I ended up
matching the number of streams = number of partitions for each of my
topics.  A stream is essentially just a consumer and producer thread.  If
you have less streams/threads than number of partitions * topics
replicated, it seems like mirror-maker roundrobins the threads across the
partitions/topics, which may be fine based on your ingest rate and network
latency between clusters.

Additionally, I had to raise the java heap size considerably to prevent
mirror-maker from OOMing.

You can check the status of mirror maker and get the lag by running:
kafka-consumer-offset-checker.sh  --topic "your-topic-here" --zookeeper
src-zk-host:2181 --group mirrormaker-consumer-id-here

I *think* I also edited config/tools-log4j.properties and made mirror-maker
log info level messages by changing the first line to read as follows.
This allowed me to more easily debug issues and fine tune my settings.

log4j.rootLogger=INFO, fileAppender, stderr


consumer.properties
##

# Zookeeper connection string
# comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002"
zookeeper.connect="source-zk-host1:2181,source-zk-host2:2181"

# timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000

#consumer group id
group.id=mirrormaker-consumer-id-here


producer.properties
##
# list of brokers used for bootstrapping knowledge about the rest of the cluster
# format: host1:port1,host2:port2 ...
bootstrap.servers=dest-kafka-host1:9092,dest-kafka-host2:9092

# name of the partitioner class for partitioning events; default
partition spreads data randomly (in most cases)
## I needed to use a custom partitioner to get my data to be
distributed evenly, but you'll probably
## be fine leaving this commented out.  see jira kafka- if you're
curious to what I used.
# partitioner.class=

# specify the compression codec for all data generated: none, gzip, snappy, lz4.
# the old config values work as well: 0, 1, 2, 3 for none, gzip,
snappy, lz4, respectively
compression.codec=none

## Increment size of batches
batch.size:65536
client.id: mirror-maker






On Wed, Mar 9, 2016 at 5:43 AM, prabhu v  wrote:

> Thanks for the reply..
>
> I will remove the bootstrap.servers property and add zookeeper.connect in
> consumer properties and let you know
>
> Also, is there any way we can check how much data the target data center is
> lagging behind source DC?
>
>
> On Wed, Mar 9, 2016 at 3:41 PM, Gerard Klijs 
> wrote:
>
> > What do you see in the logs?
> > It could be it goes wrong because you have the bootstrap.servers property
> > which is not supported for the old consumer.
> >
> > On Wed, Mar 9, 2016 at 11:05 AM Gerard Klijs 
> > wrote:
> >
> > > Don't know the actual question, it matters what you want to do.
> > > Just watch out trying to copy every topic using a new consumer, cause
> > then
> > > internal topics are copied, leading to errors.
> > > Here is a temple start script we used:
> > >
> > > #!/usr/bin/env bash
> > > export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true
> > -Dcom.sun.management.jmxremote.authenticate=false
> > -Dcom.sun.management.jmxremote.ssl=false
> > -Dcom.sun.management.jmxremote.local.only=false
> > -Djava.rmi.server.hostname=
> > -Dcom.sun.management.jmxremote.rmi.port="
> > > export JMX_PORT=
> > > /usr/bin/kafka-mirror-maker --consumer.config
> > $HOME_DIR/consumer.properties --producer.config
> > $HOME_DIR/producer.properties --whitelist='' 1>>
> > $LOG_DIR/mirror-maker.log 2>> $LOG_DIR/mirror-maker.log
> > >
> > > Both the consumer and producer configs have sensible defaults, these
> are
> > out consumer.properties template:
> > >
> > > #Consumer template to be used with the mirror maker
> > > zookeeper.connect=
> > > group.id=mirrormaker
> > > auto.offset.reset=smallest
> > > #next property is not available in new consumer
> > > exclude.internal.topics=true
> > >
> > > *And a producer.properties template:*
> > >
> > > #Producer template to be used with the mirror maker
> > > bootstrap.servers=
> > > client.id=mirrormaker
> > >
> > > Because the internal topics can't be excluded in the new consumer yet,
> > we use the old consumer.
> > >
> > > Hope this helps.
> > >
> > >
> > > On Wed, Mar 9, 2016 at 10:57 AM prabhu v 
> > wrote:
> > >
> > >> Hi Experts,
> > >>
> > >> I am trying to mirror
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Regards,
> > >>
> > >> Prabhu.V
> > >>
> > >
> >
>
>
>
> --
> Regards,
>
> Prabhu.V
>


Mirrormaker only publishing to a single partition on destination cluster

2016-03-03 Thread Stephen Powis
Hey!

I'm using kafka 0.9.0.1 and trying to replicate a cluster from one
datacenter to another.  mirror-maker properly connects to my source cluster
and consumes messages, but for some reason is only publishing to a single
partition for my topic in the destination cluster.  So all of my partitions
for the topic are empty, except one, which contains everything from the
source cluster.

Has anyone seen this behavior before?  I must have something misconfigured,
but am unable to figure it out from reviewing the online docs.

Thanks!
Stephen


Re: How to bind all Kafka tcp port to private net address

2016-01-29 Thread Stephen Powis
Pretty sure you want to set this option in your server.properties file:

# Hostname the broker will bind to. If not set, the server will bind to all
> interfaces
> #host.name=localhost
>

On Thu, Jan 28, 2016 at 10:58 PM, costa xu  wrote:

> My version is kafka_2.11-0.9.0.0. I find that the kafka listen on multi tcp
> port on a linux server.
>
> [gdata@gdataqosconnd2 kafka_2.11-0.9.0.0]$ netstat -plnt|grep java
> (Not all processes could be identified, non-owned process info
>  will not be shown, you would have to be root to see it all.)
> tcp0  0 10.105.7.243:9092   0.0.0.0:*
> LISTEN  31011/java
> tcp0  0 0.0.0.0:51367   0.0.0.0:*
> LISTEN  31011/java
> tcp0  0 0.0.0.0:11050.0.0.0:*
> LISTEN  31011/java
> tcp0  0 0.0.0.0:42592   0.0.0.0:*
> LISTEN  31011/java
>
> 10.105.7.243:9092 is the broker's port.0 0.0.0.0:1105 is the jmx port that
> I set in the start script.
> But I dont know what is the 0 0.0.0.0:51367 and 0 0.0.0.0:42592. And more
> tricky, the port will change after restarting of the kafka process.
>
> So  I want to know how to bind the kafka port to private interface just
> like '10.105.7.243'.
> If I can not bind them, can I set the fixed listened port number?
>
> My kafka server.properties is:
> # Licensed to the Apache Software Foundation (ASF) under one or more
> # contributor license agreements.  See the NOTICE file distributed with
> # this work for additional information regarding copyright ownership.
> # The ASF licenses this file to You under the Apache License, Version 2.0
> # (the "License"); you may not use this file except in compliance with
> # the License.  You may obtain a copy of the License at
> #
> #http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing, software
> # distributed under the License is distributed on an "AS IS" BASIS,
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> # See the License for the specific language governing permissions and
> # limitations under the License.
> # see kafka.server.KafkaConfig for additional details and defaults
>
> # Server Basics #
>
> # The id of the broker. This must be set to a unique integer for each
> broker.
> broker.id=1
>
> # Socket Server Settings
> #
>
> listeners=PLAINTEXT://10.105.7.243:9092
>
> # The port the socket server listens on
> #port=9092
>
> # Hostname the broker will bind to. If not set, the server will bind to all
> interfaces
> #host.name=localhost
>
> # Hostname the broker will advertise to producers and consumers. If not
> set, it uses the
> # value for "host.name" if configured.  Otherwise, it will use the value
> returned from
> # java.net.InetAddress.getCanonicalHostName().
> #advertised.host.name=
>
> # The port to publish to ZooKeeper for clients to use. If this is not set,
> # it will publish the same port that the broker binds to.
> #advertised.port=
>
> # The number of threads handling network requests
> num.network.threads=3
>
> # The number of threads doing disk I/O
> num.io.threads=8
>
> # The send buffer (SO_SNDBUF) used by the socket server
> socket.send.buffer.bytes=102400
>
> # The receive buffer (SO_RCVBUF) used by the socket server
> socket.receive.buffer.bytes=102400
>
> # The maximum size of a request that the socket server will accept
> (protection against OOM)
> socket.request.max.bytes=104857600
>
>
> # Log Basics #
>
> # A comma seperated list of directories under which to store log files
> log.dirs=/data/gdata/var/kafka-logs
>
> # The default number of log partitions per topic. More partitions allow
> greater
> # parallelism for consumption, but this will also result in more files
> across
> # the brokers.
> num.partitions=1
>
> # The number of threads per data directory to be used for log recovery at
> startup and flushing at shutdown.
> # This value is recommended to be increased for installations with data
> dirs located in RAID array.
> num.recovery.threads.per.data.dir=1
>
> # Log Flush Policy
> #
>
> # Messages are immediately written to the filesystem but by default we only
> fsync() to sync
> # the OS cache lazily. The following configurations control the flush of
> data to disk.
> # There are a few important trade-offs here:
> #1. Durability: Unflushed data may be lost if you are not using
> replication.
> #2. Latency: Very large flush intervals may lead to latency spikes when
> the flush does occur as there will be a lot of data to flush.
> #3. Throughput: The flush is generally the most expensive operation,
> and a small flush interval may lead to exceessive seeks.
> # The settings below allow one to configure the flush policy to flush data

Re: Kafka Consumer and Topic Partition

2016-01-14 Thread Stephen Powis
I think you need to have unique consumer group ids for each consumer (if
you want both consumers to receive ALL msgs) or multiple partitions setup
on your topic with your consumers sharing the same consumer group (each
consumer would then get ~half of all the messages)

On Thu, Jan 14, 2016 at 8:46 PM, Joe San  wrote:

> Kafka Users,
>
> I have been trying out a simple consumer example that is supposed to read
> messages from a specific partition of a topic. I'm not able to get two
> consumer instances up and running. The second consumer instance is idle!
>
> Here is the original post that I created:
>
>
> http://stackoverflow.com/questions/3478/apache-kafka-0-9-0-0-show-all-topics-with-partitions
>
> Thanks and Regards,
> Joe
>


Re: How to chose the size of a Kafka broker

2016-01-13 Thread Stephen Powis
I can't really answer your question, but you don't mention your network
layout/hardware.  May want to add that as a data point in your decision
(wouldn't want to overrun your network device(s) on the brokers).


On Wed, Jan 13, 2016 at 7:09 PM, Vladoiu Catalin 
wrote:

> Hi guys,
>
> I've run into a long conversation with my colleagues when we discussed the
> size of the Brokers for our new Kafka cluster and we still haven't reached
> a final conclusion.
>
> Our main concern is the size of the requests 10-20MB per request (producer
> will send big requests), maybe more and we estimate that we will have 4-5TB
> per day.
>
> Our debate is between:
> 1. Having a smaller cluster(not so many brokers) but big config, something
> like this:
> Disk: 11 x 4TB, CPU: 48 Core, RAM: 252 GB. We chose this configuration
> because our Hadoop cluster has that config and can easily handle that
> amount of data.
> 2. Having a bigger number of brokers but smaller broker config.
>
> I was hopping that somebody with more experience in using Kafka can advice
> on this.
>
> Thanks,
> Catalin
>


MirrorMaker --whitelist option in 0.9.0.0

2016-01-08 Thread Stephen Powis
Hey!

I'm having a little trouble with mirror maker, if I supply --whitelist "*"
it will not consume any topics.  If do --whitelist "TopicName" it works,
for the specified topic.  Any idea why my wildcard does not work?

Thanks
Stephen


Re: Upgraded from 0.8.2 to 0.9.0 and FetchFollower request time has increased

2016-01-08 Thread Stephen Powis
Hey Ismael,

It did, just took a few hours :)   It seems like its just hit where it was
before the upgrade.




Thanks!
Stephen

On Fri, Jan 8, 2016 at 8:03 PM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Stephen,
>
> Have things settled down since?
>
> Thanks,
> Ismael
>
> On Fri, Jan 8, 2016 at 7:39 AM, Stephen Powis <spo...@salesforce.com>
> wrote:
>
> > Hey!
> >
> > So I upgraded our production kafka cluster from 0.8.2 to 0.9.0 this
> > morning, and it seems like everything went smoothly.
> >
> > As the last step when I changed the inter.broker.protocol.version to
> > 0.9.0.0 and did a rolling restart, I noticed that the FetchFollower
> request
> > times increased fairly significantly.  Does anyone know if this is
> expected
> > with the new protocol version?
> >
> > Link to graphite graph: http://i.imgur.com/uhvjCBH.png
> >
> > Perhaps I'm just being paranoid and it will settle out?  This is the same
> > graph over a larger time period - http://i.imgur.com/FB5Ey2a.png
> >
> > Thanks!
> >
>


Re: MirrorMaker --whitelist option in 0.9.0.0

2016-01-08 Thread Stephen Powis
Hmm so after I sent my email to the list, I realized that I forgot to
include.. I started with --whitelist ".*" but after that didn't work
swapped to "*" after running across the docs here:
https://kafka.apache.org/090/ops.html#basic_ops_mirror_maker

Neither were working for me :/

On Fri, Jan 8, 2016 at 9:54 PM, tao xiao <xiaotao...@gmail.com> wrote:

> Mirror maker white list uses Java pattern
> https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html. In
> you case the whitelist is ".*"
>
> On Fri, 8 Jan 2016 at 20:54 Stephen Powis <spo...@salesforce.com> wrote:
>
> > Hey!
> >
> > I'm having a little trouble with mirror maker, if I supply --whitelist
> "*"
> > it will not consume any topics.  If do --whitelist "TopicName" it works,
> > for the specified topic.  Any idea why my wildcard does not work?
> >
> > Thanks
> > Stephen
> >
>


Upgraded from 0.8.2 to 0.9.0 and FetchFollower request time has increased

2016-01-07 Thread Stephen Powis
Hey!

So I upgraded our production kafka cluster from 0.8.2 to 0.9.0 this
morning, and it seems like everything went smoothly.

As the last step when I changed the inter.broker.protocol.version to
0.9.0.0 and did a rolling restart, I noticed that the FetchFollower request
times increased fairly significantly.  Does anyone know if this is expected
with the new protocol version?

Link to graphite graph: http://i.imgur.com/uhvjCBH.png

Perhaps I'm just being paranoid and it will settle out?  This is the same
graph over a larger time period - http://i.imgur.com/FB5Ey2a.png

Thanks!


mirror maker against 0.8.2 source cluster and 0.9.0 destination cluster

2016-01-06 Thread Stephen Powis
Hey!

So I'm trying to get mirror maker going between two different clusters.  My
source cluster is version 0.8.2 and my destination cluster is 0.9.0 running
the mirror maker code from the 0.9.0 release.  Does anyone know if this is
possible to do?  I'm aware that the protocol changed slightly between
versions.

Attempting to run ./kafka-console-consumer.sh from the 0.9.0 release and
consume from my 0.8.2 cluster seems to fail, which is leading me to believe
that mirror maker will have the same issue.

Attached is the errors I receive from kafka-console-consumer.sh running
from 0.9.0 release against a 0.8.2 cluster.

./kafka-console-consumer.sh  --zookeeper W.X.Y.Z:2181 --topic MyTopic

[2016-01-06 06:17:40,587] WARN
[ConsumerFetcherThread-console-consumer-73608_hostname-1452079050696-728e3f7e-0-100],
Error in fetch kafka.consumer.ConsumerFetcherThread$FetchRequest@6bb31fae.
Possible cause: java.lang.IllegalArgumentException
(kafka.consumer.ConsumerFetcherThread)
...

Thanks!


Question about what happens when producer timeout occurs

2015-11-06 Thread Stephen Powis
Hello!

I'm curious as to what exactly happens in a certain scenario with
kafka, and was unable to find it clearly documented on the website
anywhere (or I'm just blind).

When producing messages into kafka, the producer can specify a timeout
parameter (see producer API here:
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ProduceAPI)

I'm curious as to what exactly happens when this timeout is exceeded?

For example, I have a multi-node cluster, my topic has a replica
factor of 3, and my producer is configured with RequiredAcks = -1, and
Timeout=100.

My producer connects to the leader and submits a message.  The API doc
says that the write on the leader will not abide by the timeout
defined in the request.  So lets assume that the write is successful
on the leader within 10ms.  Now the leader goes to its two replicas
and attempts to get the data written there.  Lets say the first
replica writes within 10ms, but the 2nd replica times out (takes over
100ms).  The leader then responds back to the producer with error code
7.

In this situation does that mean that the write into the cluster
failed and does not exist on any node?  Or does it mean that the write
exists on the leader and replica #1, but not replica #2?

Hopefully my question is clear,

Thanks!
Stephen