Re: How to log/analyze the consumer lag in kafka streaming application
Hi Sachin, If you check kafka-run-class.bat you can see that when environment variable KAFKA_LOG4J_OPTS is not provided, a default log4j configuration under "tools" will be loaded. So setting the environment variable to something like "-Dlog4j.configurationFile=file:///D:/kafka_2.10-0.10.1.1/log4js.properties" should to the trick for you. Joris On Sat, Feb 4, 2017 at 11:38 AM Sachin Mittal wrote: Hi, As suggested this is how I am starting my stream D:\kafka_2.10-0.10.1.1>bin\windows\kafka-run-class.bat -Dlog4j.debug -Dlog4j.configurationFile=file:///D:/kafka_2.10-0.10.1.1/log4js.properties TestKafkaWindowStream log4j: Using URL [file:D:/kafka_2.10-0.10.1.1/config/tools-log4j.properties] for automatic log4j configuration. You can see it somehow does not pick my log4j properties but always picks some tools-log4j.properties. I have tried many different ways to specify log4j.configurationFile but it always seem to be picking up a different log4j properties file. So is it something that we should always configure our log4j under tools-log4j.properties? Thanks Sachin On Fri, Jan 27, 2017 at 2:31 PM, Damian Guy wrote: > -Dlog4j.configuration=your-log4j.properties >
Re: How to log/analyze the consumer lag in kafka streaming application
Hi, As suggested this is how I am starting my stream D:\kafka_2.10-0.10.1.1>bin\windows\kafka-run-class.bat -Dlog4j.debug -Dlog4j.configurationFile=file:///D:/kafka_2.10-0.10.1.1/log4js.properties TestKafkaWindowStream log4j: Using URL [file:D:/kafka_2.10-0.10.1.1/config/tools-log4j.properties] for automatic log4j configuration. You can see it somehow does not pick my log4j properties but always picks some tools-log4j.properties. I have tried many different ways to specify log4j.configurationFile but it always seem to be picking up a different log4j properties file. So is it something that we should always configure our log4j under tools-log4j.properties? Thanks Sachin On Fri, Jan 27, 2017 at 2:31 PM, Damian Guy wrote: > -Dlog4j.configuration=your-log4j.properties >
Re: How to log/analyze the consumer lag in kafka streaming application
If you are using jmxterm then you are going to connect to a running jvm and you don't need to set StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG. You need to connect jmxterm to the MBean server that will be running in the jvm of your streams app. You'll need to provide an appropriate jmx port for it to connect to. So, when running your streams application you should start it with: -Dcom.sun.management.jmxremote.port=your-port Thanks On Fri, 27 Jan 2017 at 10:24 Sachin Mittal wrote: > Hi, > I understood what I need to do. > I think is not clear though regarding > StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG > > Say I decide to use jmxterm which is cli based client which I can easily > use where my streams app is running. > With respect to that what value should I assign it to the > METRICS_REPORTER_CLASSES_CONFIG > property? > > I am following this guide > https://cwiki.apache.org/confluence/display/KAFKA/jmxterm+quickstart > > However it is not clear what config I need to provide. > > Thanks > Sachin > > > > On Fri, Jan 27, 2017 at 2:31 PM, Damian Guy wrote: > > > Hi Sachin, > > > > You can configure an implementation of org.apache.kafka.common.Metrics. > > This is done via StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG > > > > There is a list of jmx reporters here: > > https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters > > I'm sure their are plenty more available on github. It is also fairly > > simple to write your own. > > > > As for your log4j.properties. You should be able to run with: > > -Dlog4j.configuration=your-log4j.properties > > > > Thanks, > > Damian > > > > On Fri, 27 Jan 2017 at 07:59 Sachin Mittal wrote: > > > > > Hi, > > > Thanks for sharing the info. > > > > > > I am reading this document for more understanding: > > > http://kafka.apache.org/documentation.html#monitoring > > > > > > Is there any special way I need to start my kafka cluster or streams > > > application (or configure them) to report these metrics. > > > > > > I suppose both cluster and streams application report separate > metrics. I > > > mean that to collect streams metrics I need to connect to the jmx port > on > > > machine where my streams is running right? > > > > > > One issue I see is that the machines where both cluster and streams > > > application are running are not accessible from outside where I can run > > any > > > UI based application like jconsole to report on these metrics. > > > > > > So what are other possible option. can I log the metrics values to a > log > > > file. or if can I enable logging in general. If yes where do I place my > > > log4j.properties. I tried making it part of the jar which has my main > > class > > > but I don't see any logs getting generated. > > > > > > Thanks > > > Sachin > > > > > > > > > > > > On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax < > matth...@confluent.io> > > > wrote: > > > > > > > You should check out Kafka Streams Metrics (for upcoming 0.10.2 they > > are > > > > even more detailed). > > > > > > > > There is not a lot of documentation for 0.10.0 or 0.10.1, but it work > > > > the same way as for consumer/producer metric that are documented. > > > > > > > > > > > > -Matthias > > > > > > > > On 1/24/17 10:38 PM, Sachin Mittal wrote: > > > > > Hi All, > > > > > I am running a kafka streaming application with a simple pipeline > of: > > > > > source topic -> group -> aggregate by key -> for each > save to a > > sink. > > > > > > > > > > I source topic gets message at rate of 5000 - 1 messages per > > > second. > > > > > During peak load we see the delay reaching to 3 million messages. > > > > > > > > > > So I need to figure out where delay might be happening. > > > > > > > > > > 1. Is there any mechanism in kafka streams to log time spent > within > > > each > > > > > pipeline stage. > > > > > > > > > > 2. Also if I want to turn on custom logging to log some times how > > can > > > I > > > > do > > > > > the same. > > > > > > > > > > I have a log4j.properties and I am packaging it inside a jar which > > has > > > > the > > > > > main class. > > > > > I place that jar in libs folder of kafka installation. > > > > > > > > > > However I see no logs generated under logs folder. > > > > > > > > > > So where are we suppose to add the log4j.properties. > > > > > > > > > > Thanks > > > > > Sachin > > > > > > > > > > > > > > > > > > >
Re: How to log/analyze the consumer lag in kafka streaming application
Hi, I understood what I need to do. I think is not clear though regarding StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG Say I decide to use jmxterm which is cli based client which I can easily use where my streams app is running. With respect to that what value should I assign it to the METRICS_REPORTER_CLASSES_CONFIG property? I am following this guide https://cwiki.apache.org/confluence/display/KAFKA/jmxterm+quickstart However it is not clear what config I need to provide. Thanks Sachin On Fri, Jan 27, 2017 at 2:31 PM, Damian Guy wrote: > Hi Sachin, > > You can configure an implementation of org.apache.kafka.common.Metrics. > This is done via StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG > > There is a list of jmx reporters here: > https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters > I'm sure their are plenty more available on github. It is also fairly > simple to write your own. > > As for your log4j.properties. You should be able to run with: > -Dlog4j.configuration=your-log4j.properties > > Thanks, > Damian > > On Fri, 27 Jan 2017 at 07:59 Sachin Mittal wrote: > > > Hi, > > Thanks for sharing the info. > > > > I am reading this document for more understanding: > > http://kafka.apache.org/documentation.html#monitoring > > > > Is there any special way I need to start my kafka cluster or streams > > application (or configure them) to report these metrics. > > > > I suppose both cluster and streams application report separate metrics. I > > mean that to collect streams metrics I need to connect to the jmx port on > > machine where my streams is running right? > > > > One issue I see is that the machines where both cluster and streams > > application are running are not accessible from outside where I can run > any > > UI based application like jconsole to report on these metrics. > > > > So what are other possible option. can I log the metrics values to a log > > file. or if can I enable logging in general. If yes where do I place my > > log4j.properties. I tried making it part of the jar which has my main > class > > but I don't see any logs getting generated. > > > > Thanks > > Sachin > > > > > > > > On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax > > wrote: > > > > > You should check out Kafka Streams Metrics (for upcoming 0.10.2 they > are > > > even more detailed). > > > > > > There is not a lot of documentation for 0.10.0 or 0.10.1, but it work > > > the same way as for consumer/producer metric that are documented. > > > > > > > > > -Matthias > > > > > > On 1/24/17 10:38 PM, Sachin Mittal wrote: > > > > Hi All, > > > > I am running a kafka streaming application with a simple pipeline of: > > > > source topic -> group -> aggregate by key -> for each > save to a > sink. > > > > > > > > I source topic gets message at rate of 5000 - 1 messages per > > second. > > > > During peak load we see the delay reaching to 3 million messages. > > > > > > > > So I need to figure out where delay might be happening. > > > > > > > > 1. Is there any mechanism in kafka streams to log time spent within > > each > > > > pipeline stage. > > > > > > > > 2. Also if I want to turn on custom logging to log some times how > can > > I > > > do > > > > the same. > > > > > > > > I have a log4j.properties and I am packaging it inside a jar which > has > > > the > > > > main class. > > > > I place that jar in libs folder of kafka installation. > > > > > > > > However I see no logs generated under logs folder. > > > > > > > > So where are we suppose to add the log4j.properties. > > > > > > > > Thanks > > > > Sachin > > > > > > > > > > > > >
Re: How to log/analyze the consumer lag in kafka streaming application
Hi Sachin, You can configure an implementation of org.apache.kafka.common.Metrics. This is done via StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG There is a list of jmx reporters here: https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters I'm sure their are plenty more available on github. It is also fairly simple to write your own. As for your log4j.properties. You should be able to run with: -Dlog4j.configuration=your-log4j.properties Thanks, Damian On Fri, 27 Jan 2017 at 07:59 Sachin Mittal wrote: > Hi, > Thanks for sharing the info. > > I am reading this document for more understanding: > http://kafka.apache.org/documentation.html#monitoring > > Is there any special way I need to start my kafka cluster or streams > application (or configure them) to report these metrics. > > I suppose both cluster and streams application report separate metrics. I > mean that to collect streams metrics I need to connect to the jmx port on > machine where my streams is running right? > > One issue I see is that the machines where both cluster and streams > application are running are not accessible from outside where I can run any > UI based application like jconsole to report on these metrics. > > So what are other possible option. can I log the metrics values to a log > file. or if can I enable logging in general. If yes where do I place my > log4j.properties. I tried making it part of the jar which has my main class > but I don't see any logs getting generated. > > Thanks > Sachin > > > > On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax > wrote: > > > You should check out Kafka Streams Metrics (for upcoming 0.10.2 they are > > even more detailed). > > > > There is not a lot of documentation for 0.10.0 or 0.10.1, but it work > > the same way as for consumer/producer metric that are documented. > > > > > > -Matthias > > > > On 1/24/17 10:38 PM, Sachin Mittal wrote: > > > Hi All, > > > I am running a kafka streaming application with a simple pipeline of: > > > source topic -> group -> aggregate by key -> for each > save to a sink. > > > > > > I source topic gets message at rate of 5000 - 1 messages per > second. > > > During peak load we see the delay reaching to 3 million messages. > > > > > > So I need to figure out where delay might be happening. > > > > > > 1. Is there any mechanism in kafka streams to log time spent within > each > > > pipeline stage. > > > > > > 2. Also if I want to turn on custom logging to log some times how can > I > > do > > > the same. > > > > > > I have a log4j.properties and I am packaging it inside a jar which has > > the > > > main class. > > > I place that jar in libs folder of kafka installation. > > > > > > However I see no logs generated under logs folder. > > > > > > So where are we suppose to add the log4j.properties. > > > > > > Thanks > > > Sachin > > > > > > > >
Re: How to log/analyze the consumer lag in kafka streaming application
Hi, Thanks for sharing the info. I am reading this document for more understanding: http://kafka.apache.org/documentation.html#monitoring Is there any special way I need to start my kafka cluster or streams application (or configure them) to report these metrics. I suppose both cluster and streams application report separate metrics. I mean that to collect streams metrics I need to connect to the jmx port on machine where my streams is running right? One issue I see is that the machines where both cluster and streams application are running are not accessible from outside where I can run any UI based application like jconsole to report on these metrics. So what are other possible option. can I log the metrics values to a log file. or if can I enable logging in general. If yes where do I place my log4j.properties. I tried making it part of the jar which has my main class but I don't see any logs getting generated. Thanks Sachin On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax wrote: > You should check out Kafka Streams Metrics (for upcoming 0.10.2 they are > even more detailed). > > There is not a lot of documentation for 0.10.0 or 0.10.1, but it work > the same way as for consumer/producer metric that are documented. > > > -Matthias > > On 1/24/17 10:38 PM, Sachin Mittal wrote: > > Hi All, > > I am running a kafka streaming application with a simple pipeline of: > > source topic -> group -> aggregate by key -> for each > save to a sink. > > > > I source topic gets message at rate of 5000 - 1 messages per second. > > During peak load we see the delay reaching to 3 million messages. > > > > So I need to figure out where delay might be happening. > > > > 1. Is there any mechanism in kafka streams to log time spent within each > > pipeline stage. > > > > 2. Also if I want to turn on custom logging to log some times how can I > do > > the same. > > > > I have a log4j.properties and I am packaging it inside a jar which has > the > > main class. > > I place that jar in libs folder of kafka installation. > > > > However I see no logs generated under logs folder. > > > > So where are we suppose to add the log4j.properties. > > > > Thanks > > Sachin > > > >
Re: How to log/analyze the consumer lag in kafka streaming application
You should check out Kafka Streams Metrics (for upcoming 0.10.2 they are even more detailed). There is not a lot of documentation for 0.10.0 or 0.10.1, but it work the same way as for consumer/producer metric that are documented. -Matthias On 1/24/17 10:38 PM, Sachin Mittal wrote: > Hi All, > I am running a kafka streaming application with a simple pipeline of: > source topic -> group -> aggregate by key -> for each > save to a sink. > > I source topic gets message at rate of 5000 - 1 messages per second. > During peak load we see the delay reaching to 3 million messages. > > So I need to figure out where delay might be happening. > > 1. Is there any mechanism in kafka streams to log time spent within each > pipeline stage. > > 2. Also if I want to turn on custom logging to log some times how can I do > the same. > > I have a log4j.properties and I am packaging it inside a jar which has the > main class. > I place that jar in libs folder of kafka installation. > > However I see no logs generated under logs folder. > > So where are we suppose to add the log4j.properties. > > Thanks > Sachin > signature.asc Description: OpenPGP digital signature
How to log/analyze the consumer lag in kafka streaming application
Hi All, I am running a kafka streaming application with a simple pipeline of: source topic -> group -> aggregate by key -> for each > save to a sink. I source topic gets message at rate of 5000 - 1 messages per second. During peak load we see the delay reaching to 3 million messages. So I need to figure out where delay might be happening. 1. Is there any mechanism in kafka streams to log time spent within each pipeline stage. 2. Also if I want to turn on custom logging to log some times how can I do the same. I have a log4j.properties and I am packaging it inside a jar which has the main class. I place that jar in libs folder of kafka installation. However I see no logs generated under logs folder. So where are we suppose to add the log4j.properties. Thanks Sachin