Re: kafka streams consumer group reporting lag even on source topics removed from topology
Well, it's kinda expected behavior. It's a split brain problem. In the end, you use the same `application.id / group.id` and thus the committed offsets for the removed topics are still in `__consumer_offsets` topics and associated with the consumer group. If a tool inspects lags and compares the latest committed offsets to end-offsets it looks for everything it finds in the `__consumer_offsets` topics for the group in question -- the tool cannot know that you changed the application and that is does not read from those topics any longer (and thus does not commit any longer). I am not sure from top of my head if you could do a manual cleanup for the `application.id` and topics in question and delete the committed offsets from the `__consumer_offsets` topic -- try to checkout `Admin` client and/or the command line tools... In know that it's possible to delete committed offsets for a consumer group (if a group becomes inactive, the broker would also cleanup all group metadata after a configurable timeout), but I am not sure if that's for the entire consumer group (ie, all topic) or if you can do it on a per-topic basis, too. HTH, -Matthias On 8/16/23 2:11 AM, Pushkar Deole wrote: Hi streams Dev community @matthias, @bruno Any inputs on above issue? Is this a bug in the streams library wherein the input topic removed from streams processor topology, the underlying consumer group still reporting lag against those? On Wed, Aug 9, 2023 at 4:38 PM Pushkar Deole wrote: Hi All, I have a streams application with 3 instances with application-id set to applicationV1. The application uses processor API with reading from source topics, processing the data and writing to destination topic. Currently it consumes from 6 source topics however we don't need to process data any more from 2 of those topics so we removed 2 topics from the source topics list. We have configured Datadog dashboard to report and alert on consumer lag so after removing the 2 source topics and deploying application, we started getting several alerts about consumer lag on applicationV1 consumer group which is underlying consumer group of the streams application. When we looked at the consumer group from kafka-cli, we could see that the consumer group is reporting lag against the topics removed from source topic list which is reflecting as increasing lag on Datadog monitoring. Can someone advise if this is expected behavior? In my opinion, this is not expected since streams application no more has those topics as part of source, it should not report lag on those.
Re: Kafka connect Graceful stop of task failed
Hi Robson, Thank you for the detailed bug report. I believe the behavior that you're describing is caused by this flaw: https://issues.apache.org/jira/browse/KAFKA-15090 which is still under discussion. Since the above flaw was introduced in 3.0, source connectors need to return from poll() before the graceful shutdown timeout to avoid the error. You may be able to mitigate the error if the connector allows you to reduce its poll timeout/interval to something less than the graceful timeout, but that will depend on the specific connector implementation, so check the documentation for your connector. I know some implementations have received patches to compensate for this behavior in the framework, so also consider upgrading or checking release notes for your connectors. As for the effects of this error: whenever a non-graceful stop occurs, the runtime will immediately close the producer so that the task will not be able to write any further records. However, it will still leave resources for that task (threads, in-memory records, database connections, etc) allocated, until the task does finally return from poll(). While this is not desirable behavior, it seems to be treated as just a nuisance error by most operators. I hope this gives some context for the error message you're seeing. Thanks, Greg
Kafka connect Graceful stop of task failed
Hello I'm using kafka connect 7.4.0 to read data from Postgres views and write to another Postgres tables. So using JDBC source and sink connectors. All works good, but whenever I stop the source connectors via the rest api: DEL http://kafka-connect:8083/connectors/connector_name_here The connector stops fine, but not the task: Graceful stop of connector (connector-name-here) succeeded. Graceful stop of task (task-name-here) failed. It only happens with the *source* connector tasks. The sink connector and tasks shutdown gracefully and fine. The timeout for task shutdown has been increased, but didn't help: task.shutdown.graceful.timeout.ms=6 The connectors are running once per day (during the night) to load a lot of data, and the error happens when I try to delete the connectors in the middle of the day. That is, they are not actually executing/loading any data, it has finished already. offset.flush.interval.ms=1 in development and integration environments. offset.flush.interval.ms=6 in production and uat. The rest of the config is pretty much the default. What could be the issue? The errors of the graceful stop of the tasks are triggering our alert system, so trying to get rid of those. Thanks a lot Robson
list offsets of compacted topics
Hey Folks, I have a process which aims to measure time lag (in Scala). When the process bootstraps it looks at the history of offsets and collect the offset that existed for different timestamps (7 days ago, 6 days ago... etc in more frequency as it gets closer to *now*). In order to do that it uses the "consumer.offsetsForTimes" method and keeps that information in memory. I observed that in some cases, mainly in *some* compacted topics partitions (observed on a few partitions from __consumer_offsets and __transaction_state) that the resulted offsets arent monotonically increasing as times are getting closer to now. In these specific cases (again, not all partitions) I see for example (a few data points from many): time (in seconds)=1691512025 offset=16101908 1691550724/15121538 1691645078/15473125 1691789229/15473125 1692104539/16078952 1692116770/16101908 1692116809/16101908 1692116833/16101908 .. .. Code looks like: BootstrapTimes.reverse // bootstrap from old to new time .foreach { duration => val time = now - duration val lookup = partitions .map { _ -> (time.inMillis: java.lang.Long) } .toMap .asJava val offsets = try consumer.offsetsForTimes(lookup).asScala.toMap catch KafkaConsumerException.rethrow val partitionToOffset = offsets.collect { case (partition, value) if value != null => partition -> value.offset } } So no data point is after the earliest time, which seems - odd? What could the reason for it be? can you help me understand why it would mostly be observed with compacted topic (but once in a while in other topics too)?
Re: kafka streams consumer group reporting lag even on source topics removed from topology
Hi streams Dev community @matthias, @bruno Any inputs on above issue? Is this a bug in the streams library wherein the input topic removed from streams processor topology, the underlying consumer group still reporting lag against those? On Wed, Aug 9, 2023 at 4:38 PM Pushkar Deole wrote: > Hi All, > > I have a streams application with 3 instances with application-id set to > applicationV1. The application uses processor API with reading from source > topics, processing the data and writing to destination topic. > Currently it consumes from 6 source topics however we don't need to > process data any more from 2 of those topics so we removed 2 topics from > the source topics list. We have configured Datadog dashboard to report and > alert on consumer lag so after removing the 2 source topics and deploying > application, we started getting several alerts about consumer lag on > applicationV1 consumer group which is underlying consumer group of the > streams application. When we looked at the consumer group from kafka-cli, > we could see that the consumer group is reporting lag against the topics > removed from source topic list which is reflecting as increasing lag on > Datadog monitoring. > > Can someone advise if this is expected behavior? In my opinion, this is > not expected since streams application no more has those topics as part of > source, it should not report lag on those. >