Re: kafka streams consumer group reporting lag even on source topics removed from topology

2023-08-16 Thread Matthias J. Sax

Well, it's kinda expected behavior. It's a split brain problem.

In the end, you use the same `application.id / group.id` and thus the 
committed offsets for the removed topics are still in 
`__consumer_offsets` topics and associated with the consumer group.


If a tool inspects lags and compares the latest committed offsets to 
end-offsets it looks for everything it finds in the `__consumer_offsets` 
topics for the group in question -- the tool cannot know that you 
changed the application and that is does not read from those topics any 
longer (and thus does not commit any longer).


I am not sure from top of my head if you could do a manual cleanup for 
the `application.id` and topics in question and delete the committed 
offsets from the `__consumer_offsets` topic -- try to checkout `Admin` 
client and/or the command line tools...


In know that it's possible to delete committed offsets for a consumer 
group (if a group becomes inactive, the broker would also cleanup all 
group metadata after a configurable timeout), but I am not sure if 
that's for the entire consumer group (ie, all topic) or if you can do it 
on a per-topic basis, too.



HTH,
  -Matthias


On 8/16/23 2:11 AM, Pushkar Deole wrote:

Hi streams Dev community  @matthias, @bruno

Any inputs on above issue? Is this a bug in the streams library wherein the
input topic removed from streams processor topology, the underlying
consumer group still reporting lag against those?

On Wed, Aug 9, 2023 at 4:38 PM Pushkar Deole  wrote:


Hi All,

I have a streams application with 3 instances with application-id set to
applicationV1. The application uses processor API with reading from source
topics, processing the data and writing to destination topic.
Currently it consumes from 6 source topics however we don't need to
process data any more from 2 of those topics so we removed 2 topics from
the source topics list. We have configured Datadog dashboard to report and
alert on consumer lag so after removing the 2 source topics and deploying
application, we started getting several alerts about consumer lag on
applicationV1 consumer group which is underlying consumer group of the
streams application. When we looked at the consumer group from kafka-cli,
we could see that the consumer group is reporting lag against the topics
removed from source topic list which is reflecting as increasing lag on
Datadog monitoring.

Can someone advise if this is expected behavior? In my opinion, this is
not expected since streams application no more has those topics as part of
source, it should not report lag on those.





Re: Kafka connect Graceful stop of task failed

2023-08-16 Thread Greg Harris
Hi Robson,

Thank you for the detailed bug report.

I believe the behavior that you're describing is caused by this flaw:
https://issues.apache.org/jira/browse/KAFKA-15090 which is still under
discussion. Since the above flaw was introduced in 3.0, source
connectors need to return from poll() before the graceful shutdown
timeout to avoid the error.
You may be able to mitigate the error if the connector allows you to
reduce its poll timeout/interval to something less than the graceful
timeout, but that will depend on the specific connector
implementation, so check the documentation for your connector. I know
some implementations have received patches to compensate for this
behavior in the framework, so also consider upgrading or checking
release notes for your connectors.

As for the effects of this error: whenever a non-graceful stop occurs,
the runtime will immediately close the producer so that the task will
not be able to write any further records. However, it will still leave
resources for that task (threads, in-memory records, database
connections, etc) allocated, until the task does finally return from
poll(). While this is not desirable behavior, it seems to be treated
as just a nuisance error by most operators.

I hope this gives some context for the error message you're seeing.

Thanks,
Greg


Kafka connect Graceful stop of task failed

2023-08-16 Thread Robson Hermes
Hello

I'm using kafka connect 7.4.0 to read data from Postgres views and write to
another Postgres tables. So using JDBC source and sink connectors.
All works good, but whenever I stop the source connectors via the rest api:

DEL http://kafka-connect:8083/connectors/connector_name_here

The connector stops fine, but not the task:


Graceful stop of connector (connector-name-here) succeeded.

Graceful stop of task (task-name-here) failed.


It only happens with the *source* connector tasks. The sink connector
and tasks shutdown gracefully and fine.

The timeout for task shutdown has been increased, but didn't help:

task.shutdown.graceful.timeout.ms=6



The connectors are running once per day (during the night) to load a
lot of data, and the error happens when I try to delete the connectors
in the middle of the day. That is, they are not actually
executing/loading any data, it has finished already.

offset.flush.interval.ms=1 in development and integration environments.

 offset.flush.interval.ms=6 in production and uat.


The rest of the config is pretty much the default.

What could be the issue?

The errors of the graceful stop of the tasks are triggering our alert
system, so trying to get rid of those.


Thanks a lot

Robson


list offsets of compacted topics

2023-08-16 Thread Shahar Cizer Kobrinsky
Hey Folks,

I have a process which aims to measure time lag (in Scala).

When the process bootstraps it looks at the history of offsets and collect
the offset that existed for different timestamps (7 days ago, 6 days ago...
etc in more frequency as it gets closer to *now*). In order to do that it
uses the "consumer.offsetsForTimes" method and keeps that information in
memory. I observed that in some cases, mainly in *some* compacted topics
partitions (observed on a few partitions from __consumer_offsets and
__transaction_state) that the resulted offsets arent monotonically
increasing as times are getting closer to now.

In these specific cases (again, not all partitions) I see for example (a
few data points from many):

time (in seconds)=1691512025 offset=16101908
1691550724/15121538
1691645078/15473125
1691789229/15473125
1692104539/16078952
1692116770/16101908
1692116809/16101908
1692116833/16101908
..
..

Code looks like:
BootstrapTimes.reverse // bootstrap from old to new time
  .foreach { duration =>
val time = now - duration
val lookup = partitions
  .map {
_ -> (time.inMillis: java.lang.Long)
  }
  .toMap
  .asJava
val offsets =
  try consumer.offsetsForTimes(lookup).asScala.toMap
  catch KafkaConsumerException.rethrow
val partitionToOffset =
  offsets.collect { case (partition, value) if value != null =>
partition -> value.offset }
}

So no data point is after the earliest time, which seems - odd?
What could the reason for it be? can you help me understand why it would
mostly be observed with compacted topic (but once in a while in other
topics too)?


Re: kafka streams consumer group reporting lag even on source topics removed from topology

2023-08-16 Thread Pushkar Deole
Hi streams Dev community  @matthias, @bruno

Any inputs on above issue? Is this a bug in the streams library wherein the
input topic removed from streams processor topology, the underlying
consumer group still reporting lag against those?

On Wed, Aug 9, 2023 at 4:38 PM Pushkar Deole  wrote:

> Hi All,
>
> I have a streams application with 3 instances with application-id set to
> applicationV1. The application uses processor API with reading from source
> topics, processing the data and writing to destination topic.
> Currently it consumes from 6 source topics however we don't need to
> process data any more from 2 of those topics so we removed 2 topics from
> the source topics list. We have configured Datadog dashboard to report and
> alert on consumer lag so after removing the 2 source topics and deploying
> application, we started getting several alerts about consumer lag on
> applicationV1 consumer group which is underlying consumer group of the
> streams application. When we looked at the consumer group from kafka-cli,
> we could see that the consumer group is reporting lag against the topics
> removed from source topic list which is reflecting as increasing lag on
> Datadog monitoring.
>
> Can someone advise if this is expected behavior? In my opinion, this is
> not expected since streams application no more has those topics as part of
> source, it should not report lag on those.
>