[
https://issues.apache.org/jira/browse/KAFKA-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323648#comment-15323648
]
Ewen Cheslack-Postava commented on KAFKA-3821:
----------------------------------------------
I was thinking about this a bit more, along with the fact that having to
shoehorn extra data into offsets is not ideal.
Maybe a better way to expose this would be to provide a separate {{data}} field
or something like that, which is also key-value based (similar to source
partition/source offset) and has as flexible a data structure. We could manage
the two types of data together and they'd have the same basic semantics, but
allow you to decouple state/data changes that really only have to happen once
in awhile from the offsets, which really do need to be associated with every
message. We could possibly then use a subclass like {{DataOnlySourceRecord}}
which doesn't trigger any data to be written, but still applies the data
changes. (I think it might be nice to introduce a parent interface for both
instead of still calling it {{SourceRecord}}, but I'm not sure we could do that
in a compatible way.)
[~rhauch] Thoughts? Would this be a better fit for what you're trying to
accomplish? The change to the output of {{poll()}} to contain more than just
records to be written to Kafka is a bit weird, but might make sense for these
use cases and provides framework support for having those changes get
"committed" asynchronously but still at a safe point.
> Allow Kafka Connect source tasks to produce offset without writing to topics
> ----------------------------------------------------------------------------
>
> Key: KAFKA-3821
> URL: https://issues.apache.org/jira/browse/KAFKA-3821
> Project: Kafka
> Issue Type: Improvement
> Components: KafkaConnect
> Affects Versions: 0.9.0.1
> Reporter: Randall Hauch
> Assignee: Ewen Cheslack-Postava
>
> Provide a way for a {{SourceTask}} implementation to record a new offset for
> a given partition without necessarily writing a source record to a topic.
> Consider a connector task that uses the same offset when producing an unknown
> number of {{SourceRecord}} objects (e.g., it is taking a snapshot of a
> database). Once the task completes those records, the connector wants to
> update the offsets (e.g., the snapshot is complete) but has no more records
> to be written to a topic. With this change, the task could simply supply an
> updated offset.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)