[ 
https://issues.apache.org/jira/browse/KAFKA-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323648#comment-15323648
 ] 

Ewen Cheslack-Postava commented on KAFKA-3821:
----------------------------------------------

I was thinking about this a bit more, along with the fact that having to 
shoehorn extra data into offsets is not ideal.

Maybe a better way to expose this would be to provide a separate {{data}} field 
or something like that, which is also key-value based (similar to source 
partition/source offset) and has as flexible a data structure. We could manage 
the two types of data together and they'd have the same basic semantics, but 
allow you to decouple state/data changes that really only have to happen once 
in awhile from the offsets, which really do need to be associated with every 
message. We could possibly then use a subclass like {{DataOnlySourceRecord}} 
which doesn't trigger any data to be written, but still applies the data 
changes. (I think it might be nice to introduce a parent interface for both 
instead of still calling it {{SourceRecord}}, but I'm not sure we could do that 
in a compatible way.)

[~rhauch] Thoughts? Would this be a better fit for what you're trying to 
accomplish? The change to the output of {{poll()}} to contain more than just 
records to be written to Kafka is a bit weird, but might make sense for these 
use cases and provides framework support for having those changes get 
"committed" asynchronously but still at a safe point.

> Allow Kafka Connect source tasks to produce offset without writing to topics
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-3821
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3821
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 0.9.0.1
>            Reporter: Randall Hauch
>            Assignee: Ewen Cheslack-Postava
>
> Provide a way for a {{SourceTask}} implementation to record a new offset for 
> a given partition without necessarily writing a source record to a topic.
> Consider a connector task that uses the same offset when producing an unknown 
> number of {{SourceRecord}} objects (e.g., it is taking a snapshot of a 
> database). Once the task completes those records, the connector wants to 
> update the offsets (e.g., the snapshot is complete) but has no more records 
> to be written to a topic. With this change, the task could simply supply an 
> updated offset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to