[ 
https://issues.apache.org/jira/browse/KAFKA-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989989#comment-15989989
 ] 

Matthias J. Sax commented on KAFKA-4785:
----------------------------------------

IMHO, KAFKA-4219 is quite different:

This JIRA is a "bug guard" -- currently, if a custom TS extractor is provide, 
users need to make sure that they return embedded record timestamp for internal 
topics -- with this JIRA, we do this automatically and thus preventing 
potential bugs in custom TS extractors. Thus, it's about _reading_.

KAFKA-4219 however, is about a "new feature" that allows to assign timestamp 
differently on _write_. Not sue if this would be some API or config, or just a 
change to how Streams does assign timestamps internally.

> Records from internal repartitioning topics should always use 
> RecordMetadataTimestampExtractor
> ----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4785
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4785
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.2.0
>            Reporter: Matthias J. Sax
>            Assignee: Jeyhun Karimov
>
> Users can specify what timestamp extractor should be used to decode the 
> timestamp of input topic records. As long as RecordMetadataTimestamp or 
> WallclockTime is use this is fine. 
> However, for custom timestamp extractors it might be invalid to apply this 
> custom extractor to records received from internal repartitioning topics. The 
> reason is that Streams sets the current "stream time" as record metadata 
> timestamp explicitly before writing to intermediate repartitioning topics 
> because this timestamp should be use by downstream subtopologies. A custom 
> timestamp extractor might return something different breaking this assumption.
> Thus, for reading data from intermediate repartitioning topic, the configured 
> timestamp extractor should not be used, but the record's metadata timestamp 
> should be extracted as record timestamp.
> In order to leverage the same behavior for intermediate user topic (ie, used 
> in {{through()}})  we can leverage KAFKA-4144 and internally set an extractor 
> for those "intermediate sources" that returns the record's metadata timestamp 
> in order to overwrite the global extractor from {{StreamsConfig}} (ie, set 
> {{FailOnInvalidTimestampExtractor}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to