[ https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252742#comment-14252742 ]
Saisai Shao commented on SPARK-3146: ------------------------------------ Hi all, thanks a lot for your comments. My original purpose of this proposal is to solve SPARK-2388 and make this solution more general, I'm not sure if other receivers have such requirements, also is it reasonable to add this interceptor to some receivers like socket, file ? We can further improve the flexibility of streaming API by this interceptors, like filter out some unwanted messages, track the latency by timestamping the message once received. But the problem is that such flexibility will make user abusing this API to process the data once received, rather than by transforming function, this potentially disobey the semantics of Spark Streaming. I think probably we should redesign the whole thing to make it both flexible and meaningful. > Improve the flexibility of Spark Streaming Kafka API to offer user the > ability to process message before storing into BM > ------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-3146 > URL: https://issues.apache.org/jira/browse/SPARK-3146 > Project: Spark > Issue Type: Improvement > Components: Streaming > Affects Versions: 1.0.2, 1.1.0 > Reporter: Saisai Shao > > Currently Spark Streaming Kafka API stores the key and value of each message > into BM for processing, potentially this may lose the flexibility for > different requirements: > 1. currently topic/partition/offset information for each message is discarded > by KafkaInputDStream. In some scenarios people may need this information to > better filter the message, like SPARK-2388 described. > 2. People may need to add timestamp for each message when feeding into Spark > Streaming, which can better measure the system latency. > 3. Checkpointing the partition/offsets or others... > So here we add a messageHandler in interface to give people the flexibility > to preprocess the message before storing into BM. In the meantime time this > improvement keep compatible with current API. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org