[jira] [Commented] (SPARK-11698) Add option to ignore kafka messages that are out of limit rate
[ https://issues.apache.org/jira/browse/SPARK-11698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570208#comment-15570208 ] Cody Koeninger commented on SPARK-11698: Would a custom ConsumerStrategy for the new consumer added in SPARK-12177 allow you to address this issue? You could supply a Consumer implementation that overrides poll > Add option to ignore kafka messages that are out of limit rate > -- > > Key: SPARK-11698 > URL: https://issues.apache.org/jira/browse/SPARK-11698 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Liang-Chi Hsieh > > With spark.streaming.kafka.maxRatePerPartition, we can control the max rate > limit. However, we can not ignore these messages out of limit. These messages > will be consumed in next iteration. We have a use case that we need to ignore > these messages and process latest messages in next iteration. > In other words, we simply want to consume part of messages in each iteration > and ignore remaining messages that are not consumed. > We add an option for this purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11698) Add option to ignore kafka messages that are out of limit rate
[ https://issues.apache.org/jira/browse/SPARK-11698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004050#comment-15004050 ] Cody Koeninger commented on SPARK-11698: That looks like a reasonable way to solve your particular use case. I think my preference would be to make the strategy for generating the next batch of offsets user-configurable in a more general way, rather than adding knobs for each possible use case. Have you seen the discussion in https://issues.apache.org/jira/browse/SPARK-10320 and would it likely accommodate this modification? Outside of the specifics of the direct stream, this is also related to having different strategies for dealing with backpressure, and I'm not sure what the long-term plan there is. > Add option to ignore kafka messages that are out of limit rate > -- > > Key: SPARK-11698 > URL: https://issues.apache.org/jira/browse/SPARK-11698 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Liang-Chi Hsieh > > With spark.streaming.kafka.maxRatePerPartition, we can control the max rate > limit. However, we can not ignore these messages out of limit. These messages > will be consumed in next iteration. We have a use case that we need to ignore > these messages and process latest messages in next iteration. > In other words, we simply want to consume part of messages in each iteration > and ignore remaining messages that are not consumed. > We add an option for this purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11698) Add option to ignore kafka messages that are out of limit rate
[ https://issues.apache.org/jira/browse/SPARK-11698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003478#comment-15003478 ] Liang-Chi Hsieh commented on SPARK-11698: - Yes, but it is intentional. We don't want to increase data latency due to heavy data loading. So we need to ignore some data in each iteration and keep consuming latest data. > Add option to ignore kafka messages that are out of limit rate > -- > > Key: SPARK-11698 > URL: https://issues.apache.org/jira/browse/SPARK-11698 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Liang-Chi Hsieh > > With spark.streaming.kafka.maxRatePerPartition, we can control the max rate > limit. However, we can not ignore these messages out of limit. These messages > will be consumed in next iteration. We have a use case that we need to ignore > these messages and process latest messages in next iteration. > In other words, we simply want to consume part of messages in each iteration > and ignore remaining messages that are not consumed. > We add an option for this purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11698) Add option to ignore kafka messages that are out of limit rate
[ https://issues.apache.org/jira/browse/SPARK-11698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003301#comment-15003301 ] Felix Cheung commented on SPARK-11698: -- Should we ignore message? Doesn't that mean possible data loss? > Add option to ignore kafka messages that are out of limit rate > -- > > Key: SPARK-11698 > URL: https://issues.apache.org/jira/browse/SPARK-11698 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Liang-Chi Hsieh > > With spark.streaming.kafka.maxRatePerPartition, we can control the max rate > limit. However, we can not ignore these messages out of limit. These messages > will be consumed in next iteration. We have a use case that we need to ignore > these messages and process latest messages in next iteration. > In other words, we simply want to consume part of messages in each iteration > and ignore remaining messages that are not consumed. > We add an option for this purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11698) Add option to ignore kafka messages that are out of limit rate
[ https://issues.apache.org/jira/browse/SPARK-11698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002318#comment-15002318 ] Apache Spark commented on SPARK-11698: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/9665 > Add option to ignore kafka messages that are out of limit rate > -- > > Key: SPARK-11698 > URL: https://issues.apache.org/jira/browse/SPARK-11698 > Project: Spark > Issue Type: Improvement > Components: Streaming >Reporter: Liang-Chi Hsieh > > With spark.streaming.kafka.maxRatePerPartition, we can control the max rate > limit. However, we can not ignore these messages out of limit. These messages > will be consumed in next iteration. We have a use case that we need to ignore > these messages and process latest messages in next iteration. > In other words, we simply want to consume part of messages in each iteration > and ignore remaining messages that are not consumed. > We add an option for this purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org