[jira] [Commented] (SPARK-11698) Add option to ignore kafka messages that are out of limit rate

2016-10-12 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570208#comment-15570208
 ] 

Cody Koeninger commented on SPARK-11698:


Would a custom ConsumerStrategy for the new consumer added in SPARK-12177 allow 
you to address this issue?  You could supply a Consumer implementation that 
overrides poll

> Add option to ignore kafka messages that are out of limit rate
> --
>
> Key: SPARK-11698
> URL: https://issues.apache.org/jira/browse/SPARK-11698
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Liang-Chi Hsieh
>
> With spark.streaming.kafka.maxRatePerPartition, we can control the max rate 
> limit. However, we can not ignore these messages out of limit. These messages 
> will be consumed in next iteration. We have a use case that we need to ignore 
> these messages and process latest messages in next iteration.
> In other words, we simply want to consume part of messages in each iteration 
> and ignore remaining messages that are not consumed.
> We add an option for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11698) Add option to ignore kafka messages that are out of limit rate

2015-11-13 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004050#comment-15004050
 ] 

Cody Koeninger commented on SPARK-11698:


That looks like a reasonable way to solve your particular use case.

I think my preference would be to make the strategy for generating the next 
batch of offsets user-configurable in a more general way, rather than adding 
knobs for each possible use case.

Have you seen the discussion in
https://issues.apache.org/jira/browse/SPARK-10320
and would it likely accommodate this modification?

Outside of the specifics of the direct stream, this is also related to having 
different strategies for dealing with backpressure, and I'm not sure what the 
long-term plan there is.

> Add option to ignore kafka messages that are out of limit rate
> --
>
> Key: SPARK-11698
> URL: https://issues.apache.org/jira/browse/SPARK-11698
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Liang-Chi Hsieh
>
> With spark.streaming.kafka.maxRatePerPartition, we can control the max rate 
> limit. However, we can not ignore these messages out of limit. These messages 
> will be consumed in next iteration. We have a use case that we need to ignore 
> these messages and process latest messages in next iteration.
> In other words, we simply want to consume part of messages in each iteration 
> and ignore remaining messages that are not consumed.
> We add an option for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11698) Add option to ignore kafka messages that are out of limit rate

2015-11-12 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003301#comment-15003301
 ] 

Felix Cheung commented on SPARK-11698:
--

Should we ignore message? Doesn't that mean possible data loss?

> Add option to ignore kafka messages that are out of limit rate
> --
>
> Key: SPARK-11698
> URL: https://issues.apache.org/jira/browse/SPARK-11698
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Liang-Chi Hsieh
>
> With spark.streaming.kafka.maxRatePerPartition, we can control the max rate 
> limit. However, we can not ignore these messages out of limit. These messages 
> will be consumed in next iteration. We have a use case that we need to ignore 
> these messages and process latest messages in next iteration.
> In other words, we simply want to consume part of messages in each iteration 
> and ignore remaining messages that are not consumed.
> We add an option for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11698) Add option to ignore kafka messages that are out of limit rate

2015-11-12 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003478#comment-15003478
 ] 

Liang-Chi Hsieh commented on SPARK-11698:
-

Yes, but it is intentional. We don't want to increase data latency due to heavy 
data loading. So we need to ignore some data in each iteration and keep 
consuming latest data.

> Add option to ignore kafka messages that are out of limit rate
> --
>
> Key: SPARK-11698
> URL: https://issues.apache.org/jira/browse/SPARK-11698
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Liang-Chi Hsieh
>
> With spark.streaming.kafka.maxRatePerPartition, we can control the max rate 
> limit. However, we can not ignore these messages out of limit. These messages 
> will be consumed in next iteration. We have a use case that we need to ignore 
> these messages and process latest messages in next iteration.
> In other words, we simply want to consume part of messages in each iteration 
> and ignore remaining messages that are not consumed.
> We add an option for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11698) Add option to ignore kafka messages that are out of limit rate

2015-11-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002318#comment-15002318
 ] 

Apache Spark commented on SPARK-11698:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/9665

> Add option to ignore kafka messages that are out of limit rate
> --
>
> Key: SPARK-11698
> URL: https://issues.apache.org/jira/browse/SPARK-11698
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Liang-Chi Hsieh
>
> With spark.streaming.kafka.maxRatePerPartition, we can control the max rate 
> limit. However, we can not ignore these messages out of limit. These messages 
> will be consumed in next iteration. We have a use case that we need to ignore 
> these messages and process latest messages in next iteration.
> In other words, we simply want to consume part of messages in each iteration 
> and ignore remaining messages that are not consumed.
> We add an option for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org