In my use case, if I need to stop spark streaming for a while, data would
accumulate a lot on kafka topic-partitions. After I restart spark streaming
job, the worker's heap will go out of memory on the fetch of the 1st batch.

I am wondering if

* Is there a way to throttle reading from kafka in spark streaming jobs?
* Is there a way to control how far Kafka Dstream can read on
topic-partition (via offset for example). By setting this to a small
number, it will force DStream to read less data initially.
* Is there a way to limit the consumption rate at Kafka side? (This one is
not actually for spark streaming and doesn't seem to be question in this
group. But I am raising it anyway here.)

I have looked at code example below but doesn't seem it is supported.

KafkaUtils.createStream ...
Thanks, All
-- 
Chen Song

Reply via email to