Re: How to stream all data out of a Kafka topic once, then terminate job?

ayan guha Wed, 29 Apr 2015 03:13:38 -0700

I guess what you mean is not streaming.  If you create a stream context at
time t, you will receive data coming through starting time t++, not before
time t.


Looks like you want a queue. Let Kafka write to a queue, consume msgs from
the queue and stop when queue is empty.
On 29 Apr 2015 14:35, "dgoldenberg" <dgoldenberg...@gmail.com> wrote:

> Hi,
>
> I'm wondering about the use-case where you're not doing continuous,
> incremental streaming of data out of Kafka but rather want to publish data
> once with your Producer(s) and consume it once, in your Consumer, then
> terminate the consumer Spark job.
>
> JavaStreamingContext jssc = new JavaStreamingContext(sparkConf,
> Durations.milliseconds(...));
>
> The batchDuration parameter is "The time interval at which streaming data
> will be divided into batches". Can this be worked somehow to cause Spark
> Streaming to just get all the available data, then let all the RDD's within
> the Kafka discretized stream get processed, and then just be done and
> terminate, rather than wait another period and try and process any more
> data
> from Kafka?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-stream-all-data-out-of-a-Kafka-topic-once-then-terminate-job-tp22698.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: How to stream all data out of a Kafka topic once, then terminate job?

Reply via email to