Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

shyla deshpande Mon, 07 Aug 2017 19:32:52 -0700

Thanks TD for the response. I forgot to mention that I am not using
structured streaming.

I was looking into KafkaUtils.createRDD, and looks like I need to get the
earliest and the latest offset for each partition to build the
Array(offsetRange). I wanted to know if there was a easier way.

1 reason why we are hesitating to use structured streaming is because I
need to persist the data in Cassandra database which I believe is not
production ready.

On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> Its best to use DataFrames. You can read from as streaming or as batch.
> More details here.
>
> https://spark.apache.org/docs/latest/structured-streaming-
> kafka-integration.html#creating-a-kafka-source-for-batch-queries
> https://databricks.com/blog/2017/04/26/processing-data-in-
> apache-kafka-with-structured-streaming-in-apache-spark-2-2.html
>
> On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande <deshpandesh...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> What is the easiest way to read all the data from kafka in a batch
>> program for a given topic?
>> I have 10 kafka partitions, but the data is not much. I would like to
>> read  from the earliest from all the partitions for a topic.
>>
>> I appreciate any help. Thanks
>>
>
>

Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

Reply via email to