and then save this rdd to several hdfs locations.
> >> But it seems that KafkaRDD is fetching data from kafka broker every
> time I call saveAsNewAPIHadoopFile.
> >>
> >> How can I cache data from Kafka in memory?
> >>
> >> P.S. When I do repartition a
ry?
>>
>> P.S. When I do repartition add it seems to work properly( read kafka only
>> once) but spark store shuffled data localy.
>> Is it possible to keep data in memory?
>>
>>
> once) but spark store shuffled data localy.
> Is it possible to keep data in memory?
>
> ------
> View this message in context: [KafkaRDD]: rdd.cache() does not seem to
> work
> <http://apache-spark-user-list.1001560.n3.nabble.com/KafkaRDD-rdd-cache-
Hi,
Here is my use case :
I have kafka topic. The job is fairly simple - it reads topic and save data to
several hdfs paths.
I create rdd with the following code
val r =
KafkaUtils.createRDD[Array[Byte],Array[Byte],DefaultDecoder,DefaultDecoder](context,kafkaParams,range)
Then I am trying to