Re: [KafkaRDD]: rdd.cache() does not seem to work

2016-01-13 Thread Tathagata Das
and then save this rdd to several hdfs locations. > >> But it seems that KafkaRDD is fetching data from kafka broker every > time I call saveAsNewAPIHadoopFile. > >> > >> How can I cache data from Kafka in memory? > >> > >> P.S. When I do repartition a

Re: [KafkaRDD]: rdd.cache() does not seem to work

2016-01-12 Thread Понькин Алексей
ry? >> >> P.S. When I do repartition add it seems to work properly( read kafka only >> once) but spark store shuffled data localy. >> Is it possible to keep data in memory? >> >>

Re: [KafkaRDD]: rdd.cache() does not seem to work

2016-01-11 Thread charles li
> once) but spark store shuffled data localy. > Is it possible to keep data in memory? > > ------ > View this message in context: [KafkaRDD]: rdd.cache() does not seem to > work > <http://apache-spark-user-list.1001560.n3.nabble.com/KafkaRDD-rdd-cache-

[KafkaRDD]: rdd.cache() does not seem to work

2016-01-11 Thread ponkin
Hi, Here is my use case : I have kafka topic. The job is fairly simple - it reads topic and save data to several hdfs paths. I create rdd with the following code val r = KafkaUtils.createRDD[Array[Byte],Array[Byte],DefaultDecoder,DefaultDecoder](context,kafkaParams,range) Then I am trying to