I think i want to use persist then and write my intermediate RDDs to disk+mem.
On Wed, Jul 1, 2015 at 8:28 AM, Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > I think persist api is internal to rdd whereas write api is for saving > content on dist. > Rdd persist will dump your obj bytes serialized on the disk.. If you wanna > change that behavior you need to override the class serialization that your > are storing in rdd.. > On Jul 1, 2015 8:50 PM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepuj...@gmail.com> wrote: > >> This is my write API. how do i integrate it here. >> >> >> protected def writeOutputRecords(detailRecords: >> RDD[(AvroKey[DetailOutputRecord], NullWritable)], outputDir: String) { >> val writeJob = new Job() >> val schema = SchemaUtil.outputSchema(_detail) >> AvroJob.setOutputKeySchema(writeJob, schema) >> val outputRecords = detailRecords.coalesce(100) >> outputRecords.saveAsNewAPIHadoopFile(outputDir, >> classOf[AvroKey[GenericRecord]], >> classOf[org.apache.hadoop.io.NullWritable], >> classOf[AvroKeyOutputFormat[GenericRecord]], >> writeJob.getConfiguration) >> } >> >> On Wed, Jul 1, 2015 at 8:11 AM, Koert Kuipers <ko...@tresata.com> wrote: >> >>> rdd.persist(StorageLevel.MEMORY_AND_DISK_SER) >>> >>> On Wed, Jul 1, 2015 at 11:01 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>> wrote: >>> >>>> How do i persist an RDD using StorageLevel.MEMORY_AND_DISK_SER ? >>>> >>>> >>>> -- >>>> Deepak >>>> >>>> >>> >> >> >> -- >> Deepak >> >> -- Deepak