So do you want to change the behavior of persist api or write the rdd on disk... On Jul 1, 2015 9:13 PM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepuj...@gmail.com> wrote:
> I think i want to use persist then and write my intermediate RDDs to > disk+mem. > > On Wed, Jul 1, 2015 at 8:28 AM, Raghavendra Pandey < > raghavendra.pan...@gmail.com> wrote: > >> I think persist api is internal to rdd whereas write api is for saving >> content on dist. >> Rdd persist will dump your obj bytes serialized on the disk.. If you >> wanna change that behavior you need to override the class serialization >> that your are storing in rdd.. >> On Jul 1, 2015 8:50 PM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepuj...@gmail.com> wrote: >> >>> This is my write API. how do i integrate it here. >>> >>> >>> protected def writeOutputRecords(detailRecords: >>> RDD[(AvroKey[DetailOutputRecord], NullWritable)], outputDir: String) { >>> val writeJob = new Job() >>> val schema = SchemaUtil.outputSchema(_detail) >>> AvroJob.setOutputKeySchema(writeJob, schema) >>> val outputRecords = detailRecords.coalesce(100) >>> outputRecords.saveAsNewAPIHadoopFile(outputDir, >>> classOf[AvroKey[GenericRecord]], >>> classOf[org.apache.hadoop.io.NullWritable], >>> classOf[AvroKeyOutputFormat[GenericRecord]], >>> writeJob.getConfiguration) >>> } >>> >>> On Wed, Jul 1, 2015 at 8:11 AM, Koert Kuipers <ko...@tresata.com> wrote: >>> >>>> rdd.persist(StorageLevel.MEMORY_AND_DISK_SER) >>>> >>>> On Wed, Jul 1, 2015 at 11:01 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>> wrote: >>>> >>>>> How do i persist an RDD using StorageLevel.MEMORY_AND_DISK_SER ? >>>>> >>>>> >>>>> -- >>>>> Deepak >>>>> >>>>> >>>> >>> >>> >>> -- >>> Deepak >>> >>> > > > -- > Deepak > >