Re: send transformed RDD to s3 from slaves
Update: You can now answer this on stackoverflow for 100 bounty: http://stackoverflow.com/questions/33704073/how-to-send-transformed-data-from-partitions-to-s3 On Fri, Nov 13, 2015 at 4:56 PM, Walrus theCatwrote: > Hi, > > I have an RDD which crashes the driver when being collected. I want to > send the data on its partitions out to S3 without bringing it back to the > driver. I try calling rdd.foreachPartition, but the data that gets sent has > not gone through the chain of transformations that I need. It's the data > as it was ingested initially. After specifying my chain of > transformations, but before calling foreachPartition, I call rdd.count in > order to force the RDD to transform. The data it sends out is still not > transformed. How do I get the RDD to send out transformed data when > calling foreachPartition? > > Thanks >
Re: send transformed RDD to s3 from slaves
Hi Walrus, Try caching the results just before calling the rdd.count. Regards, Ajay > On Nov 13, 2015, at 7:56 PM, Walrus theCatwrote: > > Hi, > > I have an RDD which crashes the driver when being collected. I want to send > the data on its partitions out to S3 without bringing it back to the driver. > I try calling rdd.foreachPartition, but the data that gets sent has not gone > through the chain of transformations that I need. It's the data as it was > ingested initially. After specifying my chain of transformations, but before > calling foreachPartition, I call rdd.count in order to force the RDD to > transform. The data it sends out is still not transformed. How do I get the > RDD to send out transformed data when calling foreachPartition? > > Thanks - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: send transformed RDD to s3 from slaves
Maybe you want to be using rdd.saveAsTextFile() ? > On Nov 13, 2015, at 4:56 PM, Walrus theCatwrote: > > Hi, > > I have an RDD which crashes the driver when being collected. I want to send > the data on its partitions out to S3 without bringing it back to the driver. > I try calling rdd.foreachPartition, but the data that gets sent has not gone > through the chain of transformations that I need. It's the data as it was > ingested initially. After specifying my chain of transformations, but before > calling foreachPartition, I call rdd.count in order to force the RDD to > transform. The data it sends out is still not transformed. How do I get the > RDD to send out transformed data when calling foreachPartition? > > Thanks - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
send transformed RDD to s3 from slaves
Hi, I have an RDD which crashes the driver when being collected. I want to send the data on its partitions out to S3 without bringing it back to the driver. I try calling rdd.foreachPartition, but the data that gets sent has not gone through the chain of transformations that I need. It's the data as it was ingested initially. After specifying my chain of transformations, but before calling foreachPartition, I call rdd.count in order to force the RDD to transform. The data it sends out is still not transformed. How do I get the RDD to send out transformed data when calling foreachPartition? Thanks