Re: send transformed RDD to s3 from slaves

2015-11-16 Thread Walrus theCat
Update:

You can now answer this on stackoverflow for 100 bounty:

http://stackoverflow.com/questions/33704073/how-to-send-transformed-data-from-partitions-to-s3

On Fri, Nov 13, 2015 at 4:56 PM, Walrus theCat 
wrote:

> Hi,
>
> I have an RDD which crashes the driver when being collected.  I want to
> send the data on its partitions out to S3 without bringing it back to the
> driver. I try calling rdd.foreachPartition, but the data that gets sent has
> not gone through the chain of transformations that I need.  It's the data
> as it was ingested initially.  After specifying my chain of
> transformations, but before calling foreachPartition, I call rdd.count in
> order to force the RDD to transform.  The data it sends out is still not
> transformed.  How do I get the RDD to send out transformed data when
> calling foreachPartition?
>
> Thanks
>


Re: send transformed RDD to s3 from slaves

2015-11-14 Thread Ajay
Hi Walrus,

Try caching the results just before calling the rdd.count.

Regards,
Ajay

> On Nov 13, 2015, at 7:56 PM, Walrus theCat  wrote:
> 
> Hi,
> 
> I have an RDD which crashes the driver when being collected.  I want to send 
> the data on its partitions out to S3 without bringing it back to the driver. 
> I try calling rdd.foreachPartition, but the data that gets sent has not gone 
> through the chain of transformations that I need.  It's the data as it was 
> ingested initially.  After specifying my chain of transformations, but before 
> calling foreachPartition, I call rdd.count in order to force the RDD to 
> transform.  The data it sends out is still not transformed.  How do I get the 
> RDD to send out transformed data when calling foreachPartition?
> 
> Thanks

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: send transformed RDD to s3 from slaves

2015-11-14 Thread Andrew Ehrlich
Maybe you want to be using rdd.saveAsTextFile() ?

> On Nov 13, 2015, at 4:56 PM, Walrus theCat  wrote:
> 
> Hi,
> 
> I have an RDD which crashes the driver when being collected.  I want to send 
> the data on its partitions out to S3 without bringing it back to the driver. 
> I try calling rdd.foreachPartition, but the data that gets sent has not gone 
> through the chain of transformations that I need.  It's the data as it was 
> ingested initially.  After specifying my chain of transformations, but before 
> calling foreachPartition, I call rdd.count in order to force the RDD to 
> transform.  The data it sends out is still not transformed.  How do I get the 
> RDD to send out transformed data when calling foreachPartition?
> 
> Thanks



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



send transformed RDD to s3 from slaves

2015-11-13 Thread Walrus theCat
Hi,

I have an RDD which crashes the driver when being collected.  I want to
send the data on its partitions out to S3 without bringing it back to the
driver. I try calling rdd.foreachPartition, but the data that gets sent has
not gone through the chain of transformations that I need.  It's the data
as it was ingested initially.  After specifying my chain of
transformations, but before calling foreachPartition, I call rdd.count in
order to force the RDD to transform.  The data it sends out is still not
transformed.  How do I get the RDD to send out transformed data when
calling foreachPartition?

Thanks