subject:"MapValues and Shuffle Reads"

Re: MapValues and Shuffle Reads

2015-02-17 Thread Imran Rashid

: MapValues and Shuffle Reads Hi Darin, When you say you see 400GB of shuffle writes from the first code snippet, what do you mean? There is no action in that first set, so it won't do anything. By itself, it won't do any shuffle writing, or anything else for that matter. Most likely

MapValues and Shuffle Reads

2015-02-17 Thread Darin McBeath

In the following code, I read in a large sequence file from S3 (1TB) spread across 1024 partitions. When I look at the job/stage summary, I see about 400GB of shuffle writes which seems to make sense as I'm doing a hash partition on this file. // Get the baseline input file

Re: MapValues and Shuffle Reads

2015-02-17 Thread Imran Rashid

Hi Darin, When you say you see 400GB of shuffle writes from the first code snippet, what do you mean? There is no action in that first set, so it won't do anything. By itself, it won't do any shuffle writing, or anything else for that matter. Most likely, the .count() on your second code

Re: MapValues and Shuffle Reads

2015-02-17 Thread Darin McBeath

really want to do in the first place. Thanks again for your insights. Darin. From: Imran Rashid iras...@cloudera.com To: Darin McBeath ddmcbe...@yahoo.com Cc: User user@spark.apache.org Sent: Tuesday, February 17, 2015 3:29 PM Subject: Re: MapValues and Shuffle

Re: MapValues and Shuffle Reads

MapValues and Shuffle Reads

Re: MapValues and Shuffle Reads

Re: MapValues and Shuffle Reads

4 matches

Site Navigation

Mail list logo

Footer information