Re: Fully in-memory shuffles

2015-06-11 Thread Patrick Wendell
Hey Corey,

Yes, when shuffles are smaller than available memory to the OS, most
often the outputs never get stored to disk. I believe this holds same
for the YARN shuffle service, because the write path is actually the
same, i.e. we don't fsync the writes and force them to disk. I would
guess in such shuffles the bottleneck is serializing the data rather
than raw IO, so I'm not sure explicitly buffering the data in the JVM
process would yield a large improvement.

Writing shuffle to an explicitly pinned memory filesystem is also
possible (per Davies suggestion), but it's brittle because the job
will fail if shuffle output exceeds memory.

- Patrick

On Wed, Jun 10, 2015 at 9:50 PM, Davies Liu dav...@databricks.com wrote:
 If you have enough memory, you can put the temporary work directory in
 tempfs (in memory file system).

 On Wed, Jun 10, 2015 at 8:43 PM, Corey Nolet cjno...@gmail.com wrote:
 Ok so it is the case that small shuffles can be done without hitting any
 disk. Is this the same case for the aux shuffle service in yarn? Can that be
 done without hitting disk?

 On Wed, Jun 10, 2015 at 9:17 PM, Patrick Wendell pwend...@gmail.com wrote:

 In many cases the shuffle will actually hit the OS buffer cache and
 not ever touch spinning disk if it is a size that is less than memory
 on the machine.

 - Patrick

 On Wed, Jun 10, 2015 at 5:06 PM, Corey Nolet cjno...@gmail.com wrote:
  So with this... to help my understanding of Spark under the hood-
 
  Is this statement correct When data needs to pass between multiple
  JVMs, a
  shuffle will always hit disk?
 
  On Wed, Jun 10, 2015 at 10:11 AM, Josh Rosen rosenvi...@gmail.com
  wrote:
 
  There's a discussion of this at
  https://github.com/apache/spark/pull/5403
 
 
 
  On Wed, Jun 10, 2015 at 7:08 AM, Corey Nolet cjno...@gmail.com wrote:
 
  Is it possible to configure Spark to do all of its shuffling FULLY in
  memory (given that I have enough memory to store all the data)?
 
 
 
 
 



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Fully in-memory shuffles

2015-06-11 Thread Mark Hamstra

 I would guess in such shuffles the bottleneck is serializing the data
 rather than raw IO, so I'm not sure explicitly buffering the data in the
 JVM process would yield a large improvement.


 Good guess!  It is very hard to beat the performance of retrieving shuffle
outputs from the OS buffer cache, and very easy to actually make things
slower while trying to do so.

On Thu, Jun 11, 2015 at 4:33 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey Corey,

 Yes, when shuffles are smaller than available memory to the OS, most
 often the outputs never get stored to disk. I believe this holds same
 for the YARN shuffle service, because the write path is actually the
 same, i.e. we don't fsync the writes and force them to disk. I would
 guess in such shuffles the bottleneck is serializing the data rather
 than raw IO, so I'm not sure explicitly buffering the data in the JVM
 process would yield a large improvement.

 Writing shuffle to an explicitly pinned memory filesystem is also
 possible (per Davies suggestion), but it's brittle because the job
 will fail if shuffle output exceeds memory.

 - Patrick

 On Wed, Jun 10, 2015 at 9:50 PM, Davies Liu dav...@databricks.com wrote:
  If you have enough memory, you can put the temporary work directory in
  tempfs (in memory file system).
 
  On Wed, Jun 10, 2015 at 8:43 PM, Corey Nolet cjno...@gmail.com wrote:
  Ok so it is the case that small shuffles can be done without hitting any
  disk. Is this the same case for the aux shuffle service in yarn? Can
 that be
  done without hitting disk?
 
  On Wed, Jun 10, 2015 at 9:17 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  In many cases the shuffle will actually hit the OS buffer cache and
  not ever touch spinning disk if it is a size that is less than memory
  on the machine.
 
  - Patrick
 
  On Wed, Jun 10, 2015 at 5:06 PM, Corey Nolet cjno...@gmail.com
 wrote:
   So with this... to help my understanding of Spark under the hood-
  
   Is this statement correct When data needs to pass between multiple
   JVMs, a
   shuffle will always hit disk?
  
   On Wed, Jun 10, 2015 at 10:11 AM, Josh Rosen rosenvi...@gmail.com
   wrote:
  
   There's a discussion of this at
   https://github.com/apache/spark/pull/5403
  
  
  
   On Wed, Jun 10, 2015 at 7:08 AM, Corey Nolet cjno...@gmail.com
 wrote:
  
   Is it possible to configure Spark to do all of its shuffling FULLY
 in
   memory (given that I have enough memory to store all the data)?
  
  
  
  
  
 
 

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Fully in-memory shuffles

2015-06-10 Thread Josh Rosen
There's a discussion of this at https://github.com/apache/spark/pull/5403



On Wed, Jun 10, 2015 at 7:08 AM, Corey Nolet cjno...@gmail.com wrote:

 Is it possible to configure Spark to do all of its shuffling FULLY in
 memory (given that I have enough memory to store all the data)?






Re: Fully in-memory shuffles

2015-06-10 Thread Patrick Wendell
In many cases the shuffle will actually hit the OS buffer cache and
not ever touch spinning disk if it is a size that is less than memory
on the machine.

- Patrick

On Wed, Jun 10, 2015 at 5:06 PM, Corey Nolet cjno...@gmail.com wrote:
 So with this... to help my understanding of Spark under the hood-

 Is this statement correct When data needs to pass between multiple JVMs, a
 shuffle will always hit disk?

 On Wed, Jun 10, 2015 at 10:11 AM, Josh Rosen rosenvi...@gmail.com wrote:

 There's a discussion of this at https://github.com/apache/spark/pull/5403



 On Wed, Jun 10, 2015 at 7:08 AM, Corey Nolet cjno...@gmail.com wrote:

 Is it possible to configure Spark to do all of its shuffling FULLY in
 memory (given that I have enough memory to store all the data)?






-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Fully in-memory shuffles

2015-06-10 Thread Corey Nolet
So with this... to help my understanding of Spark under the hood-

Is this statement correct When data needs to pass between multiple JVMs, a
shuffle will *always* hit disk?

On Wed, Jun 10, 2015 at 10:11 AM, Josh Rosen rosenvi...@gmail.com wrote:

 There's a discussion of this at https://github.com/apache/spark/pull/5403



 On Wed, Jun 10, 2015 at 7:08 AM, Corey Nolet cjno...@gmail.com wrote:

 Is it possible to configure Spark to do all of its shuffling FULLY in
 memory (given that I have enough memory to store all the data)?







Re: Fully in-memory shuffles

2015-06-10 Thread Corey Nolet
Ok so it is the case that small shuffles can be done without hitting any
disk. Is this the same case for the aux shuffle service in yarn? Can that
be done without hitting disk?

On Wed, Jun 10, 2015 at 9:17 PM, Patrick Wendell pwend...@gmail.com wrote:

 In many cases the shuffle will actually hit the OS buffer cache and
 not ever touch spinning disk if it is a size that is less than memory
 on the machine.

 - Patrick

 On Wed, Jun 10, 2015 at 5:06 PM, Corey Nolet cjno...@gmail.com wrote:
  So with this... to help my understanding of Spark under the hood-
 
  Is this statement correct When data needs to pass between multiple
 JVMs, a
  shuffle will always hit disk?
 
  On Wed, Jun 10, 2015 at 10:11 AM, Josh Rosen rosenvi...@gmail.com
 wrote:
 
  There's a discussion of this at
 https://github.com/apache/spark/pull/5403
 
 
 
  On Wed, Jun 10, 2015 at 7:08 AM, Corey Nolet cjno...@gmail.com wrote:
 
  Is it possible to configure Spark to do all of its shuffling FULLY in
  memory (given that I have enough memory to store all the data)?