Re: ephemeral storage level in spark ?

2014-04-06 Thread Matei Zaharia
The off-heap storage level is currently tied to Tachyon, but it might support 
other forms of off-heap storage later. However it’s not really designed to be 
mixed with the other ones. For this use case you may want to rely on memory 
locality and have some custom code to push the data to the accelerator. If you 
can think of a way to extend the storage level concept to handle this that 
would be general though, do send a proposal.

Matei

On Apr 5, 2014, at 5:14 PM, Mridul Muralidharan mri...@gmail.com wrote:

 No, I am thinking along lines of writing to an accelerator card or
 dedicated card with its own memory.
 
 Regards,
 Mridul
 On Apr 6, 2014 5:19 AM, Haoyuan Li haoyuan...@gmail.com wrote:
 
 Hi Mridul,
 
 Do you mean the scenario that different Spark applications need to read the
 same raw data, which is stored in a remote cluster or machines. And the
 goal is to load the remote raw data only once?
 
 Haoyuan
 
 
 On Sat, Apr 5, 2014 at 4:30 PM, Mridul Muralidharan mri...@gmail.com
 wrote:
 
 Hi,
 
  We have a requirement to use a (potential) ephemeral storage, which
 is not within the VM, which is strongly tied to a worker node. So
 source of truth for a block would still be within spark; but to
 actually do computation, we would need to copy data to external device
 (where it might lie around for a while : so data locality really
 really helps if we can avoid a subsequent copy if it is already
 present on computations on same block again).
 
 I was wondering if the recently added storage level for tachyon would
 help in this case (note, tachyon wont help; just the storage level
 might).
 What sort of guarantees does it provide ? How extensible is it ? Or is
 it strongly tied to tachyon with only a generic name ?
 
 
 Thanks,
 Mridul
 
 
 
 
 --
 Haoyuan Li
 Algorithms, Machines, People Lab, EECS, UC Berkeley
 http://www.cs.berkeley.edu/~haoyuan/
 



ephemeral storage level in spark ?

2014-04-05 Thread Mridul Muralidharan
Hi,

  We have a requirement to use a (potential) ephemeral storage, which
is not within the VM, which is strongly tied to a worker node. So
source of truth for a block would still be within spark; but to
actually do computation, we would need to copy data to external device
(where it might lie around for a while : so data locality really
really helps if we can avoid a subsequent copy if it is already
present on computations on same block again).

I was wondering if the recently added storage level for tachyon would
help in this case (note, tachyon wont help; just the storage level
might).
What sort of guarantees does it provide ? How extensible is it ? Or is
it strongly tied to tachyon with only a generic name ?


Thanks,
Mridul


Re: ephemeral storage level in spark ?

2014-04-05 Thread Haoyuan Li
Hi Mridul,

Do you mean the scenario that different Spark applications need to read the
same raw data, which is stored in a remote cluster or machines. And the
goal is to load the remote raw data only once?

Haoyuan


On Sat, Apr 5, 2014 at 4:30 PM, Mridul Muralidharan mri...@gmail.comwrote:

 Hi,

   We have a requirement to use a (potential) ephemeral storage, which
 is not within the VM, which is strongly tied to a worker node. So
 source of truth for a block would still be within spark; but to
 actually do computation, we would need to copy data to external device
 (where it might lie around for a while : so data locality really
 really helps if we can avoid a subsequent copy if it is already
 present on computations on same block again).

 I was wondering if the recently added storage level for tachyon would
 help in this case (note, tachyon wont help; just the storage level
 might).
 What sort of guarantees does it provide ? How extensible is it ? Or is
 it strongly tied to tachyon with only a generic name ?


 Thanks,
 Mridul




-- 
Haoyuan Li
Algorithms, Machines, People Lab, EECS, UC Berkeley
http://www.cs.berkeley.edu/~haoyuan/


Re: ephemeral storage level in spark ?

2014-04-05 Thread Mridul Muralidharan
No, I am thinking along lines of writing to an accelerator card or
dedicated card with its own memory.

Regards,
Mridul
On Apr 6, 2014 5:19 AM, Haoyuan Li haoyuan...@gmail.com wrote:

 Hi Mridul,

 Do you mean the scenario that different Spark applications need to read the
 same raw data, which is stored in a remote cluster or machines. And the
 goal is to load the remote raw data only once?

 Haoyuan


 On Sat, Apr 5, 2014 at 4:30 PM, Mridul Muralidharan mri...@gmail.com
 wrote:

  Hi,
 
We have a requirement to use a (potential) ephemeral storage, which
  is not within the VM, which is strongly tied to a worker node. So
  source of truth for a block would still be within spark; but to
  actually do computation, we would need to copy data to external device
  (where it might lie around for a while : so data locality really
  really helps if we can avoid a subsequent copy if it is already
  present on computations on same block again).
 
  I was wondering if the recently added storage level for tachyon would
  help in this case (note, tachyon wont help; just the storage level
  might).
  What sort of guarantees does it provide ? How extensible is it ? Or is
  it strongly tied to tachyon with only a generic name ?
 
 
  Thanks,
  Mridul
 



 --
 Haoyuan Li
 Algorithms, Machines, People Lab, EECS, UC Berkeley
 http://www.cs.berkeley.edu/~haoyuan/