Hi Mridul, Do you mean the scenario that different Spark applications need to read the same raw data, which is stored in a remote cluster or machines. And the goal is to load the remote raw data only once?
Haoyuan On Sat, Apr 5, 2014 at 4:30 PM, Mridul Muralidharan <mri...@gmail.com>wrote: > Hi, > > We have a requirement to use a (potential) ephemeral storage, which > is not within the VM, which is strongly tied to a worker node. So > source of truth for a block would still be within spark; but to > actually do computation, we would need to copy data to external device > (where it might lie around for a while : so data locality really > really helps if we can avoid a subsequent copy if it is already > present on computations on same block again). > > I was wondering if the recently added storage level for tachyon would > help in this case (note, tachyon wont help; just the storage level > might). > What sort of guarantees does it provide ? How extensible is it ? Or is > it strongly tied to tachyon with only a generic name ? > > > Thanks, > Mridul > -- Haoyuan Li Algorithms, Machines, People Lab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/