Re: on shark, is tachyon less efficient than memory_only cache strategy ?

Haoyuan Li Sun, 13 Jul 2014 11:48:06 -0700

Qingyang,

Are you asking Spark or Shark (The first email was "Shark", the last email
was "Spark".)?


Best,

Haoyuan


On Wed, Jul 9, 2014 at 7:40 PM, qingyang li <[email protected]>
wrote:

> could i set some cache policy to let spark load data from tachyon only one
> time for all sql query?  for example by using CacheAllPolicy
> FIFOCachePolicy LRUCachePolicy.  But I have tried that three policy, they
> are not useful.
> I think , if spark always load data for each sql query,  it will impact the
> query speed , it will take more time than the case that data are managed by
> spark itself.
>
>
>
>
> 2014-07-09 1:19 GMT+08:00 Haoyuan Li <[email protected]>:
>
> > Yes. For Shark, two modes, "shark.cache=tachyon" and
> "shark.cache=memory",
> > have the same ser/de overhead. Shark loads data from outsize of the
> process
> > in Tachyon mode with the following benefits:
> >
> >
> >    - In-memory data sharing across multiple Shark instances (i.e.
> stronger
> >    isolation)
> >    - Instant recovery of in-memory tables
> >    - Reduce heap size => faster GC in shark
> >    - If the table is larger than the memory size, only the hot columns
> will
> >    be cached in memory
> >
> > from http://tachyon-project.org/master/Running-Shark-on-Tachyon.html and
> > https://github.com/amplab/shark/wiki/Running-Shark-with-Tachyon
> >
> > Haoyuan
> >
> >
> > On Tue, Jul 8, 2014 at 9:58 AM, Aaron Davidson <[email protected]>
> wrote:
> >
> > > Shark's in-memory format is already serialized (it's compressed and
> > > column-based).
> > >
> > >
> > > On Tue, Jul 8, 2014 at 9:50 AM, Mridul Muralidharan <[email protected]>
> > > wrote:
> > >
> > > > You are ignoring serde costs :-)
> > > >
> > > > - Mridul
> > > >
> > > > On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson <[email protected]>
> > > wrote:
> > > > > Tachyon should only be marginally less performant than memory_only,
> > > > because
> > > > > we mmap the data from Tachyon's ramdisk. We do not have to, say,
> > > transfer
> > > > > the data over a pipe from Tachyon; we can directly read from the
> > > buffers
> > > > in
> > > > > the same way that Shark reads from its in-memory columnar format.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jul 8, 2014 at 1:18 AM, qingyang li <
> > [email protected]>
> > > > > wrote:
> > > > >
> > > > >> hi, when i create a table, i can point the cache strategy using
> > > > >> shark.cache,
> > > > >> i think "shark.cache=memory_only"  means data are managed by
> spark,
> > > and
> > > > >> data are in the same jvm with excutor;   while
> >  "shark.cache=tachyon"
> > > > >>  means  data are managed by tachyon which is off heap, and data
> are
> > > not
> > > > in
> > > > >> the same jvm with excutor,  so spark will load data from tachyon
> for
> > > > each
> > > > >> query sql , so,  is  tachyon less efficient than memory_only cache
> > > > strategy
> > > > >>  ?
> > > > >> if yes, can we let spark load all data once from tachyon  for all
> > sql
> > > > query
> > > > >>  if i want to use tachyon cache strategy since tachyon is more HA
> > than
> > > > >> memory_only ?
> > > > >>
> > > >
> > >
> >
> >
> >
> > --
> > Haoyuan Li
> > AMPLab, EECS, UC Berkeley
> > http://www.cs.berkeley.edu/~haoyuan/
> >
>



-- 
Haoyuan Li
AMPLab, EECS, UC Berkeley
http://www.cs.berkeley.edu/~haoyuan/

Re: on shark, is tachyon less efficient than memory_only cache strategy ?

Reply via email to