Shark, thanks for replying. Let's me clear my question again. ---------------------------------------------- i create a table using " create table xxx1 tblproperties("shark.cache"="tachyon") as select * from xxx2" when excuting some sql (for example , select * from xxx1) using shark, shark will read data into shark's memory from tachyon's memory. I think if each time we execute sql, shark always load data from tachyon, it is less effient. could we use some cache policy (such as, CacheAllPolicy FIFOCachePolicy LRUCachePolicy ) to cache data to invoid reading data from tachyon for each sql query? ----------------------------------------------
2014-07-14 2:47 GMT+08:00 Haoyuan Li <haoyuan...@gmail.com>: > Qingyang, > > Are you asking Spark or Shark (The first email was "Shark", the last email > was "Spark".)? > > Best, > > Haoyuan > > > On Wed, Jul 9, 2014 at 7:40 PM, qingyang li <liqingyang1...@gmail.com> > wrote: > > > could i set some cache policy to let spark load data from tachyon only > one > > time for all sql query? for example by using CacheAllPolicy > > FIFOCachePolicy LRUCachePolicy. But I have tried that three policy, they > > are not useful. > > I think , if spark always load data for each sql query, it will impact > the > > query speed , it will take more time than the case that data are managed > by > > spark itself. > > > > > > > > > > 2014-07-09 1:19 GMT+08:00 Haoyuan Li <haoyuan...@gmail.com>: > > > > > Yes. For Shark, two modes, "shark.cache=tachyon" and > > "shark.cache=memory", > > > have the same ser/de overhead. Shark loads data from outsize of the > > process > > > in Tachyon mode with the following benefits: > > > > > > > > > - In-memory data sharing across multiple Shark instances (i.e. > > stronger > > > isolation) > > > - Instant recovery of in-memory tables > > > - Reduce heap size => faster GC in shark > > > - If the table is larger than the memory size, only the hot columns > > will > > > be cached in memory > > > > > > from http://tachyon-project.org/master/Running-Shark-on-Tachyon.html > and > > > https://github.com/amplab/shark/wiki/Running-Shark-with-Tachyon > > > > > > Haoyuan > > > > > > > > > On Tue, Jul 8, 2014 at 9:58 AM, Aaron Davidson <ilike...@gmail.com> > > wrote: > > > > > > > Shark's in-memory format is already serialized (it's compressed and > > > > column-based). > > > > > > > > > > > > On Tue, Jul 8, 2014 at 9:50 AM, Mridul Muralidharan < > mri...@gmail.com> > > > > wrote: > > > > > > > > > You are ignoring serde costs :-) > > > > > > > > > > - Mridul > > > > > > > > > > On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson <ilike...@gmail.com > > > > > > wrote: > > > > > > Tachyon should only be marginally less performant than > memory_only, > > > > > because > > > > > > we mmap the data from Tachyon's ramdisk. We do not have to, say, > > > > transfer > > > > > > the data over a pipe from Tachyon; we can directly read from the > > > > buffers > > > > > in > > > > > > the same way that Shark reads from its in-memory columnar format. > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 8, 2014 at 1:18 AM, qingyang li < > > > liqingyang1...@gmail.com> > > > > > > wrote: > > > > > > > > > > > >> hi, when i create a table, i can point the cache strategy using > > > > > >> shark.cache, > > > > > >> i think "shark.cache=memory_only" means data are managed by > > spark, > > > > and > > > > > >> data are in the same jvm with excutor; while > > > "shark.cache=tachyon" > > > > > >> means data are managed by tachyon which is off heap, and data > > are > > > > not > > > > > in > > > > > >> the same jvm with excutor, so spark will load data from tachyon > > for > > > > > each > > > > > >> query sql , so, is tachyon less efficient than memory_only > cache > > > > > strategy > > > > > >> ? > > > > > >> if yes, can we let spark load all data once from tachyon for > all > > > sql > > > > > query > > > > > >> if i want to use tachyon cache strategy since tachyon is more > HA > > > than > > > > > >> memory_only ? > > > > > >> > > > > > > > > > > > > > > > > > > > > > -- > > > Haoyuan Li > > > AMPLab, EECS, UC Berkeley > > > http://www.cs.berkeley.edu/~haoyuan/ > > > > > > > > > -- > Haoyuan Li > AMPLab, EECS, UC Berkeley > http://www.cs.berkeley.edu/~haoyuan/ >