Yeah, tachyon does sound like a good option here.  Especially if you have
nested data, its likely that parquet in tachyon will always be better
supported.

On Fri, Dec 19, 2014 at 2:17 PM, Sadhan Sood <sadhan.s...@gmail.com> wrote:
>
> Hey Michael,
>
> Thank you for clarifying that. Is tachyon the right way to get compressed
> data in memory or should we explore the option of adding compression to
> cached data. This is because our uncompressed data set is too big to fit in
> memory right now. I see the benefit of tachyon not just with storing
> compressed data in memory but we wouldn't have to create a separate table
> for caching some partitions like 'cache table table_cached as select * from
> table where date = 201412XX' - the way we are doing right now.
>
>
> On Thu, Dec 18, 2014 at 6:46 PM, Michael Armbrust <mich...@databricks.com>
> wrote:
>>
>> There is only column level encoding (run length encoding, delta encoding,
>> dictionary encoding) and no generic compression.
>>
>> On Thu, Dec 18, 2014 at 12:07 PM, Sadhan Sood <sadhan.s...@gmail.com>
>> wrote:
>>>
>>> Hi All,
>>>
>>> Wondering if when caching a table backed by lzo compressed parquet data,
>>> if spark also compresses it (using lzo/gzip/snappy) along with column level
>>> encoding or just does the column level encoding when 
>>> "*spark.sql.inMemoryColumnarStorage.compressed"
>>> *is set to true. This is because when I try to cache the data, I notice
>>> the memory being used is almost as much as the uncompressed size of the
>>> data.
>>>
>>> Thanks!
>>>
>>

Reply via email to