Re: Improving Non-dictionary storage & performance.

Kumar Vishal Wed, 01 Mar 2017 04:48:33 -0800

Hi Ravi,
Sorting of data for no dictionary should be based on data type + same for
filter . Please add this point.


-Regards
Kumar Vishal

On Wed, Mar 1, 2017 at 8:34 PM, Ravindra Pesala <ravi.pes...@gmail.com>
wrote:

> Hi,
>
> In order to make non-dictionary columns storage and performance more
> efficient, I am suggesting following improvements.
>
> 1. Make always SHORT, INT, BIGINT, DOUBLE & FLOAT always  direct
> dictionary.
>    Right now only date and timestamp are direct dictionary columns. We can
> make SHORT, INT, BIGINT, DOUBLE & FLOAT Direct dictionary if these columns
> are included in SORT_COLUMNS
>
> 2. Consider delta/value compression while storing direct dictionary values.
> Right now it always uses INT datatype to store direct dictionary values. So
> we can consider value/Delta compression to compact the storage.
>
> 3. Use the Separator instead of LV format to store String value in
> no-dictionary format.
> Currently String datatypes for non-dictionary colums are stored as
> LV(length value) format, here we are using Short(2 bytes) as length always.
> In order to keep storage compact we can use separator (0 byte as separator)
> it just takes single byte. And while reading we can traverse through data
> and get the offsets like we are doing now.
>
> 4. Add Range filters for no-dictionary columns.
> Currently range filters like greater/ less than filters are not implemented
> for no-dictionary columns. So we should implement them to avoid row level
> filter and improve the performance.
>
> Regards,
> Ravindra.
>

Re: Improving Non-dictionary storage & performance.

Reply via email to