Hi Ravi, Sorting of data for no dictionary should be based on data type + same for filter . Please add this point.
-Regards Kumar Vishal On Wed, Mar 1, 2017 at 8:34 PM, Ravindra Pesala <ravi.pes...@gmail.com> wrote: > Hi, > > In order to make non-dictionary columns storage and performance more > efficient, I am suggesting following improvements. > > 1. Make always SHORT, INT, BIGINT, DOUBLE & FLOAT always direct > dictionary. > Right now only date and timestamp are direct dictionary columns. We can > make SHORT, INT, BIGINT, DOUBLE & FLOAT Direct dictionary if these columns > are included in SORT_COLUMNS > > 2. Consider delta/value compression while storing direct dictionary values. > Right now it always uses INT datatype to store direct dictionary values. So > we can consider value/Delta compression to compact the storage. > > 3. Use the Separator instead of LV format to store String value in > no-dictionary format. > Currently String datatypes for non-dictionary colums are stored as > LV(length value) format, here we are using Short(2 bytes) as length always. > In order to keep storage compact we can use separator (0 byte as separator) > it just takes single byte. And while reading we can traverse through data > and get the offsets like we are doing now. > > 4. Add Range filters for no-dictionary columns. > Currently range filters like greater/ less than filters are not implemented > for no-dictionary columns. So we should implement them to avoid row level > filter and improve the performance. > > Regards, > Ravindra. >