Hi Likun, Yes, Likun we better keep dictionary as default until we optimize no-dictionary columns. As you mentioned we can suggest 2-pass for first load and subsequent loads will use single-pass to improve the performance.
Regards, Ravindra. On 2 March 2017 at 06:48, Jacky Li <jacky.li...@qq.com> wrote: > Hi Ravindra & Vishal, > > Yes, I think these works need to be done before switching no-dictionary as > default. So as of now, we should use dictionary as default. > I think we can suggest user to do loading as: > 1. First load: use 2-pass mode to load, the first scan should discover the > cardinality, and check with user specified option. We should define rules > to pass or fail the validation, and finalize the load option for subsequent > load. > 2. Subsequent load: use single-pass mode to load, use the options defined > by first load > > What is your idea? > > Regards, > Jacky > > > 在 2017年3月1日,下午11:31,Ravindra Pesala <ravi.pes...@gmail.com> 写道: > > > > Hi Vishal, > > > > You are right, thats why we can do no-dictionary only for String > datatype. > > Please look at my first point. we can always use direct dictionary for > > possible data types like short, int, long, double & float for > sort_columns. > > > > Regards, > > Ravindra. > > > > On 1 March 2017 at 18:18, Kumar Vishal <kumarvishal1...@gmail.com> > wrote: > > > >> Hi Ravi, > >> Sorting of data for no dictionary should be based on data type + same > for > >> filter . Please add this point. > >> > >> -Regards > >> Kumar Vishal > >> > >> On Wed, Mar 1, 2017 at 8:34 PM, Ravindra Pesala <ravi.pes...@gmail.com> > >> wrote: > >> > >>> Hi, > >>> > >>> In order to make non-dictionary columns storage and performance more > >>> efficient, I am suggesting following improvements. > >>> > >>> 1. Make always SHORT, INT, BIGINT, DOUBLE & FLOAT always direct > >>> dictionary. > >>> Right now only date and timestamp are direct dictionary columns. We > >> can > >>> make SHORT, INT, BIGINT, DOUBLE & FLOAT Direct dictionary if these > >> columns > >>> are included in SORT_COLUMNS > >>> > >>> 2. Consider delta/value compression while storing direct dictionary > >> values. > >>> Right now it always uses INT datatype to store direct dictionary > values. > >> So > >>> we can consider value/Delta compression to compact the storage. > >>> > >>> 3. Use the Separator instead of LV format to store String value in > >>> no-dictionary format. > >>> Currently String datatypes for non-dictionary colums are stored as > >>> LV(length value) format, here we are using Short(2 bytes) as length > >> always. > >>> In order to keep storage compact we can use separator (0 byte as > >> separator) > >>> it just takes single byte. And while reading we can traverse through > data > >>> and get the offsets like we are doing now. > >>> > >>> 4. Add Range filters for no-dictionary columns. > >>> Currently range filters like greater/ less than filters are not > >> implemented > >>> for no-dictionary columns. So we should implement them to avoid row > level > >>> filter and improve the performance. > >>> > >>> Regards, > >>> Ravindra. > >>> > >> > > > > > > -- > > Thanks & Regards, > > Ravi > > > > -- Thanks & Regards, Ravi