+1

I agree.

About non-dictionary column of sort_columns:
1. sort column data in ColumnChunk2

2. compress column by datatype
string: RLE or snappy (if RLE is not good)
short, int, bigint: Delta and number compressor (ValueCompressor and
NumberCompressor)
float, double:  Delta and snappy (ValueCompressor and SnappyCompressor)

3. store column by datatype:
string :  byte[], use null character separator
short, int, bigint: byte[], use max/min to calculate a fixed length to store
delta value
float, double: byte[], uncompressed to float[] or double[]

4. filter column
column level: ExcludeFilterExecuterImpl, IncludeFilterExecuterImpl,
RangeFilterExecuter
RangeFilterExecuter of column level should calculate the index range(start
and end) of sorted data chunk to get bitset of uncompressed result.

@Ravindra please correct me





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Improving-Non-dictionary-storage-performance-tp8146p8412.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Reply via email to