Hi

Thank you started a good discussion.

I propose to do strict check mechanism to avoid these problems what you
mentioned in the below.
And the behavior should be same for both dimensions and measures. In a word
, need to process the actual data type as per users input.

Regards
Liang


manishgupta88 wrote
> Hi All,
> 
> Currently in carbon we treat Short and Int as long and at the time of
> storing in carbon data files delta compression is used which compresses
> the
> data based on min and max values of the column.
> 
> While parsing the values for these datatypes, we use Double data type
> parser and extract long value from that. Code snippet as below.
> Double.valueOf(msrValue).longValue()
> 
> This has the following problems.
> 
> 1. Measure Values beyond the range of Int and Short are parsed
> successfully. This behavior conflicts when the same measure is included as
> dictionary_include and becomes a dimension. When we query then each
> dimension value is parsed for its datatype for result conversion and at
> that time NumberFormatException is thrown and null is displayed in the
> result while for measure the loaded values are displayed. This also
> impacts
> aggregate queries. That is why strict check mechanism is adopted for
> dimensions values parsing.
> 
> 2. Data inconsistency  in case of measures as for decimal values, the
> value
> before decimal will only be considered for Int and Short datatypes.
> 
> 3. For measures, if values beyond the datatype range are allowed the
> compression will decrease.
> 
> Please comment as what should be the parsing behavior. Carbon should adopt
> a strict check mechanism or lenient check mechanism considering that the
> behavior should be same for both dimensions and measures as both are
> finally table columns.
> 
> Regards
> Manish Gupta





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Parsing-values-during-data-load-should-adopt-a-strict-check-or-lenient-check-mechanism-tp3826p3893.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Reply via email to