Hi Thank you started a good discussion.
I propose to do strict check mechanism to avoid these problems what you mentioned in the below. And the behavior should be same for both dimensions and measures. In a word , need to process the actual data type as per users input. Regards Liang manishgupta88 wrote > Hi All, > > Currently in carbon we treat Short and Int as long and at the time of > storing in carbon data files delta compression is used which compresses > the > data based on min and max values of the column. > > While parsing the values for these datatypes, we use Double data type > parser and extract long value from that. Code snippet as below. > Double.valueOf(msrValue).longValue() > > This has the following problems. > > 1. Measure Values beyond the range of Int and Short are parsed > successfully. This behavior conflicts when the same measure is included as > dictionary_include and becomes a dimension. When we query then each > dimension value is parsed for its datatype for result conversion and at > that time NumberFormatException is thrown and null is displayed in the > result while for measure the loaded values are displayed. This also > impacts > aggregate queries. That is why strict check mechanism is adopted for > dimensions values parsing. > > 2. Data inconsistency in case of measures as for decimal values, the > value > before decimal will only be considered for Int and Short datatypes. > > 3. For measures, if values beyond the datatype range are allowed the > compression will decrease. > > Please comment as what should be the parsing behavior. Carbon should adopt > a strict check mechanism or lenient check mechanism considering that the > behavior should be same for both dimensions and measures as both are > finally table columns. > > Regards > Manish Gupta -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Parsing-values-during-data-load-should-adopt-a-strict-check-or-lenient-check-mechanism-tp3826p3893.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.