[ https://issues.apache.org/jira/browse/CARBONDATA-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ravindra Pesala resolved CARBONDATA-542. ---------------------------------------- > Parsing values for measures and dimensions during data load should adopt a > strict check > --------------------------------------------------------------------------------------- > > Key: CARBONDATA-542 > URL: https://issues.apache.org/jira/browse/CARBONDATA-542 > Project: CarbonData > Issue Type: Improvement > Reporter: Manish Gupta > Priority: Minor > Fix For: 1.0.0-incubating > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently in carbon we treat Short and Int as long and at the time of storing > in carbon data files delta compression is used which compresses the data > based on min and max values of the column. > While parsing the values for these datatypes, we use Double data type parser > and extract long value from that. Code snippet as below. > Double.valueOf(msrValue).longValue() > This has the following problems. > 1. Measure Values beyond the range of Int and Short are parsed successfully. > This behavior conflicts when the same measure is included as > dictionary_include and becomes a dimension. When we query then each dimension > value is parsed for its datatype for result conversion and at that time > NumberFormatException is thrown and null is displayed in the result while for > measure the loaded values are displayed. This also impacts aggregate queries. > That is why strict check mechanism is adopted for dimensions values parsing. > 2. Data inconsistency in case of measures as for decimal values, the value > before decimal will only be considered for Int and Short datatypes. > 3. For measures, if values beyond the datatype range are allowed the > compression will decrease. > Therefore we will have to adopt a strict behavior for both dimensions and > measures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)