[ https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275518#comment-17275518 ]
Jialin Qiao commented on IOTDB-1140: ------------------------------------ In this example: (1, 1000) ,(2, 1100),(3, 1800)(4, 1400),(5,1500) Time column is (1, 2, 3, 4, 5) value column is (1000, 1100, 1800, 1400, 1500) Two columns are encoded independently. If you eliminate the value 1800, timestamp 3 needs to removed too. (1, 1000) ,(2, 1100), (4, 1400),(5,1500) > optimize regular data encoding > ------------------------------ > > Key: IOTDB-1140 > URL: https://issues.apache.org/jira/browse/IOTDB-1140 > Project: Apache IoTDB > Issue Type: Improvement > Components: Core/Engine > Reporter: Chao Wang > Assignee: Chao Wang > Priority: Critical > > current regular data encoding algorithm: > # Calculate the difference between two adjacent values. The smallest > difference is used as the equal-frequency frequency. > # Determine the data range of this batch of data based on the difference > between the last value and the first value. > # Traverse this batch of data, use a BitSet, compare the difference between > two adjacent values with the same frequency, and save the value true by > default, > If the value is not equal to the equal frequency, calculate the number of > equal frequency differences and set the value to false at the corresponding > position, indicating that the point is a missing point. > > this algorithm only can identity missing point, if have error point , it > will throw exception.. > because BitSet only can do this thing, indicates whether the same frequency > exists in a segment of data > > But there is some optimize point.. > If there is an abnormal value in a column of values, the algorithm is > deviated if the difference is directly obtained to the minimum value. > sample: 1000,1100,1800,1400,1500... > current algorithm be do not use... > 1800 is a error point, we should identity error point, revise data. > revise data should be : 1000,1100,1300,1400,1500 -- This message was sent by Atlassian Jira (v8.3.4#803005)