[jira] [Commented] (IOTDB-1140) optimize regular data encoding
[ https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275523#comment-17275523 ] Chao Wang commented on IOTDB-1140: -- Thanks, I know your mean.. I have some idea,we can think about it. # Revise to correct data, which better reflects the benefits of this algorithm. # This is what users want. They occasionally insert an incorrect line of data, which is not their intention, and the algorithm can give a corrent value, not delete it.. # when page write , The time-column is encoded first, and then the value-column is encoded. When find the value is incorrect, it is difficult to process remove timestamp. > optimize regular data encoding > -- > > Key: IOTDB-1140 > URL: https://issues.apache.org/jira/browse/IOTDB-1140 > Project: Apache IoTDB > Issue Type: Improvement > Components: Core/Engine >Reporter: Chao Wang >Assignee: Chao Wang >Priority: Critical > > current regular data encoding algorithm: > # Calculate the difference between two adjacent values. The smallest > difference is used as the equal-frequency frequency. > # Determine the data range of this batch of data based on the difference > between the last value and the first value. > # Traverse this batch of data, use a BitSet, compare the difference between > two adjacent values with the same frequency, and save the value true by > default, > If the value is not equal to the equal frequency, calculate the number of > equal frequency differences and set the value to false at the corresponding > position, indicating that the point is a missing point. > > this algorithm only can identity missing point, if have error point , it > will throw exception.. > because BitSet only can do this thing, indicates whether the same frequency > exists in a segment of data > > But there is some optimize point.. > If there is an abnormal value in a column of values, the algorithm is > deviated if the difference is directly obtained to the minimum value. > sample: 1000,1100,1800,1400,1500... > current algorithm be do not use... > 1800 is a error point, we should identity error point, revise data. > revise data should be : 1000,1100,1300,1400,1500 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-1140) optimize regular data encoding
[ https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275518#comment-17275518 ] Jialin Qiao commented on IOTDB-1140: In this example: (1, 1000) ,(2, 1100),(3, 1800)(4, 1400),(5,1500) Time column is (1, 2, 3, 4, 5) value column is (1000, 1100, 1800, 1400, 1500) Two columns are encoded independently. If you eliminate the value 1800, timestamp 3 needs to removed too. (1, 1000) ,(2, 1100), (4, 1400),(5,1500) > optimize regular data encoding > -- > > Key: IOTDB-1140 > URL: https://issues.apache.org/jira/browse/IOTDB-1140 > Project: Apache IoTDB > Issue Type: Improvement > Components: Core/Engine >Reporter: Chao Wang >Assignee: Chao Wang >Priority: Critical > > current regular data encoding algorithm: > # Calculate the difference between two adjacent values. The smallest > difference is used as the equal-frequency frequency. > # Determine the data range of this batch of data based on the difference > between the last value and the first value. > # Traverse this batch of data, use a BitSet, compare the difference between > two adjacent values with the same frequency, and save the value true by > default, > If the value is not equal to the equal frequency, calculate the number of > equal frequency differences and set the value to false at the corresponding > position, indicating that the point is a missing point. > > this algorithm only can identity missing point, if have error point , it > will throw exception.. > because BitSet only can do this thing, indicates whether the same frequency > exists in a segment of data > > But there is some optimize point.. > If there is an abnormal value in a column of values, the algorithm is > deviated if the difference is directly obtained to the minimum value. > sample: 1000,1100,1800,1400,1500... > current algorithm be do not use... > 1800 is a error point, we should identity error point, revise data. > revise data should be : 1000,1100,1300,1400,1500 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-1140) optimize regular data encoding
[ https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275517#comment-17275517 ] Chao Wang commented on IOTDB-1140: -- "should we abort the timestamp at the same time?" sorry, I do not know your mean... could you explain? Thanks... > optimize regular data encoding > -- > > Key: IOTDB-1140 > URL: https://issues.apache.org/jira/browse/IOTDB-1140 > Project: Apache IoTDB > Issue Type: Improvement > Components: Core/Engine >Reporter: Chao Wang >Assignee: Chao Wang >Priority: Critical > > current regular data encoding algorithm: > # Calculate the difference between two adjacent values. The smallest > difference is used as the equal-frequency frequency. > # Determine the data range of this batch of data based on the difference > between the last value and the first value. > # Traverse this batch of data, use a BitSet, compare the difference between > two adjacent values with the same frequency, and save the value true by > default, > If the value is not equal to the equal frequency, calculate the number of > equal frequency differences and set the value to false at the corresponding > position, indicating that the point is a missing point. > > this algorithm only can identity missing point, if have error point , it > will throw exception.. > because BitSet only can do this thing, indicates whether the same frequency > exists in a segment of data > > But there is some optimize point.. > If there is an abnormal value in a column of values, the algorithm is > deviated if the difference is directly obtained to the minimum value. > sample: 1000,1100,1800,1400,1500... > current algorithm be do not use... > 1800 is a error point, we should identity error point, revise data. > revise data should be : 1000,1100,1300,1400,1500 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-1140) optimize regular data encoding
[ https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275515#comment-17275515 ] Jialin Qiao commented on IOTDB-1140: Oh, the regular method is designed to encode the time column. Maybe we need to restrict the usage of this encoding to timestamp. Eliminate the wrong value is ok, should we abort the timestamp at the same time? > optimize regular data encoding > -- > > Key: IOTDB-1140 > URL: https://issues.apache.org/jira/browse/IOTDB-1140 > Project: Apache IoTDB > Issue Type: Improvement > Components: Core/Engine >Reporter: Chao Wang >Assignee: Chao Wang >Priority: Critical > > current regular data encoding algorithm: > # Calculate the difference between two adjacent values. The smallest > difference is used as the equal-frequency frequency. > # Determine the data range of this batch of data based on the difference > between the last value and the first value. > # Traverse this batch of data, use a BitSet, compare the difference between > two adjacent values with the same frequency, and save the value true by > default, > If the value is not equal to the equal frequency, calculate the number of > equal frequency differences and set the value to false at the corresponding > position, indicating that the point is a missing point. > > this algorithm only can identity missing point, if have error point , it > will throw exception.. > because BitSet only can do this thing, indicates whether the same frequency > exists in a segment of data > > But there is some optimize point.. > If there is an abnormal value in a column of values, the algorithm is > deviated if the difference is directly obtained to the minimum value. > sample: 1000,1100,1800,1400,1500... > current algorithm be do not use... > 1800 is a error point, we should identity error point, revise data. > revise data should be : 1000,1100,1300,1400,1500 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-1140) optimize regular data encoding
[ https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275512#comment-17275512 ] Chao Wang commented on IOTDB-1140: -- My idea is to consider that in normal regular encoding, there are occasional mutations of data or two, and we can eliminate that data. this is suite for user.. > optimize regular data encoding > -- > > Key: IOTDB-1140 > URL: https://issues.apache.org/jira/browse/IOTDB-1140 > Project: Apache IoTDB > Issue Type: Improvement > Components: Core/Engine >Reporter: Chao Wang >Assignee: Chao Wang >Priority: Critical > > current regular data encoding algorithm: > # Calculate the difference between two adjacent values. The smallest > difference is used as the equal-frequency frequency. > # Determine the data range of this batch of data based on the difference > between the last value and the first value. > # Traverse this batch of data, use a BitSet, compare the difference between > two adjacent values with the same frequency, and save the value true by > default, > If the value is not equal to the equal frequency, calculate the number of > equal frequency differences and set the value to false at the corresponding > position, indicating that the point is a missing point. > > this algorithm only can identity missing point, if have error point , it > will throw exception.. > because BitSet only can do this thing, indicates whether the same frequency > exists in a segment of data > > But there is some optimize point.. > If there is an abnormal value in a column of values, the algorithm is > deviated if the difference is directly obtained to the minimum value. > sample: 1000,1100,1800,1400,1500... > current algorithm be do not use... > 1800 is a error point, we should identity error point, revise data. > revise data should be : 1000,1100,1300,1400,1500 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-1140) optimize regular data encoding
[ https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275509#comment-17275509 ] Chao Wang commented on IOTDB-1140: -- thanks [~qiaojialin] , value is not sort.. sample: create timeseries root.db_0.tab0.salary with datatype=INT64,encoding=REGULAR ; (time, salary) == > (1, 1000) ,(2, 1100),(3, 1800)(4, 1400),(5,1500) only time series will be sorted by time, not salary. > optimize regular data encoding > -- > > Key: IOTDB-1140 > URL: https://issues.apache.org/jira/browse/IOTDB-1140 > Project: Apache IoTDB > Issue Type: Improvement > Components: Core/Engine >Reporter: Chao Wang >Assignee: Chao Wang >Priority: Critical > > current regular data encoding algorithm: > # Calculate the difference between two adjacent values. The smallest > difference is used as the equal-frequency frequency. > # Determine the data range of this batch of data based on the difference > between the last value and the first value. > # Traverse this batch of data, use a BitSet, compare the difference between > two adjacent values with the same frequency, and save the value true by > default, > If the value is not equal to the equal frequency, calculate the number of > equal frequency differences and set the value to false at the corresponding > position, indicating that the point is a missing point. > > this algorithm only can identity missing point, if have error point , it > will throw exception.. > because BitSet only can do this thing, indicates whether the same frequency > exists in a segment of data > > But there is some optimize point.. > If there is an abnormal value in a column of values, the algorithm is > deviated if the difference is directly obtained to the minimum value. > sample: 1000,1100,1800,1400,1500... > current algorithm be do not use... > 1800 is a error point, we should identity error point, revise data. > revise data should be : 1000,1100,1300,1400,1500 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-1140) optimize regular data encoding
[ https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275501#comment-17275501 ] Jialin Qiao commented on IOTDB-1140: Hi, in IoTDB, a sequence of time series will be sorted, and then encoding. So, out-of-order will not happen, it will be 1000,1100,1400,1500,1800。 Instead of this case, more complicated is: 1000, 1201, 1299, 1303 Data is not in fixed frequency. Maybe we could check the data, and using TS2_DIFF if it is not in fixed frequency? > optimize regular data encoding > -- > > Key: IOTDB-1140 > URL: https://issues.apache.org/jira/browse/IOTDB-1140 > Project: Apache IoTDB > Issue Type: Improvement > Components: Core/Engine >Reporter: Chao Wang >Assignee: Chao Wang >Priority: Critical > > current regular data encoding algorithm: > # Calculate the difference between two adjacent values. The smallest > difference is used as the equal-frequency frequency. > # Determine the data range of this batch of data based on the difference > between the last value and the first value. > # Traverse this batch of data, use a BitSet, compare the difference between > two adjacent values with the same frequency, and save the value true by > default, > If the value is not equal to the equal frequency, calculate the number of > equal frequency differences and set the value to false at the corresponding > position, indicating that the point is a missing point. > > this algorithm only can identity missing point, if have error point , it > will throw exception.. > because BitSet only can do this thing, indicates whether the same frequency > exists in a segment of data > > But there is some optimize point.. > If there is an abnormal value in a column of values, the algorithm is > deviated if the difference is directly obtained to the minimum value. > sample: 1000,1100,1800,1400,1500... > current algorithm be do not use... > 1800 is a error point, we should identity error point, revise data. > revise data should be : 1000,1100,1300,1400,1500 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-1140) optimize regular data encoding
[ https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275462#comment-17275462 ] Chao Wang commented on IOTDB-1140: -- [~jixuan1989] [~qiaojialin] hi , we can discuss this issue... > optimize regular data encoding > -- > > Key: IOTDB-1140 > URL: https://issues.apache.org/jira/browse/IOTDB-1140 > Project: Apache IoTDB > Issue Type: Improvement > Components: Core/Engine >Reporter: Chao Wang >Assignee: Chao Wang >Priority: Critical > > current regular data encoding algorithm: > # Calculate the difference between two adjacent values. The smallest > difference is used as the equal-frequency frequency. > # Determine the data range of this batch of data based on the difference > between the last value and the first value. > # Traverse this batch of data, use a BitSet, compare the difference between > two adjacent values with the same frequency, and save the value true by > default, > If the value is not equal to the equal frequency, calculate the number of > equal frequency differences and set the value to false at the corresponding > position, indicating that the point is a missing point. > > this algorithm only can identity missing point, if have error point , it can > not store.. > because BitSet only can do this thing, indicates whether the same frequency > exists in a segment of data > > But there is some optimize point.. > If there is an abnormal value in a column of values, the algorithm is > deviated if the difference is directly obtained to the minimum value. > sample: 1000,1100,1800,1400,1500... > current algorithm be do not use... > 1800 is a error point, we should identity error point, revise data. > revise data should be : 1000,1100,1300,1400,1500 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-1140) optimize regular data encoding
Chao Wang created IOTDB-1140: Summary: optimize regular data encoding Key: IOTDB-1140 URL: https://issues.apache.org/jira/browse/IOTDB-1140 Project: Apache IoTDB Issue Type: Improvement Components: Core/Engine Reporter: Chao Wang Assignee: Chao Wang current regular data encoding algorithm: # Calculate the difference between two adjacent values. The smallest difference is used as the equal-frequency frequency. # Determine the data range of this batch of data based on the difference between the last value and the first value. # Traverse this batch of data, use a BitSet, compare the difference between two adjacent values with the same frequency, and save the value true by default, If the value is not equal to the equal frequency, calculate the number of equal frequency differences and set the value to false at the corresponding position, indicating that the point is a missing point. this algorithm only can identity missing point, if have error point , it can not store.. because BitSet only can do this thing, indicates whether the same frequency exists in a segment of data But there is some optimize point.. If there is an abnormal value in a column of values, the algorithm is deviated if the difference is directly obtained to the minimum value. sample: 1000,1100,1800,1400,1500... current algorithm be do not use... 1800 is a error point, we should identity error point, revise data. revise data should be : 1000,1100,1300,1400,1500 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-1139) cluster - SELECT LAST null
[ https://issues.apache.org/jira/browse/IOTDB-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274324#comment-17274324 ] Houliang Qi commented on IOTDB-1139: !image-2021-01-29-18-56-08-148.png! I tried locally, however, it's ok sorry, I can not > cluster - SELECT LAST null > -- > > Key: IOTDB-1139 > URL: https://issues.apache.org/jira/browse/IOTDB-1139 > Project: Apache IoTDB > Issue Type: Bug >Reporter: Haimei Guo >Priority: Major > Attachments: image-2021-01-29-17-53-17-966.png, > image-2021-01-29-18-53-43-055.png, image-2021-01-29-18-56-08-148.png > > > cluster SELECT LAST throw exception in some timeseries > > ie. > insert into root.临时数据.可删.boolen(timestamp,66) values(111,444) > SELECT last 66 FROM root.临时数据.可删.boolen ; > !image-2021-01-29-17-53-17-966.png! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-1139) cluster - SELECT LAST null
Haimei Guo created IOTDB-1139: - Summary: cluster - SELECT LAST null Key: IOTDB-1139 URL: https://issues.apache.org/jira/browse/IOTDB-1139 Project: Apache IoTDB Issue Type: Bug Reporter: Haimei Guo Attachments: image-2021-01-29-17-53-17-966.png cluster SELECT LAST should haveresult SELECT last 66 FROM root.临时数据.可删.boolen ; !image-2021-01-29-17-53-17-966.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-1138) Compaction encountered some errors
Houliang Qi created IOTDB-1138: -- Summary: Compaction encountered some errors Key: IOTDB-1138 URL: https://issues.apache.org/jira/browse/IOTDB-1138 Project: Apache IoTDB Issue Type: Bug Components: Core/Engine Reporter: Houliang Qi Attachments: image-2021-01-29-17-23-53-280.png When using iotdb-benchmark to write continuously, some merge errors will be reported in the log as follows: !image-2021-01-29-17-23-53-280.png! besides, there are some bugs that are introduced by merge, for example: [https://github.com/apache/iotdb/issues/2549] [https://github.com/apache/iotdb/issues/2545] [https://github.com/apache/iotdb/pulls?q=is%3Apr+merge+is%3Aclosed] We are very happy to see the introduction of merge in version 0.11, but we are also a little worried about whether merge can really be applied in the production environment? Whether it is necessary to add a lot of tests specifically for merge? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-1137) MNode.getLeafCount error when existing sub-device
Zesong Sun created IOTDB-1137: - Summary: MNode.getLeafCount error when existing sub-device Key: IOTDB-1137 URL: https://issues.apache.org/jira/browse/IOTDB-1137 Project: Apache IoTDB Issue Type: Bug Reporter: Zesong Sun Assignee: Zesong Sun Fix For: 0.12.0 When existing sub-device, getLeafCount() may get wrong result. For example: We have 3 timeseries: root.a.b.c, root.a.b.c.d, root.a.b.c.d.e The "leaf" count of root.a should be 3: c, d and e, which represents the number of timeseries. Besides, the function name is also ambiguous, which should be modified to getMeasurementMNodeCount. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [iotdb-client-go] manlge closed pull request #12: Integration test
manlge closed pull request #12: URL: https://github.com/apache/iotdb-client-go/pull/12 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org