[jira] [Commented] (IOTDB-1140) optimize regular data encoding

2021-01-29 Thread Chao Wang (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275523#comment-17275523
 ] 

Chao Wang commented on IOTDB-1140:
--

Thanks, I know your mean..

I have some idea,we can think about it.  
 # Revise to correct data, which better reflects the benefits of this algorithm.
 # This is what users want. They occasionally insert an incorrect line of data, 
which is not their intention, and the algorithm can give a corrent value, not 
delete it..
 # when page write , The time-column is encoded first, and then the 
value-column is encoded. When find the value is incorrect, it is difficult to 
process remove timestamp. 

> optimize regular data encoding
> --
>
> Key: IOTDB-1140
> URL: https://issues.apache.org/jira/browse/IOTDB-1140
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: Core/Engine
>Reporter: Chao Wang
>Assignee: Chao Wang
>Priority: Critical
>
> current regular data encoding algorithm:
>  # Calculate the difference between two adjacent values. The smallest 
> difference is used as the equal-frequency frequency.
>  # Determine the data range of this batch of data based on the difference 
> between the last value and the first value.
>  # Traverse this batch of data, use a BitSet, compare the difference between 
> two adjacent values with the same frequency, and save the value true by 
> default,
>  If the value is not equal to the equal frequency, calculate the number of 
> equal frequency differences and set the value to false at the corresponding 
> position, indicating that the point is a missing point.
>  
> this algorithm only can identity missing point,  if have error point , it 
> will throw exception..
> because BitSet only can do this thing,  indicates whether the same frequency 
> exists in a segment of data
>  
> But there is some optimize point..
> If there is an abnormal value in a column of values, the algorithm is 
> deviated if the difference is directly obtained to the minimum value.
> sample: 1000,1100,1800,1400,1500... 
> current algorithm be do not use...
> 1800 is a error point,  we should identity error point,  revise data. 
> revise data should be : 1000,1100,1300,1400,1500



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-1140) optimize regular data encoding

2021-01-29 Thread Jialin Qiao (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275518#comment-17275518
 ] 

Jialin Qiao commented on IOTDB-1140:


In this example:

(1, 1000) ,(2, 1100),(3, 1800)(4, 1400),(5,1500)

 

Time column is (1, 2, 3, 4, 5)

value column is (1000, 1100, 1800, 1400, 1500)

 

Two columns are encoded independently. If you eliminate the value 1800, 
timestamp 3 needs to removed too.

(1, 1000) ,(2, 1100), (4, 1400),(5,1500)

 

> optimize regular data encoding
> --
>
> Key: IOTDB-1140
> URL: https://issues.apache.org/jira/browse/IOTDB-1140
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: Core/Engine
>Reporter: Chao Wang
>Assignee: Chao Wang
>Priority: Critical
>
> current regular data encoding algorithm:
>  # Calculate the difference between two adjacent values. The smallest 
> difference is used as the equal-frequency frequency.
>  # Determine the data range of this batch of data based on the difference 
> between the last value and the first value.
>  # Traverse this batch of data, use a BitSet, compare the difference between 
> two adjacent values with the same frequency, and save the value true by 
> default,
>  If the value is not equal to the equal frequency, calculate the number of 
> equal frequency differences and set the value to false at the corresponding 
> position, indicating that the point is a missing point.
>  
> this algorithm only can identity missing point,  if have error point , it 
> will throw exception..
> because BitSet only can do this thing,  indicates whether the same frequency 
> exists in a segment of data
>  
> But there is some optimize point..
> If there is an abnormal value in a column of values, the algorithm is 
> deviated if the difference is directly obtained to the minimum value.
> sample: 1000,1100,1800,1400,1500... 
> current algorithm be do not use...
> 1800 is a error point,  we should identity error point,  revise data. 
> revise data should be : 1000,1100,1300,1400,1500



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-1140) optimize regular data encoding

2021-01-29 Thread Chao Wang (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275517#comment-17275517
 ] 

Chao Wang commented on IOTDB-1140:
--

"should we abort the timestamp at the same time?"

sorry, I do not know your mean...  could you explain? Thanks...

> optimize regular data encoding
> --
>
> Key: IOTDB-1140
> URL: https://issues.apache.org/jira/browse/IOTDB-1140
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: Core/Engine
>Reporter: Chao Wang
>Assignee: Chao Wang
>Priority: Critical
>
> current regular data encoding algorithm:
>  # Calculate the difference between two adjacent values. The smallest 
> difference is used as the equal-frequency frequency.
>  # Determine the data range of this batch of data based on the difference 
> between the last value and the first value.
>  # Traverse this batch of data, use a BitSet, compare the difference between 
> two adjacent values with the same frequency, and save the value true by 
> default,
>  If the value is not equal to the equal frequency, calculate the number of 
> equal frequency differences and set the value to false at the corresponding 
> position, indicating that the point is a missing point.
>  
> this algorithm only can identity missing point,  if have error point , it 
> will throw exception..
> because BitSet only can do this thing,  indicates whether the same frequency 
> exists in a segment of data
>  
> But there is some optimize point..
> If there is an abnormal value in a column of values, the algorithm is 
> deviated if the difference is directly obtained to the minimum value.
> sample: 1000,1100,1800,1400,1500... 
> current algorithm be do not use...
> 1800 is a error point,  we should identity error point,  revise data. 
> revise data should be : 1000,1100,1300,1400,1500



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-1140) optimize regular data encoding

2021-01-29 Thread Jialin Qiao (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275515#comment-17275515
 ] 

Jialin Qiao commented on IOTDB-1140:


Oh, the regular method is designed to encode the time column. Maybe we need to 
restrict the usage of this encoding to timestamp. 

Eliminate the wrong value is ok, should we abort the timestamp at the same time?

> optimize regular data encoding
> --
>
> Key: IOTDB-1140
> URL: https://issues.apache.org/jira/browse/IOTDB-1140
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: Core/Engine
>Reporter: Chao Wang
>Assignee: Chao Wang
>Priority: Critical
>
> current regular data encoding algorithm:
>  # Calculate the difference between two adjacent values. The smallest 
> difference is used as the equal-frequency frequency.
>  # Determine the data range of this batch of data based on the difference 
> between the last value and the first value.
>  # Traverse this batch of data, use a BitSet, compare the difference between 
> two adjacent values with the same frequency, and save the value true by 
> default,
>  If the value is not equal to the equal frequency, calculate the number of 
> equal frequency differences and set the value to false at the corresponding 
> position, indicating that the point is a missing point.
>  
> this algorithm only can identity missing point,  if have error point , it 
> will throw exception..
> because BitSet only can do this thing,  indicates whether the same frequency 
> exists in a segment of data
>  
> But there is some optimize point..
> If there is an abnormal value in a column of values, the algorithm is 
> deviated if the difference is directly obtained to the minimum value.
> sample: 1000,1100,1800,1400,1500... 
> current algorithm be do not use...
> 1800 is a error point,  we should identity error point,  revise data. 
> revise data should be : 1000,1100,1300,1400,1500



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-1140) optimize regular data encoding

2021-01-29 Thread Chao Wang (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275512#comment-17275512
 ] 

Chao Wang commented on IOTDB-1140:
--

My idea is to consider that in normal regular encoding, there are occasional 
mutations of data or two, and we can eliminate that data. this is suite for 
user.. 

> optimize regular data encoding
> --
>
> Key: IOTDB-1140
> URL: https://issues.apache.org/jira/browse/IOTDB-1140
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: Core/Engine
>Reporter: Chao Wang
>Assignee: Chao Wang
>Priority: Critical
>
> current regular data encoding algorithm:
>  # Calculate the difference between two adjacent values. The smallest 
> difference is used as the equal-frequency frequency.
>  # Determine the data range of this batch of data based on the difference 
> between the last value and the first value.
>  # Traverse this batch of data, use a BitSet, compare the difference between 
> two adjacent values with the same frequency, and save the value true by 
> default,
>  If the value is not equal to the equal frequency, calculate the number of 
> equal frequency differences and set the value to false at the corresponding 
> position, indicating that the point is a missing point.
>  
> this algorithm only can identity missing point,  if have error point , it 
> will throw exception..
> because BitSet only can do this thing,  indicates whether the same frequency 
> exists in a segment of data
>  
> But there is some optimize point..
> If there is an abnormal value in a column of values, the algorithm is 
> deviated if the difference is directly obtained to the minimum value.
> sample: 1000,1100,1800,1400,1500... 
> current algorithm be do not use...
> 1800 is a error point,  we should identity error point,  revise data. 
> revise data should be : 1000,1100,1300,1400,1500



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-1140) optimize regular data encoding

2021-01-29 Thread Chao Wang (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275509#comment-17275509
 ] 

Chao Wang commented on IOTDB-1140:
--

thanks [~qiaojialin] , 

 value is not sort..

sample:

create timeseries root.db_0.tab0.salary with datatype=INT64,encoding=REGULAR ;

(time, salary) == > (1, 1000) ,(2, 1100),(3, 1800)(4, 1400),(5,1500)

only time series will be sorted by time,  not salary. 

 

> optimize regular data encoding
> --
>
> Key: IOTDB-1140
> URL: https://issues.apache.org/jira/browse/IOTDB-1140
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: Core/Engine
>Reporter: Chao Wang
>Assignee: Chao Wang
>Priority: Critical
>
> current regular data encoding algorithm:
>  # Calculate the difference between two adjacent values. The smallest 
> difference is used as the equal-frequency frequency.
>  # Determine the data range of this batch of data based on the difference 
> between the last value and the first value.
>  # Traverse this batch of data, use a BitSet, compare the difference between 
> two adjacent values with the same frequency, and save the value true by 
> default,
>  If the value is not equal to the equal frequency, calculate the number of 
> equal frequency differences and set the value to false at the corresponding 
> position, indicating that the point is a missing point.
>  
> this algorithm only can identity missing point,  if have error point , it 
> will throw exception..
> because BitSet only can do this thing,  indicates whether the same frequency 
> exists in a segment of data
>  
> But there is some optimize point..
> If there is an abnormal value in a column of values, the algorithm is 
> deviated if the difference is directly obtained to the minimum value.
> sample: 1000,1100,1800,1400,1500... 
> current algorithm be do not use...
> 1800 is a error point,  we should identity error point,  revise data. 
> revise data should be : 1000,1100,1300,1400,1500



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-1140) optimize regular data encoding

2021-01-29 Thread Jialin Qiao (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275501#comment-17275501
 ] 

Jialin Qiao commented on IOTDB-1140:


Hi, in IoTDB, a sequence of time series will be sorted, and then encoding.

So, out-of-order will not happen, it will be 1000,1100,1400,1500,1800。

 

Instead of this case, more complicated is: 1000, 1201, 1299, 1303

Data is not in fixed frequency.

 

Maybe we could check the data, and using TS2_DIFF if it is not in fixed 
frequency? 

 

> optimize regular data encoding
> --
>
> Key: IOTDB-1140
> URL: https://issues.apache.org/jira/browse/IOTDB-1140
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: Core/Engine
>Reporter: Chao Wang
>Assignee: Chao Wang
>Priority: Critical
>
> current regular data encoding algorithm:
>  # Calculate the difference between two adjacent values. The smallest 
> difference is used as the equal-frequency frequency.
>  # Determine the data range of this batch of data based on the difference 
> between the last value and the first value.
>  # Traverse this batch of data, use a BitSet, compare the difference between 
> two adjacent values with the same frequency, and save the value true by 
> default,
>  If the value is not equal to the equal frequency, calculate the number of 
> equal frequency differences and set the value to false at the corresponding 
> position, indicating that the point is a missing point.
>  
> this algorithm only can identity missing point,  if have error point , it 
> will throw exception..
> because BitSet only can do this thing,  indicates whether the same frequency 
> exists in a segment of data
>  
> But there is some optimize point..
> If there is an abnormal value in a column of values, the algorithm is 
> deviated if the difference is directly obtained to the minimum value.
> sample: 1000,1100,1800,1400,1500... 
> current algorithm be do not use...
> 1800 is a error point,  we should identity error point,  revise data. 
> revise data should be : 1000,1100,1300,1400,1500



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-1140) optimize regular data encoding

2021-01-29 Thread Chao Wang (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275462#comment-17275462
 ] 

Chao Wang commented on IOTDB-1140:
--

[~jixuan1989] [~qiaojialin] hi , we  can discuss this issue...

> optimize regular data encoding
> --
>
> Key: IOTDB-1140
> URL: https://issues.apache.org/jira/browse/IOTDB-1140
> Project: Apache IoTDB
>  Issue Type: Improvement
>  Components: Core/Engine
>Reporter: Chao Wang
>Assignee: Chao Wang
>Priority: Critical
>
> current regular data encoding algorithm:
>  # Calculate the difference between two adjacent values. The smallest 
> difference is used as the equal-frequency frequency.
>  # Determine the data range of this batch of data based on the difference 
> between the last value and the first value.
>  # Traverse this batch of data, use a BitSet, compare the difference between 
> two adjacent values with the same frequency, and save the value true by 
> default,
> If the value is not equal to the equal frequency, calculate the number of 
> equal frequency differences and set the value to false at the corresponding 
> position, indicating that the point is a missing point.
>  
> this algorithm only can identity missing point,  if have error point , it can 
> not store..
> because BitSet only can do this thing,  indicates whether the same frequency 
> exists in a segment of data
>  
> But there is some optimize point..
> If there is an abnormal value in a column of values, the algorithm is 
> deviated if the difference is directly obtained to the minimum value.
> sample: 1000,1100,1800,1400,1500... 
> current algorithm be do not use...
> 1800 is a error point,  we should identity error point,  revise data. 
> revise data should be : 1000,1100,1300,1400,1500



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-1140) optimize regular data encoding

2021-01-29 Thread Chao Wang (Jira)
Chao Wang created IOTDB-1140:


 Summary: optimize regular data encoding
 Key: IOTDB-1140
 URL: https://issues.apache.org/jira/browse/IOTDB-1140
 Project: Apache IoTDB
  Issue Type: Improvement
  Components: Core/Engine
Reporter: Chao Wang
Assignee: Chao Wang


current regular data encoding algorithm:
 # Calculate the difference between two adjacent values. The smallest 
difference is used as the equal-frequency frequency.
 # Determine the data range of this batch of data based on the difference 
between the last value and the first value.
 # Traverse this batch of data, use a BitSet, compare the difference between 
two adjacent values with the same frequency, and save the value true by default,
If the value is not equal to the equal frequency, calculate the number of equal 
frequency differences and set the value to false at the corresponding position, 
indicating that the point is a missing point.

 

this algorithm only can identity missing point,  if have error point , it can 
not store..

because BitSet only can do this thing,  indicates whether the same frequency 
exists in a segment of data

 

But there is some optimize point..

If there is an abnormal value in a column of values, the algorithm is deviated 
if the difference is directly obtained to the minimum value.

sample: 1000,1100,1800,1400,1500... 

current algorithm be do not use...

1800 is a error point,  we should identity error point,  revise data. 

revise data should be : 1000,1100,1300,1400,1500



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-1139) cluster - SELECT LAST null

2021-01-29 Thread Houliang Qi (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274324#comment-17274324
 ] 

Houliang Qi commented on IOTDB-1139:


!image-2021-01-29-18-56-08-148.png!

 

I tried locally, however, it's ok

 

sorry, I can not

> cluster - SELECT LAST null
> --
>
> Key: IOTDB-1139
> URL: https://issues.apache.org/jira/browse/IOTDB-1139
> Project: Apache IoTDB
>  Issue Type: Bug
>Reporter: Haimei Guo
>Priority: Major
> Attachments: image-2021-01-29-17-53-17-966.png, 
> image-2021-01-29-18-53-43-055.png, image-2021-01-29-18-56-08-148.png
>
>
> cluster SELECT LAST throw exception in some timeseries 
>  
> ie.
> insert into root.临时数据.可删.boolen(timestamp,66) values(111,444)
> SELECT last 66 FROM root.临时数据.可删.boolen ;
> !image-2021-01-29-17-53-17-966.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-1139) cluster - SELECT LAST null

2021-01-29 Thread Haimei Guo (Jira)
Haimei Guo created IOTDB-1139:
-

 Summary: cluster - SELECT LAST null
 Key: IOTDB-1139
 URL: https://issues.apache.org/jira/browse/IOTDB-1139
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Haimei Guo
 Attachments: image-2021-01-29-17-53-17-966.png

cluster SELECT LAST should haveresult

SELECT last 66 FROM root.临时数据.可删.boolen ;

!image-2021-01-29-17-53-17-966.png!

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-1138) Compaction encountered some errors

2021-01-29 Thread Houliang Qi (Jira)
Houliang Qi created IOTDB-1138:
--

 Summary: Compaction encountered some errors
 Key: IOTDB-1138
 URL: https://issues.apache.org/jira/browse/IOTDB-1138
 Project: Apache IoTDB
  Issue Type: Bug
  Components: Core/Engine
Reporter: Houliang Qi
 Attachments: image-2021-01-29-17-23-53-280.png

When using iotdb-benchmark to write continuously, some merge errors will be 
reported in the log as follows:

!image-2021-01-29-17-23-53-280.png!

besides, there are some bugs that are introduced by merge, for example:

 

[https://github.com/apache/iotdb/issues/2549]

[https://github.com/apache/iotdb/issues/2545]

[https://github.com/apache/iotdb/pulls?q=is%3Apr+merge+is%3Aclosed]

 

We are very happy to see the introduction of merge in version 0.11, but we are 
also a little worried about whether merge can really be applied in the 
production environment?

Whether it is necessary to add a lot of tests specifically for merge?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-1137) MNode.getLeafCount error when existing sub-device

2021-01-29 Thread Zesong Sun (Jira)
Zesong Sun created IOTDB-1137:
-

 Summary: MNode.getLeafCount error when existing sub-device
 Key: IOTDB-1137
 URL: https://issues.apache.org/jira/browse/IOTDB-1137
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Zesong Sun
Assignee: Zesong Sun
 Fix For: 0.12.0


When existing sub-device, getLeafCount() may get wrong result.
For example:
We have 3 timeseries: root.a.b.c, root.a.b.c.d, root.a.b.c.d.e
The "leaf" count of root.a should be 3: c, d and e, which represents the number 
of timeseries.

Besides, the function name is also ambiguous, which should be modified to 
getMeasurementMNodeCount.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [iotdb-client-go] manlge closed pull request #12: Integration test

2021-01-29 Thread GitBox


manlge closed pull request #12:
URL: https://github.com/apache/iotdb-client-go/pull/12


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org