Re: Support SDT compression

2020-10-02 Thread Julian Feinauer
Thanks!

I think it cold be a very cool index. Due to linearity it would be pretty easy 
to translate  regular queries to queries on the sdt signal which had way lower 
data points.

Or in some sense this is a visualisation optimized version of the time series.

Really cool feature!

Julian

Holen Sie sich Outlook für Android


From: Jialin Qiao 
Sent: Saturday, October 3, 2020 5:04:20 AM
To: dev@iotdb.apache.org 
Subject: Re: Support SDT compression

Hi,

Yes, it's lossy. Users need to config the tolerant error.

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -原始邮件-
> 发件人: "Julian Feinauer" 
> 发送时间: 2020-10-03 02:09:44 (星期六)
> 收件人: "dev@iotdb.apache.org" 
> 抄送:
> 主题: Re: Support SDT compression
>
> Hi,
>
> I read the document (which is excellent, good work!) and it sounds very 
> interesting but as far as I understand the algorithm its lossy.
> Is this true?
> Or do I miss something?
>
> Thanks!
> Julian
>
> Am 29.09.20, 12:04 schrieb "Jialin Qiao" :
>
> Hi,
>
> Good summary~
>
> > Page header needs to maintain a Map
>
> It's better to keep the structure of PageHeader the same as 0.10.
> You can store this information in the PageData.
>
> > Encoder, Decoder will take both a data column and a series.
>
> Try to provide interface with primitive data type. So a series should be 
> put by encodeFloatPoint(long time, float value)
>
> Besides, there are some more details need to consider:
>
> - How to store the endpoint of each segment
> - In the SDT with timestamp encoding, how to store the timestamps? (Maybe 
> just using TS2_DIFF is fine)
>
> This is not a small change to IoTDB...
>
> Thanks,
> --
> Jialin Qiao
> School of Software, Tsinghua University
>
> 乔嘉林
> 清华大学 软件学院
>
> > -原始邮件-
> > 发件人: "Haimei Guo" 
> > 发送时间: 2020-09-29 15:19:49 (星期二)
> > 收件人: dev@iotdb.apache.org
> > 抄送:
> > 主题: Re: Support SDT compression
> >
> > Hi,
> >
> > Following is a summary of SDT's encoding and decoding implementation in
> > IoTDB.
> >
> >-
> >
> >SDT is mainly to calculate the up and down slopes of the data to the
> >segment starting point. If it is within the compression deviation CD 
> range,
> >discard the data. If it exceeds the CD, the original data is stored
> >-
> >
> >In IoTDB, the SDT can act as a new Encoding method. It works inside 
> each
> >Page (PageWriter and PageReader).
> >-
> >
> >Will support with and without timestamp encoding.
> >-
> >
> >For without timestamps encoding, we will record the count of data 
> points
> >in each segment in the page header. Page header needs to maintain a
> >Map
> >
> > Encoder will be changed to encode(long time, long value)
> > Data buffer will be stored in each Encoder
> > Decoder will be changed to getTime(), getXXValue()
> > Encoder, Decoder will take both a data column and a series.
> >
> >
> > If you have any question or comment, you are more than welcome to reply!
> >
> >
> > Thank you,
> >
> > Haimei
> >
> >
> > On Mon, Sep 28, 2020 at 1:20 PM Jialin Qiao 
> 
> > wrote:
> >
> > > Hi Haimei,
> > >
> > > Good work! This doc is comprehensive :)
> > >
> > > As for the implementation in IoTDB, here are some points:
> > >
> > > (1) First, SDT could act as a new Encoding method in IoTDB. It works
> > > inside each Page (PageWriter and PageReader).
> > > (2) The interface of Encoder could be changed to encode(long time, XX
> > > value). The interface of Decoder could be change to getTime(),
> > > getXXValue(). Which is, the encoder and decoder is not only 
> responsible for
> > > one data column but a series. This involves some reconstruction of the
> > > Encoder and Decoder, the data buffers should be stored inside each 
> encoder.
> > > (3) For the SDT without timestamps, we need to record the count of 
> each
> > > segment.
> > > (4) We could offer two encodings, SDT with timestamps and SDT without
> > > timestamps.
> > >
> > > Thanks,
> > > --
> > > Jialin Qiao
> > > School of Software, Tsinghua University
> > >
> > > 乔嘉林
> > > 清华大学 软件学院
> > >
> > > > -原始邮件-
> > > > 发件人: "runhus...@foxmail.com" 
> > > > 发送时间: 2020-09-25 11:56:16 (星期五)
> > > > 收件人: dev 
> > > > 抄送:
> > > > 主题: Re: Support SDT compression
> > > >
> > > > Great work!
> > > >
> > > >
> > > >
> > > > Thanks.
> > > >
> > > > Chao Wang
> > > > BONC Ltd
> > > >
> > > >
> > > > From: Eileen Guo
> > > > Date: 2020-09-25 11:47
> > > > To: dev
> > > > Subject: Support SDT compression
> > > > Hi all,
> > > >

Re: Support SDT compression

2020-10-02 Thread Jialin Qiao
Hi,

Yes, it's lossy. Users need to config the tolerant error.

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -原始邮件-
> 发件人: "Julian Feinauer" 
> 发送时间: 2020-10-03 02:09:44 (星期六)
> 收件人: "dev@iotdb.apache.org" 
> 抄送: 
> 主题: Re: Support SDT compression
> 
> Hi,
> 
> I read the document (which is excellent, good work!) and it sounds very 
> interesting but as far as I understand the algorithm its lossy.
> Is this true?
> Or do I miss something?
> 
> Thanks!
> Julian
> 
> Am 29.09.20, 12:04 schrieb "Jialin Qiao" :
> 
> Hi,
> 
> Good summary~
> 
> > Page header needs to maintain a Map
> 
> It's better to keep the structure of PageHeader the same as 0.10.
> You can store this information in the PageData. 
> 
> > Encoder, Decoder will take both a data column and a series.
> 
> Try to provide interface with primitive data type. So a series should be 
> put by encodeFloatPoint(long time, float value)
> 
> Besides, there are some more details need to consider: 
> 
> - How to store the endpoint of each segment
> - In the SDT with timestamp encoding, how to store the timestamps? (Maybe 
> just using TS2_DIFF is fine)
> 
> This is not a small change to IoTDB...
> 
> Thanks,
> --
> Jialin Qiao
> School of Software, Tsinghua University
> 
> 乔嘉林
> 清华大学 软件学院
> 
> > -原始邮件-
> > 发件人: "Haimei Guo" 
> > 发送时间: 2020-09-29 15:19:49 (星期二)
> > 收件人: dev@iotdb.apache.org
> > 抄送: 
> > 主题: Re: Support SDT compression
> > 
> > Hi,
> > 
> > Following is a summary of SDT's encoding and decoding implementation in
> > IoTDB.
> > 
> >-
> > 
> >SDT is mainly to calculate the up and down slopes of the data to the
> >segment starting point. If it is within the compression deviation CD 
> range,
> >discard the data. If it exceeds the CD, the original data is stored
> >-
> > 
> >In IoTDB, the SDT can act as a new Encoding method. It works inside 
> each
> >Page (PageWriter and PageReader).
> >-
> > 
> >Will support with and without timestamp encoding.
> >-
> > 
> >For without timestamps encoding, we will record the count of data 
> points
> >in each segment in the page header. Page header needs to maintain a
> >Map
> > 
> > Encoder will be changed to encode(long time, long value)
> > Data buffer will be stored in each Encoder
> > Decoder will be changed to getTime(), getXXValue()
> > Encoder, Decoder will take both a data column and a series.
> > 
> > 
> > If you have any question or comment, you are more than welcome to reply!
> > 
> > 
> > Thank you,
> > 
> > Haimei
> > 
> > 
> > On Mon, Sep 28, 2020 at 1:20 PM Jialin Qiao 
> 
> > wrote:
> > 
> > > Hi Haimei,
> > >
> > > Good work! This doc is comprehensive :)
> > >
> > > As for the implementation in IoTDB, here are some points:
> > >
> > > (1) First, SDT could act as a new Encoding method in IoTDB. It works
> > > inside each Page (PageWriter and PageReader).
> > > (2) The interface of Encoder could be changed to encode(long time, XX
> > > value). The interface of Decoder could be change to getTime(),
> > > getXXValue(). Which is, the encoder and decoder is not only 
> responsible for
> > > one data column but a series. This involves some reconstruction of the
> > > Encoder and Decoder, the data buffers should be stored inside each 
> encoder.
> > > (3) For the SDT without timestamps, we need to record the count of 
> each
> > > segment.
> > > (4) We could offer two encodings, SDT with timestamps and SDT without
> > > timestamps.
> > >
> > > Thanks,
> > > --
> > > Jialin Qiao
> > > School of Software, Tsinghua University
> > >
> > > 乔嘉林
> > > 清华大学 软件学院
> > >
> > > > -原始邮件-
> > > > 发件人: "runhus...@foxmail.com" 
> > > > 发送时间: 2020-09-25 11:56:16 (星期五)
> > > > 收件人: dev 
> > > > 抄送:
> > > > 主题: Re: Support SDT compression
> > > >
> > > > Great work!
> > > >
> > > >
> > > >
> > > > Thanks.
> > > >
> > > > Chao Wang
> > > > BONC Ltd
> > > >
> > > >
> > > > From: Eileen Guo
> > > > Date: 2020-09-25 11:47
> > > > To: dev
> > > > Subject: Support SDT compression
> > > > Hi all,
> > > >
> > > > I've completed a design draft for supporting swinging door 
> compression.
> > > >
> > > > Jira: jira SDT link
> > > > <
> > > 
> https://issues.apache.org/jira/browse/IOTDB-890?filter=-4=assignee%20in%20(haimeiguo)%20order%20by%20created%20DESC
> > > >
> > > > design doc: SDT design doc link
> > > > <
> > > 
> https://docs.google.com/document/d/1VeTwVsm4CkQSVR65bWw9pKg6gRdiDYUz0lBBhWXHl5A/edit?usp=sharing
> > > >
> > > 

Re: Support SDT compression

2020-10-02 Thread Julian Feinauer
Hi,

I read the document (which is excellent, good work!) and it sounds very 
interesting but as far as I understand the algorithm its lossy.
Is this true?
Or do I miss something?

Thanks!
Julian

Am 29.09.20, 12:04 schrieb "Jialin Qiao" :

Hi,

Good summary~

> Page header needs to maintain a Map

It's better to keep the structure of PageHeader the same as 0.10.
You can store this information in the PageData. 

> Encoder, Decoder will take both a data column and a series.

Try to provide interface with primitive data type. So a series should be 
put by encodeFloatPoint(long time, float value)

Besides, there are some more details need to consider: 

- How to store the endpoint of each segment
- In the SDT with timestamp encoding, how to store the timestamps? (Maybe 
just using TS2_DIFF is fine)

This is not a small change to IoTDB...

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -原始邮件-
> 发件人: "Haimei Guo" 
> 发送时间: 2020-09-29 15:19:49 (星期二)
> 收件人: dev@iotdb.apache.org
> 抄送: 
> 主题: Re: Support SDT compression
> 
> Hi,
> 
> Following is a summary of SDT's encoding and decoding implementation in
> IoTDB.
> 
>-
> 
>SDT is mainly to calculate the up and down slopes of the data to the
>segment starting point. If it is within the compression deviation CD 
range,
>discard the data. If it exceeds the CD, the original data is stored
>-
> 
>In IoTDB, the SDT can act as a new Encoding method. It works inside 
each
>Page (PageWriter and PageReader).
>-
> 
>Will support with and without timestamp encoding.
>-
> 
>For without timestamps encoding, we will record the count of data 
points
>in each segment in the page header. Page header needs to maintain a
>Map
> 
> Encoder will be changed to encode(long time, long value)
> Data buffer will be stored in each Encoder
> Decoder will be changed to getTime(), getXXValue()
> Encoder, Decoder will take both a data column and a series.
> 
> 
> If you have any question or comment, you are more than welcome to reply!
> 
> 
> Thank you,
> 
> Haimei
> 
> 
> On Mon, Sep 28, 2020 at 1:20 PM Jialin Qiao 
> wrote:
> 
> > Hi Haimei,
> >
> > Good work! This doc is comprehensive :)
> >
> > As for the implementation in IoTDB, here are some points:
> >
> > (1) First, SDT could act as a new Encoding method in IoTDB. It works
> > inside each Page (PageWriter and PageReader).
> > (2) The interface of Encoder could be changed to encode(long time, XX
> > value). The interface of Decoder could be change to getTime(),
> > getXXValue(). Which is, the encoder and decoder is not only responsible 
for
> > one data column but a series. This involves some reconstruction of the
> > Encoder and Decoder, the data buffers should be stored inside each 
encoder.
> > (3) For the SDT without timestamps, we need to record the count of each
> > segment.
> > (4) We could offer two encodings, SDT with timestamps and SDT without
> > timestamps.
> >
> > Thanks,
> > --
> > Jialin Qiao
> > School of Software, Tsinghua University
> >
> > 乔嘉林
> > 清华大学 软件学院
> >
> > > -原始邮件-
> > > 发件人: "runhus...@foxmail.com" 
> > > 发送时间: 2020-09-25 11:56:16 (星期五)
> > > 收件人: dev 
> > > 抄送:
> > > 主题: Re: Support SDT compression
> > >
> > > Great work!
> > >
> > >
> > >
> > > Thanks.
> > >
> > > Chao Wang
> > > BONC Ltd
> > >
> > >
> > > From: Eileen Guo
> > > Date: 2020-09-25 11:47
> > > To: dev
> > > Subject: Support SDT compression
> > > Hi all,
> > >
> > > I've completed a design draft for supporting swinging door 
compression.
> > >
> > > Jira: jira SDT link
> > > <
> > 
https://issues.apache.org/jira/browse/IOTDB-890?filter=-4=assignee%20in%20(haimeiguo)%20order%20by%20created%20DESC
> > >
> > > design doc: SDT design doc link
> > > <
> > 
https://docs.google.com/document/d/1VeTwVsm4CkQSVR65bWw9pKg6gRdiDYUz0lBBhWXHl5A/edit?usp=sharing
> > >
> > >
> > > The doc explains SDT algorithm, compression and decompression process,
> > > performance tests and SDT + IoTDB implementation and usage.
> > >
> > > There is still some question about where to use this algorithm. If you
> > have
> > > any idea, welcome to comment.
> > >
> > > Thank you!
> > > Haimei Guo
> >