Re: Var-Length-Numeric encoding?

2022-06-19 Thread Yuan Tian
Hi Chris,

You're right. IoTDB does use ZigZag encoding for variable length
signed integers. And as you said, if the number is negative, it is
x-ored with all bits set to 1, so this is identical to flipping the
bits.

Shifting left the negative int32 value by one and then flipping all
tthe bits is just what ZigZag encoding want.


Best,
---
Yuan Tian

On Fri, Jun 17, 2022 at 6:55 PM Christofer Dutz
 wrote:
>
> Hi Xiangdong,
>
> I doubt you invented a new encoding form. So, in general, I was asking which 
> form this actually is.
> Juilian already pointed out that bit of code.
>
> So, as I can see it, the sign information is in the least significant bit. 
> This would usually be an indicator for ZigZag encoding. The only part I don’t 
> quite understand, is the bit-flipping in case of negative values. In case of 
> ZigZag encoding, the value would be shift left by one and the last bit would 
> be set as the new first bit (So effectively the last bit would just be 
> rotated to become the first). In IoTDB it seems as if the left-shifted value 
> is inverted. Don’t quite understand why that is happening. I could imagine 
> that for small negative integers (small as in “close to 0”) the 2s complement 
> notation has many 1s, therefore it would consume a lot of memory in 
> serialized form. So, flipping the entire number would get rid of these 1s and 
> hence reduce the size of the serialized form.
>
> But going though this document again: 
> https://golb.hplar.ch/2019/06/variable-length-int-java.html
>
> If the number is negative, it is x-ored with all bits set to 1 … so this is 
> identical to flipping the bits … this is actually really cool and efficient.
>
> So, I would like to confirm that IoTDB uses ZigZag encoding for variable 
> length signed integers. Possibly a comment to the utils class to which 
> encoding is actually used, would be a great addition. I’ll probably add one 
> asap.
>
> Chris
>
>
>
>
> From: Xiangdong Huang 
> Sent: Freitag, 17. Juni 2022 09:33
> To: dev ; Yuan Tian 
> Subject: Re: Var-Length-Numeric encoding?
>
> Hi,
>
> I think the encoding implementation is in 
> src/main/java/org/apache/iotdb/tsfile/utils/ReadWriteForEncodingUtils.java
> @Yuan Tian<mailto:jackiet...@apache.org>  implemented it.
>
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Julian Feinauer 
> mailto:j.feina...@pragmaticminds.de>> 
> 于2022年6月13日周一 17:47写道:
> Hi,
>
> I can only comment on floating points: we dont.
> Currently we also only have var-length encoding vor u32 (not for u64).
>
> Regarding ZigZag Encoding perhaps anybody else can jump in here?
>
> Julian
>
> Julian Feinauer
> Geschäftsführer/CEO
>
> j.feina...@pragmaticminds.de<mailto:%7BE-mail%7D>
> +49 (0) 7021 87868-01 |
> Jesinger Str. 57, 73230 Kirchheim unter Teck
> www.pragmaticindustries.de<https://pragmaticindustries.com/>
>
> [cid:1817091c10b45ac8cae1]   [cid:1817091c10b6373642a2] 
> <https://www.linkedin.com/company/pragmatic-industries-gmbh/>  
> [cid:1817091c10b5017b7993] <https://twitter.com/pragmaticindus1>  
> [cid:1817091c10b32bee5404] 
> <https://www.facebook.com/Pragmatic-industries-GmbH-102791535422112>  
> [cid:1817091c10b8dea4c1d5] <https://www.instagram.com/pragmaticindustries/>
> Pflichtinformationen<https://pragmaticindustries.com/datenschutzerklaerung/>  
> gemäß Artikel 13 DSGVO
> Von: Christofer Dutz 
> mailto:christofer.d...@c-ware.de>>
> Datum: Montag, 13. Juni 2022 um 09:50
> An: dev@iotdb.apache.org<mailto:dev@iotdb.apache.org> 
> mailto:dev@iotdb.apache.org>>
> Betreff: Var-Length-Numeric encoding?
> Hi all,
>
> Just out of curiosity. Julian told me TSFiles make use of variable length 
> encoding of numeric types.
> I would expect the encoding for unsigned integers to be the "ordinary" one 
> where 7 bits of a byte are being used for encoding the numeric value and new 
> bytes are added as long as the first bit is 1.
> However, I would be interested in which encoding is being used for unsigned 
> integers? Julian posted a reply in the #iotdb slack channel, but I'm unsure 
> which official encoding type this is.
> It most likely looks like ZigZag Encoding, but I'm a bit unsure if it really 
> is.
> Could anyone here please shed a bit of lite on this? And do we have 
> var-length encoding for floating-point types too?
>
> Chris


RE: Var-Length-Numeric encoding?

2022-06-17 Thread Christofer Dutz
Hi Xiangdong,

I doubt you invented a new encoding form. So, in general, I was asking which 
form this actually is.
Juilian already pointed out that bit of code.

So, as I can see it, the sign information is in the least significant bit. This 
would usually be an indicator for ZigZag encoding. The only part I don’t quite 
understand, is the bit-flipping in case of negative values. In case of ZigZag 
encoding, the value would be shift left by one and the last bit would be set as 
the new first bit (So effectively the last bit would just be rotated to become 
the first). In IoTDB it seems as if the left-shifted value is inverted. Don’t 
quite understand why that is happening. I could imagine that for small negative 
integers (small as in “close to 0”) the 2s complement notation has many 1s, 
therefore it would consume a lot of memory in serialized form. So, flipping the 
entire number would get rid of these 1s and hence reduce the size of the 
serialized form.

But going though this document again: 
https://golb.hplar.ch/2019/06/variable-length-int-java.html

If the number is negative, it is x-ored with all bits set to 1 … so this is 
identical to flipping the bits … this is actually really cool and efficient.

So, I would like to confirm that IoTDB uses ZigZag encoding for variable length 
signed integers. Possibly a comment to the utils class to which encoding is 
actually used, would be a great addition. I’ll probably add one asap.

Chris




From: Xiangdong Huang 
Sent: Freitag, 17. Juni 2022 09:33
To: dev ; Yuan Tian 
Subject: Re: Var-Length-Numeric encoding?

Hi,

I think the encoding implementation is in 
src/main/java/org/apache/iotdb/tsfile/utils/ReadWriteForEncodingUtils.java
@Yuan Tian<mailto:jackiet...@apache.org>  implemented it.

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Julian Feinauer 
mailto:j.feina...@pragmaticminds.de>> 
于2022年6月13日周一 17:47写道:
Hi,

I can only comment on floating points: we dont.
Currently we also only have var-length encoding vor u32 (not for u64).

Regarding ZigZag Encoding perhaps anybody else can jump in here?

Julian

Julian Feinauer
Geschäftsführer/CEO

j.feina...@pragmaticminds.de<mailto:%7BE-mail%7D>
+49 (0) 7021 87868-01 |
Jesinger Str. 57, 73230 Kirchheim unter Teck
www.pragmaticindustries.de<https://pragmaticindustries.com/>

[cid:1817091c10b45ac8cae1]   [cid:1817091c10b6373642a2] 
<https://www.linkedin.com/company/pragmatic-industries-gmbh/>  
[cid:1817091c10b5017b7993] <https://twitter.com/pragmaticindus1>  
[cid:1817091c10b32bee5404] 
<https://www.facebook.com/Pragmatic-industries-GmbH-102791535422112>  
[cid:1817091c10b8dea4c1d5] <https://www.instagram.com/pragmaticindustries/>
Pflichtinformationen<https://pragmaticindustries.com/datenschutzerklaerung/>  
gemäß Artikel 13 DSGVO
Von: Christofer Dutz 
mailto:christofer.d...@c-ware.de>>
Datum: Montag, 13. Juni 2022 um 09:50
An: dev@iotdb.apache.org<mailto:dev@iotdb.apache.org> 
mailto:dev@iotdb.apache.org>>
Betreff: Var-Length-Numeric encoding?
Hi all,

Just out of curiosity. Julian told me TSFiles make use of variable length 
encoding of numeric types.
I would expect the encoding for unsigned integers to be the "ordinary" one 
where 7 bits of a byte are being used for encoding the numeric value and new 
bytes are added as long as the first bit is 1.
However, I would be interested in which encoding is being used for unsigned 
integers? Julian posted a reply in the #iotdb slack channel, but I'm unsure 
which official encoding type this is.
It most likely looks like ZigZag Encoding, but I'm a bit unsure if it really is.
Could anyone here please shed a bit of lite on this? And do we have var-length 
encoding for floating-point types too?

Chris


Re: Var-Length-Numeric encoding?

2022-06-17 Thread Xiangdong Huang
Hi,

I think the encoding implementation is in
src/main/java/org/apache/iotdb/tsfile/utils/ReadWriteForEncodingUtils.java
@Yuan Tian   implemented it.

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Julian Feinauer  于2022年6月13日周一 17:47写道:

> Hi,
>
>
>
> I can only comment on floating points: we dont.
>
> Currently we also only have var-length encoding vor u32 (not for u64).
>
>
>
> Regarding ZigZag Encoding perhaps anybody else can jump in here?
>
>
>
> Julian
>
>
>
> *Julian Feinauer*
> Geschäftsführer/CEO
>  
> j.feina...@pragmaticminds.de <%7BE-mail%7D>
> +49 (0) 7021 87868-01 <+49%20(0)%207021%2087868-01> |
> Jesinger Str. 57, 73230 Kirchheim unter Teck
> www.pragmaticindustries.de 
>
> 
> 
> 
> 
> Pflichtinformationen
>   gemäß Artikel
> 13 DSGVO
>
> *Von: *Christofer Dutz 
> *Datum: *Montag, 13. Juni 2022 um 09:50
> *An: *dev@iotdb.apache.org 
> *Betreff: *Var-Length-Numeric encoding?
>
> Hi all,
>
> Just out of curiosity. Julian told me TSFiles make use of variable length
> encoding of numeric types.
> I would expect the encoding for unsigned integers to be the "ordinary" one
> where 7 bits of a byte are being used for encoding the numeric value and
> new bytes are added as long as the first bit is 1.
> However, I would be interested in which encoding is being used for
> unsigned integers? Julian posted a reply in the #iotdb slack channel, but
> I'm unsure which official encoding type this is.
> It most likely looks like ZigZag Encoding, but I'm a bit unsure if it
> really is.
> Could anyone here please shed a bit of lite on this? And do we have
> var-length encoding for floating-point types too?
>
> Chris
>