Re: Var-Length-Numeric encoding?
Hi Chris, You're right. IoTDB does use ZigZag encoding for variable length signed integers. And as you said, if the number is negative, it is x-ored with all bits set to 1, so this is identical to flipping the bits. Shifting left the negative int32 value by one and then flipping all tthe bits is just what ZigZag encoding want. Best, --- Yuan Tian On Fri, Jun 17, 2022 at 6:55 PM Christofer Dutz wrote: > > Hi Xiangdong, > > I doubt you invented a new encoding form. So, in general, I was asking which > form this actually is. > Juilian already pointed out that bit of code. > > So, as I can see it, the sign information is in the least significant bit. > This would usually be an indicator for ZigZag encoding. The only part I don’t > quite understand, is the bit-flipping in case of negative values. In case of > ZigZag encoding, the value would be shift left by one and the last bit would > be set as the new first bit (So effectively the last bit would just be > rotated to become the first). In IoTDB it seems as if the left-shifted value > is inverted. Don’t quite understand why that is happening. I could imagine > that for small negative integers (small as in “close to 0”) the 2s complement > notation has many 1s, therefore it would consume a lot of memory in > serialized form. So, flipping the entire number would get rid of these 1s and > hence reduce the size of the serialized form. > > But going though this document again: > https://golb.hplar.ch/2019/06/variable-length-int-java.html > > If the number is negative, it is x-ored with all bits set to 1 … so this is > identical to flipping the bits … this is actually really cool and efficient. > > So, I would like to confirm that IoTDB uses ZigZag encoding for variable > length signed integers. Possibly a comment to the utils class to which > encoding is actually used, would be a great addition. I’ll probably add one > asap. > > Chris > > > > > From: Xiangdong Huang > Sent: Freitag, 17. Juni 2022 09:33 > To: dev ; Yuan Tian > Subject: Re: Var-Length-Numeric encoding? > > Hi, > > I think the encoding implementation is in > src/main/java/org/apache/iotdb/tsfile/utils/ReadWriteForEncodingUtils.java > @Yuan Tian<mailto:jackiet...@apache.org> implemented it. > > Best, > --- > Xiangdong Huang > School of Software, Tsinghua University > > 黄向东 > 清华大学 软件学院 > > > Julian Feinauer > mailto:j.feina...@pragmaticminds.de>> > 于2022年6月13日周一 17:47写道: > Hi, > > I can only comment on floating points: we dont. > Currently we also only have var-length encoding vor u32 (not for u64). > > Regarding ZigZag Encoding perhaps anybody else can jump in here? > > Julian > > Julian Feinauer > Geschäftsführer/CEO > > j.feina...@pragmaticminds.de<mailto:%7BE-mail%7D> > +49 (0) 7021 87868-01 | > Jesinger Str. 57, 73230 Kirchheim unter Teck > www.pragmaticindustries.de<https://pragmaticindustries.com/> > > [cid:1817091c10b45ac8cae1] [cid:1817091c10b6373642a2] > <https://www.linkedin.com/company/pragmatic-industries-gmbh/> > [cid:1817091c10b5017b7993] <https://twitter.com/pragmaticindus1> > [cid:1817091c10b32bee5404] > <https://www.facebook.com/Pragmatic-industries-GmbH-102791535422112> > [cid:1817091c10b8dea4c1d5] <https://www.instagram.com/pragmaticindustries/> > Pflichtinformationen<https://pragmaticindustries.com/datenschutzerklaerung/> > gemäß Artikel 13 DSGVO > Von: Christofer Dutz > mailto:christofer.d...@c-ware.de>> > Datum: Montag, 13. Juni 2022 um 09:50 > An: dev@iotdb.apache.org<mailto:dev@iotdb.apache.org> > mailto:dev@iotdb.apache.org>> > Betreff: Var-Length-Numeric encoding? > Hi all, > > Just out of curiosity. Julian told me TSFiles make use of variable length > encoding of numeric types. > I would expect the encoding for unsigned integers to be the "ordinary" one > where 7 bits of a byte are being used for encoding the numeric value and new > bytes are added as long as the first bit is 1. > However, I would be interested in which encoding is being used for unsigned > integers? Julian posted a reply in the #iotdb slack channel, but I'm unsure > which official encoding type this is. > It most likely looks like ZigZag Encoding, but I'm a bit unsure if it really > is. > Could anyone here please shed a bit of lite on this? And do we have > var-length encoding for floating-point types too? > > Chris
RE: Var-Length-Numeric encoding?
Hi Xiangdong, I doubt you invented a new encoding form. So, in general, I was asking which form this actually is. Juilian already pointed out that bit of code. So, as I can see it, the sign information is in the least significant bit. This would usually be an indicator for ZigZag encoding. The only part I don’t quite understand, is the bit-flipping in case of negative values. In case of ZigZag encoding, the value would be shift left by one and the last bit would be set as the new first bit (So effectively the last bit would just be rotated to become the first). In IoTDB it seems as if the left-shifted value is inverted. Don’t quite understand why that is happening. I could imagine that for small negative integers (small as in “close to 0”) the 2s complement notation has many 1s, therefore it would consume a lot of memory in serialized form. So, flipping the entire number would get rid of these 1s and hence reduce the size of the serialized form. But going though this document again: https://golb.hplar.ch/2019/06/variable-length-int-java.html If the number is negative, it is x-ored with all bits set to 1 … so this is identical to flipping the bits … this is actually really cool and efficient. So, I would like to confirm that IoTDB uses ZigZag encoding for variable length signed integers. Possibly a comment to the utils class to which encoding is actually used, would be a great addition. I’ll probably add one asap. Chris From: Xiangdong Huang Sent: Freitag, 17. Juni 2022 09:33 To: dev ; Yuan Tian Subject: Re: Var-Length-Numeric encoding? Hi, I think the encoding implementation is in src/main/java/org/apache/iotdb/tsfile/utils/ReadWriteForEncodingUtils.java @Yuan Tian<mailto:jackiet...@apache.org> implemented it. Best, --- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 Julian Feinauer mailto:j.feina...@pragmaticminds.de>> 于2022年6月13日周一 17:47写道: Hi, I can only comment on floating points: we dont. Currently we also only have var-length encoding vor u32 (not for u64). Regarding ZigZag Encoding perhaps anybody else can jump in here? Julian Julian Feinauer Geschäftsführer/CEO j.feina...@pragmaticminds.de<mailto:%7BE-mail%7D> +49 (0) 7021 87868-01 | Jesinger Str. 57, 73230 Kirchheim unter Teck www.pragmaticindustries.de<https://pragmaticindustries.com/> [cid:1817091c10b45ac8cae1] [cid:1817091c10b6373642a2] <https://www.linkedin.com/company/pragmatic-industries-gmbh/> [cid:1817091c10b5017b7993] <https://twitter.com/pragmaticindus1> [cid:1817091c10b32bee5404] <https://www.facebook.com/Pragmatic-industries-GmbH-102791535422112> [cid:1817091c10b8dea4c1d5] <https://www.instagram.com/pragmaticindustries/> Pflichtinformationen<https://pragmaticindustries.com/datenschutzerklaerung/> gemäß Artikel 13 DSGVO Von: Christofer Dutz mailto:christofer.d...@c-ware.de>> Datum: Montag, 13. Juni 2022 um 09:50 An: dev@iotdb.apache.org<mailto:dev@iotdb.apache.org> mailto:dev@iotdb.apache.org>> Betreff: Var-Length-Numeric encoding? Hi all, Just out of curiosity. Julian told me TSFiles make use of variable length encoding of numeric types. I would expect the encoding for unsigned integers to be the "ordinary" one where 7 bits of a byte are being used for encoding the numeric value and new bytes are added as long as the first bit is 1. However, I would be interested in which encoding is being used for unsigned integers? Julian posted a reply in the #iotdb slack channel, but I'm unsure which official encoding type this is. It most likely looks like ZigZag Encoding, but I'm a bit unsure if it really is. Could anyone here please shed a bit of lite on this? And do we have var-length encoding for floating-point types too? Chris
Re: Var-Length-Numeric encoding?
Hi, I think the encoding implementation is in src/main/java/org/apache/iotdb/tsfile/utils/ReadWriteForEncodingUtils.java @Yuan Tian implemented it. Best, --- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 Julian Feinauer 于2022年6月13日周一 17:47写道: > Hi, > > > > I can only comment on floating points: we dont. > > Currently we also only have var-length encoding vor u32 (not for u64). > > > > Regarding ZigZag Encoding perhaps anybody else can jump in here? > > > > Julian > > > > *Julian Feinauer* > Geschäftsführer/CEO > <https://pragmaticindustries.com/> <https://pragmaticindustries.com/> > j.feina...@pragmaticminds.de <%7BE-mail%7D> > +49 (0) 7021 87868-01 <+49%20(0)%207021%2087868-01> | > Jesinger Str. 57, 73230 Kirchheim unter Teck > www.pragmaticindustries.de <https://pragmaticindustries.com/> > > <https://www.linkedin.com/company/pragmatic-industries-gmbh/> > <https://twitter.com/pragmaticindus1> > <https://www.facebook.com/Pragmatic-industries-GmbH-102791535422112> > <https://www.instagram.com/pragmaticindustries/> > Pflichtinformationen > <https://pragmaticindustries.com/datenschutzerklaerung/> gemäß Artikel > 13 DSGVO > > *Von: *Christofer Dutz > *Datum: *Montag, 13. Juni 2022 um 09:50 > *An: *dev@iotdb.apache.org > *Betreff: *Var-Length-Numeric encoding? > > Hi all, > > Just out of curiosity. Julian told me TSFiles make use of variable length > encoding of numeric types. > I would expect the encoding for unsigned integers to be the "ordinary" one > where 7 bits of a byte are being used for encoding the numeric value and > new bytes are added as long as the first bit is 1. > However, I would be interested in which encoding is being used for > unsigned integers? Julian posted a reply in the #iotdb slack channel, but > I'm unsure which official encoding type this is. > It most likely looks like ZigZag Encoding, but I'm a bit unsure if it > really is. > Could anyone here please shed a bit of lite on this? And do we have > var-length encoding for floating-point types too? > > Chris >
Var-Length-Numeric encoding?
Hi all, Just out of curiosity. Julian told me TSFiles make use of variable length encoding of numeric types. I would expect the encoding for unsigned integers to be the "ordinary" one where 7 bits of a byte are being used for encoding the numeric value and new bytes are added as long as the first bit is 1. However, I would be interested in which encoding is being used for unsigned integers? Julian posted a reply in the #iotdb slack channel, but I'm unsure which official encoding type this is. It most likely looks like ZigZag Encoding, but I'm a bit unsure if it really is. Could anyone here please shed a bit of lite on this? And do we have var-length encoding for floating-point types too? Chris