Hi Xiangdong, I doubt you invented a new encoding form. So, in general, I was asking which form this actually is. Juilian already pointed out that bit of code.
So, as I can see it, the sign information is in the least significant bit. This would usually be an indicator for ZigZag encoding. The only part I don’t quite understand, is the bit-flipping in case of negative values. In case of ZigZag encoding, the value would be shift left by one and the last bit would be set as the new first bit (So effectively the last bit would just be rotated to become the first). In IoTDB it seems as if the left-shifted value is inverted. Don’t quite understand why that is happening. I could imagine that for small negative integers (small as in “close to 0”) the 2s complement notation has many 1s, therefore it would consume a lot of memory in serialized form. So, flipping the entire number would get rid of these 1s and hence reduce the size of the serialized form. But going though this document again: https://golb.hplar.ch/2019/06/variable-length-int-java.html If the number is negative, it is x-ored with all bits set to 1 … so this is identical to flipping the bits … this is actually really cool and efficient. So, I would like to confirm that IoTDB uses ZigZag encoding for variable length signed integers. Possibly a comment to the utils class to which encoding is actually used, would be a great addition. I’ll probably add one asap. Chris From: Xiangdong Huang <saint...@gmail.com> Sent: Freitag, 17. Juni 2022 09:33 To: dev <dev@iotdb.apache.org>; Yuan Tian <jackiet...@apache.org> Subject: Re: Var-Length-Numeric encoding? Hi, I think the encoding implementation is in src/main/java/org/apache/iotdb/tsfile/utils/ReadWriteForEncodingUtils.java @Yuan Tian<mailto:jackiet...@apache.org> implemented it. Best, ----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 Julian Feinauer <j.feina...@pragmaticminds.de<mailto:j.feina...@pragmaticminds.de>> 于2022年6月13日周一 17:47写道: Hi, I can only comment on floating points: we dont. Currently we also only have var-length encoding vor u32 (not for u64). Regarding ZigZag Encoding perhaps anybody else can jump in here? Julian Julian Feinauer Geschäftsführer/CEO j.feina...@pragmaticminds.de<mailto:%7BE-mail%7D> +49 (0) 7021 87868-01<tel:+49%20(0)%207021%2087868-01> | Jesinger Str. 57, 73230 Kirchheim unter Teck www.pragmaticindustries.de<https://pragmaticindustries.com/> [cid:1817091c10b45ac8cae1] [cid:1817091c10b6373642a2] <https://www.linkedin.com/company/pragmatic-industries-gmbh/> [cid:1817091c10b5017b7993] <https://twitter.com/pragmaticindus1> [cid:1817091c10b32bee5404] <https://www.facebook.com/Pragmatic-industries-GmbH-102791535422112> [cid:1817091c10b8dea4c1d5] <https://www.instagram.com/pragmaticindustries/> Pflichtinformationen<https://pragmaticindustries.com/datenschutzerklaerung/> gemäß Artikel 13 DSGVO Von: Christofer Dutz <christofer.d...@c-ware.de<mailto:christofer.d...@c-ware.de>> Datum: Montag, 13. Juni 2022 um 09:50 An: dev@iotdb.apache.org<mailto:dev@iotdb.apache.org> <dev@iotdb.apache.org<mailto:dev@iotdb.apache.org>> Betreff: Var-Length-Numeric encoding? Hi all, Just out of curiosity. Julian told me TSFiles make use of variable length encoding of numeric types. I would expect the encoding for unsigned integers to be the "ordinary" one where 7 bits of a byte are being used for encoding the numeric value and new bytes are added as long as the first bit is 1. However, I would be interested in which encoding is being used for unsigned integers? Julian posted a reply in the #iotdb slack channel, but I'm unsure which official encoding type this is. It most likely looks like ZigZag Encoding, but I'm a bit unsure if it really is. Could anyone here please shed a bit of lite on this? And do we have var-length encoding for floating-point types too? Chris