From 48d74effe7043576008f31551e7f1ac08d24496b Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 17 Aug 2022 20:48:43 +0300
Subject: [PATCH v1] Clarify the comments about varlena header encoding

This patch fixes somewhat misleading comments regarding the encoding of
the varlena header.

Author: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: TODO FIXME
Discussion: TODO FIXME
---
 src/include/postgres.h | 39 ++++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/src/include/postgres.h b/src/include/postgres.h
index 31358110dc..0f9dac73ec 100644
--- a/src/include/postgres.h
+++ b/src/include/postgres.h
@@ -178,27 +178,36 @@ typedef struct
 /*
  * Bit layouts for varlena headers on big-endian machines:
  *
- * 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
- * 01xxxxxx 4-byte length word, aligned, *compressed* data (up to 1G)
- * 10000000 1-byte length word, unaligned, TOAST pointer
- * 1xxxxxxx 1-byte length word, unaligned, uncompressed data (up to 126b)
+ * 00xxxxxx xxxxxxxx xxxxxxxx xxxxxxxx, uncompressed data (up to 1G)
+ * 01xxxxxx xxxxxxxx xxxxxxxx xxxxxxxx, compressed data (up to 1G)
+ * 10000000 xxxxxxxx, TOAST pointer (struct varatt_external)
+ * 1xxxxxxx, uncompressed data (up to 126b)
  *
  * Bit layouts for varlena headers on little-endian machines:
  *
- * xxxxxx00 4-byte length word, aligned, uncompressed data (up to 1G)
- * xxxxxx10 4-byte length word, aligned, *compressed* data (up to 1G)
- * 00000001 1-byte length word, unaligned, TOAST pointer
- * xxxxxxx1 1-byte length word, unaligned, uncompressed data (up to 126b)
+ * xxxxxx00 xxxxxxxx xxxxxxxx xxxxxxxx, uncompressed data (up to 1G)
+ * xxxxxx10 xxxxxxxx xxxxxxxx xxxxxxxx, compressed data (up to 1G)
+ * 00000001 xxxxxxxx, TOAST pointer (struct varatt_external)
+ * xxxxxxx1, uncompressed data (up to 126b)
+ *
+ * The "xxx" bits are the length of the attribute. It always includes the length
+ * of the varlena header.
  *
- * The "xxx" bits are the length field (which includes itself in all cases).
  * In the big-endian case we mask to extract the length, in the little-endian
- * case we shift.  Note that in both cases the flag bits are in the physically
- * first byte.  Also, it is not possible for a 1-byte length word to be zero;
- * this lets us disambiguate alignment padding bytes from the start of an
- * unaligned datum.  (We now *require* pad bytes to be filled with zero!)
+ * case we shift. Note that in both cases the flag bits are stored in the
+ * physically first byte.
+ *
+ * In first two cases when the length is encoded with 30 bits the varlena
+ * header is aligned to 4 bytes. In other two cases the header is unaligned.
+ * Padding bytes are required to be filled with zeroes. This makes the encoding
+ * unambiguous.
+ *
+ * In the second case the first 4 bytes of compressed data store the length
+ * of the uncompressed data.
  *
- * In TOAST pointers the va_tag field (see varattrib_1b_e) is used to discern
- * the specific type and length of the pointer datum.
+ * In the third case the va_tag field (see varattrib_1b_e) is used to discern
+ * the specific type and length of the pointer datum. On disk the "xxx" bits
+ * currently always store sizeof(varatt_external) + 2.
  */
 
 /*
-- 
2.37.1

