emkornfield commented on code in PR #464:
URL: https://github.com/apache/parquet-format/pull/464#discussion_r1875068149
##########
VariantEncoding.md:
##########
@@ -373,25 +374,37 @@ The Decimal type contains a scale, but no precision. The
implied precision of a
| Object | `2` | A collection of (string-key, variant-value) pairs |
| Array | `3` | An ordered sequence of variant values |
-| Logical Type | Physical Type | Type ID | Equivalent
Parquet Type | Binary format
|
-|----------------------|-----------------------------|---------|-----------------------------|---------------------------------------------------------------------------------------------------------------------|
-| NullType | null | `0` | any
| none
|
-| Boolean | boolean (True) | `1` | BOOLEAN
| none
|
-| Boolean | boolean (False) | `2` | BOOLEAN
| none
|
-| Exact Numeric | int8 | `3` | INT(8,
signed) | 1 byte
|
-| Exact Numeric | int16 | `4` | INT(16,
signed) | 2 byte little-endian
|
-| Exact Numeric | int32 | `5` | INT(32,
signed) | 4 byte little-endian
|
-| Exact Numeric | int64 | `6` | INT(64,
signed) | 8 byte little-endian
|
-| Double | double | `7` | DOUBLE
| IEEE little-endian
|
-| Exact Numeric | decimal4 | `8` |
DECIMAL(precision, scale) | 1 byte scale in range [0, 38], followed by
little-endian unscaled value (see decimal table) |
-| Exact Numeric | decimal8 | `9` |
DECIMAL(precision, scale) | 1 byte scale in range [0, 38], followed by
little-endian unscaled value (see decimal table) |
-| Exact Numeric | decimal16 | `10` |
DECIMAL(precision, scale) | 1 byte scale in range [0, 38], followed by
little-endian unscaled value (see decimal table) |
-| Date | date | `11` | DATE
| 4 byte little-endian
|
-| Timestamp | timestamp | `12` |
TIMESTAMP(true, MICROS) | 8-byte little-endian
|
-| TimestampNTZ | timestamp without time zone | `13` |
TIMESTAMP(false, MICROS) | 8-byte little-endian
|
-| Float | float | `14` | FLOAT
| IEEE little-endian
|
-| Binary | binary | `15` | BINARY
| 4 byte little-endian size, followed by bytes
|
-| String | string | `16` | STRING
| 4 byte little-endian size, followed by UTF-8 encoded bytes
|
+*Variant primitive types*
+
+| Logical Type | Physical Type | Type ID | Equivalent
Parquet Type | Binary format
|
+|----------------------|-----------------------------|---------|-----------------------------|---------------------------------------------------------------------------------------------|
+| NullType | null | `0` | any
| none
|
+| Boolean | boolean (True) | `1` | BOOLEAN
| none
|
+| Boolean | boolean (False) | `2` | BOOLEAN
| none
|
+| Exact Numeric | int8 | `3` | INT(8,
signed) | 1 byte
|
+| Exact Numeric | int16 | `4` | INT(16,
signed) | 2 byte little-endian
|
+| Exact Numeric | int32 | `5` | INT(32,
signed) | 4 byte little-endian
|
+| Exact Numeric | int64 | `6` | INT(64,
signed) | 8 byte little-endian
|
+| Double | double | `7` | DOUBLE
| IEEE little-endian
|
+| Exact Numeric | decimal4 | `8` |
DECIMAL(precision, scale) | 1 byte scale in range [0, 38], followed by
little-endian unscaled value (see decimal table) |
+| Exact Numeric | decimal8 | `9` |
DECIMAL(precision, scale) | 1 byte scale in range [0, 38], followed by
little-endian unscaled value (see decimal table) |
+| Exact Numeric | decimal16 | `10` |
DECIMAL(precision, scale) | 1 byte scale in range [0, 38], followed by
little-endian unscaled value (see decimal table) |
+| Date | date | `11` | DATE
| 4 byte little-endian
|
+| Timestamp | timestamp with time zone | `12` |
TIMESTAMP(isAdjustedToUTC=true, MICROS) | 8-byte little-endian
|
+| TimestampNTZ | timestamp without time zone | `13` |
TIMESTAMP(isAdjustedToUTC=false, MICROS) | 8-byte little-endian
|
+| Float | float | `14` | FLOAT
| IEEE little-endian
|
+| Binary | binary | `15` | BINARY
| 4 byte little-endian size, followed by bytes
|
+| String | string | `16` | STRING
| 4 byte little-endian size, followed by UTF-8 encoded bytes
|
+| TimeNTZ | time without time zone | `21` |
TIME(isAdjustedToUTC=false, MICROS) | 8-byte little-endian
|
+| Timestamp | timestamp with time zone | `22` |
TIMESTAMP(isAdjustedToUTC=true, NANOS) | 8-byte little-endian
|
+| TimestampNTZ | timestamp without time zone | `23` |
TIMESTAMP(isAdjustedToUTC=false, NANOS) | 8-byte little-endian
|
+| UUID | uuid | `24` | UUID
| 16-byte big-endian
|
+
+The *Logical Type* column indicates logical equivalence of physically encoded
types.
Review Comment:
```suggestion
The *Type Equivalence Class* column indicates logical equivalence of
physically encoded types.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]