alkis commented on code in PR #34: URL: https://github.com/apache/parquet-site/pull/34#discussion_r1644575190
########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- +### Physical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| BOOLEAN | | | | | +| INT32 | | | | | +| INT64 | | | | | +| INT96 | | | | | +| FLOAT | | | | | +| DOUBLE | | | | | +| BYTE_ARRAY | | | | | +| FIXED_LEN_BYTE_ARRAY | | | | | + +### Logical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| STRING | | | | | +| ENUM | | | | | +| UUID | | | | | +| 8 and 16 bit signed INT | | | | | +| 8, 16, 32, 64 bit unsigned INT | | | | | +| DECIMAL (INT32) | | | | | +| DECIMAL (INT64) | | | | | +| DECIMAL (BYTE_ARRAY) | | | | | +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | +| DATE | | | | | +| TIME (INT32) | | | | | +| TIME (INT64) | | | | | +| TIMESTAMP (INT32) | | | | | +| TIMESTAMP (INT64) | | | | | +| INTERVAL | | | | | +| JSON | | | | | +| BSON | | | | | +| LIST | | | | | +| MAP | | | | | +| UNKNOWN | | | | | + +### Encoding + +| Encoding | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| PLAIN | | | | | +| PLAIN_DICTIONARY | | | | | +| RLE_DICTIONARY | | | | | +| RLE | | | | | +| BIT_PACKED | | | | | Review Comment: `BIT_PACKED (deprecated)`? ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- +### Physical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| BOOLEAN | | | | | +| INT32 | | | | | +| INT64 | | | | | +| INT96 | | | | | +| FLOAT | | | | | +| DOUBLE | | | | | +| BYTE_ARRAY | | | | | +| FIXED_LEN_BYTE_ARRAY | | | | | + +### Logical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| STRING | | | | | +| ENUM | | | | | +| UUID | | | | | +| 8 and 16 bit signed INT | | | | | +| 8, 16, 32, 64 bit unsigned INT | | | | | +| DECIMAL (INT32) | | | | | +| DECIMAL (INT64) | | | | | +| DECIMAL (BYTE_ARRAY) | | | | | +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | +| DATE | | | | | +| TIME (INT32) | | | | | +| TIME (INT64) | | | | | +| TIMESTAMP (INT32) | | | | | +| TIMESTAMP (INT64) | | | | | +| INTERVAL | | | | | +| JSON | | | | | +| BSON | | | | | +| LIST | | | | | +| MAP | | | | | +| UNKNOWN | | | | | + +### Encoding + +| Encoding | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| PLAIN | | | | | +| PLAIN_DICTIONARY | | | | | +| RLE_DICTIONARY | | | | | +| RLE | | | | | +| BIT_PACKED | | | | | +| DELTA_BINARY_PACKED | | | | | +| DELTA_LENGTH_BYTE_ARRAY | | | | | +| DELTA_BYTE_ARRAY | | | | | +| BYTE_STREAM_SPLIT | | | | | + +### Compression + +| Compression | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| UNCOMPRESSED | | | | | +| SNAPPY | | | | | +| GZIP | | | | | +| LZO | | | | | +| BROTLI | | | | | +| LZ4 | | | | | Review Comment: `LZ4 (deprecated)`? ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- +### Physical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| BOOLEAN | | | | | +| INT32 | | | | | +| INT64 | | | | | +| INT96 | | | | | +| FLOAT | | | | | +| DOUBLE | | | | | +| BYTE_ARRAY | | | | | +| FIXED_LEN_BYTE_ARRAY | | | | | + +### Logical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| STRING | | | | | +| ENUM | | | | | +| UUID | | | | | +| 8 and 16 bit signed INT | | | | | Review Comment: This should be 8, 16, 32, 64 (or folded below like @pitrou suggests). ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- +### Physical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| BOOLEAN | | | | | +| INT32 | | | | | +| INT64 | | | | | +| INT96 | | | | | +| FLOAT | | | | | +| DOUBLE | | | | | +| BYTE_ARRAY | | | | | +| FIXED_LEN_BYTE_ARRAY | | | | | + +### Logical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| STRING | | | | | +| ENUM | | | | | +| UUID | | | | | +| 8 and 16 bit signed INT | | | | | +| 8, 16, 32, 64 bit unsigned INT | | | | | +| DECIMAL (INT32) | | | | | +| DECIMAL (INT64) | | | | | +| DECIMAL (BYTE_ARRAY) | | | | | +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | +| DATE | | | | | +| TIME (INT32) | | | | | +| TIME (INT64) | | | | | +| TIMESTAMP (INT32) | | | | | +| TIMESTAMP (INT64) | | | | | +| INTERVAL | | | | | +| JSON | | | | | +| BSON | | | | | +| LIST | | | | | +| MAP | | | | | +| UNKNOWN | | | | | + +### Encoding + +| Encoding | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| PLAIN | | | | | +| PLAIN_DICTIONARY | | | | | +| RLE_DICTIONARY | | | | | +| RLE | | | | | +| BIT_PACKED | | | | | +| DELTA_BINARY_PACKED | | | | | +| DELTA_LENGTH_BYTE_ARRAY | | | | | +| DELTA_BYTE_ARRAY | | | | | +| BYTE_STREAM_SPLIT | | | | | + +### Compression + +| Compression | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| UNCOMPRESSED | | | | | +| SNAPPY | | | | | +| GZIP | | | | | +| LZO | | | | | +| BROTLI | | | | | +| LZ4 | | | | | +| ZSTD | | | | | +| LZ4_RAW | | | | | + +### Other format level features + +| | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| xxHash Bloom filters | | | | | +| bloom filter length | | | | | +| Statistics min_value, max_value | | | | | +| Column index | | | | | +| Offset index | | | | | +| Modular encryption | | | | | +| Page CRC32 checksum | | | | | +| Modular encryption | | | | | + +### High level data API-s for parquet feature usage + +| Format | C++ | Java | Go | Rust | +| -------------------------------------------- | ----- | ------ | ----- | ----- | +| Hive-style partitioning | | | | | +| Partition pruning on the partition column | | | | | +| External column data | | | | | Review Comment: Isn't this supporting reads in other files? https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L868 ########## content/en/docs/File Format/implementationstatus.md: ########## @@ -0,0 +1,101 @@ +--- +title: "Implementation status" +linkTitle: "Implementation status" +weight: 8 +--- +### Physical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| BOOLEAN | | | | | +| INT32 | | | | | +| INT64 | | | | | +| INT96 | | | | | +| FLOAT | | | | | +| DOUBLE | | | | | +| BYTE_ARRAY | | | | | +| FIXED_LEN_BYTE_ARRAY | | | | | + +### Logical types + +| Data type | C++ | Java | Go | Rust | +| ----------------------------------------- | ----- | ------ | ----- | ----- | +| STRING | | | | | +| ENUM | | | | | +| UUID | | | | | +| 8 and 16 bit signed INT | | | | | +| 8, 16, 32, 64 bit unsigned INT | | | | | +| DECIMAL (INT32) | | | | | +| DECIMAL (INT64) | | | | | +| DECIMAL (BYTE_ARRAY) | | | | | +| DECIMAL (FIXED_LEN_BYTE_ARRAY) | | | | | +| DATE | | | | | +| TIME (INT32) | | | | | +| TIME (INT64) | | | | | +| TIMESTAMP (INT32) | | | | | +| TIMESTAMP (INT64) | | | | | +| INTERVAL | | | | | Review Comment: If it is not in the union we might as well call it deprecated and forget it. Out of curiosity is this type even useful? - it is 12 bytes which means it will be slower to process than 8 byte types - just like INT96 - given that it stores 3 ints for months, days, millis, it has a range of 357 million years in either direction - an int64 that stores millis alone has a range of 584 million years Woot? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org