alkis commented on code in PR #34:
URL: https://github.com/apache/parquet-site/pull/34#discussion_r1644575190


##########
content/en/docs/File Format/implementationstatus.md:
##########
@@ -0,0 +1,101 @@
+---
+title: "Implementation status"
+linkTitle: "Implementation status"
+weight: 8
+---
+### Physical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| BOOLEAN                                   |       |        |       |       |
+| INT32                                     |       |        |       |       |
+| INT64                                     |       |        |       |       |
+| INT96                                     |       |        |       |       |
+| FLOAT                                     |       |        |       |       |
+| DOUBLE                                    |       |        |       |       |
+| BYTE_ARRAY                                |       |        |       |       |
+| FIXED_LEN_BYTE_ARRAY                      |       |        |       |       |
+
+### Logical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| STRING                                    |       |        |       |       |
+| ENUM                                      |       |        |       |       |
+| UUID                                      |       |        |       |       |
+| 8 and 16 bit signed INT                   |       |        |       |       |
+| 8, 16, 32, 64 bit unsigned INT            |       |        |       |       |
+| DECIMAL (INT32)                           |       |        |       |       |
+| DECIMAL (INT64)                           |       |        |       |       |
+| DECIMAL (BYTE_ARRAY)                      |       |        |       |       |
+| DECIMAL (FIXED_LEN_BYTE_ARRAY)            |       |        |       |       |
+| DATE                                      |       |        |       |       |
+| TIME (INT32)                              |       |        |       |       |
+| TIME (INT64)                              |       |        |       |       |
+| TIMESTAMP (INT32)                         |       |        |       |       |
+| TIMESTAMP (INT64)                         |       |        |       |       |
+| INTERVAL                                  |       |        |       |       |
+| JSON                                      |       |        |       |       |
+| BSON                                      |       |        |       |       |
+| LIST                                      |       |        |       |       |
+| MAP                                       |       |        |       |       |
+| UNKNOWN                                   |       |        |       |       |
+
+### Encoding
+
+| Encoding                                  | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| PLAIN                                     |       |        |       |       |
+| PLAIN_DICTIONARY                          |       |        |       |       |
+| RLE_DICTIONARY                            |       |        |       |       |
+| RLE                                       |       |        |       |       |
+| BIT_PACKED                                |       |        |       |       |

Review Comment:
   `BIT_PACKED (deprecated)`?



##########
content/en/docs/File Format/implementationstatus.md:
##########
@@ -0,0 +1,101 @@
+---
+title: "Implementation status"
+linkTitle: "Implementation status"
+weight: 8
+---
+### Physical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| BOOLEAN                                   |       |        |       |       |
+| INT32                                     |       |        |       |       |
+| INT64                                     |       |        |       |       |
+| INT96                                     |       |        |       |       |
+| FLOAT                                     |       |        |       |       |
+| DOUBLE                                    |       |        |       |       |
+| BYTE_ARRAY                                |       |        |       |       |
+| FIXED_LEN_BYTE_ARRAY                      |       |        |       |       |
+
+### Logical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| STRING                                    |       |        |       |       |
+| ENUM                                      |       |        |       |       |
+| UUID                                      |       |        |       |       |
+| 8 and 16 bit signed INT                   |       |        |       |       |
+| 8, 16, 32, 64 bit unsigned INT            |       |        |       |       |
+| DECIMAL (INT32)                           |       |        |       |       |
+| DECIMAL (INT64)                           |       |        |       |       |
+| DECIMAL (BYTE_ARRAY)                      |       |        |       |       |
+| DECIMAL (FIXED_LEN_BYTE_ARRAY)            |       |        |       |       |
+| DATE                                      |       |        |       |       |
+| TIME (INT32)                              |       |        |       |       |
+| TIME (INT64)                              |       |        |       |       |
+| TIMESTAMP (INT32)                         |       |        |       |       |
+| TIMESTAMP (INT64)                         |       |        |       |       |
+| INTERVAL                                  |       |        |       |       |
+| JSON                                      |       |        |       |       |
+| BSON                                      |       |        |       |       |
+| LIST                                      |       |        |       |       |
+| MAP                                       |       |        |       |       |
+| UNKNOWN                                   |       |        |       |       |
+
+### Encoding
+
+| Encoding                                  | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| PLAIN                                     |       |        |       |       |
+| PLAIN_DICTIONARY                          |       |        |       |       |
+| RLE_DICTIONARY                            |       |        |       |       |
+| RLE                                       |       |        |       |       |
+| BIT_PACKED                                |       |        |       |       |
+| DELTA_BINARY_PACKED                       |       |        |       |       |
+| DELTA_LENGTH_BYTE_ARRAY                   |       |        |       |       |
+| DELTA_BYTE_ARRAY                          |       |        |       |       |
+| BYTE_STREAM_SPLIT                         |       |        |       |       |
+
+### Compression
+
+| Compression                               | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| UNCOMPRESSED                              |       |        |       |       |
+| SNAPPY                                    |       |        |       |       |
+| GZIP                                      |       |        |       |       |
+| LZO                                       |       |        |       |       |
+| BROTLI                                    |       |        |       |       |
+| LZ4                                       |       |        |       |       |

Review Comment:
   `LZ4 (deprecated)`?



##########
content/en/docs/File Format/implementationstatus.md:
##########
@@ -0,0 +1,101 @@
+---
+title: "Implementation status"
+linkTitle: "Implementation status"
+weight: 8
+---
+### Physical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| BOOLEAN                                   |       |        |       |       |
+| INT32                                     |       |        |       |       |
+| INT64                                     |       |        |       |       |
+| INT96                                     |       |        |       |       |
+| FLOAT                                     |       |        |       |       |
+| DOUBLE                                    |       |        |       |       |
+| BYTE_ARRAY                                |       |        |       |       |
+| FIXED_LEN_BYTE_ARRAY                      |       |        |       |       |
+
+### Logical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| STRING                                    |       |        |       |       |
+| ENUM                                      |       |        |       |       |
+| UUID                                      |       |        |       |       |
+| 8 and 16 bit signed INT                   |       |        |       |       |

Review Comment:
   This should be 8, 16, 32, 64 (or folded below like @pitrou suggests).



##########
content/en/docs/File Format/implementationstatus.md:
##########
@@ -0,0 +1,101 @@
+---
+title: "Implementation status"
+linkTitle: "Implementation status"
+weight: 8
+---
+### Physical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| BOOLEAN                                   |       |        |       |       |
+| INT32                                     |       |        |       |       |
+| INT64                                     |       |        |       |       |
+| INT96                                     |       |        |       |       |
+| FLOAT                                     |       |        |       |       |
+| DOUBLE                                    |       |        |       |       |
+| BYTE_ARRAY                                |       |        |       |       |
+| FIXED_LEN_BYTE_ARRAY                      |       |        |       |       |
+
+### Logical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| STRING                                    |       |        |       |       |
+| ENUM                                      |       |        |       |       |
+| UUID                                      |       |        |       |       |
+| 8 and 16 bit signed INT                   |       |        |       |       |
+| 8, 16, 32, 64 bit unsigned INT            |       |        |       |       |
+| DECIMAL (INT32)                           |       |        |       |       |
+| DECIMAL (INT64)                           |       |        |       |       |
+| DECIMAL (BYTE_ARRAY)                      |       |        |       |       |
+| DECIMAL (FIXED_LEN_BYTE_ARRAY)            |       |        |       |       |
+| DATE                                      |       |        |       |       |
+| TIME (INT32)                              |       |        |       |       |
+| TIME (INT64)                              |       |        |       |       |
+| TIMESTAMP (INT32)                         |       |        |       |       |
+| TIMESTAMP (INT64)                         |       |        |       |       |
+| INTERVAL                                  |       |        |       |       |
+| JSON                                      |       |        |       |       |
+| BSON                                      |       |        |       |       |
+| LIST                                      |       |        |       |       |
+| MAP                                       |       |        |       |       |
+| UNKNOWN                                   |       |        |       |       |
+
+### Encoding
+
+| Encoding                                  | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| PLAIN                                     |       |        |       |       |
+| PLAIN_DICTIONARY                          |       |        |       |       |
+| RLE_DICTIONARY                            |       |        |       |       |
+| RLE                                       |       |        |       |       |
+| BIT_PACKED                                |       |        |       |       |
+| DELTA_BINARY_PACKED                       |       |        |       |       |
+| DELTA_LENGTH_BYTE_ARRAY                   |       |        |       |       |
+| DELTA_BYTE_ARRAY                          |       |        |       |       |
+| BYTE_STREAM_SPLIT                         |       |        |       |       |
+
+### Compression
+
+| Compression                               | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| UNCOMPRESSED                              |       |        |       |       |
+| SNAPPY                                    |       |        |       |       |
+| GZIP                                      |       |        |       |       |
+| LZO                                       |       |        |       |       |
+| BROTLI                                    |       |        |       |       |
+| LZ4                                       |       |        |       |       |
+| ZSTD                                      |       |        |       |       |
+| LZ4_RAW                                   |       |        |       |       |
+
+### Other format level features
+
+|                                           | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| xxHash Bloom filters                      |       |        |       |       |
+| bloom filter length                       |       |        |       |       |
+| Statistics min_value, max_value           |       |        |       |       |
+| Column index                              |       |        |       |       |
+| Offset index                              |       |        |       |       |
+| Modular encryption                        |       |        |       |       |
+| Page CRC32 checksum                       |       |        |       |       |
+| Modular encryption                        |       |        |       |       |
+
+### High level data API-s for parquet feature usage
+
+| Format                                       | C++   | Java   | Go    | Rust 
 |
+| -------------------------------------------- | ----- | ------ | ----- | 
----- |
+| Hive-style partitioning                      |       |        |       |      
 |
+| Partition pruning on the partition column    |       |        |       |      
 |
+| External column data                         |       |        |       |      
 |

Review Comment:
   Isn't this supporting reads in other files? 
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L868



##########
content/en/docs/File Format/implementationstatus.md:
##########
@@ -0,0 +1,101 @@
+---
+title: "Implementation status"
+linkTitle: "Implementation status"
+weight: 8
+---
+### Physical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| BOOLEAN                                   |       |        |       |       |
+| INT32                                     |       |        |       |       |
+| INT64                                     |       |        |       |       |
+| INT96                                     |       |        |       |       |
+| FLOAT                                     |       |        |       |       |
+| DOUBLE                                    |       |        |       |       |
+| BYTE_ARRAY                                |       |        |       |       |
+| FIXED_LEN_BYTE_ARRAY                      |       |        |       |       |
+
+### Logical types
+
+| Data type                                 | C++   | Java   | Go    | Rust  |
+| ----------------------------------------- | ----- | ------ | ----- | ----- |
+| STRING                                    |       |        |       |       |
+| ENUM                                      |       |        |       |       |
+| UUID                                      |       |        |       |       |
+| 8 and 16 bit signed INT                   |       |        |       |       |
+| 8, 16, 32, 64 bit unsigned INT            |       |        |       |       |
+| DECIMAL (INT32)                           |       |        |       |       |
+| DECIMAL (INT64)                           |       |        |       |       |
+| DECIMAL (BYTE_ARRAY)                      |       |        |       |       |
+| DECIMAL (FIXED_LEN_BYTE_ARRAY)            |       |        |       |       |
+| DATE                                      |       |        |       |       |
+| TIME (INT32)                              |       |        |       |       |
+| TIME (INT64)                              |       |        |       |       |
+| TIMESTAMP (INT32)                         |       |        |       |       |
+| TIMESTAMP (INT64)                         |       |        |       |       |
+| INTERVAL                                  |       |        |       |       |

Review Comment:
   If it is not in the union we might as well call it deprecated and forget it.
   
   Out of curiosity is this type even useful?
   - it is 12 bytes which means it will be slower to process than 8 byte types 
- just like INT96
   - given that it stores 3 ints for months, days, millis, it has a range of 
357 million years in either direction
   - an int64 that stores millis alone has a range of 584 million years
   
   Woot?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to