gene-db commented on code in PR #48215: URL: https://github.com/apache/spark/pull/48215#discussion_r1773613194
########## common/variant/README.md: ########## @@ -364,11 +362,7 @@ The Decimal type contains a scale, but no precision. The implied precision of a The *Logical Type* column indicates logical equivalence of physically encoded types. For example, a user expression operating on a string value containing "hello" should behave the same, whether it is encoded with the short string optimization, or long string encoding. Similarly, user expressions operating on an *int8* value of 1 should behave the same as a decimal16 with scale 2 and unscaled value 100. -The year-month and day-time interval types have one byte at the beginning indicating the start and end fields. In the case of the year-month interval, the least significant bit denotes the start field and the next least significant bit denotes the end field. The remaining 6 bits are unused. A field value of 0 represents YEAR and 1 represents MONTH. In the case of the day-time interval, the least significant 2 bits denote the start field and the next least significant 2 bits denote the end field. The remaining 4 bits are unused. A field value of 0 represents DAY, 1 represents HOUR, 2 represents MINUTE, and 3 represents SECOND. - -Type IDs 17 and 18 were originally reserved for a prototype feature (string-from-metadata) that was never implemented. These IDs are available for use by new types. - -[1] The parquet format does not have pure equivalents for the year-month and day-time interval types. Year-month intervals are usually represented using int32 values and the day-time intervals are usually represented using int64 values. However, these values don't include the start and end fields of these types. Therefore, Spark stores them in the column metadata. +Type IDs 17 and 18 were originally reserved for a prototype feature (string-from-metadata) that was never implemented. These IDs are available for use by new types. Type IDs 19 and 20 were used to represent interval types for whom support has been temporarily disabled, and therefore, these type IDs should not be used by new types. Review Comment: Why can't future types use 19 and 20? Shouldn't intervals just be removed completely, and added later when the exact encoded format is agreed upon? ########## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ########## @@ -4461,6 +4461,16 @@ object SQLConf { .booleanConf .createWithDefault(false) + val ALLOW_INTERVAL_TYPES_IN_VARIANT = buildConf("spark.sql.variant.allowIntervalTypes") Review Comment: I think maybe making this a conf might be dangerous. We should just never create a variant value that contains an interval, since interval is not a valid type in variant right now. It is possible that a user enables this, writes a variant that contains this version of interval, but later a different version of an interval is updated in the spec. Then, there would be corrupt variant values. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org