mihailom-db opened a new pull request, #51335: URL: https://github.com/apache/spark/pull/51335
### What changes were proposed in this pull request? This PR proposes a change in how our parser treats datatypes. We introduce types with/without parameters and group accordingly. ### Why are the changes needed? Changes are needed for many reasons: 1. Context of primitiveDataType is constantly getting bigger. This is not a good practice, as we have many null fields which only take up memory. 2. We have inconsistencies in where we use each type. We get TIMESTAMP_NTZ in a separate rule, but we also mention it in primitive types. 3. Primitive types should stay related to primitive types, adding ARRAY, STRUCT, MAP in the rule just because it is convenient is not good practice. 4. Current structure does not give option of extending types with different features. For example, we introduced STRING collations, but what if we were to introduce CHAR/VARCHAR with collations. Current structure gives us 0 possibility of making a type CHAR(5) COLLATE UTF8_BINARY (We can only do CHAR COLLATE UTF8_BINARY (5)). ### Does this PR introduce _any_ user-facing change? No. This is internal refactoring. ### How was this patch tested? All existing tests should pass, this is just code refactoring. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
