[PR] [WIP] Fix inconsistencies and refactor primitive types in parser [spark]

via GitHub Tue, 01 Jul 2025 04:57:39 -0700


mihailom-db opened a new pull request, #51335:
URL: https://github.com/apache/spark/pull/51335


   ### What changes were proposed in this pull request?
   This PR proposes a change in how our parser treats datatypes. We introduce 
types with/without parameters and group accordingly.
   
   
   ### Why are the changes needed?
   Changes are needed for many reasons:
   1. Context of primitiveDataType is constantly getting bigger. This is not a 
good practice, as we have many null fields which only take up memory.
   2. We have inconsistencies in where we use each type. We get TIMESTAMP_NTZ 
in a separate rule, but we also mention it in primitive types.
   3. Primitive types should stay related to primitive types, adding ARRAY, 
STRUCT, MAP in the rule just because it is convenient is not good practice.
   4. Current structure does not give option of extending types with different 
features. For example, we introduced STRING collations, but what if we were to 
introduce CHAR/VARCHAR with collations. Current structure gives us 0 
possibility of making a type CHAR(5) COLLATE UTF8_BINARY (We can only do CHAR 
COLLATE UTF8_BINARY (5)).
   
   
   ### Does this PR introduce _any_ user-facing change?
   No. This is internal refactoring.
   
   
   ### How was this patch tested?
   All existing tests should pass, this is just code refactoring.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [WIP] Fix inconsistencies and refactor primitive types in parser [spark]

Reply via email to