findepi commented on PR #12488: URL: https://github.com/apache/datafusion/pull/12488#issuecomment-2353208973
This won't pass the test yet, but creating the PR to have a conversation first. cc @alamb @comphead especially if we go with @notfilippo's https://github.com/apache/datafusion/issues/11513 , we need to have an answer to: what is a type? what isn't a type? what's logical and what's physical. but even if we don't do the decoupling linked above, we need to answer this question. Do we need scalar value to represent "would be dictionary but is single value" (being removed here)? Do we need scalar value to represent "would be RLE but is single value" (similar, doesn't exist today)? Do we need scalar value to represent "a string", "a string but actually maybe long one" or "a string but IF it was encoded in array, it would use SSO / German strings", " a string but IF it was encoded in array it would use prefix compression"? Do we need a scalar to represent "an integer", "an integer but stored as varint", etc.? To me those are properties of physical representation of series of values, that are not attributes of a single value, so ScalarValue doesn't have to be concerned about them. I think ScalarValue reflects some of them because we expect it to define type, and we expect type to *sometimes* define physical representation. This is blurry. Side note: if we want scalar value to represent all possible aspects of array representation then arrow's builtin `Scalar` seems to be ready for that. cc @milevin @sadboy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
