milenkovicm opened a new issue, #16980: URL: https://github.com/apache/datafusion/issues/16980
### Describe the bug Current implementation of `ComposedPhysicalExtensionCodec` is unsound. Approach relying on `try_any` may produce a wrong type by accident if types are simple enough. This is not just theoretical issue, it happened in [ballista codec], where encoded parquet file was decoded as csv instead of parquet. Type was encoded by the last encoded in the list but decoded by first encoder just by pure luck. ( I guess i don't have to mention how hard this was to debug) In order to make current implementation sound we would need to capture which encoder in the list has been used and do a reverse lookup when we do decoding. We need to encode tuple (position, serialised_blob). [ballista codec]: https://github.com/milenkovicm/arrow-ballista/blob/d1295f7d1ab5c40a433ab17a344494f39b18f0af/ballista/core/src/serde/mod.rs#L126-L127 ### To Reproduce I dont have a reproducer at the moment, i believe it could be done very simple ### Expected behavior It is expected that types can't be decoded by accident ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org