tustvold commented on issue #5332: URL: https://github.com/apache/arrow-rs/issues/5332#issuecomment-2551785491
> Many data services accept Parquet as input. Being an open format it has become a de-facto interchange format between systems. Different applications will have different threat models and should make their own judgements, but I would certainly encourage any applications accepting truly untrusted parquet data to rewrite them in some sandboxed environment before handing them off to other systems. This is fairly standard practice when it comes to other media files, e.g. images, video, etc... even where there are extremely mature and well tested transcoders. However, many systems will instead be accepting files from other internal systems, at which point perhaps the thread model is different. _Security concerns aside, I would recommend rewriting parquet files anyway because of the sheer variety of parquet implementations - two files with the same data may behave very differently depending on how they've been written_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
