madhavajay commented on issue #11239: URL: https://github.com/apache/arrow/issues/11239#issuecomment-1054964568
@pitrou thats a great point, the attacker can simply serialize something which will use the pickle deserializer internally. We currently use protobuf but are looking for something zero copy, faster and supporting much larger sizes, however I think the best thing to do here is just to re-implement our data structure to use the native Arrow datatypes that way the serialization is basically just memory copy right? Am I correct in understanding that serializing of normal Arrow tensor / table etc types will not use the `pa.serialize` pathway or `pickle` anywhere? Is there any good documentation on implementing a custom object type made from Arrow primitives, for example multiple related np.arrays of data and some sidecar meta data possibly of different underlying numeric type and size? Regarding the existing functionality of `pa.serialize` I think it would be worth adding a warning either to the function or docs that Pickle is insecure and either accelerating the deprecation or providing an optional parameter which allows the ability to ignore / raise an exception for any pickle types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org