madhavajay commented on issue #11239:
URL: https://github.com/apache/arrow/issues/11239#issuecomment-1054964568


   @pitrou thats a great point, the attacker can simply serialize something 
which will use the pickle deserializer internally. We currently use protobuf 
but are looking for something zero copy, faster and supporting much larger 
sizes, however I think the best thing to do here is just to re-implement our 
data structure to use the native Arrow datatypes that way the serialization is  
basically just memory copy right?
   
   Am I correct in understanding that serializing of normal Arrow tensor / 
table etc types will not use the `pa.serialize` pathway or `pickle` anywhere? 
Is there any good documentation on implementing a custom object type made from 
Arrow primitives, for example multiple related np.arrays of data and some 
sidecar meta data possibly of different underlying numeric type and size?
   
   Regarding the existing functionality of `pa.serialize` I think it would be 
worth adding a warning either to the function or docs that Pickle is insecure 
and either accelerating the deprecation or providing an optional parameter 
which allows the ability to ignore / raise an exception for any pickle types.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to