On Mon, Jul 13, 2020 at 09:56:45PM +1000, Chris Angelico wrote:
> A pickle file (or equivalent blob in a database, or whatever) should
> be considered equally as trusted as your source code. If you're
> writing out a file that has the exact same access permissions as your
> own source code, and then reading it back, you shouldn't have to worry
> about pickle's safety any more than you worry about your code's safety
> - anyone who could maliciously craft something for you to unpickle
> could equally just edit the source code directly.
If I worry about the security of my source code, I can put a known good
copy on read-only media, or lock it down with more restrictive
permissions so that the user running the code cannot modify it. In
either case, if my code needs to write data out and then later back in
to a pickle file, it can't be written to the same location as my source
code. (As it is read-only.)
So it isn't correct that a malicious user having the ability to craft a
pickle file could just edit the source code. These are independent
threats.
There is a scenario where what you say is correct: as the application
developer, I create my data structures for my app and store them in
pickles *at build time*, distributing the pickles as part of my app. In
that case they can be read-only, and are effectively compiled source
code. I guess you were thinking of a similar scenario?
But in the case of security, it really doesn't matter about the safe
scenarios. It doesn't matter if there are a million safe use-cases for
pickle ("what if I'm running on a single-user system with no internet, a
malicious user can only hurt themselves..."[1]) if the user mistakes
their actually unsafe scenario for a safe one.
And that's the risk: can I guarantee that there is no clever scheme by
which an attacker can fool me into unpickling malicious code? I need to
be smarter than the attacker, and more imaginative, and to have thought
as long and hard about the problem as they have.
They've probably been thinking about ways to exploit pickle for months.
I've spent three minutes reading the docs. Who is likely to win?
This is why an *inherently safe* serialization format is a necessary
thing. I don't want to spend even three minutes thinking about exploits,
I just want to write the data out and read it back in, no issues, no
worries, and not have to think about it.
[1] Victims and authors of viruses and malware in the 1980s and 1990s
may disagree.
--
Steven
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/DS6RWX3734E6ZKM67ILDCV2UTE5I3KEC/
Code of Conduct: http://python.org/psf/codeofconduct/