As all the string in python are immutable, is impossible to overwrite the value or to make a "secure disposal" (overwrite-then-free) of a string using something like:
>>> a = "something to hide" >>> a = "x"*len(a) This will lead on the process memory "something to hide" and "x" repeated len(a) times. - Who cares? Why is this relevant? Well if you handle some sensitive information like CC numbers, Passwords, PINs, or other kind of information you wanna minimize the chance of leaking any of it. - How this "leak" can happen? If you get a core/memory dump of an app handling sensitive information you will get all the information on that core exposed! - Well, so what we can do about this? I propose to make the required changes on the string objects to add an option to overwrite the underlying buffer. To do so: * Add a wiped as an attribute that is read-only to be set when the string is overwrited. * Add a wipe() method that overwrite the internal string buffer. So this will work like this: >>> pwd =getpass.getpass('Set your password:') # could be other sensitive data. >>> encrypted_pwd = crypt.crypt(pwd) # crypt() just as example. >>> pwd.wiped # Check if pwd was wiped. False >>> pwd.wipe() # Overwrite the underlying buffer >>> pwd.wiped # Check if pwd was wiped. True >>> print(pwd) # Print noise (or empty str?) >>> del pwd # Now is in hands of the GC. The wipe method immediately overwrite the underlying string buffer, setting wiped as True for reference so if the string is further used this can be checked to confirm that the change was made by a wipe and not by another procedure. Also initially the idea is to use unicode NULL datapoint to overwrite the string, but this could be change to let the user parametrize it over wipe() method. An alternative to this is to add a new exception "WipedError" that could be throw where the string is accessed again, but I found this method too disruptive to implement for a normal/standard string workflow usage. Quick & Dirty FAQ: - You do it wrong!, the correct code to do that in a secure way is: >>> pwd = crypt.crypt(getpass.getpass('Set your password')) Don't you know that fool? Well no, the code still generate a temporary string in memory to pass to crypt. But now this string is lying there and can't be accessed for an overwrite with wipe() - Why not create a new type like in C# or Java? I see that this tend to disrupt the usual workflow of string usage. Also the idea here is not to offer secure storage of string in memory because there is already a few mechanism to achieve with the current Python base. I just want to have the hability to overwrite the buffer. - Why don't use one of the standard algorithms to overwrite like DoD5220 or MIL-STD-414? This kind of standard usually are oriented for usage on persistent storage, specially on magnetic media for where the data could be "easily" recoverd. But this could ve an option that could be implemented adding the option to plug a function that do the overwrite work inside the wipe method. - This is far beyond of the almost implementation-agnostic definition of the python lang. How about to you make a module with this functionality and left the lang as is? Well I already do it: https://github.com/qlixed/python-memwiper/ But i hit a lot of problems in the road, I was working on me free time over the last year on this and make it "almost" work, but that is not relevant to the proposal. I think that this kind of security things needs to be tackled from within the language itself specially when the lang have GC. I firmly believe that the security and protections needs to be part of the "with batteries" offer of Python. And I think that this is one little thing that could help a lot to secure our apps. Let me know what do you think! ~ Ezequiel (Ezekiel) Brizuela [ aka Qlixed ] ~
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/