Re: [Python-Dev] Why does base64 return bytes?
Stephen J. Turnbull wrote: The RFC is unclear on this point, but I read it as specifying the ASCII coded character set, not the ASCII repertoire of (abstract) characters. Well, I think you've misread it. Or at least there is a more general reading possible that is entirely consistent with the stated purpose and doesn't assume any particular output encoding. It's more subtle than that. *RFCs do not deal with text.* That may be true of most RFCs, but I think this particular one really *is* talking about text, even if the authors didn't realise it at the time. It is also desirable that it be likely to pass unscathed through channels that ... *inadvertantly* treat it as text. Both requirements are conveniently fulfilled by using appropriate ASCII subsets, and encoding on the wire using the usual bit patterns. But only if the part that is (deliberately or inadvertently) treating it as text is using ASCII as its encoding. So, by your reading of the RFC, base64 is *only* intended for channels that use ASCII encoding. Whereas if you drop the assumption of ASCII and use whatever encoding the channel uses for text, then it works for all channels. RFC 4648 doesn't mention it, but an earlier RFC on base64 explicitly said that characters were chosen that also exist in EBCDIC, so it seems they were intending that base64 should work on EBCDIC-bases systems as well as ASCII-based ones. It's purely a matter of our convenience (as programmer *in* Python) whether we return str or bytes. Yes, and it seems to me the decision has been made by people with their noses stuck in low-level protocol implementations. Whenever *I've* needed to base64 encode something, I've wanted the output as text, because that's what I needed to feed into the next stage of the process. Maybe there should be two versions of the base64 codec, one producing bytes and one producing text? -- Greg ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why does base64 return bytes?
Simon Cross wrote: If we only support one, I would prefer it to be bytes since (bytes -> bytes -> unicode) seems like less overhead and slightly conceptually clearer than (bytes -> unicode -> bytes), Whereas bytes -> unicode, followed if needed by unicode -> bytes, seems conceptually clearer to me. IOW, base64 is conceptually a bytes-to-text transformation, and the usual way to represent text in Python 3 is unicode. -- Greg ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why does base64 return bytes?
On Tue, Jun 14, 2016 at 09:40:51PM -0700, Guido van Rossum wrote: > I'm officially on vacation, but I was surprised that people now assume > RFCs, which specify internet protocols, would have a bearing on programming > languages. (With perhaps an exception for RFCs that specifically specify > how programming languages or their libraries should treat certain specific > issues -- but I found no evidence that this RFC is doing that.) Sorry to disturb your vacation! I hoped that there might have been a nice simple answer, like "the main use-case for Base64 is the email module, which needs bytes, and thus it was decided". Or even "because backwards compatibility". Thanks to everyone for their constructive comments, and expecially Mark for digging up the original discussion on the Python-3000 list. I'm satisfied that the choice made by Python is the right choice, and that it meets the spirit (if, arguably, not the letter) of the RFC. -- Steve ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why does base64 return bytes?
In that case could we just add a base64_text() method somewhere? Who would like to measure whether it would be a win? On Wed, Jun 15, 2016 at 8:34 AM Steven D'Aprano wrote: > On Tue, Jun 14, 2016 at 09:40:51PM -0700, Guido van Rossum wrote: > > I'm officially on vacation, but I was surprised that people now assume > > RFCs, which specify internet protocols, would have a bearing on > programming > > languages. (With perhaps an exception for RFCs that specifically specify > > how programming languages or their libraries should treat certain > specific > > issues -- but I found no evidence that this RFC is doing that.) > > Sorry to disturb your vacation! > > I hoped that there might have been a nice simple answer, like "the > main use-case for Base64 is the email module, which needs bytes, and > thus it was decided". Or even "because backwards compatibility". > > Thanks to everyone for their constructive comments, and expecially Mark > for digging up the original discussion on the Python-3000 list. I'm > satisfied that the choice made by Python is the right choice, and that > it meets the spirit (if, arguably, not the letter) of the RFC. > > > -- > Steve > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com > ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why does base64 return bytes?
On 15 June 2016 at 13:53, Daniel Holth wrote: > In that case could we just add a base64_text() method somewhere? Who would > like to measure whether it would be a win? "Just adding" a method in the stdlib, means we'd have to support it long term (backward compatibility). So by the time such an experiment determined whether it was worth it, it'd be too late. Finding out whether users/projects typically write such a helper function for themselves would be a better way of getting this information. Personally, I suspect they don't, but facts beat speculation. Of course, "not every one liner needs to be a stdlib function" applies here too. Paul ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why does base64 return bytes?
On Wed, 15 Jun 2016, Greg Ewing wrote: Simon Cross wrote: If we only support one, I would prefer it to be bytes since (bytes -> bytes -> unicode) seems like less overhead and slightly conceptually clearer than (bytes -> unicode -> bytes), Whereas bytes -> unicode, followed if needed by unicode -> bytes, seems conceptually clearer to me. IOW, base64 is conceptually a bytes-to-text transformation, and the usual way to represent text in Python 3 is unicode. And in CPython, do I understand correctly that the output text would be represented using one byte per character? If so, would there be a way of encoding that into UTF-8 that re-used the raw memory that backs the Unicode object? And, therefore, avoids almost all the inefficiency of going via Unicode? If so, this would be a win - proper use of Unicode to represent a text string, combined with instantaneous conversion into a bytes object for the purpose of writing to the OS. Isaac Morland CSCF Web Guru DC 2619, x36650 WWW Software Specialist ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why does base64 return bytes?
On Wed, Jun 15, 2016 at 12:53:15PM +, Daniel Holth wrote:
> In that case could we just add a base64_text() method somewhere? Who would
> like to measure whether it would be a win?
Just call .decode('ascii') on the output of base64.b64encode. Not every
one-liner needs to be a standard function.
--
Steve
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Bug in the DELETE statement in sqlite3 module
Respected Developer(s),
while writing a database module for one of my applications in python I
encountered something interesting. I had a username and password field in
my table and only one entry which was "Admin" and "password". While
debugging I purposefully deleted that record. Then I ran the same statement
again. To my surprise, it got execute. Then I ran the statement to delete
the user "admin" (lowercase 'a') which does not exist in the table.
Surprisingly again is got executed even though the table was empty. What I
expected was an error popping up. But nothing happened. I hope this error
gets fixed soon. The code snippet is given below.
self.cursor.execute(''' DELETE FROM Users WHERE username =
?''',(self.username,))
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why does base64 return bytes?
It would be a codec. base64_text in the codecs module. Probably 1 line different than the existing codec. Very easy to use and maintain. Less surprising and less error prone for everyone who thinks base64 should convert between bytes to text. Sounds like an obvious win to me. On Wed, Jun 15, 2016 at 11:08 AM Isaac Morland wrote: > On Wed, 15 Jun 2016, Greg Ewing wrote: > > > Simon Cross wrote: > >> If we only support one, I would prefer it to be bytes since (bytes -> > >> bytes -> unicode) seems like less overhead and slightly conceptually > >> clearer than (bytes -> unicode -> bytes), > > > > Whereas bytes -> unicode, followed if needed by unicode -> bytes, > > seems conceptually clearer to me. IOW, base64 is conceptually a > > bytes-to-text transformation, and the usual way to represent > > text in Python 3 is unicode. > > And in CPython, do I understand correctly that the output text would be > represented using one byte per character? If so, would there be a way of > encoding that into UTF-8 that re-used the raw memory that backs the > Unicode object? And, therefore, avoids almost all the inefficiency of > going via Unicode? If so, this would be a win - proper use of Unicode to > represent a text string, combined with instantaneous conversion into a > bytes object for the purpose of writing to the OS. > > Isaac Morland CSCF Web Guru > DC 2619, x36650 WWW Software Specialist > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com > ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bug in the DELETE statement in sqlite3 module
This is not a bug, this is correct behavior of any sql database.
2016-06-15 8:40 GMT+02:00 ninostephen mathew :
> Respected Developer(s),
> while writing a database module for one of my applications in python I
> encountered something interesting. I had a username and password field in my
> table and only one entry which was "Admin" and "password". While debugging
> I purposefully deleted that record. Then I ran the same statement again. To
> my surprise, it got execute. Then I ran the statement to delete the user
> "admin" (lowercase 'a') which does not exist in the table. Surprisingly
> again is got executed even though the table was empty. What I expected was
> an error popping up. But nothing happened. I hope this error gets fixed
> soon. The code snippet is given below.
>
> self.cursor.execute(''' DELETE FROM Users WHERE username =
> ?''',(self.username,))
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/duda.piotr%40gmail.com
>
--
闇に隠れた黒い力
弱い心を操る
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bug in the DELETE statement in sqlite3 module
> On 2016-06-15, at 08:40 , ninostephen mathew wrote:
>
> Respected Developer(s),
> while writing a database module for one of my applications in python I
> encountered something interesting. I had a username and password field in my
> table and only one entry which was "Admin" and "password". While debugging I
> purposefully deleted that record. Then I ran the same statement again. To my
> surprise, it got execute. Then I ran the statement to delete the user "admin"
> (lowercase 'a') which does not exist in the table. Surprisingly again is got
> executed even though the table was empty. What I expected was an error
> popping up. But nothing happened. I hope this error gets fixed soon. The
> code snippet is given below.
>
> self.cursor.execute(''' DELETE FROM Users WHERE username =
> ?''',(self.username,))
Despite Python bundling sqlite, the Python mailing list is not
responsible for developing SQLite (only for the SQLite bindings
themselves) so this is the wrong mailing list.
That being said, the DELETE statement deletes whichever records in the
table match the provided predicate. If no record matches the predicate,
it will simply delete no record, that is not an error, it is the exact
expected and documented behaviour for the statement in SQL in general
and SQLite in particular.
See https://www.sqlite.org/lang_delete.html for the documentation of the
DELETE statement in SQLite.
While you should feel free to report your expectations to the SQLite
project or to the JTC1/SC32 technical committee (which is responsible
for SQL itself) I fear that's what you will get told there, and that you
are about 30 years too late to try influence such a core statement of
the language.
Not that it would have worked I'd think, I'm reasonably sure the
behaviour of the DELETE statement is a natural consequence of SQL's set-
theoretic foundations: DELETE applies to a set of records, regardless of
the set's cardinality.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bug in the DELETE statement in sqlite3 module
On 15 June 2016 at 07:40, ninostephen mathew wrote:
> Respected Developer(s),
> while writing a database module for one of my applications in python I
> encountered something interesting. I had a username and password field in my
> table and only one entry which was "Admin" and "password". While debugging
> I purposefully deleted that record. Then I ran the same statement again. To
> my surprise, it got execute. Then I ran the statement to delete the user
> "admin" (lowercase 'a') which does not exist in the table. Surprisingly
> again is got executed even though the table was empty. What I expected was
> an error popping up. But nothing happened. I hope this error gets fixed
> soon. The code snippet is given below.
>
> self.cursor.execute(''' DELETE FROM Users WHERE username =
> ?''',(self.username,))
First of all, this list is for the discussions about the development
of Python itself, not for developing applications with Python. You
should probably be posting to python-list instead.
Having said that, this is how SQL works - a DELETE statement selects
all records matching the WHERE clause and deletes them. If the WHERE
clause doesn't match anything, nothing gets deleted. So your code is
working exactly as I would expect.
Paul
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] proposed os.fspath() change
I would like to make a change to os.fspath(). Specifically, os.fspath() currently raises an exception if something besides str, bytes, or os.PathLike is passed in, but makes no checks if an os.PathLike object returns something besides a str or bytes. I would like to change that to the opposite: if a non-os.PathLike is passed in, return it unchanged (so no change for str and bytes); if an os.PathLike object returns something that is not a str nor bytes, raise. An example of the difference in the lzma file: Current code (has not been upgraded to use os.fspath() yet) --- if isinstance(filename, (str, bytes)): if "b" not in mode: mode += "b" self._fp = builtins.open(filename, mode) self._closefp = True self._mode = mode_code elif hasattr(filename, "read") or hasattr(filename, "write"): self._fp = filename self._mode = mode_code else: raise TypeError( "filename must be a str or bytes object, or a file" ) Code change if using upgraded os.fspath() (placed before above stanza): filename = os.fspath(filename) Code change with current os.fspath() (ditto): if isinstance(filename, os.PathLike): filename = os.fspath(filename) My intention with the os.fspath() function was to minimize boiler-plate code and make PathLike objects easy and painless to support; having to discover if any given parameter is PathLike before calling os.fspath() on it is, IMHO, just the opposite. There is also precedent for having a __dunder__ check the return type: --> class Huh: ... def __int__(self): ... return 'string' ... def __index__(self): ... return b'bytestring' ... def __bool__(self): ... return 'true-ish' ... --> h = Huh() --> int(h) Traceback (most recent call last): File "", line 1, in TypeError: __int__ returned non-int (type str) --> ''[h] Traceback (most recent call last): File "", line 1, in TypeError: __index__ returned non-int (type bytes) --> bool(h) Traceback (most recent call last): File "", line 1, in TypeError: __bool__ should return bool, returned str Arguments in favor or against? -- ~Ethan~ ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bug in the DELETE statement in sqlite3 module
A point of order: it's not necessary to post three separate "this is the
wrong list" replies. In fact the optimal number is probably close to zero
-- I understand we all want to be helpful, and we don't want to send
duplicate replies, but someone who posts an inappropriate question is
likely to try another venue when they receive no replies, and three replies
to the list implies that some folks are a little too eager to appear
helpful (while reading the list with considerable delay). When the OP pings
the thread maybe one person, preferably someone who reads the list directly
via email from the list server, could post a standard "wrong list" response.
On Wed, Jun 15, 2016 at 8:29 AM, Paul Moore wrote:
> On 15 June 2016 at 07:40, ninostephen mathew wrote:
> > Respected Developer(s),
> > while writing a database module for one of my applications in python I
> > encountered something interesting. I had a username and password field
> in my
> > table and only one entry which was "Admin" and "password". While
> debugging
> > I purposefully deleted that record. Then I ran the same statement again.
> To
> > my surprise, it got execute. Then I ran the statement to delete the user
> > "admin" (lowercase 'a') which does not exist in the table. Surprisingly
> > again is got executed even though the table was empty. What I expected
> was
> > an error popping up. But nothing happened. I hope this error gets fixed
> > soon. The code snippet is given below.
> >
> > self.cursor.execute(''' DELETE FROM Users WHERE username =
> > ?''',(self.username,))
>
> First of all, this list is for the discussions about the development
> of Python itself, not for developing applications with Python. You
> should probably be posting to python-list instead.
>
> Having said that, this is how SQL works - a DELETE statement selects
> all records matching the WHERE clause and deletes them. If the WHERE
> clause doesn't match anything, nothing gets deleted. So your code is
> working exactly as I would expect.
>
> Paul
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proposed os.fspath() change
These are really two separate proposals. I'm okay with checking the return value of calling obj.__fspath__; that's an error in the object anyways, and it doesn't matter much whether we do this or not (though when approving the PEP I considered this and decided not to insert a check for this). But it doesn't affect your example, does it? I guess it's easier to raise now and change the API in the future to avoid raising in this case (if we find that raising is undesirable) than the other way around, so I'm +0 on this. The other proposal (passing anything that's not understood right through) is more interesting and your use case is somewhat compelling. Catching the exception coming out of os.fspath() would certainly be much messier. The question remaining is whether, when this behavior is not desired (e.g. when the caller of os.fspath() just wants a string that it can pass to open()), the condition of passing that's neither a string not supports __fspath__ still produces an understandable error. I'm not sure that that's the case. E.g. open() accepts file descriptors in addition to paths, but I'm not sure that accepting an integer is a good idea in most cases -- it either gives a mystery "Bad file descriptor" error or starts reading/writing some random system file, which it then closes once the stream is closed. On Wed, Jun 15, 2016 at 9:12 AM, Ethan Furman wrote: > I would like to make a change to os.fspath(). > > Specifically, os.fspath() currently raises an exception if something > besides str, bytes, or os.PathLike is passed in, but makes no checks > if an os.PathLike object returns something besides a str or bytes. > > I would like to change that to the opposite: if a non-os.PathLike is > passed in, return it unchanged (so no change for str and bytes); if > an os.PathLike object returns something that is not a str nor bytes, > raise. > > An example of the difference in the lzma file: > > Current code (has not been upgraded to use os.fspath() yet) > --- > > if isinstance(filename, (str, bytes)): > if "b" not in mode: > mode += "b" > self._fp = builtins.open(filename, mode) > self._closefp = True > self._mode = mode_code > elif hasattr(filename, "read") or hasattr(filename, "write"): > self._fp = filename > self._mode = mode_code > else: > raise TypeError( > "filename must be a str or bytes object, or a file" > ) > > Code change if using upgraded os.fspath() (placed before above stanza): > > filename = os.fspath(filename) > > Code change with current os.fspath() (ditto): > > if isinstance(filename, os.PathLike): > filename = os.fspath(filename) > > My intention with the os.fspath() function was to minimize boiler-plate > code and make PathLike objects easy and painless to support; having to > discover if any given parameter is PathLike before calling os.fspath() > on it is, IMHO, just the opposite. > > There is also precedent for having a __dunder__ check the return type: > > --> class Huh: > ... def __int__(self): > ... return 'string' > ... def __index__(self): > ... return b'bytestring' > ... def __bool__(self): > ... return 'true-ish' > ... > --> h = Huh() > > --> int(h) > Traceback (most recent call last): > File "", line 1, in > TypeError: __int__ returned non-int (type str) > > --> ''[h] > Traceback (most recent call last): > File "", line 1, in > TypeError: __index__ returned non-int (type bytes) > > --> bool(h) > Traceback (most recent call last): > File "", line 1, in > TypeError: __bool__ should return bool, returned str > > Arguments in favor or against? > > -- > ~Ethan~ > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bug in the DELETE statement in sqlite3 module
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/15/2016 12:33 PM, Guido van Rossum wrote: > A point of order: it's not necessary to post three separate "this is > the wrong list" replies. In fact the optimal number is probably close > to zero -- I understand we all want to be helpful, and we don't want > to send duplicate replies, but someone who posts an inappropriate > question is likely to try another venue when they receive no replies, > and three replies to the list implies that some folks are a little too > eager to appear helpful (while reading the list with considerable > delay). When the OP pings the thread maybe one person, preferably > someone who reads the list directly via email from the list server, > could post a standard "wrong list" response. In addition, please don't undermine the "this is the wrong list" message by responding substantively to the OP's query. Tres. - -- === Tres Seaver +1 540-429-0999 [email protected] Palladion Software "Excellence by Design"http://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQIcBAEBAgAGBQJXYYc/AAoJEPKpaDSJE9HYlSgP/1v+FpEvildmH4fEpZXG+j18 jCt3Q48ffSW22oPhx4lyfZv1Sh3EOsEuHHd3oU7jG9kUtTPyluQQYJiygfCBpSev CP8LonjJxxkFsVwK5SRGcp7JdjiFbLyqUXbtkFM6s2OE7mpXwtbn4suCRJx7MYaO CUkN2h0vAandftV4xu+lp/r7n0l8HLTTOsrUFuPZRbT4dVzKwRcM+ER1W4tCnkgZ bFRXM8YjrUcX/Um2blSi4yZT75TvHjyi44ujbQPsR3OHCPN8GAfAzIVSkbiECP2K xAqT2/h0E6VkGdEymELCMRHvhCI2wFrAoA6nWYCdyR2Ekg7VB/tnr6AGi+SNvP06 BETMf0BRxpd4sXOvS4+ydhBQQpydW4hiw61RHs8xFiy0W7pqp5Zh4ZHHcZBR2KRT TXfoxrwQIBIWKlyBdgv9d0maOWg3uq3I3MqO2vnGj/XRPsjs/BWCX9BYZqpnEATB MasQItCMPoOfmVxlS+cS7rIXXVFdwulm2s5GRZR9PwEuMS8Vmi9A5UyEpshlDYZM ZMPT3CScFOyczVgC3N+LyO7rYaJMlcNQD/HxxQDvpXoYinxQAFo4eVE2+490XN8j Od8n3UIo72+rFyyFJ8A7iBORYF9UD44VrFHQRHROTEvv7dV1OTYSVZcdqBb4Ik6S 8Wl+qMIEm8VcuFKI4b/T =4IaO -END PGP SIGNATURE- ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proposed os.fspath() change
On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote: > These are really two separate proposals. > > I'm okay with checking the return value of calling obj.__fspath__; that's > an error in the object anyways, and it doesn't matter much whether we do > this or not (though when approving the PEP I considered this and decided > not to insert a check for this). But it doesn't affect your example, does > it? I guess it's easier to raise now and change the API in the future to > avoid raising in this case (if we find that raising is undesirable) than > the other way around, so I'm +0 on this. > +0 from me as well. I know in some code in the stdlib that has been ported which prior to adding support was explicitly checking for str/bytes this will eliminate its own checking (obviously not a motivating factor as it's pretty minor). > > The other proposal (passing anything that's not understood right through) > is more interesting and your use case is somewhat compelling. Catching the > exception coming out of os.fspath() would certainly be much messier. The > question remaining is whether, when this behavior is not desired (e.g. when > the caller of os.fspath() just wants a string that it can pass to open()), > the condition of passing that's neither a string not supports __fspath__ > still produces an understandable error. I'm not sure that that's the case. > E.g. open() accepts file descriptors in addition to paths, but I'm not sure > that accepting an integer is a good idea in most cases -- it either gives a > mystery "Bad file descriptor" error or starts reading/writing some random > system file, which it then closes once the stream is closed. > The FD issue of magically passing through an int was also a concern when Ethan brought this up in an issue on the tracker. My argument is that FDs are not file paths and so shouldn't magically pass through if we're going to type-check anything or claim os.fspath() only works with paths (FDs are already open file objects). So in my view either we go ahead and type-check the return value of __fspath__() and thus restrict everything coming out of os.fspath() to Union[str, bytes] or we don't type check anything and be consistent that os.fspath() simply does is call __fspath__() if present. And just because I'm thinking about it, I would special-case the FDs, not os.PathLike (clearer why you care and faster as it skips the override of __subclasshook__): # Can be a single-line ternary operator if preferred. if not isinstance(filename, int): filename = os.fspath(filename) > On Wed, Jun 15, 2016 at 9:12 AM, Ethan Furman wrote: > >> I would like to make a change to os.fspath(). >> >> Specifically, os.fspath() currently raises an exception if something >> besides str, bytes, or os.PathLike is passed in, but makes no checks >> if an os.PathLike object returns something besides a str or bytes. >> >> I would like to change that to the opposite: if a non-os.PathLike is >> passed in, return it unchanged (so no change for str and bytes); if >> an os.PathLike object returns something that is not a str nor bytes, >> raise. >> >> An example of the difference in the lzma file: >> >> Current code (has not been upgraded to use os.fspath() yet) >> --- >> >> if isinstance(filename, (str, bytes)): >> if "b" not in mode: >> mode += "b" >> self._fp = builtins.open(filename, mode) >> self._closefp = True >> self._mode = mode_code >> elif hasattr(filename, "read") or hasattr(filename, "write"): >> self._fp = filename >> self._mode = mode_code >> else: >> raise TypeError( >> "filename must be a str or bytes object, or a file" >> ) >> >> Code change if using upgraded os.fspath() (placed before above stanza): >> >> filename = os.fspath(filename) >> >> Code change with current os.fspath() (ditto): >> >> if isinstance(filename, os.PathLike): >> filename = os.fspath(filename) >> >> My intention with the os.fspath() function was to minimize boiler-plate >> code and make PathLike objects easy and painless to support; having to >> discover if any given parameter is PathLike before calling os.fspath() >> on it is, IMHO, just the opposite. >> >> There is also precedent for having a __dunder__ check the return type: >> >> --> class Huh: >> ... def __int__(self): >> ... return 'string' >> ... def __index__(self): >> ... return b'bytestring' >> ... def __bool__(self): >> ... return 'true-ish' >> ... >> --> h = Huh() >> >> --> int(h) >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: __int__ returned non-int (type str) >> >> --> ''[h] >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: __index__ returned non-int (type bytes) >> >> --> bool(h) >> Traceback (most recent call last)
Re: [Python-Dev] proposed os.fspath() change
My proposal at the point of the first PEP draft solved both of these issues. That version of the fspath function passed anything right through that was an instance of the keyword-only `type_constraint`. If not, it would ask __fspath__, and before returning the result, it would check that __fspath__ returned an instance of `type_constraint` and otherwise raise a TypeError. `type_constraint=object` would then have given the behavior you want. I always wanted fspath to spare the caller from all the instance checking (most of which it does even now). The main problem with setting type_constraint to something broader than (str, bytes) is that then that parameter would affect the return type of the function, which would at least complicate the type hinting issue. Mypy might now support things like @overload def fspath(path: T, type_constraint: Type[T] = (str, bytes)) -> T: ... but then again, isinstance and Union are not compatible (for a reason?), and PEP484 for a reason does not allow tuples like (str, bytes) in place of Unions. Anyway, if we were to go back to this behavior, we would need to decide whether to officially allow a wider type constraint or whether to leave that to Stack Overflow, so to speak. -- Koos On Wed, Jun 15, 2016 at 7:46 PM, Guido van Rossum wrote: > These are really two separate proposals. > > I'm okay with checking the return value of calling obj.__fspath__; that's an > error in the object anyways, and it doesn't matter much whether we do this > or not (though when approving the PEP I considered this and decided not to > insert a check for this). But it doesn't affect your example, does it? I > guess it's easier to raise now and change the API in the future to avoid > raising in this case (if we find that raising is undesirable) than the other > way around, so I'm +0 on this. > > The other proposal (passing anything that's not understood right through) is > more interesting and your use case is somewhat compelling. Catching the > exception coming out of os.fspath() would certainly be much messier. The > question remaining is whether, when this behavior is not desired (e.g. when > the caller of os.fspath() just wants a string that it can pass to open()), > the condition of passing that's neither a string not supports __fspath__ > still produces an understandable error. I'm not sure that that's the case. > E.g. open() accepts file descriptors in addition to paths, but I'm not sure > that accepting an integer is a good idea in most cases -- it either gives a > mystery "Bad file descriptor" error or starts reading/writing some random > system file, which it then closes once the stream is closed. > > On Wed, Jun 15, 2016 at 9:12 AM, Ethan Furman wrote: >> >> I would like to make a change to os.fspath(). >> >> Specifically, os.fspath() currently raises an exception if something >> besides str, bytes, or os.PathLike is passed in, but makes no checks >> if an os.PathLike object returns something besides a str or bytes. >> >> I would like to change that to the opposite: if a non-os.PathLike is >> passed in, return it unchanged (so no change for str and bytes); if >> an os.PathLike object returns something that is not a str nor bytes, >> raise. >> >> An example of the difference in the lzma file: >> >> Current code (has not been upgraded to use os.fspath() yet) >> --- >> >> if isinstance(filename, (str, bytes)): >> if "b" not in mode: >> mode += "b" >> self._fp = builtins.open(filename, mode) >> self._closefp = True >> self._mode = mode_code >> elif hasattr(filename, "read") or hasattr(filename, "write"): >> self._fp = filename >> self._mode = mode_code >> else: >> raise TypeError( >> "filename must be a str or bytes object, or a file" >> ) >> >> Code change if using upgraded os.fspath() (placed before above stanza): >> >> filename = os.fspath(filename) >> >> Code change with current os.fspath() (ditto): >> >> if isinstance(filename, os.PathLike): >> filename = os.fspath(filename) >> >> My intention with the os.fspath() function was to minimize boiler-plate >> code and make PathLike objects easy and painless to support; having to >> discover if any given parameter is PathLike before calling os.fspath() >> on it is, IMHO, just the opposite. >> >> There is also precedent for having a __dunder__ check the return type: >> >> --> class Huh: >> ... def __int__(self): >> ... return 'string' >> ... def __index__(self): >> ... return b'bytestring' >> ... def __bool__(self): >> ... return 'true-ish' >> ... >> --> h = Huh() >> >> --> int(h) >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: __int__ returned non-int (type str) >> >> --> ''[h] >> Traceback (most recent call last): >> File "", line 1, in >> Typ
Re: [Python-Dev] proposed os.fspath() change
On 15 June 2016 at 10:59, Brett Cannon wrote: > > > On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote: >> >> These are really two separate proposals. >> >> I'm okay with checking the return value of calling obj.__fspath__; that's >> an error in the object anyways, and it doesn't matter much whether we do >> this or not (though when approving the PEP I considered this and decided not >> to insert a check for this). But it doesn't affect your example, does it? I >> guess it's easier to raise now and change the API in the future to avoid >> raising in this case (if we find that raising is undesirable) than the other >> way around, so I'm +0 on this. > > +0 from me as well. I know in some code in the stdlib that has been ported > which prior to adding support was explicitly checking for str/bytes this > will eliminate its own checking (obviously not a motivating factor as it's > pretty minor). I'd like a strong assertion that the return value of os.fspath() is a plausible filesystem path representation (so either bytes or str), and *not* some other kind of object that can also be used for accessing the filesystem (like a file descriptor or an IO stream) >> The other proposal (passing anything that's not understood right through) >> is more interesting and your use case is somewhat compelling. Catching the >> exception coming out of os.fspath() would certainly be much messier. The >> question remaining is whether, when this behavior is not desired (e.g. when >> the caller of os.fspath() just wants a string that it can pass to open()), >> the condition of passing that's neither a string not supports __fspath__ >> still produces an understandable error. I'm not sure that that's the case. >> E.g. open() accepts file descriptors in addition to paths, but I'm not sure >> that accepting an integer is a good idea in most cases -- it either gives a >> mystery "Bad file descriptor" error or starts reading/writing some random >> system file, which it then closes once the stream is closed. > > The FD issue of magically passing through an int was also a concern when > Ethan brought this up in an issue on the tracker. My argument is that FDs > are not file paths and so shouldn't magically pass through if we're going to > type-check anything or claim os.fspath() only works with paths (FDs are > already open file objects). So in my view either we go ahead and type-check > the return value of __fspath__() and thus restrict everything coming out of > os.fspath() to Union[str, bytes] or we don't type check anything and be > consistent that os.fspath() simply does is call __fspath__() if present. > > And just because I'm thinking about it, I would special-case the FDs, not > os.PathLike (clearer why you care and faster as it skips the override of > __subclasshook__): > > # Can be a single-line ternary operator if preferred. > if not isinstance(filename, int): > filename = os.fspath(filename) Note that the LZMA case Ethan cites is one where the code accepts either an already opened file-like object *or* a path-like object, and does different things based on which it receives. In that scenario, rather than introducing an unconditional "filename = os.fspath(filename)" before the current logic, it makes more sense to me to change the current logic to use the new protocol check rather than a strict typecheck on str/bytes: if isinstance(filename, os.PathLike): # Changed line filename = os.fspath(filename)# New line if "b" not in mode: mode += "b" self._fp = builtins.open(filename, mode) self._closefp = True self._mode = mode_code elif hasattr(filename, "read") or hasattr(filename, "write"): self._fp = filename self._mode = mode_code else: raise TypeError( "filename must be a path-like or file-like object" ) I *don't* think it makes sense to weaken the guarantees on os.fspath to let it propagate non-path-like objects. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proposed os.fspath() change
On Wed, 15 Jun 2016 at 11:39 Guido van Rossum wrote: > OK, so let's add a check on the return of __fspath__() and keep the check > on path-like or string/bytes. > I'll update the PEP. Ethan, do you want to leave a note on the os.fspath() issue to update the code and go through where we've used os.fspath() to see where we can cut out redundant type checks? > --Guido (mobile) > On Jun 15, 2016 11:29 AM, "Nick Coghlan" wrote: > >> On 15 June 2016 at 10:59, Brett Cannon wrote: >> > >> > >> > On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote: >> >> >> >> These are really two separate proposals. >> >> >> >> I'm okay with checking the return value of calling obj.__fspath__; >> that's >> >> an error in the object anyways, and it doesn't matter much whether we >> do >> >> this or not (though when approving the PEP I considered this and >> decided not >> >> to insert a check for this). But it doesn't affect your example, does >> it? I >> >> guess it's easier to raise now and change the API in the future to >> avoid >> >> raising in this case (if we find that raising is undesirable) than the >> other >> >> way around, so I'm +0 on this. >> > >> > +0 from me as well. I know in some code in the stdlib that has been >> ported >> > which prior to adding support was explicitly checking for str/bytes this >> > will eliminate its own checking (obviously not a motivating factor as >> it's >> > pretty minor). >> >> I'd like a strong assertion that the return value of os.fspath() is a >> plausible filesystem path representation (so either bytes or str), and >> *not* some other kind of object that can also be used for accessing >> the filesystem (like a file descriptor or an IO stream) >> >> >> The other proposal (passing anything that's not understood right >> through) >> >> is more interesting and your use case is somewhat compelling. Catching >> the >> >> exception coming out of os.fspath() would certainly be much messier. >> The >> >> question remaining is whether, when this behavior is not desired (e.g. >> when >> >> the caller of os.fspath() just wants a string that it can pass to >> open()), >> >> the condition of passing that's neither a string not supports >> __fspath__ >> >> still produces an understandable error. I'm not sure that that's the >> case. >> >> E.g. open() accepts file descriptors in addition to paths, but I'm not >> sure >> >> that accepting an integer is a good idea in most cases -- it either >> gives a >> >> mystery "Bad file descriptor" error or starts reading/writing some >> random >> >> system file, which it then closes once the stream is closed. >> > >> > The FD issue of magically passing through an int was also a concern when >> > Ethan brought this up in an issue on the tracker. My argument is that >> FDs >> > are not file paths and so shouldn't magically pass through if we're >> going to >> > type-check anything or claim os.fspath() only works with paths (FDs are >> > already open file objects). So in my view either we go ahead and >> type-check >> > the return value of __fspath__() and thus restrict everything coming >> out of >> > os.fspath() to Union[str, bytes] or we don't type check anything and be >> > consistent that os.fspath() simply does is call __fspath__() if present. >> > >> > And just because I'm thinking about it, I would special-case the FDs, >> not >> > os.PathLike (clearer why you care and faster as it skips the override of >> > __subclasshook__): >> > >> > # Can be a single-line ternary operator if preferred. >> > if not isinstance(filename, int): >> > filename = os.fspath(filename) >> >> Note that the LZMA case Ethan cites is one where the code accepts >> either an already opened file-like object *or* a path-like object, and >> does different things based on which it receives. >> >> In that scenario, rather than introducing an unconditional "filename = >> os.fspath(filename)" before the current logic, it makes more sense to >> me to change the current logic to use the new protocol check rather >> than a strict typecheck on str/bytes: >> >> if isinstance(filename, os.PathLike): # Changed line >> filename = os.fspath(filename)# New line >> if "b" not in mode: >> mode += "b" >> self._fp = builtins.open(filename, mode) >> self._closefp = True >> self._mode = mode_code >> elif hasattr(filename, "read") or hasattr(filename, "write"): >> self._fp = filename >> self._mode = mode_code >> else: >> raise TypeError( >> "filename must be a path-like or file-like object" >> ) >> >> I *don't* think it makes sense to weaken the guarantees on os.fspath >> to let it propagate non-path-like objects. >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | [email protected] | Brisbane, Australia >> > ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: ht
Re: [Python-Dev] proposed os.fspath() change
OK, so let's add a check on the return of __fspath__() and keep the check on path-like or string/bytes. --Guido (mobile) On Jun 15, 2016 11:29 AM, "Nick Coghlan" wrote: > On 15 June 2016 at 10:59, Brett Cannon wrote: > > > > > > On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote: > >> > >> These are really two separate proposals. > >> > >> I'm okay with checking the return value of calling obj.__fspath__; > that's > >> an error in the object anyways, and it doesn't matter much whether we do > >> this or not (though when approving the PEP I considered this and > decided not > >> to insert a check for this). But it doesn't affect your example, does > it? I > >> guess it's easier to raise now and change the API in the future to avoid > >> raising in this case (if we find that raising is undesirable) than the > other > >> way around, so I'm +0 on this. > > > > +0 from me as well. I know in some code in the stdlib that has been > ported > > which prior to adding support was explicitly checking for str/bytes this > > will eliminate its own checking (obviously not a motivating factor as > it's > > pretty minor). > > I'd like a strong assertion that the return value of os.fspath() is a > plausible filesystem path representation (so either bytes or str), and > *not* some other kind of object that can also be used for accessing > the filesystem (like a file descriptor or an IO stream) > > >> The other proposal (passing anything that's not understood right > through) > >> is more interesting and your use case is somewhat compelling. Catching > the > >> exception coming out of os.fspath() would certainly be much messier. The > >> question remaining is whether, when this behavior is not desired (e.g. > when > >> the caller of os.fspath() just wants a string that it can pass to > open()), > >> the condition of passing that's neither a string not supports __fspath__ > >> still produces an understandable error. I'm not sure that that's the > case. > >> E.g. open() accepts file descriptors in addition to paths, but I'm not > sure > >> that accepting an integer is a good idea in most cases -- it either > gives a > >> mystery "Bad file descriptor" error or starts reading/writing some > random > >> system file, which it then closes once the stream is closed. > > > > The FD issue of magically passing through an int was also a concern when > > Ethan brought this up in an issue on the tracker. My argument is that FDs > > are not file paths and so shouldn't magically pass through if we're > going to > > type-check anything or claim os.fspath() only works with paths (FDs are > > already open file objects). So in my view either we go ahead and > type-check > > the return value of __fspath__() and thus restrict everything coming out > of > > os.fspath() to Union[str, bytes] or we don't type check anything and be > > consistent that os.fspath() simply does is call __fspath__() if present. > > > > And just because I'm thinking about it, I would special-case the FDs, > not > > os.PathLike (clearer why you care and faster as it skips the override of > > __subclasshook__): > > > > # Can be a single-line ternary operator if preferred. > > if not isinstance(filename, int): > > filename = os.fspath(filename) > > Note that the LZMA case Ethan cites is one where the code accepts > either an already opened file-like object *or* a path-like object, and > does different things based on which it receives. > > In that scenario, rather than introducing an unconditional "filename = > os.fspath(filename)" before the current logic, it makes more sense to > me to change the current logic to use the new protocol check rather > than a strict typecheck on str/bytes: > > if isinstance(filename, os.PathLike): # Changed line > filename = os.fspath(filename)# New line > if "b" not in mode: > mode += "b" > self._fp = builtins.open(filename, mode) > self._closefp = True > self._mode = mode_code > elif hasattr(filename, "read") or hasattr(filename, "write"): > self._fp = filename > self._mode = mode_code > else: > raise TypeError( > "filename must be a path-like or file-like object" > ) > > I *don't* think it makes sense to weaken the guarantees on os.fspath > to let it propagate non-path-like objects. > > Cheers, > Nick. > > -- > Nick Coghlan | [email protected] | Brisbane, Australia > ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proposed os.fspath() change
On 06/15/2016 10:59 AM, Brett Cannon wrote: On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote: These are really two separate proposals. I'm okay with checking the return value of calling obj.__fspath__; that's an error in the object anyways, and it doesn't matter much whether we do this or not (though when approving the PEP I considered this and decided not to insert a check for this). But it doesn't affect your example, does it? I guess it's easier to raise now and change the API in the future to avoid raising in this case (if we find that raising is undesirable) than the other way around, so I'm +0 on this. +0 from me as well. I know in some code in the stdlib that has been ported which prior to adding support was explicitly checking for str/bytes this will eliminate its own checking (obviously not a motivating factor as it's pretty minor). If we accept both parts of this proposal the checking will have to stay in place as the original argument may not have been bytes, str, nor os.PathLike. The other proposal (passing anything that's not understood right through) is more interesting and your use case is somewhat compelling. Catching the exception coming out of os.fspath() would certainly be much messier. The question remaining is whether, when this behavior is not desired (e.g. when the caller of os.fspath() just wants a string that it can pass to open()), the condition of passing that's neither a string not supports __fspath__ still produces an understandable error. This is no different than before os.fspath() existed -- if the function wasn't checking that the "filename" was a str but just used it as-is, then whatever strange, possibly-hard-to-debug error they would get now is the same as what they would have gotten before. I'm not sure that that's the case. E.g. open() accepts file descriptors in addition to paths, but I'm not sure that accepting an integer is a good idea in most cases -- it either gives a mystery "Bad file descriptor" error or starts reading/writing some random system file, which it then closes once the stream is closed. My vision of os.fspath() is simply to reduce rich-path objects to their component str or bytes representation, and pass anything else through. The advantage: - if os.open accepts str/bytes/fd it can prep the argument by calling os.fspath() and then do it's argument checking all in one place; - if lzma accepts bytes/str/filelike-obj it can prep its argument by calling os.fspath() and then do it's argument checking all in one place - if Path accepts str/os.PathLike it can prep it's argument(s) with os.fspath() and then do its argument checking all in one place. The FD issue of magically passing through an int was also a concern when Ethan brought this up in an issue on the tracker. My argument is that FDs are not file paths and so shouldn't magically pass through if we're going to type-check anything or claim os.fspath() only works with paths (FDs are already open file objects). So in my view either we go ahead and type-check the return value of __fspath__() and thus restrict everything coming out of os.fspath() to Union[str, bytes] or we don't type check anything and be consistent that os.fspath() simply does is call __fspath__() if present. This is better than what os.fspath() currently does as it has all the advantages listed above, but why is checking the output of __fspath__ incompatible with not checking anything else? And just because I'm thinking about it, I would special-case the FDs, not os.PathLike (clearer why you care and faster as it skips the override of __subclasshook__): # Can be a single-line ternary operator if preferred. if not isinstance(filename, int): filename = os.fspath(filename) That example will not do the right thing in the lzma case. -- ~Ethan~ ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proposed os.fspath() change
On Wed, Jun 15, 2016 at 9:29 PM, Nick Coghlan wrote: > On 15 June 2016 at 10:59, Brett Cannon wrote: >> >> >> On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote: >>> >>> These are really two separate proposals. >>> >>> I'm okay with checking the return value of calling obj.__fspath__; that's >>> an error in the object anyways, and it doesn't matter much whether we do >>> this or not (though when approving the PEP I considered this and decided not >>> to insert a check for this). But it doesn't affect your example, does it? I >>> guess it's easier to raise now and change the API in the future to avoid >>> raising in this case (if we find that raising is undesirable) than the other >>> way around, so I'm +0 on this. >> >> +0 from me as well. I know in some code in the stdlib that has been ported >> which prior to adding support was explicitly checking for str/bytes this >> will eliminate its own checking (obviously not a motivating factor as it's >> pretty minor). > > I'd like a strong assertion that the return value of os.fspath() is a > plausible filesystem path representation (so either bytes or str), and > *not* some other kind of object that can also be used for accessing > the filesystem (like a file descriptor or an IO stream) I agree, so I'm -0.5 on passing through any object (at least by default). >>> The other proposal (passing anything that's not understood right through) >>> is more interesting and your use case is somewhat compelling. Catching the >>> exception coming out of os.fspath() would certainly be much messier. The >>> question remaining is whether, when this behavior is not desired (e.g. when >>> the caller of os.fspath() just wants a string that it can pass to open()), >>> the condition of passing that's neither a string not supports __fspath__ >>> still produces an understandable error. I'm not sure that that's the case. >>> E.g. open() accepts file descriptors in addition to paths, but I'm not sure >>> that accepting an integer is a good idea in most cases -- it either gives a >>> mystery "Bad file descriptor" error or starts reading/writing some random >>> system file, which it then closes once the stream is closed. >> >> The FD issue of magically passing through an int was also a concern when >> Ethan brought this up in an issue on the tracker. My argument is that FDs >> are not file paths and so shouldn't magically pass through if we're going to >> type-check anything or claim os.fspath() only works with paths (FDs are >> already open file objects). So in my view either we go ahead and type-check >> the return value of __fspath__() and thus restrict everything coming out of >> os.fspath() to Union[str, bytes] or we don't type check anything and be >> consistent that os.fspath() simply does is call __fspath__() if present. >> >> And just because I'm thinking about it, I would special-case the FDs, not >> os.PathLike (clearer why you care and faster as it skips the override of >> __subclasshook__): >> >> # Can be a single-line ternary operator if preferred. >> if not isinstance(filename, int): >> filename = os.fspath(filename) > > Note that the LZMA case Ethan cites is one where the code accepts > either an already opened file-like object *or* a path-like object, and > does different things based on which it receives. > > In that scenario, rather than introducing an unconditional "filename = > os.fspath(filename)" before the current logic, it makes more sense to > me to change the current logic to use the new protocol check rather > than a strict typecheck on str/bytes: > > if isinstance(filename, os.PathLike): # Changed line > filename = os.fspath(filename)# New line You are making one of my earlier points here, thanks ;). The point is that the name PathLike sounds like it would mean anything path-like, except that os.PathLike does not include str and bytes. And I still think the naming should be a little different. So that would be (os.Pathlike, str, bytes) instead of just os.PathLike. > if "b" not in mode: > mode += "b" > self._fp = builtins.open(filename, mode) > self._closefp = True > self._mode = mode_code > elif hasattr(filename, "read") or hasattr(filename, "write"): > self._fp = filename > self._mode = mode_code > else: > raise TypeError( > "filename must be a path-like or file-like object" > ) > > I *don't* think it makes sense to weaken the guarantees on os.fspath > to let it propagate non-path-like objects. > > Cheers, > Nick. > > -- > Nick Coghlan | [email protected] | Brisbane, Australia > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com -- + Koos Zevenhoven + http://twitter.com/k7hoven + ___ Pyt
Re: [Python-Dev] Smoothing the transition from Python 2 to 3
On 10 June 2016 at 16:36, Neil Schemenauer wrote: > Nick Coghlan wrote: >> It could be very interesting to add an "ascii-warn" codec to Python >> 2.7, and then set that as the default encoding when the -3 flag is >> set. > > I don't think that can work. The library code in Python would spew > out warnings even in the cases when nothing is wrong with the > application code. I think warnings have to be added to a Python > where str and bytes have been properly separated. Without extreme > backporting efforts, that means 3.x. > > We don't want to saddle 3.x with a bunch of backwards compatibility > cruft. Maybe some of my runtime warning changes could be merged > using a command line flag to enable them. It would be nice to have > the stepping stone version just be normal 3.x with a command line > option. However, for the sanity of people maintaining 3.x, I think > perhaps we don't want to do it. Right, my initial negative reactions were mainly to the idea of having these kinds of capabilities in the mainline 3.x codebase (where we'd then have to support them for everyone, not just the folks that genuinely need them to help in migration from Python 2). The standard porting instructions currently assume code bases that are *mostly* bytes/unicode clean, with perhaps a few oversights where Python 3 rejects ambiguity that Python 2 tolerates. In that context, "run your test suite, address the test failures" should generally be sufficient, without needing to use a custom Python build. However, there are a couple of cases those standard instructions still don't cover: - if there's no test suite, exploratory discovery is problematic when the app falls over at the first type ambiguity - even if there is a test suite, sufficiently pervasive type ambiguity may make it difficult to use for fault isolation That's where I now agree your proposal for a variant build specifically aimed at compatibility testing is potentially interesting: - the tool would become an escalation path for folks that aren't in a position to use their own test suite to isolate type ambiguity problems under Python 3 - using Python 3 as a basis means you get a clean standard library that shouldn't emit any false alarms - the necessary feature set is defined by the common subset of Python 2.7 and a chosen minimum Python 3 version, not any future 3.x release, so you should be able to maintain the changes as a stable patch set without needing to chase CPython trunk (with the attendant risk of merge conflicts) Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proposed os.fspath() change
On 06/15/2016 11:44 AM, Brett Cannon wrote: On Wed, 15 Jun 2016 at 11:39 Guido van Rossum wrote: OK, so let's add a check on the return of __fspath__() and keep the check on path-like or string/bytes. I'll update the PEP. Ethan, do you want to leave a note on the os.fspath() issue to update the code and go through where we've used os.fspath() to see where we can cut out redundant type checks? Will do. I didn't see this subthread before my last post, so unless you agree with those other changes feel free to ignore it. ;) -- ~Ethan~ ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proposed os.fspath() change
>> if isinstance(filename, os.PathLike): By the way, regarding the line of code above, is there a convention regarding whether implementing some protocol/interface requires registering with (or inheriting from) the appropriate ABC for it to work in all situations. IOW, in this case, is it sufficient to implement __fspath__ to make your type pathlike? Is there a conscious trend towards requiring the ABC? -- Koos ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proposed os.fspath() change
On Wed, 15 Jun 2016 at 12:12 Koos Zevenhoven wrote: > >> if isinstance(filename, os.PathLike): > > By the way, regarding the line of code above, is there a convention > regarding whether implementing some protocol/interface requires > registering with (or inheriting from) the appropriate ABC for it to > work in all situations. IOW, in this case, is it sufficient to > implement __fspath__ to make your type pathlike? Is there a conscious > trend towards requiring the ABC? > ABCs like os.PathLike can override __subclasshook__ so that registration isn't required (see https://hg.python.org/cpython/file/default/Lib/os.py#l1136). So registration is definitely good to do to be explicit that you're trying to meet an ABC, but it isn't strictly required. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proposed os.fspath() change
On 06/15/2016 12:10 PM, Koos Zevenhoven wrote: if isinstance(filename, os.PathLike): By the way, regarding the line of code above, is there a convention regarding whether implementing some protocol/interface requires registering with (or inheriting from) the appropriate ABC for it to work in all situations. IOW, in this case, is it sufficient to implement __fspath__ to make your type pathlike? Is there a conscious trend towards requiring the ABC? The ABC is not required, simply having the __fspath__ attribute is enough. Of course, to actually work that attribute should be a function that returns a str or bytes object. ;) -- ~Ethan~ ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proposed os.fspath() change
On Wed, Jun 15, 2016 at 10:15 PM, Brett Cannon wrote: > > > On Wed, 15 Jun 2016 at 12:12 Koos Zevenhoven wrote: >> >> >> if isinstance(filename, os.PathLike): >> >> By the way, regarding the line of code above, is there a convention >> regarding whether implementing some protocol/interface requires >> registering with (or inheriting from) the appropriate ABC for it to >> work in all situations. IOW, in this case, is it sufficient to >> implement __fspath__ to make your type pathlike? Is there a conscious >> trend towards requiring the ABC? > > > ABCs like os.PathLike can override __subclasshook__ so that registration > isn't required (see > https://hg.python.org/cpython/file/default/Lib/os.py#l1136). So registration > is definitely good to do to be explicit that you're trying to meet an ABC, > but it isn't strictly required. Ok I suppose that's fine, so I propose we update the ABC part in the PEP with __subclasshook__. And the other question could be turned into whether to make str and bytes also PathLike in __subclasshook__. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proposed os.fspath() change
PEP 519 updated: https://hg.python.org/peps/rev/92feff129ee4 On Wed, 15 Jun 2016 at 11:44 Brett Cannon wrote: > On Wed, 15 Jun 2016 at 11:39 Guido van Rossum wrote: > >> OK, so let's add a check on the return of __fspath__() and keep the check >> on path-like or string/bytes. >> > > I'll update the PEP. > > Ethan, do you want to leave a note on the os.fspath() issue to update the > code and go through where we've used os.fspath() to see where we can cut > out redundant type checks? > > >> --Guido (mobile) >> On Jun 15, 2016 11:29 AM, "Nick Coghlan" wrote: >> >>> On 15 June 2016 at 10:59, Brett Cannon wrote: >>> > >>> > >>> > On Wed, 15 Jun 2016 at 09:48 Guido van Rossum >>> wrote: >>> >> >>> >> These are really two separate proposals. >>> >> >>> >> I'm okay with checking the return value of calling obj.__fspath__; >>> that's >>> >> an error in the object anyways, and it doesn't matter much whether we >>> do >>> >> this or not (though when approving the PEP I considered this and >>> decided not >>> >> to insert a check for this). But it doesn't affect your example, does >>> it? I >>> >> guess it's easier to raise now and change the API in the future to >>> avoid >>> >> raising in this case (if we find that raising is undesirable) than >>> the other >>> >> way around, so I'm +0 on this. >>> > >>> > +0 from me as well. I know in some code in the stdlib that has been >>> ported >>> > which prior to adding support was explicitly checking for str/bytes >>> this >>> > will eliminate its own checking (obviously not a motivating factor as >>> it's >>> > pretty minor). >>> >>> I'd like a strong assertion that the return value of os.fspath() is a >>> plausible filesystem path representation (so either bytes or str), and >>> *not* some other kind of object that can also be used for accessing >>> the filesystem (like a file descriptor or an IO stream) >>> >>> >> The other proposal (passing anything that's not understood right >>> through) >>> >> is more interesting and your use case is somewhat compelling. >>> Catching the >>> >> exception coming out of os.fspath() would certainly be much messier. >>> The >>> >> question remaining is whether, when this behavior is not desired >>> (e.g. when >>> >> the caller of os.fspath() just wants a string that it can pass to >>> open()), >>> >> the condition of passing that's neither a string not supports >>> __fspath__ >>> >> still produces an understandable error. I'm not sure that that's the >>> case. >>> >> E.g. open() accepts file descriptors in addition to paths, but I'm >>> not sure >>> >> that accepting an integer is a good idea in most cases -- it either >>> gives a >>> >> mystery "Bad file descriptor" error or starts reading/writing some >>> random >>> >> system file, which it then closes once the stream is closed. >>> > >>> > The FD issue of magically passing through an int was also a concern >>> when >>> > Ethan brought this up in an issue on the tracker. My argument is that >>> FDs >>> > are not file paths and so shouldn't magically pass through if we're >>> going to >>> > type-check anything or claim os.fspath() only works with paths (FDs are >>> > already open file objects). So in my view either we go ahead and >>> type-check >>> > the return value of __fspath__() and thus restrict everything coming >>> out of >>> > os.fspath() to Union[str, bytes] or we don't type check anything and be >>> > consistent that os.fspath() simply does is call __fspath__() if >>> present. >>> > >>> > And just because I'm thinking about it, I would special-case the FDs, >>> not >>> > os.PathLike (clearer why you care and faster as it skips the override >>> of >>> > __subclasshook__): >>> > >>> > # Can be a single-line ternary operator if preferred. >>> > if not isinstance(filename, int): >>> > filename = os.fspath(filename) >>> >>> Note that the LZMA case Ethan cites is one where the code accepts >>> either an already opened file-like object *or* a path-like object, and >>> does different things based on which it receives. >>> >>> In that scenario, rather than introducing an unconditional "filename = >>> os.fspath(filename)" before the current logic, it makes more sense to >>> me to change the current logic to use the new protocol check rather >>> than a strict typecheck on str/bytes: >>> >>> if isinstance(filename, os.PathLike): # Changed line >>> filename = os.fspath(filename)# New line >>> if "b" not in mode: >>> mode += "b" >>> self._fp = builtins.open(filename, mode) >>> self._closefp = True >>> self._mode = mode_code >>> elif hasattr(filename, "read") or hasattr(filename, "write"): >>> self._fp = filename >>> self._mode = mode_code >>> else: >>> raise TypeError( >>> "filename must be a path-like or file-like object" >>> ) >>> >>> I *don't* think it makes sense to weaken the guarantees on os.fspath >>> to let it propagate non-path-like objects. >>> >>> C
Re: [Python-Dev] proposed os.fspath() change
On 06/15/2016 12:24 PM, Koos Zevenhoven wrote: On Wed, Jun 15, 2016 at 10:15 PM, Brett Cannon wrote: ABCs like os.PathLike can override __subclasshook__ so that registration isn't required (see https://hg.python.org/cpython/file/default/Lib/os.py#l1136). So registration is definitely good to do to be explicit that you're trying to meet an ABC, but it isn't strictly required. And the other question could be turned into whether to make str and bytes also PathLike in __subclasshook__. No, for two reasons. - most str's and bytes' are not paths; - PathLike indicates a rich-path object, which str's and bytes' are not. -- ~Ethan~ ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?
[whew, actually read the whole thread] On 11 June 2016 at 10:28, Terry Reedy wrote: > On 6/11/2016 11:34 AM, Guido van Rossum wrote: >> >> In terms of API design, I'd prefer a flag to os.urandom() indicating a >> preference for >> - blocking >> - raising an exception >> - weaker random bits > > > +100 ;-) > > I proposed exactly this 2 days ago, 5 hours after Larry's initial post. No, this is a bad idea. Asking novice developers to make security decisions they're not yet qualified to make when it's genuinely possible for us to do the right thing by default is the antithesis of good security API design, and os.urandom() *is* a security API (whether we like it or not - third party documentation written by the cryptographic software development community has made it so, since it's part of their guidelines for writing security sensitive code in pure Python). Adding *new* APIs is also a bad idea, since "os.urandom() is the right answer on every OS except Linux, and also the best currently available answer on Linux" has been the standard security advice for generating cryptographic secrets in pure Python code for years now, so we should only change that guidance if we have extraordinarily compelling reasons to do so, and we don't. Instead, we have Ted T'so himself chiming in to say: "My preference would be that os.[u]random should block, because the odds that people would be trying to generate long-term cryptographic secrets within seconds after boot is very small, and if you *do* block for a second or two, it's not the end of the world." The *actual bug* that triggered this latest firestorm of commentary (from experts and non-experts alike) had *nothing* to do with user code calling os.urandom, and instead was a combination of: - CPython startup requesting cryptographically secure randomness when it didn't need it - a systemd init script written in Python running before the kernel RNG was fully initialised That created a deadlock between CPython startup and the rest of the Linux init process, so the latter only continued when the systemd watchdog timed out and killed the offending script. As others have noted, this kind of deadlock scenario is generally impossible on other operating systems, as the operating system doesn't provide a way to run Python code before the random number generator is ready. The change Victor made in 3.5.2 to fall back to reading /dev/urandom directly if the getrandom() syscall returns EAGAIN (effectively reverting to the Python 3.4 behaviour) was the simplest possible fix for that problem (and an approach I thoroughly endorse, both for 3.5.2 and for the life of the 3.5 series), but that doesn't make it the right answer for 3.6+. To repeat: the problem encountered was NOT due to user code calling os.urandom(), but rather due to the way CPython initialises its own internal hash algorithm at interpreter startup. However, due to the way CPython is currently implemented, fixing the regression in that not only changed the behaviour of CPython startup, it *also* changed the behaviour of every call to os.urandom() in Python 3.5.2+. For 3.6+, we can instead make it so that the only things that actually rely on cryptographic quality randomness being available are: - calling a secrets module API - calling a random.SystemRandom method - calling os.urandom directly These are all APIs that were either created specifically for use in security sensitive situations (secrets module), or have long been documented (both within our own documentation, and in third party documentation, books and Q&A sites) as being an appropriate choice for use in security sensitive situations (os.urandom and random.SystemRandom). However, we don't need to make those block waiting for randomness to be available - we can update them to raise BlockingIOError instead (which makes it trivial for people to decide for themselves how they want to handle that case). Along with that change, we can make it so that starting the interpreter will never block waiting for cryptographic randomness to be available (since it doesn't need it), and importing the random module won't block waiting for it either. To the best of our knowledge, on all operating systems other than Linux, encountering the new exception will still be impossible in practice, as there is no known opportunity to run Python code before the kernel random number generator is ready. On Linux, init scripts may still run before the kernel random number generator is ready, but will now throw an immediate BlockingIOError if they access an API that relies on crytographic randomness being available, rather than potentially deadlocking the init process. Folks encountering that situation will then need to make an explicit decision: - loop until the exception is no longer thrown - switch to reading from /dev/urandom directly instead of calling os.urandom() - switch to using a cross-platform non-cryptographic API (probably the random module) Victor has some additional
Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?
On 06/15/2016 01:01 PM, Nick Coghlan wrote: For 3.6+, we can instead make it so that the only things that actually rely on cryptographic quality randomness being available are: - calling a secrets module API - calling a random.SystemRandom method - calling os.urandom directly However, we don't need to make those block waiting for randomness to be available - we can update them to raise BlockingIOError instead (which makes it trivial for people to decide for themselves how they want to handle that case). Along with that change, we can make it so that starting the interpreter will never block waiting for cryptographic randomness to be available (since it doesn't need it), and importing the random module won't block waiting for it either. +1 -- ~Ethan~ ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 520: Ordered Class Definition Namespace
On 14 June 2016 at 02:41, Nikita Nemkin wrote: > Is there any rationale for rejecting alternatives like: Good questions - Eric, it's likely worth capturing answers to these in the PEP for the benefit of future readers. > 1. Adding standard metaclass with ordered namespace. Adding metaclasses to an existing class can break compatibility with third party subclasses, so making it possible for people to avoid that while still gaining the ability to implicitly expose attribute ordering to class decorators and other potentially interested parties is a recurring theme behind this PEP and also PEPs 422 and 487. > 2. Adding `namespace` or `ordered` args to the default metaclass. See below (as it relates to your own complexity argument) > 3. Making compiler fill in __definition_order__ for every class > (just like __qualname__) without touching the runtime. > ? Class scopes support conditionals and loops, so we can't necessarily be sure what names will be assigned without running the code. It's also possible to make attribute assignments via locals() that are entirely opaque to the compiler, but visible to the interpreter at runtime. > To me, any of the above seems preferred to complicating > the core part of the language forever. > > The vast majority of Python classes don't care about their member > order, this is minority use case receiving majority treatment. > > Also, wiring OrderedDict into class creation means elevating it > from a peripheral utility to indispensable built-in type. Right, that's one of the key reasons this is a PEP, rather than just an item on the issue tracker. The rationale for "Why not make this configurable, rather than switching it unilaterally?" is that it's actually *simpler* overall to just make it the default - we can then change the documentation to say "class bodies are evaluated in a collections.OrderedDict instance by default" and record the consequences of that, rather than having to document yet another class customisation mechanism. It also eliminates boilerplate from class decorator usage instructions, where people have to write "to use this class decorator, you must also specify 'namespace=collections.OrderedDict' in your class header" Folks that don't need the ordering information do end up paying a slight import time and memory cost, which is another key reason for handling the proposal as a PEP rather than just as a tracker issue. Aside from the boilerplate reduction when used in conjunction with a class decorator, a further possible category of consumers would be documentation generators like pydoc and Sphinx apidoc, which may be able to switch to displaying methods in definition order, rather than the current approach of always listing them in alphabetical order. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proposed os.fspath() change
On Wed, Jun 15, 2016 at 11:00 PM, Ethan Furman wrote: > On 06/15/2016 12:24 PM, Koos Zevenhoven wrote: >> >> And the other question could be turned into whether to make str and >> bytes also PathLike in __subclasshook__. > > No, for two reasons. > > - most str's and bytes' are not paths; True. Well, at least most str and bytes objects are not *meant* to be used as paths, even if they could be. > - PathLike indicates a rich-path object, which str's and bytes' are not. This does not count as a reason. If this were called pathlib.PathABC, I would definitely agree [1]. But since this is called os.PathLike, I'm not quite as sure. Anyway, including str and bytes is more of a type hinting issue. And since type hints will in also act as documentation, the naming of types is becoming more important. -- Koos [1] No, I'm not proposing moving this to pathlib > -- > ~Ethan~ > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com -- + Koos Zevenhoven + http://twitter.com/k7hoven + ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why does base64 return bytes?
Paul Moore:
> Finding out whether users/projects typically write such a helper
> function for themselves would be a better way of getting this
> information. Personally, I suspect they don't, but facts beat
> speculation.
Well, I did. It was necessary to get 2to3 conversion to work(*). I turned every
occurence of
E.encode('base-64')
and
E.decode('base-64')
into helper function calls that for Python 3 did:
b64encode(E).decode('ascii')
and
b64decode(E.encode('ascii'))
(Or something similar, I don't have the code in front of me.)
Leaving out .decode/.encode('ascii') would simply not have worked. That would
just be asking for TypeError's.
regards, Anders
(*) Yes, I use 2to3, believe it or not. Maintaining Python 2 code and doing an
automated conversion to Python 3 as needed.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?
On Wed, Jun 15, 2016 at 1:01 PM, Nick Coghlan wrote: [...] > For 3.6+, we can instead make it so that the only things that actually > rely on cryptographic quality randomness being available are: > > - calling a secrets module API > - calling a random.SystemRandom method > - calling os.urandom directly > > These are all APIs that were either created specifically for use in > security sensitive situations (secrets module), or have long been > documented (both within our own documentation, and in third party > documentation, books and Q&A sites) as being an appropriate choice for > use in security sensitive situations (os.urandom and > random.SystemRandom). > > However, we don't need to make those block waiting for randomness to > be available - we can update them to raise BlockingIOError instead > (which makes it trivial for people to decide for themselves how they > want to handle that case). > > Along with that change, we can make it so that starting the > interpreter will never block waiting for cryptographic randomness to > be available (since it doesn't need it), and importing the random > module won't block waiting for it either. This all seems exactly right to me, to the point that I've been dreading having to find the time to write pretty much this exact email. So thank you :-) > To the best of our knowledge, on all operating systems other than > Linux, encountering the new exception will still be impossible in > practice, as there is no known opportunity to run Python code before > the kernel random number generator is ready. > > On Linux, init scripts may still run before the kernel random number > generator is ready, but will now throw an immediate BlockingIOError if > they access an API that relies on crytographic randomness being > available, rather than potentially deadlocking the init process. Folks > encountering that situation will then need to make an explicit > decision: > > - loop until the exception is no longer thrown > - switch to reading from /dev/urandom directly instead of calling os.urandom() > - switch to using a cross-platform non-cryptographic API (probably the > random module) > > Victor has some additional technical details written up at > http://haypo-notes.readthedocs.io/pep_random.html and I'd be happy to > formalise this proposed approach as a PEP (the current reference is > http://bugs.python.org/issue27282 ) I'd make two additional suggestions: - one person did chime in on the thread to say that they've used os.urandom for non-security-sensitive purposes, simply because it provided a convenient "give me a random byte-string" API that is missing from random. I think we should go ahead and add a .randbytes method to random.Random that simply returns a random bytestring using the regular RNG, to give these users a nice drop-in replacement for os.urandom. Rationale: I don't think the existence of these users should block making os.urandom appropriate for generating secrets, because (1) a glance at github shows that this is very unusual -- if you skim through this search you get page after page of functions with names like "generate_secret_key" https://github.com/search?l=python&p=2&q=urandom&ref=searchresults&type=Code&utf8=%E2%9C%93 and (2) for the minority of people who are using os.urandom for non-security-sensitive purposes, if they find os.urandom raising an error, then this is just a regular bug that they will notice immediately and fix, and anyway it's basically never going to happen. (As far as we can tell, this has never yet happened in the wild, even once.) OTOH if os.urandom is allowed to fail silently, then people who are using it to generate secrets will get silent catastrophic failures, plus those users can't assume it will never happen because they have to worry about active attackers trying to drive systems into unusual states. So I'd much rather ask the non-security-sensitive users to switch to using something in random, than force the cryptographic users to switch to using secrets. But it does seem like it would be good to give those non-security-sensitive users something to switch to :-). - It's not exactly true that the Python interpreter doesn't need cryptographic randomness to initialize SipHash -- it's more that *some* Python invocations need unguessable randomness (to first approximation: all those which are exposed to hostile input), and some don't. And since the Python interpreter has no idea which case it's in, and since it's unacceptable for it to break invocations that don't need unguessable hashes, then it has to err on the side of continuing without randomness. All that's fine. But, given that the interpreter doesn't know which state it's in, there's also the possibility that this invocation *will* be exposed to hostile input, and the 3.5.2+ behavior gives absolutely no warning that this is what's happening. So instead of letting this potential error pass silently, I propose that if SipHash fails to acquire real randomness at star
Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?
On 15 June 2016 at 16:12, Nathaniel Smith wrote: > On Wed, Jun 15, 2016 at 1:01 PM, Nick Coghlan wrote: >> Victor has some additional technical details written up at >> http://haypo-notes.readthedocs.io/pep_random.html and I'd be happy to >> formalise this proposed approach as a PEP (the current reference is >> http://bugs.python.org/issue27282 ) > > I'd make two additional suggestions: > > - one person did chime in on the thread to say that they've used > os.urandom for non-security-sensitive purposes, simply because it > provided a convenient "give me a random byte-string" API that is > missing from random. I think we should go ahead and add a .randbytes > method to random.Random that simply returns a random bytestring using > the regular RNG, to give these users a nice drop-in replacement for > os.urandom. That seems reasonable. > - It's not exactly true that the Python interpreter doesn't need > cryptographic randomness to initialize SipHash -- it's more that > *some* Python invocations need unguessable randomness (to first > approximation: all those which are exposed to hostile input), and some > don't. And since the Python interpreter has no idea which case it's > in, and since it's unacceptable for it to break invocations that don't > need unguessable hashes, then it has to err on the side of continuing > without randomness. All that's fine. > > But, given that the interpreter doesn't know which state it's in, > there's also the possibility that this invocation *will* be exposed to > hostile input, and the 3.5.2+ behavior gives absolutely no warning > that this is what's happening. So instead of letting this potential > error pass silently, I propose that if SipHash fails to acquire real > randomness at startup, then it should issue a warning. In practice, > this will almost never happen. But in the rare cases it does, it at > least gives the user a fighting chance to realize that their system is > in a potentially dangerous state. And by using the warnings module, we > automatically get quite a bit of flexibility. > > If some particular > invocation (e.g. systemd-cron) has audited their code and decided that > they don't care about this issue, they can make the message go away: > >PYTHONWARNINGS=ignore::NoEntropyAtStartupWarning > > OTOH if some particular invocation knows that they do process > potentially hostile input early on (e.g. cloud-init, maybe?), then > they can explicitly promote the warning to an error: > > PYTHONWARNINGS=error::NoEntropyAtStartupWarning > > (I guess the way to implement this would be for the SipHash > initialization code -- which runs very early -- to set some flag, and > then we expose that flag in sys._something, and later in the startup > sequence check for it after the warnings module is functional. > Exposing the flag at the Python level would also make it possible for > code like cloud-init to do its own explicit check and respond > appropriately.) A Python level warning/flag seems overly elaborate to me, but we can easily emit a warning on stderr when SipHash is initialised via the fallback rather than the operating system's RNG. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Final round of the Python Language Summit coverage at LWN
Hola python-dev, The final batch of articles from the Python Language Summit is now ready. The starting point is here: https://lwn.net/Articles/688969/ I have added the final six sessions (with SubscriberLinks for those without a subscription): Python 3 in Fedora: https://lwn.net/Articles/690676/ https://lwn.net/SubscriberLink/690676/cdf118081ac0ffd5/ The Python JITs are coming: https://lwn.net/Articles/691070/ https://lwn.net/SubscriberLink/691070/2714fd6a4934f016/ Pyjion: https://lwn.net/Articles/691152/ https://lwn.net/SubscriberLink/691152/6334fd8d5a9992c0/ Why is Python slow?: https://lwn.net/Articles/691243/ https://lwn.net/SubscriberLink/691243/669cb2bf2fe220c4/ Automated testing of CPython patches: https://lwn.net/Articles/691307/ https://lwn.net/SubscriberLink/691307/89feefecfe425f58/ The Python security response team: https://lwn.net/Articles/691308/ https://lwn.net/SubscriberLink/691308/432ff50e0f9b794f/ The articles will be freely available (without using the SubscriberLink) to the world at large in a week ... until then, feel free to share the SubscriberLinks. Hopefully I have captured things reasonably well. If there are corrections or clarifications needed, though, I recommend posting them as comments on the article. With luck, I will be able to sit in on the summit again next year ... enjoy! jake -- Jake Edge - LWN - [email protected] - http://lwn.net ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?
On Wed, Jun 15, 2016 at 04:12:57PM -0700, Nathaniel Smith wrote: > - It's not exactly true that the Python interpreter doesn't need > cryptographic randomness to initialize SipHash -- it's more that > *some* Python invocations need unguessable randomness (to first > approximation: all those which are exposed to hostile input), and some > don't. And since the Python interpreter has no idea which case it's > in, and since it's unacceptable for it to break invocations that don't > need unguessable hashes, then it has to err on the side of continuing > without randomness. All that's fine. In practice, those Python ivocation which are exposed to hostile input are those that are started while the network are up. The vast majority of time, they are launched by the web brwoser --- and if this happens after a second or so of the system getting networking interrupts, (a) getrandom won't block, and (b) /dev/urandom and getrandom will be initialized. Also, I wish people would say that this is only an issue on Linux. Again, FreeBSD's /dev/urandom will block as well if it is uninitialized. It's just that in practice, for both Linux and Freebsd, we try very hard to make sure /dev/urandom is fully initialized by the time it matters. It's just that so far, it's only on Linux when there was an attempt to use Python in the early init scripts, and in a VM and in a system where everything is modularized such that the deadlock became visible. > (I guess the way to implement this would be for the SipHash > initialization code -- which runs very early -- to set some flag, and > then we expose that flag in sys._something, and later in the startup > sequence check for it after the warnings module is functional. > Exposing the flag at the Python level would also make it possible for > code like cloud-init to do its own explicit check and respond > appropriately.) I really don't think it's that big a of a deal in *practice*, and but if you really are concerned about the very remote possibility that a Python invocation could start in early boot, and *then* also stick around for the long term, and *then* be exosed to hostile input --- what if you set the flag, and then later on, N minutes, either automatically, or via some trigger such as cloud-init --- try and see if /dev/urandom is initialized (even a few seconds later, so long as the init scripts are hanging, it should be initialized) have Python hash all of its dicts, or maybe just the non-system dicts (since those are presumably the ones mos tlikely to be exposed hostile input). - Ted ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why does base64 return bytes?
Steven D'Aprano wrote: I'm satisfied that the choice made by Python is the right choice, and that it meets the spirit (if, arguably, not the letter) of the RFC. IMO it meets the letter (if you read it a certain way) but *not* the spirit. -- Greg ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?
On Jun 15, 2016, at 01:01 PM, Nick Coghlan wrote:
>No, this is a bad idea. Asking novice developers to make security
>decisions they're not yet qualified to make when it's genuinely
>possible for us to do the right thing by default is the antithesis of
>good security API design, and os.urandom() *is* a security API
>(whether we like it or not - third party documentation written by the
>cryptographic software development community has made it so, since
>it's part of their guidelines for writing security sensitive code in
>pure Python).
Regardless of what third parties have said about os.urandom(), let's look at
what *we* have said about it. Going back to pre-churn 3.4 documentation:
os.urandom(n)
Return a string of n random bytes suitable for cryptographic use.
This function returns random bytes from an OS-specific randomness
source. The returned data should be unpredictable enough for cryptographic
applications, though its exact quality depends on the OS
implementation. On a Unix-like system this will query /dev/urandom, and on
Windows it will use CryptGenRandom(). If a randomness source is not found,
NotImplementedError will be raised.
For an easy-to-use interface to the random number generator provided by
your platform, please see random.SystemRandom.
So we very clearly provided platform-dependent caveats on the cryptographic
quality of os.urandom(). We also made a strong claim that there's a direct
connection between os.urandom() and /dev/urandom on "Unix-like system(s)".
We broke that particular promise in 3.5. and semi-fixed it 3.5.2.
>Adding *new* APIs is also a bad idea, since "os.urandom() is the right
>answer on every OS except Linux, and also the best currently available
>answer on Linux" has been the standard security advice for generating
>cryptographic secrets in pure Python code for years now, so we should
>only change that guidance if we have extraordinarily compelling
>reasons to do so, and we don't.
Disagree.
We have broken one long-term promise on os.urandom() ("On a Unix-like system
this will query /dev/urandom") and changed another ("should be unpredictable
enough for cryptographic applications, though its exact quality depends on OS
implementations").
We broke the experienced Linux developer's natural and long-standing link
between the API called os.urandom() and /dev/urandom. This breaks pre-3.5
code that assumes read-from-/dev/urandom semantics for os.urandom().
We have introduced churn. Predicting a future SO question such as "Can
os.urandom() block on Linux?" the answer is "No in Python 3.4 and earlier, yes
possibly in Python 3.5.0 and 3.5.1, no in Python 3.5.2 and the rest of the
3.5.x series, and yes possibly in Python 3.6 and beyond".
We have a better answer for "cryptographically appropriate" use cases in
Python 3.6 - the secrets module. Trying to make os.urandom() "the right
answer on every OS" weakens the promotion of secrets as *the* module to use
for cryptographically appropriate use cases.
IMHO it would be better to leave os.urandom() well enough alone, except for
the documentation which should effectively say, a la 3.4:
os.urandom(n)
Return a string of n random bytes suitable for cryptographic use.
This function returns random bytes from an OS-specific randomness
source. The returned data should be unpredictable enough for cryptographic
applications, though its exact quality depends on the OS
implementation. On a Unix-like system this will query /dev/urandom, and on
Windows it will use CryptGenRandom(). If a randomness source is not found,
NotImplementedError will be raised.
Cryptographic applications should use the secrets module for stronger
guaranteed sources of randomness.
For an easy-to-use interface to the random number generator provided by
your platform, please see random.SystemRandom.
Cheers,
-Barry
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?
On 06/15/2016 11:45 PM, Barry Warsaw wrote: So we very clearly provided platform-dependent caveats on the cryptographic quality of os.urandom(). We also made a strong claim that there's a direct connection between os.urandom() and /dev/urandom on "Unix-like system(s)". We broke that particular promise in 3.5. and semi-fixed it 3.5.2. Well, 3.5.2 hasn't happened yet. So if you see it as still being broken, please speak up now. Why do you call it only "semi-fixed"? As far as I understand it, the semantics of os.urandom() in 3.5.2rc1 are indistinguishable from reading from /dev/urandom directly, except it may not need to use a file handle. //arry/ ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
