date:20160615

Re: [Python-Dev] Why does base64 return bytes?

2016-06-15 Thread Greg Ewing


Stephen J. Turnbull wrote:


The RFC is unclear on this point, but I read it as specifying the
ASCII coded character set, not the ASCII repertoire of (abstract)
characters.


Well, I think you've misread it. Or at least there is a
more general reading possible that is entirely consistent
with the stated purpose and doesn't assume any particular
output encoding.


It's more subtle than that.  *RFCs do not deal with text.*


That may be true of most RFCs, but I think this particular
one really *is* talking about text, even if the authors
didn't realise it at the time.


It is also desirable that it be likely to pass unscathed through channels
that ... *inadvertantly* treat it as text.  Both requirements are
conveniently fulfilled by using appropriate ASCII subsets, and encoding on
the wire using the usual bit patterns.


But only if the part that is (deliberately or inadvertently)
treating it as text is using ASCII as its encoding. So, by
your reading of the RFC, base64 is *only* intended for
channels that use ASCII encoding.

Whereas if you drop the assumption of ASCII and use whatever
encoding the channel uses for text, then it works for all
channels.

RFC 4648 doesn't mention it, but an earlier RFC on base64
explicitly said that characters were chosen that also exist
in EBCDIC, so it seems they were intending that base64
should work on EBCDIC-bases systems as well as ASCII-based
ones.


It's purely a matter of our convenience
(as programmer *in* Python) whether we return str or bytes.


Yes, and it seems to me the decision has been made by people
with their noses stuck in low-level protocol implementations.
Whenever *I've* needed to base64 encode something, I've wanted
the output as text, because that's what I needed to feed into
the next stage of the process.

Maybe there should be two versions of the base64 codec, one
producing bytes and one producing text?

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why does base64 return bytes?

2016-06-15 Thread Greg Ewing


Simon Cross wrote:

If we only support one, I would prefer it to be bytes since (bytes ->
bytes -> unicode) seems like less overhead and slightly conceptually
clearer than (bytes -> unicode -> bytes),


Whereas bytes -> unicode, followed if needed by unicode -> bytes,
seems conceptually clearer to me. IOW, base64 is conceptually a
bytes-to-text transformation, and the usual way to represent
text in Python 3 is unicode.

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why does base64 return bytes?

2016-06-15 Thread Steven D'Aprano

On Tue, Jun 14, 2016 at 09:40:51PM -0700, Guido van Rossum wrote:
> I'm officially on vacation, but I was surprised that people now assume
> RFCs, which specify internet protocols, would have a bearing on programming
> languages. (With perhaps an exception for RFCs that specifically specify
> how programming languages or their libraries should treat certain specific
> issues -- but I found no evidence that this RFC is doing that.)

Sorry to disturb your vacation!

I hoped that there might have been a nice simple answer, like "the 
main use-case for Base64 is the email module, which needs bytes, and 
thus it was decided". Or even "because backwards compatibility".

Thanks to everyone for their constructive comments, and expecially Mark 
for digging up the original discussion on the Python-3000 list. I'm 
satisfied that the choice made by Python is the right choice, and that 
it meets the spirit (if, arguably, not the letter) of the RFC.

-- 
Steve
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why does base64 return bytes?

2016-06-15 Thread Daniel Holth

In that case could we just add a base64_text() method somewhere? Who would
like to measure whether it would be a win?

On Wed, Jun 15, 2016 at 8:34 AM Steven D'Aprano  wrote:

> On Tue, Jun 14, 2016 at 09:40:51PM -0700, Guido van Rossum wrote:
> > I'm officially on vacation, but I was surprised that people now assume
> > RFCs, which specify internet protocols, would have a bearing on
> programming
> > languages. (With perhaps an exception for RFCs that specifically specify
> > how programming languages or their libraries should treat certain
> specific
> > issues -- but I found no evidence that this RFC is doing that.)
>
> Sorry to disturb your vacation!
>
> I hoped that there might have been a nice simple answer, like "the
> main use-case for Base64 is the email module, which needs bytes, and
> thus it was decided". Or even "because backwards compatibility".
>
> Thanks to everyone for their constructive comments, and expecially Mark
> for digging up the original discussion on the Python-3000 list. I'm
> satisfied that the choice made by Python is the right choice, and that
> it meets the spirit (if, arguably, not the letter) of the RFC.
>
>
> --
> Steve
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why does base64 return bytes?

2016-06-15 Thread Paul Moore

On 15 June 2016 at 13:53, Daniel Holth  wrote:
> In that case could we just add a base64_text() method somewhere? Who would
> like to measure whether it would be a win?

"Just adding" a method in the stdlib, means we'd have to support it
long term (backward compatibility). So by the time such an experiment
determined whether it was worth it, it'd be too late.

Finding out whether users/projects typically write such a helper
function for themselves would be a better way of getting this
information. Personally, I suspect they don't, but facts beat
speculation.

Of course, "not every one liner needs to be a stdlib function" applies here too.

Paul
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why does base64 return bytes?

2016-06-15 Thread Isaac Morland


On Wed, 15 Jun 2016, Greg Ewing wrote:


Simon Cross wrote:

If we only support one, I would prefer it to be bytes since (bytes ->
bytes -> unicode) seems like less overhead and slightly conceptually
clearer than (bytes -> unicode -> bytes),


Whereas bytes -> unicode, followed if needed by unicode -> bytes,
seems conceptually clearer to me. IOW, base64 is conceptually a
bytes-to-text transformation, and the usual way to represent
text in Python 3 is unicode.


And in CPython, do I understand correctly that the output text would be 
represented using one byte per character?  If so, would there be a way of 
encoding that into UTF-8 that re-used the raw memory that backs the 
Unicode object?  And, therefore, avoids almost all the inefficiency of 
going via Unicode?  If so, this would be a win - proper use of Unicode to 
represent a text string, combined with instantaneous conversion into a 
bytes object for the purpose of writing to the OS.


Isaac Morland   CSCF Web Guru
DC 2619, x36650 WWW Software Specialist
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why does base64 return bytes?

2016-06-15 Thread Steven D'Aprano

On Wed, Jun 15, 2016 at 12:53:15PM +, Daniel Holth wrote:
> In that case could we just add a base64_text() method somewhere? Who would
> like to measure whether it would be a win?

Just call .decode('ascii') on the output of base64.b64encode. Not every 
one-liner needs to be a standard function.


-- 
Steve
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Bug in the DELETE statement in sqlite3 module

2016-06-15 Thread ninostephen mathew

Respected Developer(s),
while writing a database module for one of my applications in python I
encountered something interesting. I had a username and password field in
my table and only one entry which was  "Admin" and "password". While
debugging I purposefully deleted that record. Then I ran the same statement
again. To my surprise, it got execute. Then I ran the statement to delete
the user "admin" (lowercase 'a') which does not exist in the table.
Surprisingly again is got executed even though the table was empty. What I
expected was an error popping up. But nothing happened.  I hope this error
gets fixed soon. The code snippet is given below.

self.cursor.execute(''' DELETE FROM Users WHERE username =
?''',(self.username,))
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why does base64 return bytes?

2016-06-15 Thread Daniel Holth

It would be a codec. base64_text in the codecs module. Probably 1 line
different than the existing codec. Very easy to use and maintain. Less
surprising and less error prone for everyone who thinks base64 should
convert between bytes to text. Sounds like an obvious win to me.

On Wed, Jun 15, 2016 at 11:08 AM Isaac Morland 
wrote:

> On Wed, 15 Jun 2016, Greg Ewing wrote:
>
> > Simon Cross wrote:
> >> If we only support one, I would prefer it to be bytes since (bytes ->
> >> bytes -> unicode) seems like less overhead and slightly conceptually
> >> clearer than (bytes -> unicode -> bytes),
> >
> > Whereas bytes -> unicode, followed if needed by unicode -> bytes,
> > seems conceptually clearer to me. IOW, base64 is conceptually a
> > bytes-to-text transformation, and the usual way to represent
> > text in Python 3 is unicode.
>
> And in CPython, do I understand correctly that the output text would be
> represented using one byte per character?  If so, would there be a way of
> encoding that into UTF-8 that re-used the raw memory that backs the
> Unicode object?  And, therefore, avoids almost all the inefficiency of
> going via Unicode?  If so, this would be a win - proper use of Unicode to
> represent a text string, combined with instantaneous conversion into a
> bytes object for the purpose of writing to the OS.
>
> Isaac Morland   CSCF Web Guru
> DC 2619, x36650 WWW Software Specialist
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Bug in the DELETE statement in sqlite3 module

2016-06-15 Thread Piotr Duda

This is not a bug, this is correct behavior of any sql database.

2016-06-15 8:40 GMT+02:00 ninostephen mathew :
> Respected Developer(s),
> while writing a database module for one of my applications in python I
> encountered something interesting. I had a username and password field in my
> table and only one entry which was  "Admin" and "password". While debugging
> I purposefully deleted that record. Then I ran the same statement again. To
> my surprise, it got execute. Then I ran the statement to delete the user
> "admin" (lowercase 'a') which does not exist in the table. Surprisingly
> again is got executed even though the table was empty. What I expected was
> an error popping up. But nothing happened.  I hope this error gets fixed
> soon. The code snippet is given below.
>
> self.cursor.execute(''' DELETE FROM Users WHERE username =
> ?''',(self.username,))
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/duda.piotr%40gmail.com
>



-- 
闇に隠れた黒い力
弱い心を操る
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Bug in the DELETE statement in sqlite3 module

2016-06-15 Thread Xavier Morel

> On 2016-06-15, at 08:40 , ninostephen mathew  wrote:
> 
> Respected Developer(s),
> while writing a database module for one of my applications in python I 
> encountered something interesting. I had a username and password field in my 
> table and only one entry which was  "Admin" and "password". While debugging I 
> purposefully deleted that record. Then I ran the same statement again. To my 
> surprise, it got execute. Then I ran the statement to delete the user "admin" 
> (lowercase 'a') which does not exist in the table. Surprisingly again is got 
> executed even though the table was empty. What I expected was an error 
> popping up. But nothing happened.  I hope this error gets fixed soon. The 
> code snippet is given below.
> 
> self.cursor.execute(''' DELETE FROM Users WHERE username = 
> ?''',(self.username,))

Despite Python bundling sqlite, the Python mailing list is not
responsible for developing SQLite (only for the SQLite bindings
themselves) so this is the wrong mailing list.

That being said, the DELETE statement deletes whichever records in the
table match the provided predicate. If no record matches the predicate,
it will simply delete no record, that is not an error, it is the exact
expected and documented behaviour for the statement in SQL in general
and SQLite in particular.

See https://www.sqlite.org/lang_delete.html for the documentation of the
DELETE statement in SQLite.

While you should feel free to report your expectations to the SQLite
project or to the JTC1/SC32 technical committee (which is responsible
for SQL itself) I fear that's what you will get told there, and that you
are about 30 years too late to try influence such a core statement of
the language.

Not that it would have worked I'd think, I'm reasonably sure the
behaviour of the DELETE statement is a natural consequence of SQL's set-
theoretic foundations: DELETE applies to a set of records, regardless of
the set's cardinality.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Bug in the DELETE statement in sqlite3 module

2016-06-15 Thread Paul Moore

On 15 June 2016 at 07:40, ninostephen mathew  wrote:
> Respected Developer(s),
> while writing a database module for one of my applications in python I
> encountered something interesting. I had a username and password field in my
> table and only one entry which was  "Admin" and "password". While debugging
> I purposefully deleted that record. Then I ran the same statement again. To
> my surprise, it got execute. Then I ran the statement to delete the user
> "admin" (lowercase 'a') which does not exist in the table. Surprisingly
> again is got executed even though the table was empty. What I expected was
> an error popping up. But nothing happened.  I hope this error gets fixed
> soon. The code snippet is given below.
>
> self.cursor.execute(''' DELETE FROM Users WHERE username =
> ?''',(self.username,))

First of all, this list is for the discussions about the development
of Python itself, not for developing applications with Python. You
should probably be posting to python-list instead.

Having said that, this is how SQL works - a DELETE statement selects
all records matching the WHERE clause and deletes them. If the WHERE
clause doesn't match anything, nothing gets deleted. So your code is
working exactly as I would expect.

Paul
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] proposed os.fspath() change

2016-06-15 Thread Ethan Furman


I would like to make a change to os.fspath().

Specifically, os.fspath() currently raises an exception if something
besides str, bytes, or os.PathLike is passed in, but makes no checks
if an os.PathLike object returns something besides a str or bytes.

I would like to change that to the opposite: if a non-os.PathLike is
passed in, return it unchanged (so no change for str and bytes); if
an os.PathLike object returns something that is not a str nor bytes,
raise.

An example of the difference in the lzma file:

Current code (has not been upgraded to use os.fspath() yet)
---

if isinstance(filename, (str, bytes)):
if "b" not in mode:
mode += "b"
self._fp = builtins.open(filename, mode)
self._closefp = True
self._mode = mode_code
elif hasattr(filename, "read") or hasattr(filename, "write"):
self._fp = filename
self._mode = mode_code
else:
raise TypeError(
 "filename must be a str or bytes object, or a file"
  )

Code change if using upgraded os.fspath() (placed before above stanza):

filename = os.fspath(filename)

Code change with current os.fspath() (ditto):

if isinstance(filename, os.PathLike):
filename = os.fspath(filename)

My intention with the os.fspath() function was to minimize boiler-plate
code and make PathLike objects easy and painless to support; having to
discover if any given parameter is PathLike before calling os.fspath()
on it is, IMHO, just the opposite.

There is also precedent for having a __dunder__ check the return type:

--> class Huh:
...   def __int__(self):
... return 'string'
...   def __index__(self):
... return b'bytestring'
...   def __bool__(self):
... return 'true-ish'
...
--> h = Huh()

--> int(h)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: __int__ returned non-int (type str)

--> ''[h]
Traceback (most recent call last):
  File "", line 1, in 
TypeError: __index__ returned non-int (type bytes)

--> bool(h)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: __bool__ should return bool, returned str

Arguments in favor or against?

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Bug in the DELETE statement in sqlite3 module

2016-06-15 Thread Guido van Rossum

A point of order: it's not necessary to post three separate "this is the
wrong list" replies. In fact the optimal number is probably close to zero
-- I understand we all want to be helpful, and we don't want to send
duplicate replies, but someone who posts an inappropriate question is
likely to try another venue when they receive no replies, and three replies
to the list implies that some folks are a little too eager to appear
helpful (while reading the list with considerable delay). When the OP pings
the thread maybe one person, preferably someone who reads the list directly
via email from the list server, could post a standard "wrong list" response.

On Wed, Jun 15, 2016 at 8:29 AM, Paul Moore  wrote:

> On 15 June 2016 at 07:40, ninostephen mathew  wrote:
> > Respected Developer(s),
> > while writing a database module for one of my applications in python I
> > encountered something interesting. I had a username and password field
> in my
> > table and only one entry which was  "Admin" and "password". While
> debugging
> > I purposefully deleted that record. Then I ran the same statement again.
> To
> > my surprise, it got execute. Then I ran the statement to delete the user
> > "admin" (lowercase 'a') which does not exist in the table. Surprisingly
> > again is got executed even though the table was empty. What I expected
> was
> > an error popping up. But nothing happened.  I hope this error gets fixed
> > soon. The code snippet is given below.
> >
> > self.cursor.execute(''' DELETE FROM Users WHERE username =
> > ?''',(self.username,))
>
> First of all, this list is for the discussions about the development
> of Python itself, not for developing applications with Python. You
> should probably be posting to python-list instead.
>
> Having said that, this is how SQL works - a DELETE statement selects
> all records matching the WHERE clause and deletes them. If the WHERE
> clause doesn't match anything, nothing gets deleted. So your code is
> working exactly as I would expect.
>
> Paul
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Guido van Rossum

These are really two separate proposals.

I'm okay with checking the return value of calling obj.__fspath__; that's
an error in the object anyways, and it doesn't matter much whether we do
this or not (though when approving the PEP I considered this and decided
not to insert a check for this). But it doesn't affect your example, does
it? I guess it's easier to raise now and change the API in the future to
avoid raising in this case (if we find that raising is undesirable) than
the other way around, so I'm +0 on this.

The other proposal (passing anything that's not understood right through)
is more interesting and your use case is somewhat compelling. Catching the
exception coming out of os.fspath() would certainly be much messier. The
question remaining is whether, when this behavior is not desired (e.g. when
the caller of os.fspath() just wants a string that it can pass to open()),
the condition of passing that's neither a string not supports __fspath__
still produces an understandable error. I'm not sure that that's the case.
E.g. open() accepts file descriptors in addition to paths, but I'm not sure
that accepting an integer is a good idea in most cases -- it either gives a
mystery "Bad file descriptor" error or starts reading/writing some random
system file, which it then closes once the stream is closed.

On Wed, Jun 15, 2016 at 9:12 AM, Ethan Furman  wrote:

> I would like to make a change to os.fspath().
>
> Specifically, os.fspath() currently raises an exception if something
> besides str, bytes, or os.PathLike is passed in, but makes no checks
> if an os.PathLike object returns something besides a str or bytes.
>
> I would like to change that to the opposite: if a non-os.PathLike is
> passed in, return it unchanged (so no change for str and bytes); if
> an os.PathLike object returns something that is not a str nor bytes,
> raise.
>
> An example of the difference in the lzma file:
>
> Current code (has not been upgraded to use os.fspath() yet)
> ---
>
> if isinstance(filename, (str, bytes)):
> if "b" not in mode:
> mode += "b"
> self._fp = builtins.open(filename, mode)
> self._closefp = True
> self._mode = mode_code
> elif hasattr(filename, "read") or hasattr(filename, "write"):
> self._fp = filename
> self._mode = mode_code
> else:
> raise TypeError(
>  "filename must be a str or bytes object, or a file"
>   )
>
> Code change if using upgraded os.fspath() (placed before above stanza):
>
> filename = os.fspath(filename)
>
> Code change with current os.fspath() (ditto):
>
> if isinstance(filename, os.PathLike):
> filename = os.fspath(filename)
>
> My intention with the os.fspath() function was to minimize boiler-plate
> code and make PathLike objects easy and painless to support; having to
> discover if any given parameter is PathLike before calling os.fspath()
> on it is, IMHO, just the opposite.
>
> There is also precedent for having a __dunder__ check the return type:
>
> --> class Huh:
> ...   def __int__(self):
> ... return 'string'
> ...   def __index__(self):
> ... return b'bytestring'
> ...   def __bool__(self):
> ... return 'true-ish'
> ...
> --> h = Huh()
>
> --> int(h)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: __int__ returned non-int (type str)
>
> --> ''[h]
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: __index__ returned non-int (type bytes)
>
> --> bool(h)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: __bool__ should return bool, returned str
>
> Arguments in favor or against?
>
> --
> ~Ethan~
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Bug in the DELETE statement in sqlite3 module

2016-06-15 Thread Tres Seaver

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/15/2016 12:33 PM, Guido van Rossum wrote:

> A point of order: it's not necessary to post three separate "this is
> the wrong list" replies. In fact the optimal number is probably close
> to zero -- I understand we all want to be helpful, and we don't want
> to send duplicate replies, but someone who posts an inappropriate
> question is likely to try another venue when they receive no replies,
> and three replies to the list implies that some folks are a little too
> eager to appear helpful (while reading the list with considerable
> delay). When the OP pings the thread maybe one person, preferably
> someone who reads the list directly via email from the list server,
> could post a standard "wrong list" response.

In addition, please don't undermine the "this is the wrong list" message
by responding substantively to the OP's query.


Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  [email protected]
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJXYYc/AAoJEPKpaDSJE9HYlSgP/1v+FpEvildmH4fEpZXG+j18
jCt3Q48ffSW22oPhx4lyfZv1Sh3EOsEuHHd3oU7jG9kUtTPyluQQYJiygfCBpSev
CP8LonjJxxkFsVwK5SRGcp7JdjiFbLyqUXbtkFM6s2OE7mpXwtbn4suCRJx7MYaO
CUkN2h0vAandftV4xu+lp/r7n0l8HLTTOsrUFuPZRbT4dVzKwRcM+ER1W4tCnkgZ
bFRXM8YjrUcX/Um2blSi4yZT75TvHjyi44ujbQPsR3OHCPN8GAfAzIVSkbiECP2K
xAqT2/h0E6VkGdEymELCMRHvhCI2wFrAoA6nWYCdyR2Ekg7VB/tnr6AGi+SNvP06
BETMf0BRxpd4sXOvS4+ydhBQQpydW4hiw61RHs8xFiy0W7pqp5Zh4ZHHcZBR2KRT
TXfoxrwQIBIWKlyBdgv9d0maOWg3uq3I3MqO2vnGj/XRPsjs/BWCX9BYZqpnEATB
MasQItCMPoOfmVxlS+cS7rIXXVFdwulm2s5GRZR9PwEuMS8Vmi9A5UyEpshlDYZM
ZMPT3CScFOyczVgC3N+LyO7rYaJMlcNQD/HxxQDvpXoYinxQAFo4eVE2+490XN8j
Od8n3UIo72+rFyyFJ8A7iBORYF9UD44VrFHQRHROTEvv7dV1OTYSVZcdqBb4Ik6S
8Wl+qMIEm8VcuFKI4b/T
=4IaO
-END PGP SIGNATURE-

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Brett Cannon

On Wed, 15 Jun 2016 at 09:48 Guido van Rossum  wrote:

> These are really two separate proposals.
>
> I'm okay with checking the return value of calling obj.__fspath__; that's
> an error in the object anyways, and it doesn't matter much whether we do
> this or not (though when approving the PEP I considered this and decided
> not to insert a check for this). But it doesn't affect your example, does
> it? I guess it's easier to raise now and change the API in the future to
> avoid raising in this case (if we find that raising is undesirable) than
> the other way around, so I'm +0 on this.
>

+0 from me as well. I know in some code in the stdlib that has been ported
which prior to adding support was explicitly checking for str/bytes this
will eliminate its own checking (obviously not a motivating factor as it's
pretty minor).


>
> The other proposal (passing anything that's not understood right through)
> is more interesting and your use case is somewhat compelling. Catching the
> exception coming out of os.fspath() would certainly be much messier. The
> question remaining is whether, when this behavior is not desired (e.g. when
> the caller of os.fspath() just wants a string that it can pass to open()),
> the condition of passing that's neither a string not supports __fspath__
> still produces an understandable error. I'm not sure that that's the case.
> E.g. open() accepts file descriptors in addition to paths, but I'm not sure
> that accepting an integer is a good idea in most cases -- it either gives a
> mystery "Bad file descriptor" error or starts reading/writing some random
> system file, which it then closes once the stream is closed.
>

The FD issue of magically passing through an int was also a concern when
Ethan brought this up in an issue on the tracker. My argument is that FDs
are not file paths and so shouldn't magically pass through if we're going
to type-check anything or claim os.fspath() only works with paths (FDs are
already open file objects). So in my view  either we go ahead and
type-check the return value of __fspath__() and thus restrict everything
coming out of os.fspath() to Union[str, bytes] or we don't type check
anything and be consistent that os.fspath() simply does is call
__fspath__() if present.

And just  because I'm thinking about it, I would special-case the FDs, not
os.PathLike (clearer why you care and faster as it skips the override of
__subclasshook__):

# Can be a single-line ternary operator if preferred.
if not isinstance(filename, int):
filename = os.fspath(filename)


> On Wed, Jun 15, 2016 at 9:12 AM, Ethan Furman  wrote:
>
>> I would like to make a change to os.fspath().
>>
>> Specifically, os.fspath() currently raises an exception if something
>> besides str, bytes, or os.PathLike is passed in, but makes no checks
>> if an os.PathLike object returns something besides a str or bytes.
>>
>> I would like to change that to the opposite: if a non-os.PathLike is
>> passed in, return it unchanged (so no change for str and bytes); if
>> an os.PathLike object returns something that is not a str nor bytes,
>> raise.
>>
>> An example of the difference in the lzma file:
>>
>> Current code (has not been upgraded to use os.fspath() yet)
>> ---
>>
>> if isinstance(filename, (str, bytes)):
>> if "b" not in mode:
>> mode += "b"
>> self._fp = builtins.open(filename, mode)
>> self._closefp = True
>> self._mode = mode_code
>> elif hasattr(filename, "read") or hasattr(filename, "write"):
>> self._fp = filename
>> self._mode = mode_code
>> else:
>> raise TypeError(
>>  "filename must be a str or bytes object, or a file"
>>   )
>>
>> Code change if using upgraded os.fspath() (placed before above stanza):
>>
>> filename = os.fspath(filename)
>>
>> Code change with current os.fspath() (ditto):
>>
>> if isinstance(filename, os.PathLike):
>> filename = os.fspath(filename)
>>
>> My intention with the os.fspath() function was to minimize boiler-plate
>> code and make PathLike objects easy and painless to support; having to
>> discover if any given parameter is PathLike before calling os.fspath()
>> on it is, IMHO, just the opposite.
>>
>> There is also precedent for having a __dunder__ check the return type:
>>
>> --> class Huh:
>> ...   def __int__(self):
>> ... return 'string'
>> ...   def __index__(self):
>> ... return b'bytestring'
>> ...   def __bool__(self):
>> ... return 'true-ish'
>> ...
>> --> h = Huh()
>>
>> --> int(h)
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> TypeError: __int__ returned non-int (type str)
>>
>> --> ''[h]
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> TypeError: __index__ returned non-int (type bytes)
>>
>> --> bool(h)
>> Traceback (most recent call last)

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Koos Zevenhoven

My proposal at the point of the first PEP draft solved both of these issues.

That version of the fspath function passed anything right through that
was an instance of the keyword-only `type_constraint`. If not, it
would ask __fspath__, and before returning the result, it would check
that __fspath__ returned an instance of `type_constraint` and
otherwise raise a TypeError. `type_constraint=object` would then have
given the behavior you want. I always wanted fspath to spare the
caller from all the instance checking (most of which it does even
now).

The main problem with setting type_constraint to something broader
than (str, bytes) is that then that parameter would affect the return
type of the function, which would at least complicate the type hinting
issue. Mypy might now support things like

@overload
def fspath(path: T, type_constraint: Type[T] = (str, bytes)) -> T: ...

but then again, isinstance and Union are not compatible (for a
reason?), and PEP484 for a reason does not allow tuples like (str,
bytes) in place of Unions.

Anyway, if we were to go back to this behavior, we would need to
decide whether to officially allow a wider type constraint or whether
to leave that to Stack Overflow, so to speak.

-- Koos


On Wed, Jun 15, 2016 at 7:46 PM, Guido van Rossum  wrote:
> These are really two separate proposals.
>
> I'm okay with checking the return value of calling obj.__fspath__; that's an
> error in the object anyways, and it doesn't matter much whether we do this
> or not (though when approving the PEP I considered this and decided not to
> insert a check for this). But it doesn't affect your example, does it? I
> guess it's easier to raise now and change the API in the future to avoid
> raising in this case (if we find that raising is undesirable) than the other
> way around, so I'm +0 on this.
>
> The other proposal (passing anything that's not understood right through) is
> more interesting and your use case is somewhat compelling. Catching the
> exception coming out of os.fspath() would certainly be much messier. The
> question remaining is whether, when this behavior is not desired (e.g. when
> the caller of os.fspath() just wants a string that it can pass to open()),
> the condition of passing that's neither a string not supports __fspath__
> still produces an understandable error. I'm not sure that that's the case.
> E.g. open() accepts file descriptors in addition to paths, but I'm not sure
> that accepting an integer is a good idea in most cases -- it either gives a
> mystery "Bad file descriptor" error or starts reading/writing some random
> system file, which it then closes once the stream is closed.
>
> On Wed, Jun 15, 2016 at 9:12 AM, Ethan Furman  wrote:
>>
>> I would like to make a change to os.fspath().
>>
>> Specifically, os.fspath() currently raises an exception if something
>> besides str, bytes, or os.PathLike is passed in, but makes no checks
>> if an os.PathLike object returns something besides a str or bytes.
>>
>> I would like to change that to the opposite: if a non-os.PathLike is
>> passed in, return it unchanged (so no change for str and bytes); if
>> an os.PathLike object returns something that is not a str nor bytes,
>> raise.
>>
>> An example of the difference in the lzma file:
>>
>> Current code (has not been upgraded to use os.fspath() yet)
>> ---
>>
>> if isinstance(filename, (str, bytes)):
>> if "b" not in mode:
>> mode += "b"
>> self._fp = builtins.open(filename, mode)
>> self._closefp = True
>> self._mode = mode_code
>> elif hasattr(filename, "read") or hasattr(filename, "write"):
>> self._fp = filename
>> self._mode = mode_code
>> else:
>> raise TypeError(
>>  "filename must be a str or bytes object, or a file"
>>   )
>>
>> Code change if using upgraded os.fspath() (placed before above stanza):
>>
>> filename = os.fspath(filename)
>>
>> Code change with current os.fspath() (ditto):
>>
>> if isinstance(filename, os.PathLike):
>> filename = os.fspath(filename)
>>
>> My intention with the os.fspath() function was to minimize boiler-plate
>> code and make PathLike objects easy and painless to support; having to
>> discover if any given parameter is PathLike before calling os.fspath()
>> on it is, IMHO, just the opposite.
>>
>> There is also precedent for having a __dunder__ check the return type:
>>
>> --> class Huh:
>> ...   def __int__(self):
>> ... return 'string'
>> ...   def __index__(self):
>> ... return b'bytestring'
>> ...   def __bool__(self):
>> ... return 'true-ish'
>> ...
>> --> h = Huh()
>>
>> --> int(h)
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> TypeError: __int__ returned non-int (type str)
>>
>> --> ''[h]
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> Typ

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Nick Coghlan

On 15 June 2016 at 10:59, Brett Cannon  wrote:
>
>
> On Wed, 15 Jun 2016 at 09:48 Guido van Rossum  wrote:
>>
>> These are really two separate proposals.
>>
>> I'm okay with checking the return value of calling obj.__fspath__; that's
>> an error in the object anyways, and it doesn't matter much whether we do
>> this or not (though when approving the PEP I considered this and decided not
>> to insert a check for this). But it doesn't affect your example, does it? I
>> guess it's easier to raise now and change the API in the future to avoid
>> raising in this case (if we find that raising is undesirable) than the other
>> way around, so I'm +0 on this.
>
> +0 from me as well. I know in some code in the stdlib that has been ported
> which prior to adding support was explicitly checking for str/bytes this
> will eliminate its own checking (obviously not a motivating factor as it's
> pretty minor).

I'd like a strong assertion that the return value of os.fspath() is a
plausible filesystem path representation (so either bytes or str), and
*not* some other kind of object that can also be used for accessing
the filesystem (like a file descriptor or an IO stream)

>> The other proposal (passing anything that's not understood right through)
>> is more interesting and your use case is somewhat compelling. Catching the
>> exception coming out of os.fspath() would certainly be much messier. The
>> question remaining is whether, when this behavior is not desired (e.g. when
>> the caller of os.fspath() just wants a string that it can pass to open()),
>> the condition of passing that's neither a string not supports __fspath__
>> still produces an understandable error. I'm not sure that that's the case.
>> E.g. open() accepts file descriptors in addition to paths, but I'm not sure
>> that accepting an integer is a good idea in most cases -- it either gives a
>> mystery "Bad file descriptor" error or starts reading/writing some random
>> system file, which it then closes once the stream is closed.
>
> The FD issue of magically passing through an int was also a concern when
> Ethan brought this up in an issue on the tracker. My argument is that FDs
> are not file paths and so shouldn't magically pass through if we're going to
> type-check anything or claim os.fspath() only works with paths (FDs are
> already open file objects). So in my view  either we go ahead and type-check
> the return value of __fspath__() and thus restrict everything coming out of
> os.fspath() to Union[str, bytes] or we don't type check anything and be
> consistent that os.fspath() simply does is call __fspath__() if present.
>
> And just  because I'm thinking about it, I would special-case the FDs, not
> os.PathLike (clearer why you care and faster as it skips the override of
> __subclasshook__):
>
> # Can be a single-line ternary operator if preferred.
> if not isinstance(filename, int):
> filename = os.fspath(filename)

Note that the LZMA case Ethan cites is one where the code accepts
either an already opened file-like object *or* a path-like object, and
does different things based on which it receives.

In that scenario, rather than introducing an unconditional "filename =
os.fspath(filename)" before the current logic, it makes more sense to
me to change the current logic to use the new protocol check rather
than a strict typecheck on str/bytes:

if isinstance(filename, os.PathLike): # Changed line
filename = os.fspath(filename)# New line
if "b" not in mode:
mode += "b"
self._fp = builtins.open(filename, mode)
self._closefp = True
self._mode = mode_code
elif hasattr(filename, "read") or hasattr(filename, "write"):
self._fp = filename
self._mode = mode_code
else:
raise TypeError(
 "filename must be a path-like or file-like object"
  )

I *don't* think it makes sense to weaken the guarantees on os.fspath
to let it propagate non-path-like objects.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Brett Cannon

On Wed, 15 Jun 2016 at 11:39 Guido van Rossum  wrote:

> OK, so let's add a check on the return of __fspath__() and keep the check
> on path-like or string/bytes.
>

I'll update the PEP.

Ethan, do you want to leave a note on the os.fspath() issue to update the
code and go through where we've used os.fspath() to see where we can cut
out redundant type checks?


> --Guido (mobile)
> On Jun 15, 2016 11:29 AM, "Nick Coghlan"  wrote:
>
>> On 15 June 2016 at 10:59, Brett Cannon  wrote:
>> >
>> >
>> > On Wed, 15 Jun 2016 at 09:48 Guido van Rossum  wrote:
>> >>
>> >> These are really two separate proposals.
>> >>
>> >> I'm okay with checking the return value of calling obj.__fspath__;
>> that's
>> >> an error in the object anyways, and it doesn't matter much whether we
>> do
>> >> this or not (though when approving the PEP I considered this and
>> decided not
>> >> to insert a check for this). But it doesn't affect your example, does
>> it? I
>> >> guess it's easier to raise now and change the API in the future to
>> avoid
>> >> raising in this case (if we find that raising is undesirable) than the
>> other
>> >> way around, so I'm +0 on this.
>> >
>> > +0 from me as well. I know in some code in the stdlib that has been
>> ported
>> > which prior to adding support was explicitly checking for str/bytes this
>> > will eliminate its own checking (obviously not a motivating factor as
>> it's
>> > pretty minor).
>>
>> I'd like a strong assertion that the return value of os.fspath() is a
>> plausible filesystem path representation (so either bytes or str), and
>> *not* some other kind of object that can also be used for accessing
>> the filesystem (like a file descriptor or an IO stream)
>>
>> >> The other proposal (passing anything that's not understood right
>> through)
>> >> is more interesting and your use case is somewhat compelling. Catching
>> the
>> >> exception coming out of os.fspath() would certainly be much messier.
>> The
>> >> question remaining is whether, when this behavior is not desired (e.g.
>> when
>> >> the caller of os.fspath() just wants a string that it can pass to
>> open()),
>> >> the condition of passing that's neither a string not supports
>> __fspath__
>> >> still produces an understandable error. I'm not sure that that's the
>> case.
>> >> E.g. open() accepts file descriptors in addition to paths, but I'm not
>> sure
>> >> that accepting an integer is a good idea in most cases -- it either
>> gives a
>> >> mystery "Bad file descriptor" error or starts reading/writing some
>> random
>> >> system file, which it then closes once the stream is closed.
>> >
>> > The FD issue of magically passing through an int was also a concern when
>> > Ethan brought this up in an issue on the tracker. My argument is that
>> FDs
>> > are not file paths and so shouldn't magically pass through if we're
>> going to
>> > type-check anything or claim os.fspath() only works with paths (FDs are
>> > already open file objects). So in my view  either we go ahead and
>> type-check
>> > the return value of __fspath__() and thus restrict everything coming
>> out of
>> > os.fspath() to Union[str, bytes] or we don't type check anything and be
>> > consistent that os.fspath() simply does is call __fspath__() if present.
>> >
>> > And just  because I'm thinking about it, I would special-case the FDs,
>> not
>> > os.PathLike (clearer why you care and faster as it skips the override of
>> > __subclasshook__):
>> >
>> > # Can be a single-line ternary operator if preferred.
>> > if not isinstance(filename, int):
>> > filename = os.fspath(filename)
>>
>> Note that the LZMA case Ethan cites is one where the code accepts
>> either an already opened file-like object *or* a path-like object, and
>> does different things based on which it receives.
>>
>> In that scenario, rather than introducing an unconditional "filename =
>> os.fspath(filename)" before the current logic, it makes more sense to
>> me to change the current logic to use the new protocol check rather
>> than a strict typecheck on str/bytes:
>>
>> if isinstance(filename, os.PathLike): # Changed line
>> filename = os.fspath(filename)# New line
>> if "b" not in mode:
>> mode += "b"
>> self._fp = builtins.open(filename, mode)
>> self._closefp = True
>> self._mode = mode_code
>> elif hasattr(filename, "read") or hasattr(filename, "write"):
>> self._fp = filename
>> self._mode = mode_code
>> else:
>> raise TypeError(
>>  "filename must be a path-like or file-like object"
>>   )
>>
>> I *don't* think it makes sense to weaken the guarantees on os.fspath
>> to let it propagate non-path-like objects.
>>
>> Cheers,
>> Nick.
>>
>> --
>> Nick Coghlan   |   [email protected]   |   Brisbane, Australia
>>
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
ht

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Guido van Rossum

OK, so let's add a check on the return of __fspath__() and keep the check
on path-like or string/bytes.

--Guido (mobile)
On Jun 15, 2016 11:29 AM, "Nick Coghlan"  wrote:

> On 15 June 2016 at 10:59, Brett Cannon  wrote:
> >
> >
> > On Wed, 15 Jun 2016 at 09:48 Guido van Rossum  wrote:
> >>
> >> These are really two separate proposals.
> >>
> >> I'm okay with checking the return value of calling obj.__fspath__;
> that's
> >> an error in the object anyways, and it doesn't matter much whether we do
> >> this or not (though when approving the PEP I considered this and
> decided not
> >> to insert a check for this). But it doesn't affect your example, does
> it? I
> >> guess it's easier to raise now and change the API in the future to avoid
> >> raising in this case (if we find that raising is undesirable) than the
> other
> >> way around, so I'm +0 on this.
> >
> > +0 from me as well. I know in some code in the stdlib that has been
> ported
> > which prior to adding support was explicitly checking for str/bytes this
> > will eliminate its own checking (obviously not a motivating factor as
> it's
> > pretty minor).
>
> I'd like a strong assertion that the return value of os.fspath() is a
> plausible filesystem path representation (so either bytes or str), and
> *not* some other kind of object that can also be used for accessing
> the filesystem (like a file descriptor or an IO stream)
>
> >> The other proposal (passing anything that's not understood right
> through)
> >> is more interesting and your use case is somewhat compelling. Catching
> the
> >> exception coming out of os.fspath() would certainly be much messier. The
> >> question remaining is whether, when this behavior is not desired (e.g.
> when
> >> the caller of os.fspath() just wants a string that it can pass to
> open()),
> >> the condition of passing that's neither a string not supports __fspath__
> >> still produces an understandable error. I'm not sure that that's the
> case.
> >> E.g. open() accepts file descriptors in addition to paths, but I'm not
> sure
> >> that accepting an integer is a good idea in most cases -- it either
> gives a
> >> mystery "Bad file descriptor" error or starts reading/writing some
> random
> >> system file, which it then closes once the stream is closed.
> >
> > The FD issue of magically passing through an int was also a concern when
> > Ethan brought this up in an issue on the tracker. My argument is that FDs
> > are not file paths and so shouldn't magically pass through if we're
> going to
> > type-check anything or claim os.fspath() only works with paths (FDs are
> > already open file objects). So in my view  either we go ahead and
> type-check
> > the return value of __fspath__() and thus restrict everything coming out
> of
> > os.fspath() to Union[str, bytes] or we don't type check anything and be
> > consistent that os.fspath() simply does is call __fspath__() if present.
> >
> > And just  because I'm thinking about it, I would special-case the FDs,
> not
> > os.PathLike (clearer why you care and faster as it skips the override of
> > __subclasshook__):
> >
> > # Can be a single-line ternary operator if preferred.
> > if not isinstance(filename, int):
> > filename = os.fspath(filename)
>
> Note that the LZMA case Ethan cites is one where the code accepts
> either an already opened file-like object *or* a path-like object, and
> does different things based on which it receives.
>
> In that scenario, rather than introducing an unconditional "filename =
> os.fspath(filename)" before the current logic, it makes more sense to
> me to change the current logic to use the new protocol check rather
> than a strict typecheck on str/bytes:
>
> if isinstance(filename, os.PathLike): # Changed line
> filename = os.fspath(filename)# New line
> if "b" not in mode:
> mode += "b"
> self._fp = builtins.open(filename, mode)
> self._closefp = True
> self._mode = mode_code
> elif hasattr(filename, "read") or hasattr(filename, "write"):
> self._fp = filename
> self._mode = mode_code
> else:
> raise TypeError(
>  "filename must be a path-like or file-like object"
>   )
>
> I *don't* think it makes sense to weaken the guarantees on os.fspath
> to let it propagate non-path-like objects.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   [email protected]   |   Brisbane, Australia
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Ethan Furman


On 06/15/2016 10:59 AM, Brett Cannon wrote:

On Wed, 15 Jun 2016 at 09:48 Guido van Rossum wrote:



These are really two separate proposals.

I'm okay with checking the return value of calling obj.__fspath__;
that's an error in the object anyways, and it doesn't matter much
whether we do this or not (though when approving the PEP I
considered this and decided not to insert a check for this). But it
doesn't affect your example, does it? I guess it's easier to raise
now and change the API in the future to avoid raising in this case
(if we find that raising is undesirable) than the other way around,
so I'm +0 on this.


+0 from me as well. I know in some code in the stdlib that has been
ported which prior to adding support was explicitly checking for
str/bytes this will eliminate its own checking (obviously not a
motivating factor as it's pretty minor).


If we accept both parts of this proposal the checking will have to stay 
in place as the original argument may not have been bytes, str, nor 
os.PathLike.



The other proposal (passing anything that's not understood right
through) is more interesting and your use case is somewhat
compelling. Catching the exception coming out of os.fspath() would
certainly be much messier. The question remaining is whether, when
this behavior is not desired (e.g. when the caller of os.fspath()
just wants a string that it can pass to open()), the condition of
passing that's neither a string not supports __fspath__ still
produces an understandable error.


This is no different than before os.fspath() existed -- if the function 
wasn't checking that the "filename" was a str but just used it as-is, 
then whatever strange, possibly-hard-to-debug error they would get now 
is the same as what they would have gotten before.



I'm not sure that that's the case.
E.g. open() accepts file descriptors in addition to paths, but I'm
not sure that accepting an integer is a good idea in most cases --
it either gives a mystery "Bad file descriptor" error or starts
reading/writing some random system file, which it then closes once
the stream is closed.


My vision of os.fspath() is simply to reduce rich-path objects to their 
component str or bytes representation, and pass anything else through.


The advantage:

- if os.open accepts str/bytes/fd it can prep the argument by
  calling os.fspath() and then do it's argument checking all
  in one place;

- if lzma accepts bytes/str/filelike-obj it can prep its argument
  by calling os.fspath() and then do it's argument checking all in
  one place

- if Path accepts str/os.PathLike it can prep it's argument(s)
  with os.fspath() and then do its argument checking all in one
  place.


The FD issue of magically passing through an int was also a concern when
Ethan brought this up in an issue on the tracker. My argument is that
FDs are not file paths and so shouldn't magically pass through if we're
going to type-check anything or claim os.fspath() only works with paths
(FDs are already open file objects). So in my view  either we go ahead
and type-check the return value of __fspath__() and thus restrict
everything coming out of os.fspath() to Union[str, bytes] or we don't
type check anything and be consistent that os.fspath() simply does is
call __fspath__() if present.


This is better than what os.fspath() currently does as it has all the 
advantages listed above, but why is checking the output of __fspath__ 
incompatible with not checking anything else?



And just  because I'm thinking about it, I would special-case the FDs,
not os.PathLike (clearer why you care and faster as it skips the
override of __subclasshook__):

# Can be a single-line ternary operator if preferred.
if not isinstance(filename, int):
 filename = os.fspath(filename)


That example will not do the right thing in the lzma case.

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Koos Zevenhoven

On Wed, Jun 15, 2016 at 9:29 PM, Nick Coghlan  wrote:
> On 15 June 2016 at 10:59, Brett Cannon  wrote:
>>
>>
>> On Wed, 15 Jun 2016 at 09:48 Guido van Rossum  wrote:
>>>
>>> These are really two separate proposals.
>>>
>>> I'm okay with checking the return value of calling obj.__fspath__; that's
>>> an error in the object anyways, and it doesn't matter much whether we do
>>> this or not (though when approving the PEP I considered this and decided not
>>> to insert a check for this). But it doesn't affect your example, does it? I
>>> guess it's easier to raise now and change the API in the future to avoid
>>> raising in this case (if we find that raising is undesirable) than the other
>>> way around, so I'm +0 on this.
>>
>> +0 from me as well. I know in some code in the stdlib that has been ported
>> which prior to adding support was explicitly checking for str/bytes this
>> will eliminate its own checking (obviously not a motivating factor as it's
>> pretty minor).
>
> I'd like a strong assertion that the return value of os.fspath() is a
> plausible filesystem path representation (so either bytes or str), and
> *not* some other kind of object that can also be used for accessing
> the filesystem (like a file descriptor or an IO stream)

I agree, so I'm -0.5 on passing through any object (at least by default).

>>> The other proposal (passing anything that's not understood right through)
>>> is more interesting and your use case is somewhat compelling. Catching the
>>> exception coming out of os.fspath() would certainly be much messier. The
>>> question remaining is whether, when this behavior is not desired (e.g. when
>>> the caller of os.fspath() just wants a string that it can pass to open()),
>>> the condition of passing that's neither a string not supports __fspath__
>>> still produces an understandable error. I'm not sure that that's the case.
>>> E.g. open() accepts file descriptors in addition to paths, but I'm not sure
>>> that accepting an integer is a good idea in most cases -- it either gives a
>>> mystery "Bad file descriptor" error or starts reading/writing some random
>>> system file, which it then closes once the stream is closed.
>>
>> The FD issue of magically passing through an int was also a concern when
>> Ethan brought this up in an issue on the tracker. My argument is that FDs
>> are not file paths and so shouldn't magically pass through if we're going to
>> type-check anything or claim os.fspath() only works with paths (FDs are
>> already open file objects). So in my view  either we go ahead and type-check
>> the return value of __fspath__() and thus restrict everything coming out of
>> os.fspath() to Union[str, bytes] or we don't type check anything and be
>> consistent that os.fspath() simply does is call __fspath__() if present.
>>
>> And just  because I'm thinking about it, I would special-case the FDs, not
>> os.PathLike (clearer why you care and faster as it skips the override of
>> __subclasshook__):
>>
>> # Can be a single-line ternary operator if preferred.
>> if not isinstance(filename, int):
>> filename = os.fspath(filename)
>
> Note that the LZMA case Ethan cites is one where the code accepts
> either an already opened file-like object *or* a path-like object, and
> does different things based on which it receives.
>
> In that scenario, rather than introducing an unconditional "filename =
> os.fspath(filename)" before the current logic, it makes more sense to
> me to change the current logic to use the new protocol check rather
> than a strict typecheck on str/bytes:
>
> if isinstance(filename, os.PathLike): # Changed line
> filename = os.fspath(filename)# New line

You are making one of my earlier points here, thanks ;). The point is
that the name PathLike sounds like it would mean anything path-like,
except that os.PathLike does not include str and bytes. And I still
think the naming should be a little different.

So that would be (os.Pathlike, str, bytes) instead of just os.PathLike.

> if "b" not in mode:
> mode += "b"
> self._fp = builtins.open(filename, mode)
> self._closefp = True
> self._mode = mode_code
> elif hasattr(filename, "read") or hasattr(filename, "write"):
> self._fp = filename
> self._mode = mode_code
> else:
> raise TypeError(
>  "filename must be a path-like or file-like object"
>   )
>
> I *don't* think it makes sense to weaken the guarantees on os.fspath
> to let it propagate non-path-like objects.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   [email protected]   |   Brisbane, Australia
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com

--
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Pyt

Re: [Python-Dev] Smoothing the transition from Python 2 to 3

2016-06-15 Thread Nick Coghlan

On 10 June 2016 at 16:36, Neil Schemenauer  wrote:
> Nick Coghlan  wrote:
>> It could be very interesting to add an "ascii-warn" codec to Python
>> 2.7, and then set that as the default encoding when the -3 flag is
>> set.
>
> I don't think that can work.  The library code in Python would spew
> out warnings even in the cases when nothing is wrong with the
> application code.  I think warnings have to be added to a Python
> where str and bytes have been properly separated.  Without extreme
> backporting efforts, that means 3.x.
>
> We don't want to saddle 3.x with a bunch of backwards compatibility
> cruft.  Maybe some of my runtime warning changes could be merged
> using a command line flag to enable them.  It would be nice to have
> the stepping stone version just be normal 3.x with a command line
> option.  However, for the sanity of people maintaining 3.x, I think
> perhaps we don't want to do it.

Right, my initial negative reactions were mainly to the idea of having
these kinds of capabilities in the mainline 3.x codebase (where we'd
then have to support them for everyone, not just the folks that
genuinely need them to help in migration from Python 2).

The standard porting instructions currently assume code bases that are
*mostly* bytes/unicode clean, with perhaps a few oversights where
Python 3 rejects ambiguity that Python 2 tolerates. In that context,
"run your test suite, address the test failures" should generally be
sufficient, without needing to use a custom Python build.

However, there are a couple of cases those standard instructions still
don't cover:

- if there's no test suite, exploratory discovery is problematic when
the app falls over at the first type ambiguity
- even if there is a test suite, sufficiently pervasive type ambiguity
may make it difficult to use for fault isolation

That's where I now agree your proposal for a variant build
specifically aimed at compatibility testing is potentially
interesting:

- the tool would become an escalation path for folks that aren't in a
position to use their own test suite to isolate type ambiguity
problems under Python 3
- using Python 3 as a basis means you get a clean standard library
that shouldn't emit any false alarms
- the necessary feature set is defined by the common subset of Python
2.7 and a chosen minimum Python 3 version, not any future 3.x release,
so you should be able to maintain the changes as a stable patch set
without needing to chase CPython trunk (with the attendant risk of
merge conflicts)

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Ethan Furman


On 06/15/2016 11:44 AM, Brett Cannon wrote:

On Wed, 15 Jun 2016 at 11:39 Guido van Rossum wrote:



OK, so let's add a check on the return of __fspath__() and keep the
check on path-like or string/bytes.


I'll update the PEP.

Ethan, do you want to leave a note on the os.fspath() issue to update
the code and go through where we've used os.fspath() to see where we can
cut out redundant type checks?


Will do.

I didn't see this subthread before my last post, so unless you agree 
with those other changes feel free to ignore it.  ;)


--
~Ethan~

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Koos Zevenhoven

>> if isinstance(filename, os.PathLike):

By the way, regarding the line of code above, is there a convention
regarding whether implementing some protocol/interface requires
registering with (or inheriting from) the appropriate ABC for it to
work in all situations. IOW, in this case, is it sufficient to
implement __fspath__ to make your type pathlike? Is there a conscious
trend towards requiring the ABC?

-- Koos
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Brett Cannon

On Wed, 15 Jun 2016 at 12:12 Koos Zevenhoven  wrote:

> >> if isinstance(filename, os.PathLike):
>
> By the way, regarding the line of code above, is there a convention
> regarding whether implementing some protocol/interface requires
> registering with (or inheriting from) the appropriate ABC for it to
> work in all situations. IOW, in this case, is it sufficient to
> implement __fspath__ to make your type pathlike? Is there a conscious
> trend towards requiring the ABC?
>

ABCs like os.PathLike can override __subclasshook__ so that registration
isn't required (see
https://hg.python.org/cpython/file/default/Lib/os.py#l1136). So
registration is definitely good to do to be explicit that you're trying to
meet an ABC, but it isn't strictly required.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Ethan Furman


On 06/15/2016 12:10 PM, Koos Zevenhoven wrote:

 if isinstance(filename, os.PathLike):


By the way, regarding the line of code above, is there a convention
regarding whether implementing some protocol/interface requires
registering with (or inheriting from) the appropriate ABC for it to
work in all situations. IOW, in this case, is it sufficient to
implement __fspath__ to make your type pathlike? Is there a conscious
trend towards requiring the ABC?


The ABC is not required, simply having the __fspath__ attribute is 
enough.  Of course, to actually work that attribute should be a function 
that returns a str or bytes object.  ;)


--
~Ethan~

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Koos Zevenhoven

On Wed, Jun 15, 2016 at 10:15 PM, Brett Cannon  wrote:
>
>
> On Wed, 15 Jun 2016 at 12:12 Koos Zevenhoven  wrote:
>>
>> >> if isinstance(filename, os.PathLike):
>>
>> By the way, regarding the line of code above, is there a convention
>> regarding whether implementing some protocol/interface requires
>> registering with (or inheriting from) the appropriate ABC for it to
>> work in all situations. IOW, in this case, is it sufficient to
>> implement __fspath__ to make your type pathlike? Is there a conscious
>> trend towards requiring the ABC?
>
>
> ABCs like os.PathLike can override __subclasshook__ so that registration
> isn't required (see
> https://hg.python.org/cpython/file/default/Lib/os.py#l1136). So registration
> is definitely good to do to be explicit that you're trying to meet an ABC,
> but it isn't strictly required.

Ok I suppose that's fine, so I propose we update the ABC part in the
PEP with __subclasshook__.

And the other question could be turned into whether to make str and
bytes also PathLike in __subclasshook__.

-- Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Brett Cannon

PEP 519 updated: https://hg.python.org/peps/rev/92feff129ee4

On Wed, 15 Jun 2016 at 11:44 Brett Cannon  wrote:

> On Wed, 15 Jun 2016 at 11:39 Guido van Rossum  wrote:
>
>> OK, so let's add a check on the return of __fspath__() and keep the check
>> on path-like or string/bytes.
>>
>
> I'll update the PEP.
>
> Ethan, do you want to leave a note on the os.fspath() issue to update the
> code and go through where we've used os.fspath() to see where we can cut
> out redundant type checks?
>
>
>> --Guido (mobile)
>> On Jun 15, 2016 11:29 AM, "Nick Coghlan"  wrote:
>>
>>> On 15 June 2016 at 10:59, Brett Cannon  wrote:
>>> >
>>> >
>>> > On Wed, 15 Jun 2016 at 09:48 Guido van Rossum 
>>> wrote:
>>> >>
>>> >> These are really two separate proposals.
>>> >>
>>> >> I'm okay with checking the return value of calling obj.__fspath__;
>>> that's
>>> >> an error in the object anyways, and it doesn't matter much whether we
>>> do
>>> >> this or not (though when approving the PEP I considered this and
>>> decided not
>>> >> to insert a check for this). But it doesn't affect your example, does
>>> it? I
>>> >> guess it's easier to raise now and change the API in the future to
>>> avoid
>>> >> raising in this case (if we find that raising is undesirable) than
>>> the other
>>> >> way around, so I'm +0 on this.
>>> >
>>> > +0 from me as well. I know in some code in the stdlib that has been
>>> ported
>>> > which prior to adding support was explicitly checking for str/bytes
>>> this
>>> > will eliminate its own checking (obviously not a motivating factor as
>>> it's
>>> > pretty minor).
>>>
>>> I'd like a strong assertion that the return value of os.fspath() is a
>>> plausible filesystem path representation (so either bytes or str), and
>>> *not* some other kind of object that can also be used for accessing
>>> the filesystem (like a file descriptor or an IO stream)
>>>
>>> >> The other proposal (passing anything that's not understood right
>>> through)
>>> >> is more interesting and your use case is somewhat compelling.
>>> Catching the
>>> >> exception coming out of os.fspath() would certainly be much messier.
>>> The
>>> >> question remaining is whether, when this behavior is not desired
>>> (e.g. when
>>> >> the caller of os.fspath() just wants a string that it can pass to
>>> open()),
>>> >> the condition of passing that's neither a string not supports
>>> __fspath__
>>> >> still produces an understandable error. I'm not sure that that's the
>>> case.
>>> >> E.g. open() accepts file descriptors in addition to paths, but I'm
>>> not sure
>>> >> that accepting an integer is a good idea in most cases -- it either
>>> gives a
>>> >> mystery "Bad file descriptor" error or starts reading/writing some
>>> random
>>> >> system file, which it then closes once the stream is closed.
>>> >
>>> > The FD issue of magically passing through an int was also a concern
>>> when
>>> > Ethan brought this up in an issue on the tracker. My argument is that
>>> FDs
>>> > are not file paths and so shouldn't magically pass through if we're
>>> going to
>>> > type-check anything or claim os.fspath() only works with paths (FDs are
>>> > already open file objects). So in my view  either we go ahead and
>>> type-check
>>> > the return value of __fspath__() and thus restrict everything coming
>>> out of
>>> > os.fspath() to Union[str, bytes] or we don't type check anything and be
>>> > consistent that os.fspath() simply does is call __fspath__() if
>>> present.
>>> >
>>> > And just  because I'm thinking about it, I would special-case the FDs,
>>> not
>>> > os.PathLike (clearer why you care and faster as it skips the override
>>> of
>>> > __subclasshook__):
>>> >
>>> > # Can be a single-line ternary operator if preferred.
>>> > if not isinstance(filename, int):
>>> > filename = os.fspath(filename)
>>>
>>> Note that the LZMA case Ethan cites is one where the code accepts
>>> either an already opened file-like object *or* a path-like object, and
>>> does different things based on which it receives.
>>>
>>> In that scenario, rather than introducing an unconditional "filename =
>>> os.fspath(filename)" before the current logic, it makes more sense to
>>> me to change the current logic to use the new protocol check rather
>>> than a strict typecheck on str/bytes:
>>>
>>> if isinstance(filename, os.PathLike): # Changed line
>>> filename = os.fspath(filename)# New line
>>> if "b" not in mode:
>>> mode += "b"
>>> self._fp = builtins.open(filename, mode)
>>> self._closefp = True
>>> self._mode = mode_code
>>> elif hasattr(filename, "read") or hasattr(filename, "write"):
>>> self._fp = filename
>>> self._mode = mode_code
>>> else:
>>> raise TypeError(
>>>  "filename must be a path-like or file-like object"
>>>   )
>>>
>>> I *don't* think it makes sense to weaken the guarantees on os.fspath
>>> to let it propagate non-path-like objects.
>>>
>>> C

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Ethan Furman


On 06/15/2016 12:24 PM, Koos Zevenhoven wrote:

On Wed, Jun 15, 2016 at 10:15 PM, Brett Cannon wrote:



ABCs like os.PathLike can override __subclasshook__ so that registration
isn't required (see
https://hg.python.org/cpython/file/default/Lib/os.py#l1136). So registration
is definitely good to do to be explicit that you're trying to meet an ABC,
but it isn't strictly required.



And the other question could be turned into whether to make str and
bytes also PathLike in __subclasshook__.


No, for two reasons.

- most str's and bytes' are not paths;
- PathLike indicates a rich-path object, which str's and bytes' are not.

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-15 Thread Nick Coghlan

[whew, actually read the whole thread]

On 11 June 2016 at 10:28, Terry Reedy  wrote:
> On 6/11/2016 11:34 AM, Guido van Rossum wrote:
>>
>> In terms of API design, I'd prefer a flag to os.urandom() indicating a
>> preference for
>> - blocking
>> - raising an exception
>> - weaker random bits
>
>
> +100 ;-)
>
> I proposed exactly this 2 days ago, 5 hours after Larry's initial post.

No, this is a bad idea. Asking novice developers to make security
decisions they're not yet qualified to make when it's genuinely
possible for us to do the right thing by default is the antithesis of
good security API design, and os.urandom() *is* a security API
(whether we like it or not - third party documentation written by the
cryptographic software development community has made it so, since
it's part of their guidelines for writing security sensitive code in
pure Python).

Adding *new* APIs is also a bad idea, since "os.urandom() is the right
answer on every OS except Linux, and also the best currently available
answer on Linux" has been the standard security advice for generating
cryptographic secrets in pure Python code for years now, so we should
only change that guidance if we have extraordinarily compelling
reasons to do so, and we don't. Instead, we have Ted T'so himself
chiming in to say: "My preference would be that os.[u]random should
block, because the odds that people would be trying to generate
long-term cryptographic secrets within seconds after boot is very
small, and if you *do* block for a second or two, it's not the end of
the world."

The *actual bug* that triggered this latest firestorm of commentary
(from experts and non-experts alike) had *nothing* to do with user
code calling os.urandom, and instead was a combination of:

- CPython startup requesting cryptographically secure randomness when
it didn't need it
- a systemd init script written in Python running before the kernel
RNG was fully initialised

That created a deadlock between CPython startup and the rest of the
Linux init process, so the latter only continued when the systemd
watchdog timed out and killed the offending script. As others have
noted, this kind of deadlock scenario is generally impossible on other
operating systems, as the operating system doesn't provide a way to
run Python code before the random number generator is ready.

The change Victor made in 3.5.2 to fall back to reading /dev/urandom
directly if the getrandom() syscall returns EAGAIN (effectively
reverting to the Python 3.4 behaviour) was the simplest possible fix
for that problem (and an approach I thoroughly endorse, both for 3.5.2
and for the life of the 3.5 series), but that doesn't make it the
right answer for 3.6+.

To repeat: the problem encountered was NOT due to user code calling
os.urandom(), but rather due to the way CPython initialises its own
internal hash algorithm at interpreter startup. However, due to the
way CPython is currently implemented, fixing the regression in that
not only changed the behaviour of CPython startup, it *also* changed
the behaviour of every call to os.urandom() in Python 3.5.2+.

For 3.6+, we can instead make it so that the only things that actually
rely on cryptographic quality randomness being available are:

- calling a secrets module API
- calling a random.SystemRandom method
- calling os.urandom directly

These are all APIs that were either created specifically for use in
security sensitive situations (secrets module), or have long been
documented (both within our own documentation, and in third party
documentation, books and Q&A sites) as being an appropriate choice for
use in security sensitive situations (os.urandom and
random.SystemRandom).

However, we don't need to make those block waiting for randomness to
be available - we can update them to raise BlockingIOError instead
(which makes it trivial for people to decide for themselves how they
want to handle that case).

Along with that change, we can make it so that starting the
interpreter will never block waiting for cryptographic randomness to
be available (since it doesn't need it), and importing the random
module won't block waiting for it either.

To the best of our knowledge, on all operating systems other than
Linux, encountering the new exception will still be impossible in
practice, as there is no known opportunity to run Python code before
the kernel random number generator is ready.

On Linux, init scripts may still run before the kernel random number
generator is ready, but will now throw an immediate BlockingIOError if
they access an API that relies on crytographic randomness being
available, rather than potentially deadlocking the init process. Folks
encountering that situation will then need to make an explicit
decision:

- loop until the exception is no longer thrown
- switch to reading from /dev/urandom directly instead of calling os.urandom()
- switch to using a cross-platform non-cryptographic API (probably the
random module)

Victor has some additional

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-15 Thread Ethan Furman


On 06/15/2016 01:01 PM, Nick Coghlan wrote:


For 3.6+, we can instead make it so that the only things that actually
rely on cryptographic quality randomness being available are:

- calling a secrets module API
- calling a random.SystemRandom method
- calling os.urandom directly




However, we don't need to make those block waiting for randomness to
be available - we can update them to raise BlockingIOError instead
(which makes it trivial for people to decide for themselves how they
want to handle that case).

Along with that change, we can make it so that starting the
interpreter will never block waiting for cryptographic randomness to
be available (since it doesn't need it), and importing the random
module won't block waiting for it either.


+1

--
~Ethan~

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 520: Ordered Class Definition Namespace

2016-06-15 Thread Nick Coghlan

On 14 June 2016 at 02:41, Nikita Nemkin  wrote:
> Is there any rationale for rejecting alternatives like:

Good questions - Eric, it's likely worth capturing answers to these in
the PEP for the benefit of future readers.

> 1. Adding standard metaclass with ordered namespace.

Adding metaclasses to an existing class can break compatibility with
third party subclasses, so making it possible for people to avoid that
while still gaining the ability to implicitly expose attribute
ordering to class decorators and other potentially interested parties
is a recurring theme behind this PEP and also PEPs 422 and 487.

> 2. Adding `namespace` or `ordered` args to the default metaclass.

See below (as it relates to your own complexity argument)

> 3. Making compiler fill in __definition_order__ for every class
> (just like __qualname__) without touching the runtime.
> ?

Class scopes support conditionals and loops, so we can't necessarily
be sure what names will be assigned without running the code. It's
also possible to make attribute assignments via locals() that are
entirely opaque to the compiler, but visible to the interpreter at
runtime.

> To me, any of the above seems preferred to complicating
> the core part of the language forever.
>
> The vast majority of Python classes don't care about their member
> order, this is minority use case receiving majority treatment.
>
> Also, wiring OrderedDict into class creation means elevating it
> from a peripheral utility to indispensable built-in type.

Right, that's one of the key reasons this is a PEP, rather than just
an item on the issue tracker.

The rationale for "Why not make this configurable, rather than
switching it unilaterally?" is that it's actually *simpler* overall to
just make it the default - we can then change the documentation to say
"class bodies are evaluated in a collections.OrderedDict instance by
default" and record the consequences of that, rather than having to
document yet another class customisation mechanism.

It also eliminates boilerplate from class decorator usage
instructions, where people have to write "to use this class decorator,
you must also specify 'namespace=collections.OrderedDict' in your
class header"

Folks that don't need the ordering information do end up paying a
slight import time and memory cost, which is another key reason for
handling the proposal as a PEP rather than just as a tracker issue.

Aside from the boilerplate reduction when used in conjunction with a
class decorator, a further possible category of consumers would be
documentation generators like pydoc and Sphinx apidoc, which may be
able to switch to displaying methods in definition order, rather than
the current approach of always listing them in alphabetical order.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] proposed os.fspath() change

2016-06-15 Thread Koos Zevenhoven

On Wed, Jun 15, 2016 at 11:00 PM, Ethan Furman  wrote:
> On 06/15/2016 12:24 PM, Koos Zevenhoven wrote:
>>
>> And the other question could be turned into whether to make str and
>> bytes also PathLike in __subclasshook__.
>
> No, for two reasons.
>
> - most str's and bytes' are not paths;

True. Well, at least most str and bytes objects are not *meant* to be
used as paths, even if they could be.

> - PathLike indicates a rich-path object, which str's and bytes' are not.

This does not count as a reason.

If this were called pathlib.PathABC, I would definitely agree [1]. But
since this is called os.PathLike, I'm not quite as sure. Anyway,
including str and bytes is more of a type hinting issue. And since
type hints will in also act as documentation, the naming of types is
becoming more important.

-- Koos

[1] No, I'm not proposing moving this to pathlib

> --
> ~Ethan~
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why does base64 return bytes?

2016-06-15 Thread Anders J. Munch


Paul Moore:
> Finding out whether users/projects typically write such a helper
> function for themselves would be a better way of getting this
> information. Personally, I suspect they don't, but facts beat
> speculation.

Well, I did. It was necessary to get 2to3 conversion to work(*). I turned every 
occurence of

E.encode('base-64')
and
   E.decode('base-64')
into helper function calls that for Python 3 did:
   b64encode(E).decode('ascii')
and
   b64decode(E.encode('ascii'))
(Or something similar, I don't have the code in front of me.)

Leaving out .decode/.encode('ascii') would simply not have worked. That would 
just be asking for TypeError's.


regards, Anders

(*) Yes, I use 2to3, believe it or not.  Maintaining Python 2 code and doing an 
automated conversion to Python 3 as needed.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-15 Thread Nathaniel Smith

On Wed, Jun 15, 2016 at 1:01 PM, Nick Coghlan  wrote:
[...]
> For 3.6+, we can instead make it so that the only things that actually
> rely on cryptographic quality randomness being available are:
>
> - calling a secrets module API
> - calling a random.SystemRandom method
> - calling os.urandom directly
>
> These are all APIs that were either created specifically for use in
> security sensitive situations (secrets module), or have long been
> documented (both within our own documentation, and in third party
> documentation, books and Q&A sites) as being an appropriate choice for
> use in security sensitive situations (os.urandom and
> random.SystemRandom).
>
> However, we don't need to make those block waiting for randomness to
> be available - we can update them to raise BlockingIOError instead
> (which makes it trivial for people to decide for themselves how they
> want to handle that case).
>
> Along with that change, we can make it so that starting the
> interpreter will never block waiting for cryptographic randomness to
> be available (since it doesn't need it), and importing the random
> module won't block waiting for it either.

This all seems exactly right to me, to the point that I've been
dreading having to find the time to write pretty much this exact
email. So thank you :-)

> To the best of our knowledge, on all operating systems other than
> Linux, encountering the new exception will still be impossible in
> practice, as there is no known opportunity to run Python code before
> the kernel random number generator is ready.
>
> On Linux, init scripts may still run before the kernel random number
> generator is ready, but will now throw an immediate BlockingIOError if
> they access an API that relies on crytographic randomness being
> available, rather than potentially deadlocking the init process. Folks
> encountering that situation will then need to make an explicit
> decision:
>
> - loop until the exception is no longer thrown
> - switch to reading from /dev/urandom directly instead of calling os.urandom()
> - switch to using a cross-platform non-cryptographic API (probably the
> random module)
>
> Victor has some additional technical details written up at
> http://haypo-notes.readthedocs.io/pep_random.html and I'd be happy to
> formalise this proposed approach as a PEP (the current reference is
> http://bugs.python.org/issue27282 )

I'd make two additional suggestions:

- one person did chime in on the thread to say that they've used
os.urandom for non-security-sensitive purposes, simply because it
provided a convenient "give me a random byte-string" API that is
missing from random. I think we should go ahead and add a .randbytes
method to random.Random that simply returns a random bytestring using
the regular RNG, to give these users a nice drop-in replacement for
os.urandom.

Rationale: I don't think the existence of these users should block
making os.urandom appropriate for generating secrets, because (1) a
glance at github shows that this is very unusual -- if you skim
through this search you get page after page of functions with names
like "generate_secret_key"

  
https://github.com/search?l=python&p=2&q=urandom&ref=searchresults&type=Code&utf8=%E2%9C%93

and (2) for the minority of people who are using os.urandom for
non-security-sensitive purposes, if they find os.urandom raising an
error, then this is just a regular bug that they will notice
immediately and fix, and anyway it's basically never going to happen.
(As far as we can tell, this has never yet happened in the wild, even
once.) OTOH if os.urandom is allowed to fail silently, then people who
are using it to generate secrets will get silent catastrophic
failures, plus those users can't assume it will never happen because
they have to worry about active attackers trying to drive systems into
unusual states. So I'd much rather ask the non-security-sensitive
users to switch to using something in random, than force the
cryptographic users to switch to using secrets. But it does seem like
it would be good to give those non-security-sensitive users something
to switch to :-).

- It's not exactly true that the Python interpreter doesn't need
cryptographic randomness to initialize SipHash -- it's more that
*some* Python invocations need unguessable randomness (to first
approximation: all those which are exposed to hostile input), and some
don't. And since the Python interpreter has no idea which case it's
in, and since it's unacceptable for it to break invocations that don't
need unguessable hashes, then it has to err on the side of continuing
without randomness. All that's fine.

But, given that the interpreter doesn't know which state it's in,
there's also the possibility that this invocation *will* be exposed to
hostile input, and the 3.5.2+ behavior gives absolutely no warning
that this is what's happening. So instead of letting this potential
error pass silently, I propose that if SipHash fails to acquire real
randomness at star

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-15 Thread Nick Coghlan

On 15 June 2016 at 16:12, Nathaniel Smith  wrote:
> On Wed, Jun 15, 2016 at 1:01 PM, Nick Coghlan  wrote:
>> Victor has some additional technical details written up at
>> http://haypo-notes.readthedocs.io/pep_random.html and I'd be happy to
>> formalise this proposed approach as a PEP (the current reference is
>> http://bugs.python.org/issue27282 )
>
> I'd make two additional suggestions:
>
> - one person did chime in on the thread to say that they've used
> os.urandom for non-security-sensitive purposes, simply because it
> provided a convenient "give me a random byte-string" API that is
> missing from random. I think we should go ahead and add a .randbytes
> method to random.Random that simply returns a random bytestring using
> the regular RNG, to give these users a nice drop-in replacement for
> os.urandom.

That seems reasonable.

> - It's not exactly true that the Python interpreter doesn't need
> cryptographic randomness to initialize SipHash -- it's more that
> *some* Python invocations need unguessable randomness (to first
> approximation: all those which are exposed to hostile input), and some
> don't. And since the Python interpreter has no idea which case it's
> in, and since it's unacceptable for it to break invocations that don't
> need unguessable hashes, then it has to err on the side of continuing
> without randomness. All that's fine.
>
> But, given that the interpreter doesn't know which state it's in,
> there's also the possibility that this invocation *will* be exposed to
> hostile input, and the 3.5.2+ behavior gives absolutely no warning
> that this is what's happening. So instead of letting this potential
> error pass silently, I propose that if SipHash fails to acquire real
> randomness at startup, then it should issue a warning. In practice,
> this will almost never happen. But in the rare cases it does, it at
> least gives the user a fighting chance to realize that their system is
> in a potentially dangerous state. And by using the warnings module, we
> automatically get quite a bit of flexibility.
>
> If some particular
> invocation (e.g. systemd-cron) has audited their code and decided that
> they don't care about this issue, they can make the message go away:
>
>PYTHONWARNINGS=ignore::NoEntropyAtStartupWarning
>
> OTOH if some particular invocation knows that they do process
> potentially hostile input early on (e.g. cloud-init, maybe?), then
> they can explicitly promote the warning to an error:
>
>   PYTHONWARNINGS=error::NoEntropyAtStartupWarning
>
> (I guess the way to implement this would be for the SipHash
> initialization code -- which runs very early -- to set some flag, and
> then we expose that flag in sys._something, and later in the startup
> sequence check for it after the warnings module is functional.
> Exposing the flag at the Python level would also make it possible for
> code like cloud-init to do its own explicit check and respond
> appropriately.)

A Python level warning/flag seems overly elaborate to me, but we can
easily emit a warning on stderr when SipHash is initialised via the
fallback rather than the operating system's RNG.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Final round of the Python Language Summit coverage at LWN

2016-06-15 Thread Jake Edge


Hola python-dev,

The final batch of articles from the Python Language Summit is now
ready.

The starting point is here: https://lwn.net/Articles/688969/

I have added the final six sessions (with SubscriberLinks for those
without a subscription):

Python 3 in Fedora: https://lwn.net/Articles/690676/
https://lwn.net/SubscriberLink/690676/cdf118081ac0ffd5/

The Python JITs are coming: https://lwn.net/Articles/691070/
https://lwn.net/SubscriberLink/691070/2714fd6a4934f016/

Pyjion: https://lwn.net/Articles/691152/
https://lwn.net/SubscriberLink/691152/6334fd8d5a9992c0/

Why is Python slow?: https://lwn.net/Articles/691243/
https://lwn.net/SubscriberLink/691243/669cb2bf2fe220c4/

Automated testing of CPython patches: https://lwn.net/Articles/691307/
https://lwn.net/SubscriberLink/691307/89feefecfe425f58/

The Python security response team: https://lwn.net/Articles/691308/
https://lwn.net/SubscriberLink/691308/432ff50e0f9b794f/

The articles will be freely available (without using the
SubscriberLink) to the world at large in a week ... until then, feel
free to share the SubscriberLinks.

Hopefully I have captured things reasonably well.  If there are
corrections or clarifications needed, though, I recommend posting them
as comments on the article.

With luck, I will be able to sit in on the summit again next year ...

enjoy!

jake

-- 
Jake Edge - LWN - [email protected] - http://lwn.net
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-15 Thread Theodore Ts'o

On Wed, Jun 15, 2016 at 04:12:57PM -0700, Nathaniel Smith wrote:
> - It's not exactly true that the Python interpreter doesn't need
> cryptographic randomness to initialize SipHash -- it's more that
> *some* Python invocations need unguessable randomness (to first
> approximation: all those which are exposed to hostile input), and some
> don't. And since the Python interpreter has no idea which case it's
> in, and since it's unacceptable for it to break invocations that don't
> need unguessable hashes, then it has to err on the side of continuing
> without randomness. All that's fine.

In practice, those Python ivocation which are exposed to hostile input
are those that are started while the network are up.  The vast
majority of time, they are launched by the web brwoser --- and if this
happens after a second or so of the system getting networking
interrupts, (a) getrandom won't block, and (b) /dev/urandom and
getrandom will be initialized.

Also, I wish people would say that this is only an issue on Linux.
Again, FreeBSD's /dev/urandom will block as well if it is
uninitialized.  It's just that in practice, for both Linux and
Freebsd, we try very hard to make sure /dev/urandom is fully
initialized by the time it matters.  It's just that so far, it's only
on Linux when there was an attempt to use Python in the early init
scripts, and in a VM and in a system where everything is modularized
such that the deadlock became visible.

> (I guess the way to implement this would be for the SipHash
> initialization code -- which runs very early -- to set some flag, and
> then we expose that flag in sys._something, and later in the startup
> sequence check for it after the warnings module is functional.
> Exposing the flag at the Python level would also make it possible for
> code like cloud-init to do its own explicit check and respond
> appropriately.)

I really don't think it's that big a of a deal in *practice*, and but
if you really are concerned about the very remote possibility that a
Python invocation could start in early boot, and *then* also stick
around for the long term, and *then* be exosed to hostile input ---
what if you set the flag, and then later on, N minutes, either
automatically, or via some trigger such as cloud-init --- try and see
if /dev/urandom is initialized (even a few seconds later, so long as
the init scripts are hanging, it should be initialized) have Python
hash all of its dicts, or maybe just the non-system dicts (since those
are presumably the ones mos tlikely to be exposed hostile input).

  - Ted
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why does base64 return bytes?

2016-06-15 Thread Greg Ewing


Steven D'Aprano wrote:
I'm 
satisfied that the choice made by Python is the right choice, and that 
it meets the spirit (if, arguably, not the letter) of the RFC.


IMO it meets the letter (if you read it a certain way)
but *not* the spirit.

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-15 Thread Barry Warsaw

On Jun 15, 2016, at 01:01 PM, Nick Coghlan wrote:

>No, this is a bad idea. Asking novice developers to make security
>decisions they're not yet qualified to make when it's genuinely
>possible for us to do the right thing by default is the antithesis of
>good security API design, and os.urandom() *is* a security API
>(whether we like it or not - third party documentation written by the
>cryptographic software development community has made it so, since
>it's part of their guidelines for writing security sensitive code in
>pure Python).

Regardless of what third parties have said about os.urandom(), let's look at
what *we* have said about it.  Going back to pre-churn 3.4 documentation:

os.urandom(n)
Return a string of n random bytes suitable for cryptographic use.

This function returns random bytes from an OS-specific randomness
source. The returned data should be unpredictable enough for cryptographic
applications, though its exact quality depends on the OS
implementation. On a Unix-like system this will query /dev/urandom, and on
Windows it will use CryptGenRandom(). If a randomness source is not found,
NotImplementedError will be raised.

For an easy-to-use interface to the random number generator provided by
your platform, please see random.SystemRandom.

So we very clearly provided platform-dependent caveats on the cryptographic
quality of os.urandom().  We also made a strong claim that there's a direct
connection between os.urandom() and /dev/urandom on "Unix-like system(s)".

We broke that particular promise in 3.5. and semi-fixed it 3.5.2.

>Adding *new* APIs is also a bad idea, since "os.urandom() is the right
>answer on every OS except Linux, and also the best currently available
>answer on Linux" has been the standard security advice for generating
>cryptographic secrets in pure Python code for years now, so we should
>only change that guidance if we have extraordinarily compelling
>reasons to do so, and we don't.

Disagree.

We have broken one long-term promise on os.urandom() ("On a Unix-like system
this will query /dev/urandom") and changed another ("should be unpredictable
enough for cryptographic applications, though its exact quality depends on OS
implementations").

We broke the experienced Linux developer's natural and long-standing link
between the API called os.urandom() and /dev/urandom.  This breaks pre-3.5
code that assumes read-from-/dev/urandom semantics for os.urandom().

We have introduced churn.  Predicting a future SO question such as "Can
os.urandom() block on Linux?" the answer is "No in Python 3.4 and earlier, yes
possibly in Python 3.5.0 and 3.5.1, no in Python 3.5.2 and the rest of the
3.5.x series, and yes possibly in Python 3.6 and beyond".

We have a better answer for "cryptographically appropriate" use cases in
Python 3.6 - the secrets module.  Trying to make os.urandom() "the right
answer on every OS" weakens the promotion of secrets as *the* module to use
for cryptographically appropriate use cases.

IMHO it would be better to leave os.urandom() well enough alone, except for
the documentation which should effectively say, a la 3.4:

os.urandom(n)
Return a string of n random bytes suitable for cryptographic use.

This function returns random bytes from an OS-specific randomness
source. The returned data should be unpredictable enough for cryptographic
applications, though its exact quality depends on the OS
implementation. On a Unix-like system this will query /dev/urandom, and on
Windows it will use CryptGenRandom(). If a randomness source is not found,
NotImplementedError will be raised.

Cryptographic applications should use the secrets module for stronger
guaranteed sources of randomness.

For an easy-to-use interface to the random number generator provided by
your platform, please see random.SystemRandom.

Cheers,
-Barry
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits?

2016-06-15 Thread Larry Hastings



On 06/15/2016 11:45 PM, Barry Warsaw wrote:

So we very clearly provided platform-dependent caveats on the cryptographic
quality of os.urandom().  We also made a strong claim that there's a direct
connection between os.urandom() and /dev/urandom on "Unix-like system(s)".

We broke that particular promise in 3.5. and semi-fixed it 3.5.2.


Well, 3.5.2 hasn't happened yet.  So if you see it as still being 
broken, please speak up now.


Why do you call it only "semi-fixed"?  As far as I understand it, the 
semantics of os.urandom() in 3.5.2rc1 are indistinguishable from reading 
from /dev/urandom directly, except it may not need to use a file handle.



//arry/
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

43 matches

Mail list logo