Re: [Python-Dev] Python 3.x and bytes
On May 19, 2011, at 7:40 PM, Ethan Furman wrote: > Several folk have said that objects that compare equal must hash equal... And so do the docs: http://docs.python.org/dev/reference/datamodel.html#object.__hash__ , "the only required property is that objects which compare equal have the same hash value". Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
2011/5/19 Ethan Furman : > If anybody has a link to or an explanation why equal values must be equal > hashes I'm all ears. My apologies in advance if this is an incredibly naive > question. https://secure.wikimedia.org/wikipedia/en/wiki/Hash_table -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
Nick Coghlan wrote: On Thu, May 19, 2011 at 6:43 PM, Nick Coghlan wrote: For point 2, I'm personally +0 on the idea of having 1-element bytes and bytearray objects delegate hashing and comparison operations to the corresponding integer object. We have the power to make the obvious code correct code, so let's do that. However, the implications of the additional key collisions in value based containers may need to be explored further. Several folk have said that objects that compare equal must hash equal... Why? It's an honest question. Here's what I have tried: --> class Wierd(): ... def __init__(self, value): ... self.value = value ... def __eq__(self, other): ... return self.value == other ... def __hash__(self): ... return hash((self.value + 13) ** 3) ... --> one = Wierd(1) --> two = Wierd(2) --> three = Wierd(3) --> one --> one == 1 True --> one == 2 False --> two == 2 True --> three == 3 True --> d = dict() --> d[one] = '1' --> d[two] = '2' --> d[three] = '3' --> d {: '1', : '3', : '2'} --> d[1] = '1.0' --> d[2] = '2.0' --> d[3] = '3.0' --> d {: '3', 1: '1.0', 2: '2.0', 3: '3.0', : '2', : '1'} --> d[2] '2.0' --> d[two] '2' This behavior matches what I was imagining for having b'a' == 97. They compare equal, yet remain distinct objects for all other purposes. If anybody has a link to or an explanation why equal values must be equal hashes I'm all ears. My apologies in advance if this is an incredibly naive question. ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Issue #12120, Issue #12119: tests were missing a sys.dont_write_bytecode check
Python 3.3 is not supposed to create .pyc files in the same directory than the .py files. So I don't understand the following code. Le jeudi 19 mai 2011 à 19:56 +0200, tarek.ziade a écrit : > http://hg.python.org/cpython/rev/9d1fb6a9104b > changeset: 70207:9d1fb6a9104b > user:Tarek Ziade > date:Thu May 19 19:56:12 2011 +0200 > summary: > Issue #12120, Issue #12119: tests were missing a sys.dont_write_bytecode > check > > files: > Lib/distutils/tests/test_build_py.py | 3 ++- > Lib/packaging/tests/test_command_build_py.py | 3 ++- > Misc/NEWS| 3 +++ > 3 files changed, 7 insertions(+), 2 deletions(-) > > > diff --git a/Lib/distutils/tests/test_build_py.py > b/Lib/distutils/tests/test_build_py.py > --- a/Lib/distutils/tests/test_build_py.py > +++ b/Lib/distutils/tests/test_build_py.py > @@ -58,7 +58,8 @@ > pkgdest = os.path.join(destination, "pkg") > files = os.listdir(pkgdest) > self.assertTrue("__init__.py" in files) > -self.assertTrue("__init__.pyc" in files) > +if not sys.dont_write_bytecode: > +self.assertTrue("__init__.pyc" in files) > self.assertTrue("README.txt" in files) > > def test_empty_package_dir (self): > diff --git a/Lib/packaging/tests/test_command_build_py.py > b/Lib/packaging/tests/test_command_build_py.py > --- a/Lib/packaging/tests/test_command_build_py.py > +++ b/Lib/packaging/tests/test_command_build_py.py > @@ -61,7 +61,8 @@ > pkgdest = os.path.join(destination, "pkg") > files = os.listdir(pkgdest) > self.assertIn("__init__.py", files) > -self.assertIn("__init__.pyc", files) > +if not sys.dont_write_bytecode: > +self.assertIn("__init__.pyc", files) > self.assertIn("README.txt", files) > > def test_empty_package_dir(self): > diff --git a/Misc/NEWS b/Misc/NEWS > --- a/Misc/NEWS > +++ b/Misc/NEWS > @@ -153,6 +153,9 @@ > Library > --- > > +- Issue #12120, #12119: skip a test in packaging and distutils > + if sys.dont_write_bytecode is set to True. > + > - Issue #12065: connect_ex() on an SSL socket now returns the original errno >when the socket's timeout expires (it used to return None). > > > ___ > Python-checkins mailing list > python-check...@python.org > http://mail.python.org/mailman/listinfo/python-checkins ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Don't set local variable in a list comprehension or generator
On 5/18/2011 10:19 AM, Nadeem Vawda wrote: > I'm not sure why you would encounter code like that in the first place. > Surely any code of the form: > > ''.join(c for c in my_string) > > would just return my_string? Or am I missing something? You might more-or-less legitimately encounter it if the generator expression originally contained a condition which got removed. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
On 5/19/2011 3:49 AM, Nick Coghlan wrote: It's a mental model problem. People try to think of bytes as equivalent to 2.x str and that's just wrong, wrong, wrong. It's far closer to array.array('c'). Or like C char arrays Strings are basically *unique* in returning a length 1 instance of themselves for indexing operations. I still remember having to work that out and get used to it. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
On 19.05.2011 10:37, Stefan Behnel wrote: > Xavier Morel, 19.05.2011 09:41: >> On 2011-05-19, at 07:28 , Georg Brandl wrote: >>> On 19.05.2011 00:39, Greg Ewing wrote: If someone sees that some_var[3] == b'd' is true, and that some_var[3] == 100 is also true, they might expect to be able to do things like n = b'd' + 1 and get 101... or maybe b'e'... >>> >>> Maybe they should :) >> >> But why wouldn't "they" expect `b'de' + 1` to work as well in this case? If >> a 1-byte bytes is equivalent to an integer, why not an arbitrary one as >> well? > > The result of this must obviously be b"de1". To clarify my original one-liner: if bytes objects (but only one-char bytes objects) equal integers, you should rightly expect to treat them as integers. This is obviously *not* desirable from a strong-typing POV. Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] packaging landed in stdlib
On 19.05.2011 13:35, Tarek Ziadé wrote: > Hey > > I've pushed packaging in stdlib. There are a few buildbots errors > we're fixing right now. > > We will continue our work in their directly for now on. Rock on! Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
On May 19, 2011, at 1:43 PM, Guido van Rossum wrote: > -1; the result is not a *character* but an integer. Well, really the result ought to be an octet, but I suppose adding an 'octet' type is beyond the scope of even this sprawling discussion :). > I'm personally favoring using b'a'[0] and possibly hiding this in a constant > definition. As someone who spends a frankly unfortunate amount of time handling protocols where things like this are necessary, I agree with this recommendation. In protocols where one needs to compare network data with one-byte type identifiers or packet prefixes, more (documented) constants and less inscrutable junk like if p == 'c': ... elif p == 'j': ... elif p == 'J': # for compatibility ... would definitely be a good thing. Of course, I realize that this sort of programmer will most likely replace those constants with 99, 106, 74 than take a moment to document what they mean, but at least they'll have to pause for a moment and realize that they have now lost _all_ mnemonics... In fact, I feel like I would want to push in the opposite direction: don't treat one-byte bytes slices less like integers; I wish I could more easily treat n-byte sequences _more_ like integers! :). More protocols have 2-byte or 4-byte network-endian packed integers embedded in them than have individual tag bytes that I want to examine. For the typical ASCII-ish protocol where you want to look at command names and CRLF-separated messages, you'd never want to look at an individual octet, stringish operations like split() will give you what you want. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Don't set local variable in a list comprehension or generator
On Wed, May 18, 2011 at 2:34 PM, Victor Stinner wrote: > Le mercredi 18 mai 2011 à 16:19 +0200, Nadeem Vawda a écrit : >> I'm not sure why you would encounter code like that in the first place. > > Well, I found the STORE_FAST/LOAD_FAST "issue" while trying to optimize > the this module which reimplements rot13 using a dict in Python 3: > > d = {} > for c in (65, 97): > for i in range(26): > d[chr(i+c)] = chr((i+13) % 26 + c) > > I tried: > > d = {chr(i+c): chr((i+13) % 26 + c) > for i in range(26) > for c in (65, 97)} > > But it is slower whereas I read somewhere than generators are faster > than loops. I'm curious where you read that. The explicit loop should be faster or equally fast *except* when you can avoid a loop in bytecode by applying map() to a built-in function. However map() with a lambda is significantly slower. Maybe what you recall actually (correctly) said that a comprehension is faster than map+lambda? > By the way, (c for c in ...) is slower than [c for c > in ...]. I suppose that a generator is slower because it exits/reenter > into PyEval_EvalFrameEx() at each step, whereas [c for c ...] uses > BUILD_LIST in a dummy (but fast) loop. Did you test this in Python 2 or 3? In 2 the genexpr is definitely slower than the comprehension; in 3 I'm not sure there's much difference any more. > (c for c in ...) and [c for c in ...] is stupid, but I used a simplified > example to explain the problem. A more realistic example would be: > > squares = (x*x for x in range(1)) > > You don't really need the "x" variable, you just want the square. > Another example is the syntax using a if the filter the data set: > > (x for x in ... if condition(x)) > >> > I heard about optimization in the AST tree instead of working on the >> > bytecode. What is the status of this project? >> >> Are you referring to issue11549? There was some related discussion [1] on >> python-dev about six weeks ago, but I haven't seen anything on the topic >> since then. > > Ah yes, it looks to be this issue. I didn't know that there was an > issue. Hm, probably. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
On Thu, May 19, 2011 at 10:50 AM, Ethan Furman wrote: > Last thought I have for a possible 'solution' -- when a bytes object is > tested for equality against an int raise TypeError. Precedent being sum() > raising a TypeError when passed a list of strings because performance is so > poor. Reason here being that the intuitive behavior will never work and > will always produce silent bugs. Not the same thing at all. The == operator is special, and should not raise exceptions; too many things would start randomly failing (e.g. membership tests for a dict that has both ints and bytes as keys, or for a list containing a variety of types). -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
On Thu, May 19, 2011 at 1:43 AM, Nick Coghlan wrote: > OK, summarising the thread so far from my point of view. > > 1. There are some aspects of the behavior of bytes() objects that > tempt people to think of them as string-like objects (primarily the > b'' literals and their use in repr(), along with the fact that they > fill roles that were filled by str in it's "arbitrary binary data" > incarnation in Python 2.x). The mental model this creates in the > reader is incorrect, as bytes() are far closer to array.array('c') in > their underlying behaviour (and deliberately so - cf. PEP 358, 3112, > 3137). I think most of this "wrong mental model" is actually due to people not having completely internalized the Python 3 way. > One proposal for addressing this is to add a x'deadbeef' literal and > using that in repr() rather than the bytestring. Another would be to > escape all characters, even printable ASCII, in the bytes() > representation. Both of these are undesirable, as they miss the > original purpose of this behaviour: making it easier to work with the > many ASCII based wire protocols that are in widespread use. Indeed, -1 on both. > To be honest, I don't think there is a lot we can do here except to > further emphasise in the documentation and elsewhere that *bytes is > not a string type* (regardless of any API similarities retained to > ease transition from the 2.x series). For example, if we have any > lingering references to "byte strings" they should be replaced with > "byte sequences" or "bytes objects" (depending on context, as the > former phrasing also encompasses bytearray objects). +1 > 2. As a concrete usability issue, it is awkward to programmatically > check the value of a specific byte when working with an ASCII based > protocol: > > data[i] == b'a' # Intuitive, but always False due to type mismatch > data[i:i+1] == b'a' # Works, but clumsy > data[i] == b'a'[0] # Ditto (but at least susceptible to compiler > const-expression optimisation) > data[i] == ord('a') # Clumsy and slow > data[i] == 97 # Hard to read > > Proposals to address this include: > - introduce a "character" literal to allow c'a' as an alternative to ord('a') -1; the result is not a *character* but an integer. I'm personally favoring using b'a'[0] and possibly hiding this in a constant definition. >Potentially workable, but leaves the intuitive answer above > silently producing an unexpected answer I'm not convinced that that problem is any worse than other comparison-related problems. E.g. b'a' == 'a' also always returns False (most likely it'll be disguised by at least one operand being a variable of course.) > - allow 1-element byte sequences to compare equal to the corresponding > integer values. > - would require reworking of bytes.__hash__ to use the hash of the > contained element when the data length is exactly 1 > - transitivity of equality would recommend also supporting > equivalences such as b'a' == 97.0 > - backwards compatibility concerns arise due to introduction of > new key collisions in dictionaries and sets and other value based > containers > - yet more string-like behaviour in a type that is *not* a string > (further reinforcing the mistaken impression from point 1) > - One thing that *isn't* a concern from my point of view is the > fact that we have ample precedent in decimal.Decimal for supporting > implicit coercion in comparison operations while disallowing them in > arithmetic operations (Decimal("1") == 1.0 is allowed, but > Decimal("1") + 1.0 will raise TypeError). > > For point 2, I'm personally +0 on the idea of having 1-element bytes > and bytearray objects delegate hashing and comparison operations to > the corresponding integer object. We have the power to make the > obvious code correct code, so let's do that. However, the implications > of the additional key collisions in value based containers may need to > be explored further. My gut feeling about this is that this will probably introduce some confusing or unintended side effect elsewhere, and I am -1 on this change. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
Nick Coghlan wrote: OK, summarising the thread so far from my point of view. [snip] To be honest, I don't think there is a lot we can do here except to further emphasise in the documentation and elsewhere that *bytes is not a string type* (regardless of any API similarities retained to ease transition from the 2.x series). For example, if we have any lingering references to "byte strings" they should be replaced with "byte sequences" or "bytes objects" (depending on context, as the former phrasing also encompasses bytearray objects). I think this would be a big help. 2. As a concrete usability issue, it is awkward to programmatically check the value of a specific byte when working with an ASCII based protocol: data[i] == b'a' # Intuitive, but always False due to type mismatch data[i:i+1] == b'a' # Works, but clumsy data[i] == b'a'[0] # Ditto (but at least susceptible to compiler const-expression optimisation) data[i] == ord('a') # Clumsy and slow data[i] == 97 # Hard to read Proposals to address this include: - introduce a "character" literal to allow c'a' as an alternative to ord('a') Potentially workable, but leaves the intuitive answer above silently producing an unexpected answer [snip] For point 2, I'm personally +0 on the idea of having 1-element bytes and bytearray objects delegate hashing and comparison operations to the corresponding integer object. We have the power to make the obvious code correct code, so let's do that. However, the implications of the additional key collisions in value based containers may need to be explored further. Nick Coghlan also wrote: > On further reflection, the key collision and semantics blurring > problems mean I am at best -0 on this particular solution to the > problem (and heading fairly rapidly in the direction of -1). Last thought I have for a possible 'solution' -- when a bytes object is tested for equality against an int raise TypeError. Precedent being sum() raising a TypeError when passed a list of strings because performance is so poor. Reason here being that the intuitive behavior will never work and will always produce silent bugs. ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] packaging landed in stdlib
On Thu, May 19, 2011 at 1:35 PM, Tarek Ziadé wrote: > Hey > > I've pushed packaging in stdlib. There are a few buildbots errors > we're fixing right now. FYI. there are still some failures we're fixing. Thanks for your patience and thanks to the folks that are helping me on this :) I expect the bbots to be back on track later today Cheers Tarek -- Tarek Ziadé | http://ziade.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [RELEASED] Python 3.2.1 rc 1
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/18/2011 10:46 PM, anatoly techtonik wrote: > On Wed, May 18, 2011 at 10:37 PM, Georg Brandl wrote: >> On 18.05.2011 21:09, "Martin v. Löwis" wrote: >>> Am 18.05.2011 20:39, schrieb Hagen Fürstenau: > On behalf of the Python development team, I am pleased to announce the > first release candidate of Python 3.2.1. Shouldn't there be a tag "v3.2.1rc1" in the hg repo? >>> >>> http://hg.python.org/releasing/3.2.1/ >>> >>> Regards, >>> Martin >>> >>> P.S. "Shouldn't" makes it sound as if there was a mistake. >> >> To clarify: once the final is done, the repo Martin mentioned will be >> merged back to main and then vanish. > > Can't this work be done in the branch of main repo, so that everybody > can track the progress in place? Is there any picture of the process > similar to http://nvie.com/posts/a-successful-git-branching-model/ ? Note that in that writeup, 'release-*' (and 'hotfix-*') branches are not shown as pushed to the 'origin' repository. Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software "Excellence by Design"http://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk3VTeAACgkQ+gerLs4ltQ42kgCeMbIDH6zRU5uyd0Su28Nb9E5q WAMAniWnrvzRReDa+b3mYtavbyaywGVJ =Dr2p -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] looking for a contact at Google on the Blogger team
Several of the PSF blogs hosted on Google's Blogger platform are experiencing issues as fallout from the recent maintenance problems they had. We have already had to recreate at least one of the translations for Python Insider in order to be able to publish to it, and now we can't edit posts on Python Insider itself. Can anyone put me in contact with someone at Google from the Blogger team? I would at least like to know whether the "bX-qpvq7q" problem is being worked on, so I can decide whether to take a hiatus or start moving us to another platform. There are a lot of posts about the error on the support forums, but no obvious response from Google. Thanks, Doug -- Doug Hellmann Communications Director Python Software Foundation http://python.org/psf/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Don't set local variable in a list comprehension or generator
Victor Stinner wrote: I suppose that you have the current value of range(1) on the stack: DUP_TOP; BINARY_MULTIPLY; gives you the square. You don't need the x variable (LOAD_FAST/STORE_FAST). That seems far too special-purpose to be worth it to me. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] packaging landed in stdlib
Hey I've pushed packaging in stdlib. There are a few buildbots errors we're fixing right now. We will continue our work in their directly for now on. The next "big" commit will be for the documentation, Cheers Tarek -- Tarek Ziadé | http://ziade.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
On 19/05/2011 10:25, Łukasz Langa wrote: Wiadomość napisana przez Stefan Behnel w dniu 2011-05-19, o godz. 10:37: But why wouldn't "they" expect `b'de' + 1` to work as well in this case? If a 1-byte bytes is equivalent to an integer, why not an arbitrary one as well? The result of this must obviously be b"de1". I hope you're joking. At best, the result should be b"de\x01". The behaviour Stefan suggests is what some "weakly typed" languages like perl (and possibly php?) do, which masks errors and is rightly abhorred by Python programmers (although semantically not *so* different from 1 + 1.0 == 2.0). I think it's safe to say that Stefan was joking. Michael But I don't think such construct should be allowed. Just like you can't do `[1, 2, 3] + 4`. I wouldn't ever expect that a single byte behaves like a sequence of bytes. In the case of bytes b'a' is obviously still a sequence of bytes, just happening to store a single one. Indexing should return a byte so I'm not surprised it returns a number. Slicing on the other hand returns a sub-sequence. However inconvenient, I find the current behaviour logical and predictable. A shortcut for b'a'[0] would obviously be nice but that's for python-ideas. -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
On Thu, May 19, 2011 at 6:43 PM, Nick Coghlan wrote: > For point 2, I'm personally +0 on the idea of having 1-element bytes > and bytearray objects delegate hashing and comparison operations to > the corresponding integer object. We have the power to make the > obvious code correct code, so let's do that. However, the implications > of the additional key collisions in value based containers may need to > be explored further. On further reflection, the key collision and semantics blurring problems mean I am at best -0 on this particular solution to the problem (and heading fairly rapidly in the direction of -1). Best to just go with b'a'[0] and let the optimiser sort it out (PyPy should handle it automatically, CPython would need work). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Don't set local variable in a list comprehension or generator
Le mercredi 18 mai 2011 à 21:44 -0400, Terry Reedy a écrit : > On 5/18/2011 5:34 PM, Victor Stinner wrote: > > You initial example gave me the impression that the issue has something > to do with join in particular, or even comprehensions in particular. It > is really about for loops. > > >>> dis('for x in range(3): y = x*x') >... > >> 13 FOR_ITER16 (to 32) > 16 STORE_NAME 1 (x) > 19 LOAD_NAME1 (x) > 22 LOAD_NAME1 (x) > 25 BINARY_MULTIPLY > 26 STORE_NAME 2 (y) > ... Yeah, "STORE_NAME; LOAD_NAME; LOAD_NAME" can be replaced by a single opcode: DUP_TOP. But the user expects x to be defined outside the loop: >>> for x in range(3): y = x*x ... >>> x 2 Well, it is possible to detect if x is used or not after the loop, but it is a little more complex to optimize than list comprehension/generator :-) > .. you cannot get that with Python code without a much smarter optimizer. Yes, I would like to write a smarter optimizer. But I first asked if it would accepted to avoid the temporary loop variable because it changes the Python language: the user can expect a loop variable using introspection or a debugger. That's why I suggested to only enable the optimization if Python is running in optimized mode (python -O or python -OO). Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
On Thu, 19 May 2011 17:49:47 +1000 Nick Coghlan wrote: > > It's a mental model problem. People try to think of bytes as > equivalent to 2.x str and that's just wrong, wrong, wrong. It's far > closer to array.array('c'). Strings are basically *unique* in > returning a length 1 instance of themselves for indexing operations. > For every other sequence type, including tuples, lists and arrays, > slicing returns a new instance of the same type, while indexing will > typically return something different. > > Now, we definitely didn't *help* matters by keeping so many of the > default behaviours of bytes() and bytearray() coupled to ASCII-encoded > text, but that was a matter of practicality beating purity: there > really *are* a lot of wire protocols out there that are ASCII based. I think "practicality beating purity" should have been extended to __getitem__ as well. I have almost never had a use for treating a bytestring as a sequence of integers, while treating a bytestring as a sequence of one-byte strings is *very* common. (and, as you say, if you want a sequence of integers you can already use array.array() which gives you more flexibility as to the width and signedness of integers) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Don't set local variable in a list comprehension or generator
Le jeudi 19 mai 2011 à 10:47 +1200, Greg Ewing a écrit : > Victor Stinner wrote: > > >squares = (x*x for x in range(1)) > > What bytecode would you optimise that into? I suppose that you have the current value of range(1) on the stack: DUP_TOP; BINARY_MULTIPLY; gives you the square. You don't need the x variable (LOAD_FAST/STORE_FAST). Full example using a function (instead of loop, so I need to load x): --- import dis, opcode, struct def f(x): return x*x def patch_bytecode(f, bytecode): fcode = f.__code__ code_type = type(f.__code__) new_code = code_type( fcode.co_argcount, fcode.co_kwonlyargcount, fcode.co_nlocals, fcode.co_stacksize, fcode.co_flags, bytecode, fcode.co_consts, fcode.co_names, fcode.co_varnames, fcode.co_filename, fcode.co_name, fcode.co_firstlineno, fcode.co_lnotab, ) f.__code__ = new_code print("Original:") print("f(4) = %s" % f(4)) dis.dis(f) print() LOAD_FAST = opcode.opmap['LOAD_FAST'] DUP_TOP = opcode.opmap['DUP_TOP'] BINARY_MULTIPLY = opcode.opmap['BINARY_MULTIPLY'] RETURN_VALUE = opcode.opmap['RETURN_VALUE'] bytecode = struct.pack( '=BHBBB', LOAD_FAST, 0, DUP_TOP, BINARY_MULTIPLY, RETURN_VALUE) print("Patched:") patch_bytecode(f, bytecode) print("f(4) patched = %s" % f(4)) dis.dis(f) --- Output: --- $ python3 square.py Original: f(4) = 16 3 0 LOAD_FAST0 (x) 3 LOAD_FAST0 (x) 6 BINARY_MULTIPLY 7 RETURN_VALUE Patched: f(4) patched = 16 3 0 LOAD_FAST0 (x) 3 DUP_TOP 4 BINARY_MULTIPLY 5 RETURN_VALUE --- Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
On 2011-05-19, at 11:25 , Łukasz Langa wrote: > Wiadomość napisana przez Stefan Behnel w dniu 2011-05-19, o godz. 10:37: > >>> But why wouldn't "they" expect `b'de' + 1` to work as well in this case? If >>> a 1-byte bytes is equivalent to an integer, why not an arbitrary one as >>> well? >> >> The result of this must obviously be b"de1". > I hope you're joking. At best, the result should be b"de\x01". Actually, if `b'd'+1` returns `b'e'` an equivalent behavior should be that `b'de'+1` returns `b'df'`. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
Łukasz Langa, 19.05.2011 11:25: Wiadomość napisana przez Stefan Behnel w dniu 2011-05-19, o godz. 10:37: But why wouldn't "they" expect `b'de' + 1` to work as well in this case? If a 1-byte bytes is equivalent to an integer, why not an arbitrary one as well? The result of this must obviously be b"de1". I hope you're joking. I "obviously" was. My point is that expectations and "obvious behaviour" may not be obvious to everyone. Nick summed it up very nicely IMHO. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
Wiadomość napisana przez Stefan Behnel w dniu 2011-05-19, o godz. 10:37: >> But why wouldn't "they" expect `b'de' + 1` to work as well in this case? If >> a 1-byte bytes is equivalent to an integer, why not an arbitrary one as well? > > The result of this must obviously be b"de1". I hope you're joking. At best, the result should be b"de\x01". But I don't think such construct should be allowed. Just like you can't do `[1, 2, 3] + 4`. I wouldn't ever expect that a single byte behaves like a sequence of bytes. In the case of bytes b'a' is obviously still a sequence of bytes, just happening to store a single one. Indexing should return a byte so I'm not surprised it returns a number. Slicing on the other hand returns a sub-sequence. However inconvenient, I find the current behaviour logical and predictable. A shortcut for b'a'[0] would obviously be nice but that's for python-ideas. -- Best regards, Łukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Don't set local variable in a list comprehension or generator
On Thu, May 19, 2011 at 7:34 AM, Victor Stinner wrote: > But it is slower whereas I read somewhere than generators are faster > than loops. Are you sure it wasn't that generator expressions can be faster than list comprehensions (if the memory savings are significant)? Or that a reduction function with a generator expression can be faster than a module-level explicit loop (due to the replacement of dict-based variable assignment with fast locals in the generator and C looping in the reduction function)? In general, as long as both are using fast locals and looping in Python, I would expect inline looping code to be faster than the equivalent generator (but often harder to maintain due to lack of reusability). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Skip some tests in the absence of multiprocessing.
On Thu, May 19, 2011 at 2:51 AM, Éric Araujo wrote: > Isn’t support.import_module or somesuch useful for this kind of checks? You have to restructure your tests into the appropriate files for that to work, as support.import_module() throws SkipTest if the module isn't available. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
OK, summarising the thread so far from my point of view. 1. There are some aspects of the behavior of bytes() objects that tempt people to think of them as string-like objects (primarily the b'' literals and their use in repr(), along with the fact that they fill roles that were filled by str in it's "arbitrary binary data" incarnation in Python 2.x). The mental model this creates in the reader is incorrect, as bytes() are far closer to array.array('c') in their underlying behaviour (and deliberately so - cf. PEP 358, 3112, 3137). One proposal for addressing this is to add a x'deadbeef' literal and using that in repr() rather than the bytestring. Another would be to escape all characters, even printable ASCII, in the bytes() representation. Both of these are undesirable, as they miss the original purpose of this behaviour: making it easier to work with the many ASCII based wire protocols that are in widespread use. To be honest, I don't think there is a lot we can do here except to further emphasise in the documentation and elsewhere that *bytes is not a string type* (regardless of any API similarities retained to ease transition from the 2.x series). For example, if we have any lingering references to "byte strings" they should be replaced with "byte sequences" or "bytes objects" (depending on context, as the former phrasing also encompasses bytearray objects). 2. As a concrete usability issue, it is awkward to programmatically check the value of a specific byte when working with an ASCII based protocol: data[i] == b'a' # Intuitive, but always False due to type mismatch data[i:i+1] == b'a' # Works, but clumsy data[i] == b'a'[0] # Ditto (but at least susceptible to compiler const-expression optimisation) data[i] == ord('a') # Clumsy and slow data[i] == 97 # Hard to read Proposals to address this include: - introduce a "character" literal to allow c'a' as an alternative to ord('a') Potentially workable, but leaves the intuitive answer above silently producing an unexpected answer - allow 1-element byte sequences to compare equal to the corresponding integer values. - would require reworking of bytes.__hash__ to use the hash of the contained element when the data length is exactly 1 - transitivity of equality would recommend also supporting equivalences such as b'a' == 97.0 - backwards compatibility concerns arise due to introduction of new key collisions in dictionaries and sets and other value based containers - yet more string-like behaviour in a type that is *not* a string (further reinforcing the mistaken impression from point 1) - One thing that *isn't* a concern from my point of view is the fact that we have ample precedent in decimal.Decimal for supporting implicit coercion in comparison operations while disallowing them in arithmetic operations (Decimal("1") == 1.0 is allowed, but Decimal("1") + 1.0 will raise TypeError). For point 2, I'm personally +0 on the idea of having 1-element bytes and bytearray objects delegate hashing and comparison operations to the corresponding integer object. We have the power to make the obvious code correct code, so let's do that. However, the implications of the additional key collisions in value based containers may need to be explored further. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
Xavier Morel, 19.05.2011 09:41: On 2011-05-19, at 07:28 , Georg Brandl wrote: On 19.05.2011 00:39, Greg Ewing wrote: If someone sees that some_var[3] == b'd' is true, and that some_var[3] == 100 is also true, they might expect to be able to do things like n = b'd' + 1 and get 101... or maybe b'e'... Maybe they should :) But why wouldn't "they" expect `b'de' + 1` to work as well in this case? If a 1-byte bytes is equivalent to an integer, why not an arbitrary one as well? The result of this must obviously be b"de1". Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
On 2011-05-19, at 09:49 , Nick Coghlan wrote: > On Thu, May 19, 2011 at 5:10 AM, Eric Smith wrote: >> On 05/18/2011 12:16 PM, Stephen J. Turnbull wrote: >>> Robert Collins writes: >>> >>> > Its probably too late to change, but please don't try to argue that >>> > its correct: the continued confusion of folk running into this is >>> > evidence that confusion *is happening*. Treat that as evidence and >>> > think about how to fix it going forward. >>> >>> Sorry, Rob, but you're just wrong here, and Nick is right. It's >>> possible to improve Python 3, but not to "fix" it in this respect. >>> The Python 3 solution is correct, the Python 2 approach is not. >>> There's no way to avoid discontinuity and confusion here. >> >> I don't think there's any connection between the way 2.x confused text >> strings and binary data (which certainly needed addressing) with the way >> that 3.x returns a different type for byte_str[i] than it does for >> byte_str[i:i+1]. I think it's the latter that's confusing to people. >> There's no particular requirement for different types that's needed to >> fix the byte/str problem. > > It's a mental model problem. People try to think of bytes as > equivalent to 2.x str and that's just wrong, wrong, wrong. It's far > closer to array.array('c'). Strings are basically *unique* in > returning a length 1 instance of themselves for indexing operations. > For every other sequence type, including tuples, lists and arrays, > slicing returns a new instance of the same type, while indexing will > typically return something different. > > Now, we definitely didn't *help* matters by keeping so many of the > default behaviours of bytes() and bytearray() coupled to ASCII-encoded > text, but that was a matter of practicality beating purity: there > really *are* a lot of wire protocols out there that are ASCII based. > In hindsight, perhaps we should have gone further in breaking things > to try to make the point about the mental model shift more forcefully. > (However, that idea carries with it its own problems). For what it's worth, Erlang's approach to the subject is — in my opinion — excellent: binaries (whose literals are called "bit syntax" there) are quite distinct from strings in both syntax and API, but you can put chunks of strings within binaries (the bit syntax acts as a container, in which you can put a literal or non-literal string). This simultaneously impresses upon the user that binaries are *not* strings and that they can still easily create binaries from strings. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
Robert Collins writes: > Thats separate to the implementation issues I have mentioned in this > thread and previous. Oops, sorry. Nevertheless, I personally think that b'a'[0] == 97 is a good idea, and consistent with everything else in Python. It's Unicode (str) that is weird, it's str is surprising when first encountered by a C or Lisp programmer at first, but not enough to cause a heart attack given how weird natural language is. But I don't see why that weirdness (an element of LIST of TYPE is a LIST of TYPE, hey, young man, you're very smart but *it's turtles all the way down!*) should be replicated elsewhere. If you want your bytes object to behave like a str, it's very easy to get that (.decode('latin1')), and nobody has yet demonstrated that this is too time-inefficient for real work, given the other overhead imposed by Python. The space inefficiency could be dealt with as Greg points out (by internally having a Unicode representation using 1 byte instead of 2 or 4). But if you want your bytes object to *be* a string, then you're confused. It isn't (any more). Even if it's just a matter of flipping one bit in the type field, a str-with-unibyte- representation, is not equal to a bytes object with the same bytes. For example, you write: > urlparse converting bytes to 'str' to operate on them is at best a > kludge - you're forcing 5 times the storage (the original bytes + 4 > bytes-per-byte when its decoded into unicode) to work on something > which is defined as a BNF * that uses ascii *. Indeed it (RFC 3896) does *use* ASCII. But I think there is confusion in your words. This is what the RFC says about that use of ASCII: 2. Characters The URI syntax provides a method of encoding data, presumably for the sake of identifying a resource, as a sequence of characters. [...] The ABNF notation defines its terminal values to be non-negative integers (codepoints) based on the US-ASCII coded character set [ASCII]. Because a URI is a sequence of characters, we must invert that relation in order to understand the URI syntax. Therefore, the integer values used by the ABNF must be mapped back to their corresponding characters via US-ASCII in order to complete the syntax rules. Ie, ASCII is *irrelevant* to (the modern definition of) URLs except as it is a convenient and familiar way to refer to a certain familiar and rather small set of *characters*. There are reasons for this (that I'm not going to rehash here), and they are the *same* reasons why Python 3's behavior is "correct" IMHO (modulo the issue about the type of a list element, which I discuss above). It is true that one might like there to be a literal that expresses `ord(bytes-object-of-length-one)', ie, something like o'a' == 97. (This is different from Greg's x'6465616462656566' == b'deadbeef', which I don't think helps solve the confusion problem although it would definitely be convenient.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
On Thu, May 19, 2011 at 5:10 AM, Eric Smith wrote: > On 05/18/2011 12:16 PM, Stephen J. Turnbull wrote: >> Robert Collins writes: >> >> > Its probably too late to change, but please don't try to argue that >> > its correct: the continued confusion of folk running into this is >> > evidence that confusion *is happening*. Treat that as evidence and >> > think about how to fix it going forward. >> >> Sorry, Rob, but you're just wrong here, and Nick is right. It's >> possible to improve Python 3, but not to "fix" it in this respect. >> The Python 3 solution is correct, the Python 2 approach is not. >> There's no way to avoid discontinuity and confusion here. > > I don't think there's any connection between the way 2.x confused text > strings and binary data (which certainly needed addressing) with the way > that 3.x returns a different type for byte_str[i] than it does for > byte_str[i:i+1]. I think it's the latter that's confusing to people. > There's no particular requirement for different types that's needed to > fix the byte/str problem. It's a mental model problem. People try to think of bytes as equivalent to 2.x str and that's just wrong, wrong, wrong. It's far closer to array.array('c'). Strings are basically *unique* in returning a length 1 instance of themselves for indexing operations. For every other sequence type, including tuples, lists and arrays, slicing returns a new instance of the same type, while indexing will typically return something different. Now, we definitely didn't *help* matters by keeping so many of the default behaviours of bytes() and bytearray() coupled to ASCII-encoded text, but that was a matter of practicality beating purity: there really *are* a lot of wire protocols out there that are ASCII based. In hindsight, perhaps we should have gone further in breaking things to try to make the point about the mental model shift more forcefully. (However, that idea carries with it its own problems). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
On 2011-05-19, at 07:28 , Georg Brandl wrote: > On 19.05.2011 00:39, Greg Ewing wrote: >> Ethan Furman wrote: >> >>> some_var[3] == b'd' >>> >>> 1) a check to see if the bytes instance is length 1 >>> 2) a check to see if >>> i) the other object is an int, and >>> 2) 0 <= other_obj < 256 >>> 3) if 1 and 2, make the comparison instead of returning NotImplemented? >> >> It might seem convenient, but I'd worry that it would lead to >> even more confusion in other ways. If someone sees that >> >>some_var[3] == b'd' >> >> is true, and that >> >>some_var[3] == 100 >> >> is also true, they might expect to be able to do things >> like >> >>n = b'd' + 1 >> >> and get 101... or maybe b'e'... > > Maybe they should :) But why wouldn't "they" expect `b'de' + 1` to work as well in this case? If a 1-byte bytes is equivalent to an integer, why not an arbitrary one as well? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [RELEASED] Python 3.2.1 rc 1
> 3.2.1b1 was already merged back. (And 3.2.1rc1 will also be merged back > soon, since there will be a 3.2.1rc2.) Thanks for the clarification! :-) Cheers, Hagen ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com