Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
Alexander Belopolsky wrote: On Wed, Feb 23, 2011 at 6:32 PM, M.-A. Lemburg m...@egenix.com wrote: Alexander Belopolsky wrote: .. In what sense is Latin-1 the official name? The IANA charset registry has the following listing Name: ISO_8859-1:1987[RFC1345,KXS2] MIBenum: 4 Source: ECMA registry Alias: iso-ir-100 Alias: ISO_8859-1 Alias: ISO-8859-1 (preferred MIME name) Alias: latin1 .. Latin-1 is short for Latin Alphabet No. 1 and started out as ECMA-94 in 1985 and 1986: This does not explain your preference of Latin-1 over Latin1. This is not my preference. See e.g. Wikipedia http://en.wikipedia.org/wiki/ISO/IEC_8859-1 It is common practice to replace spaces in descriptive names with a hyphen to come up with an identifier string (even Google does or undoes this when searching the net). Replacing spaces with an empty string is also an option, but doesn't read as well. Both are perfectly valid abbreviations for Latin Alphabet No. 1. The spelling without - has the advantage of being a valid Python identifier and a module name. The hyphens are converted to underscores by the lookup function in the encodings package. That turns the name into a valid Python module name. The IANA registration for latin1 and lack of that for latin-1 most likely indicates that the former is more commonly found in machine readable metadata. I don't know why you emphasize so much on machine readable metadata. Python source code is machine readable, the Internet is machine readable, all documents found there are machine readable. As I said earlier on: the IANA registry is just that - a registry of names with the purpose of avoiding name clashes in the resp. name space. As such, it is not a standard, but merely a tool to map various aliases to a canoncial name. The fact that an alias is registered doesn't allow any implication on whether it's in wide-spread use or not, e.g. csISOLatin1 gives me 6810 hits on Google. I get 788,000 hits for 'latin1 -latin-1' on Google, 'latin-1' gives 2,600,000 hits. Looks like it's still the preferred way to write that encoding name. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 24 2011) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On 2/24/2011 4:02 AM, M.-A. Lemburg wrote: I get 788,000 hits for 'latin1 -latin-1' on Google, 'latin-1' gives 2,600,000 hits. Looks like it's still the preferred way to write that encoding name. That's bogus. You can't search for latin-1 on Google, it isn't strict enough. The third hit is a url containing latin1 and a title of Latin 1. And it picks up things like Latin 1: The Easy Way, which is a book on Latin. However, you *can* search much more strictly on Google Code Search, which gives 4,014 (latin-1) to 13,597 (latin1). http://www.google.com/codesearch?hl=enlr=q=%28\%22latin1\%22|\%27latin1\%27%29sbtn=Search http://www.google.com/codesearch?hl=enlr=q=%28\%22latin-1\%22|\%27latin-1\%27%29sbtn=Search So, no, I don't think the development world aligns with your pedantry. That's not to say this is a popularity contest, but then let's not cite google hit counts as proof. -- Scott Dial sc...@scottdial.com scod...@cs.indiana.edu ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On Wed, Feb 23, 2011 at 3:51 PM, Stephen J. Turnbull step...@xemacs.org wrote: Jesus Cea writes: Every time I read a message from [long, incompletewink list] and so many others python-devs (not an exhaustive list, if you are not there, you probably should, sorry :), I feel I am faking my knowledge of Python :-). I am a pretender :). Sure. I suspect even some of those *on* the list feel that way sometimes. That's what's so great about the list, the people who contribute! I personally feel that way every time I realise just how much of the standard library I have never even seriously looked at (let alone used) and how much more there is to the Python ecosystem than just CPython (subscribing to Planet Python and the python tag on Stack Overflow has been truly eye opening in that regard). Heck, some day I may even get around to learning how to build a proper Python package ;) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On Tue, Feb 22, 2011 at 11:34 PM, Jesus Cea j...@jcea.es wrote: .. Issue filed. It already has a patch. That was fast!. Now I can sit back waiting for 3.2.1 before touching my project again :). Mixed feelings about the waiting. I hope it is short. It looks like you don't need delay your project: if you spell encoding as latin-1, your pickle loads just fine: with open(z.pickle, mode=rb) as f: pickle.load(f, encoding=latin-1) ... {'ya_volcados': {'comment': ''}} With encoding=latin1, it does fail: with open(z.pickle, mode=rb) as f: pickle.load(f, encoding=latin1) ... Traceback (most recent call last): File stdin, line 1, in module ValueError: operation forbidden on released memoryview object (For those not following the tracker issue, the z.pickle file was posted at http://bugs.python.org/file20839/z.pickle.) There is still a bug, which is best demonstrated by pickle.loads(b'\x80\x02U\x00.', encoding='latin1') Traceback (most recent call last): File stdin, line 1, in module ValueError: operation forbidden on released memoryview object The fact that the above works with encoding='latin-1', pickle.loads(b'\x80\x02U\x00.', encoding='latin-1') '' shows that there is probably more than one bug. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
I'm guessing that one of these encoding names is recognized by the C code while the other one takes the slow path via the aliasing code. On Wed, Feb 23, 2011 at 11:16 AM, Alexander Belopolsky alexander.belopol...@gmail.com wrote: On Tue, Feb 22, 2011 at 11:34 PM, Jesus Cea j...@jcea.es wrote: .. Issue filed. It already has a patch. That was fast!. Now I can sit back waiting for 3.2.1 before touching my project again :). Mixed feelings about the waiting. I hope it is short. It looks like you don't need delay your project: if you spell encoding as latin-1, your pickle loads just fine: with open(z.pickle, mode=rb) as f: pickle.load(f, encoding=latin-1) ... {'ya_volcados': {'comment': ''}} With encoding=latin1, it does fail: with open(z.pickle, mode=rb) as f: pickle.load(f, encoding=latin1) ... Traceback (most recent call last): File stdin, line 1, in module ValueError: operation forbidden on released memoryview object (For those not following the tracker issue, the z.pickle file was posted at http://bugs.python.org/file20839/z.pickle.) There is still a bug, which is best demonstrated by pickle.loads(b'\x80\x02U\x00.', encoding='latin1') Traceback (most recent call last): File stdin, line 1, in module ValueError: operation forbidden on released memoryview object The fact that the above works with encoding='latin-1', pickle.loads(b'\x80\x02U\x00.', encoding='latin-1') '' shows that there is probably more than one bug. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On Wed, Feb 23, 2011 at 4:07 PM, Guido van Rossum gu...@python.org wrote: I'm guessing that one of these encoding names is recognized by the C code while the other one takes the slow path via the aliasing code. This is absolutely right. In fact I am going to propose adding strcmp(lower, latin1) to the following test in PyUnicode_AsEncodedString(): else if ((strcmp(lower, latin-1) == 0) || (strcmp(lower, iso-8859-1) == 0)) return PyUnicode_EncodeLatin1(... I'll open a separate issue for that. In Python's own stdlib and tests latin1 is a more common spelling than latin-1, so it makes sense to optimize it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
Alexander Belopolsky wrote: On Wed, Feb 23, 2011 at 4:07 PM, Guido van Rossum gu...@python.org wrote: I'm guessing that one of these encoding names is recognized by the C code while the other one takes the slow path via the aliasing code. This is absolutely right. In fact I am going to propose adding strcmp(lower, latin1) to the following test in PyUnicode_AsEncodedString(): else if ((strcmp(lower, latin-1) == 0) || (strcmp(lower, iso-8859-1) == 0)) return PyUnicode_EncodeLatin1(... I'll open a separate issue for that. In Python's own stdlib and tests latin1 is a more common spelling than latin-1, so it makes sense to optimize it. Latin-1 is the official name and the one used internally by Python, so it would be good to have the test suite and Python code in general to use that variant of the name (just as utf-8 is preferred over utf8). Instead of adding more aliases to the C code, please change the encoding names in the stdlib and test suite. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 23 2011) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On Wed, Feb 23, 2011 at 4:23 PM, M.-A. Lemburg m...@egenix.com wrote: .. Latin-1 is the official name and the one used internally by Python, so it would be good to have the test suite and Python code in general to use that variant of the name (just as utf-8 is preferred over utf8). Instead of adding more aliases to the C code, please change the encoding names in the stdlib and test suite. I cannot agree with you on this one. Official or not, latin-1 is much less commonly used than latin1. Currently decode(latin1) is 10x slower than decode(latin-1) on short strings. We already have a check for iso-8859-1 alias in PyUnicode_AsEncodedString(). Adding latin1 (and possibly utf8 as well) is likely to speed up many applications at minimal cost. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
Alexander Belopolsky wrote: On Wed, Feb 23, 2011 at 4:23 PM, M.-A. Lemburg m...@egenix.com wrote: .. Latin-1 is the official name and the one used internally by Python, so it would be good to have the test suite and Python code in general to use that variant of the name (just as utf-8 is preferred over utf8). Instead of adding more aliases to the C code, please change the encoding names in the stdlib and test suite. I cannot agree with you on this one. Official or not, latin-1 is much less commonly used than latin1. Currently decode(latin1) is 10x slower than decode(latin-1) on short strings. We already have a check for iso-8859-1 alias in PyUnicode_AsEncodedString(). Adding latin1 (and possibly utf8 as well) is likely to speed up many applications at minimal cost. Fair enough, then add latin1 and utf8 to both PyUnicode_Decode() and PyUnicode_AsEncodedString(). Still, the stdlib and test suite should be examples of using the correct names. I only found these few cases where the wrong Latin-1 name is used in the stdlib: ./distutils/command/bdist_wininst.py: -- # convert back to bytes. latin1 simply avoids any possible -- encoding=latin1) as script: -- script_data = script.read().encode(latin1) ./urllib/request.py: -- data = base64.decodebytes(data.encode('ascii')).decode('latin1') ./asynchat.py: -- encoding= 'latin1' ./ftplib.py: -- encoding = latin1 ./sre_parse.py: -- encode = lambda x: x.encode('latin1') I get 12 hits for the test suite. Yet 108 for the correct name, so I can't follow your statement that the wrong variant is used more often. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 23 2011) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On Wed, Feb 23, 2011 at 4:54 PM, M.-A. Lemburg m...@egenix.com wrote: .. Yet 108 for the correct name, so I can't follow your statement that the wrong variant is used more often. Hmm, your grepping skills are probably better than mine. I get $ grep -iw latin-1 Lib/*.py | wc -l 24 and $ grep -iw latin1 Lib/test/*.py | wc -l 25 (I did get spurious hits with naive grep latin1, so I retract my more often claim and just say that both spellings are equally common.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On Wed, Feb 23, 2011 at 4:23 PM, M.-A. Lemburg m...@egenix.com wrote: .. Latin-1 is the official name and the one used internally by Python, In what sense is Latin-1 the official name? The IANA charset registry has the following listing Name: ISO_8859-1:1987[RFC1345,KXS2] MIBenum: 4 Source: ECMA registry Alias: iso-ir-100 Alias: ISO_8859-1 Alias: ISO-8859-1 (preferred MIME name) Alias: latin1 Alias: l1 Alias: IBM819 Alias: CP819 Alias: csISOLatin1 (See http://www.iana.org/assignments/character-sets) Latin-1 spelling does appear in various unicode.org documents, but not in machine readable files as far as I can tell. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
Alexander Belopolsky wrote: On Wed, Feb 23, 2011 at 4:54 PM, M.-A. Lemburg m...@egenix.com wrote: .. Yet 108 for the correct name, so I can't follow your statement that the wrong variant is used more often. Hmm, your grepping skills are probably better than mine. I get $ grep -iw latin-1 Lib/*.py | wc -l 24 and $ grep -iw latin1 Lib/test/*.py | wc -l 25 (I did get spurious hits with naive grep latin1, so I retract my more often claim and just say that both spellings are equally common.) I used a Python script based on re, perhaps that's why :-) grep only counts lines, not multiple instances on a single line and looking through the hits I found, there are a few false positives such as 'latin-10' or 'iso-latin-1'. Without those, I get 83 hits. If you open a ticket for this, I'll add the list of hits to that ticket. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 23 2011) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On Wed, Feb 23, 2011 at 5:21 PM, M.-A. Lemburg m...@egenix.com wrote: .. If you open a ticket for this, I'll add the list of hits to that ticket. http://bugs.python.org/issue11303 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
M.-A. Lemburg wrote: Still, the stdlib and test suite should be examples of using the correct names. I won't argue with the stdlib portion of your argument, but I would think that the best example of test code would be a complete and thorough check of all options. ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
Alexander Belopolsky wrote: On Wed, Feb 23, 2011 at 4:23 PM, M.-A. Lemburg m...@egenix.com wrote: .. Latin-1 is the official name and the one used internally by Python, In what sense is Latin-1 the official name? The IANA charset registry has the following listing Name: ISO_8859-1:1987[RFC1345,KXS2] MIBenum: 4 Source: ECMA registry Alias: iso-ir-100 Alias: ISO_8859-1 Alias: ISO-8859-1 (preferred MIME name) Alias: latin1 Alias: l1 Alias: IBM819 Alias: CP819 Alias: csISOLatin1 (See http://www.iana.org/assignments/character-sets) Those are registered character set names, not necessarily standard names. Anyone can apply for new aliases to get added to that list. Latin-1 spelling does appear in various unicode.org documents, but not in machine readable files as far as I can tell. Latin-1 is short for Latin Alphabet No. 1 and started out as ECMA-94 in 1985 and 1986: http://www.ecma-international.org/publications/standards/Ecma-094.htm ISO then applied their numbering scheme for the character set standard ISO-8859 in 1987 where Latin-1 became ISO-8859-1. Note that this was before the Internet took off. I assume that since the HTML standard used the more popular name Latin-1 for its definition of the default character set and also made use of the term throughout the spec, it became the de-facto standard name for that character set at the time. I only learned about the term ISO-8859-1 when starting to dive into the Unicode world late in the 1990s. Latin-1 is also sometimes written as ISO Latin-1, e.g. http://msdn.microsoft.com/en-us/library/ms537495(v=vs.85).aspx For much the same reasons, ISO-10646 never really became popular, but Unicode eventually did. ECMA-262 or ISO/IEC 16262 just doesn't sound as good as JavaScript either :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 23 2011) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On Wed, Feb 23, 2011 at 6:32 PM, M.-A. Lemburg m...@egenix.com wrote: Alexander Belopolsky wrote: .. In what sense is Latin-1 the official name? The IANA charset registry has the following listing Name: ISO_8859-1:1987 [RFC1345,KXS2] MIBenum: 4 Source: ECMA registry Alias: iso-ir-100 Alias: ISO_8859-1 Alias: ISO-8859-1 (preferred MIME name) Alias: latin1 .. Latin-1 is short for Latin Alphabet No. 1 and started out as ECMA-94 in 1985 and 1986: This does not explain your preference of Latin-1 over Latin1. Both are perfectly valid abbreviations for Latin Alphabet No. 1. The spelling without - has the advantage of being a valid Python identifier and a module name. The IANA registration for latin1 and lack of that for latin-1 most likely indicates that the former is more commonly found in machine readable metadata. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
M.-A. Lemburg writes: Latin-1 is short for Latin Alphabet No. 1 [...]. I assume that since the HTML standard used the more popular name Latin-1 for its definition of the default character set and also made use of the term throughout the spec, it became the de-facto standard name for that character set at the time. As usual with de facto standards, it got embraced and extended. I've seen people seriously contend that Windows-1252 is an implementation or (conformant) extension of Latin-1, and that the EURO SIGN is now a member of Latin-1. It's just too ambiguous for my taste; I avoid it in discussions of character sets, preferring to be thought idiosyncratic and pedantic. As for the spelling, I think Latin-1 is slightly more readable than Latin1, but the latter is in the same degree more typable.wink For much the same reasons, ISO-10646 never really became popular, but Unicode eventually did. No, there are much more important reasons why Unicode became popular. IMHO, as an encoding standard ISO-10646 had a slight edge over Unicode in the early 1990s, before the two were unified as coded character sets. However, as a text processing system there simply was no comparison. Unicode provided a large number of standard facilities, and was clearly set to add to those, that were way outside of the scope of ISO 10646. Claiming Unicode conformance was a much bigger deal than ISO 10646 (not to mention having the advantage that you could *optionally* save Intel shorts to disk without swabbing them first). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
Google Code search limited to python latin1: 3,489 http://www.google.com/codesearch?hl=enlr=q=latin1+lang%3Apythonsbtn=Search latin-1: 5,604 http://www.google.com/codesearch?hl=enlr=q=latin-1+lang%3Apythonsbtn=Search utf8: 25,341 http://www.google.com/codesearch?hl=enlr=q=utf8+lang%3Apythonsbtn=Search utf-8: 179,806 http://www.google.com/codesearch?hl=enlr=q=utf-8+lang%3Apythonsbtn=Search Interesting that for Latin-1 the split of wrong/right is 40/60 and the split for utf8 is 15/85 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On Wed, Feb 23, 2011 at 8:48 PM, Dj Gilcrease digitalx...@gmail.com wrote: Google Code search limited to python latin1: 3,489 http://www.google.com/codesearch?hl=enlr=q=latin1+lang%3Apythonsbtn=Search latin-1: 5,604 http://www.google.com/codesearch?hl=enlr=q=latin-1+lang%3Apythonsbtn=Search utf8: 25,341 http://www.google.com/codesearch?hl=enlr=q=utf8+lang%3Apythonsbtn=Search utf-8: 179,806 http://www.google.com/codesearch?hl=enlr=q=utf-8+lang%3Apythonsbtn=Search Interesting that for Latin-1 the split of wrong/right is 40/60 and the split for utf8 is 15/85 Your search is invalid. You hit things such as Latin1ClassModel which have no relevance to the issue at hand. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On 2/23/2011 9:19 PM, Alexander Belopolsky wrote: On Wed, Feb 23, 2011 at 8:48 PM, Dj Gilcrease digitalx...@gmail.com wrote: Google Code search limited to python latin1: 3,489 http://www.google.com/codesearch?hl=enlr=q=latin1+lang%3Apythonsbtn=Search latin-1: 5,604 http://www.google.com/codesearch?hl=enlr=q=latin-1+lang%3Apythonsbtn=Search latin1: 1,618 http://www.google.com/codesearch?hl=enlr=q=%28\%22latin1\%22|\%27latin1\%27%29+lang%3Apythonsbtn=Search latin-1: 2,241 http://www.google.com/codesearch?hl=enlr=q=%28\%22latin-1\%22|\%27latin-1\%27%29+lang%3Apythonsbtn=Search utf8: 25,341 http://www.google.com/codesearch?hl=enlr=q=utf8+lang%3Apythonsbtn=Search utf-8: 179,806 http://www.google.com/codesearch?hl=enlr=q=utf-8+lang%3Apythonsbtn=Search utf8 9,676 http://www.google.com/codesearch?hl=enlr=q=%28\%22utf8\%22|\%27utf8\%27%29+lang%3Apythonsbtn=Search utf-8 44,795 http://www.google.com/codesearch?hl=enlr=q=%28\%22utf-8\%22|\%27utf-8\%27%29+lang%3Apythonsbtn=Search Your search is invalid. You hit things such as Latin1ClassModel which have no relevance to the issue at hand. You get about the same ratio if you filter out only the quoted strings. -- Scott Dial sc...@scottdial.com scod...@cs.indiana.edu ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On 22/02/2011 12:14, Jesus Cea wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have 10MB pickled structure generated in Python 2.7. I only use basic types (no clases) like sets, dictionaries, lists, strings, etc. The pickle stores a lot of strings. Some of them should be bytes, while other should be unicode. My idea is to import ALL the strings as bytes in Python 3.2 and navigate the data structure to convert the appropiate values to unicode, in a one-time operation (I version the structure, so I can know if this conversion is already done, simply storing a new version value). But I get this error: Python 3.2 (r32:88445, Feb 21 2011, 13:34:07) [GCC 4.4.3] on linux2 Type help, copyright, credits or license for more information. f=open(file.pickle, mode=rb).read() len(f) 9847316 b=pickle.loads(f,encoding=latin1) Traceback (most recent call last): File stdin, line 1, inmodule ValueError: operation forbidden on released memoryview object That seems like an odd error, but the decision was made that Python 2 byte-strings would be unpickled on Python 3 as Unicode strings. See the discussion here: http://bugs.python.org/issue6784 This is basically because many people do the wrong thing and use Python 2 byte strings for restoring text. What it means though is that people who do the right thing and store binary data in Python 2 byte strings can't use Python 2 pickles from Python 3. It also means that only ascii data can be unpickled. A custom pickler / unpickler is suggested as the solution in this issue. All the best, Michael Foord I use the encoding latin1 for transparent byte/unicode conversion (do not touch the values!). This seems to be a bug in Python 3.2. Any suggestion?. PS: The bytestream is protocol 2. PPS: If there is consensus that this is a real bug, I would create an issue in the tracker and try to get a minimal testcase. - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ j...@jcea.es - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:j...@jabber.org _/_/_/_/ _/_/_/_/_/ . _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTWOomplgi5GaxT1NAQLTYAP/U+eqhQ5nIJyBAqSgYwPmkH4DOlMj4JnH Jgt6okvOV0hRIXlZ7kbWI2l9OuQyUM4gAeTNDSjFaKs9Hswy26Ro6xhtjidivXDS TKw6ocRx92/eHvgsOdEZjrE0D8l0dOqodZddbXELp2DjpYs9aozzAsjTHqNZDE1L fujeTOhtUKw= =/bcO -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On Tue, 22 Feb 2011 13:14:18 +0100 Jesus Cea j...@jcea.es wrote: This seems to be a bug in Python 3.2. Any suggestion?. Report an issue and investigate :) Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 22/02/11 13:20, Michael Foord wrote: Traceback (most recent call last): File stdin, line 1, inmodule ValueError: operation forbidden on released memoryview object That seems like an odd error, but the decision was made that Python 2 byte-strings would be unpickled on Python 3 as Unicode strings. This problem seems NOT related to unicode. In fact, when saying 'encoding=latin1', my binary strings should be converted to unicode without any kind of issue (my plan was, then, to scan the datastructure and convert them to native bytes). The fact is that I get a strange error: ValueError: operation forbidden on released memoryview object. Seems like a bug to me. Google shows no hits. I want to discard any obvious overlooked point. PS: Just checked... Python 3.1.3 imports the pickle just fine. So busy migrating my projects to 3.2 (it was my compromise two years ago :), I don't have time to debug this :). - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ j...@jcea.es - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:j...@jabber.org _/_/_/_/ _/_/_/_/_/ . _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTWO7l5lgi5GaxT1NAQJVLQP/fZMxsybbHfkbwDEJ/DVaBSj8VZ2dkO38 oXsH9ojspbxRTv9BCNakKt8SyDMtzJIB6kaZ10qScxftDAGs22xlkpOJyGxBYgNZ Ut5U425YuUTCyFQyYfREWNs2AqUQOWymnXgIlThDS93n1Y+W2S1ovcT9WJaHyebe ZVDabLUZYlw= =IBN8 -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
PS: Just checked... Python 3.1.3 imports the pickle just fine. So busy migrating my projects to 3.2 (it was my compromise two years ago :), I don't have time to debug this :). I hope you do have a time to open an issue, though :-) Eli ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 22/02/11 15:32, Eli Bendersky wrote: PS: Just checked... Python 3.1.3 imports the pickle just fine. So busy migrating my projects to 3.2 (it was my compromise two years ago :), I don't have time to debug this :). I hope you do have a time to open an issue, though :-) Eli Bugs bug me a lot :). I spend a couple of hours trying to reduce my pickle to something I could post (the original pickle has tons of propietary information): http://bugs.python.org/issue11286 I got a reproductible pickle testcase in only 41 bytes. Seems to be a SERIOUS regression in 3.2. I can't progress further. Pickle internals are out of my expertise. - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ j...@jcea.es - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:j...@jabber.org _/_/_/_/ _/_/_/_/_/ . _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTWPRz5lgi5GaxT1NAQKwdgQAjV5zKB3LhuVMU+JDbPeZjo/oFu1Yz++Z 1xFPuXTtaeMGMYuQH16j5rghqp90Q4u0M/VGaXI99uxcyTR9fpGGVEBE2L0qnVTg 1sbRyCaaVrPDVju3tTonw5QEe7eXnsec9INuK7KCIXUqEZK7klbqoWFFflU5g/Ui hcxe8Zt6lQE= =nWu0 -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
On Tue, Feb 22, 2011 at 9:31 PM, Stephen J. Turnbull step...@xemacs.org wrote: Jesus Cea writes: PPS: If there is consensus that this is a real bug, I would create an issue in the tracker and try to get a minimal testcase. All bugs are issues, but not all issues are bugs. Please don't wait for consensus or even a second opinion to file the issue. AFAICT, it was not mentioned in this thread, but the issue has been created on the tracker: http://bugs.python.org/issue11286 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 23/02/11 03:31, Stephen J. Turnbull wrote: Please don't wait for consensus or even a second opinion to file the issue. It's reasonable for a new Python user to ask whether something is a bug or not, but if somebody with your experience and contribution level to Python doesn't understand something, at the very least we have to suspect a doc bug. Every time I read a message from Antoine Pitrou, Brett Cannon, Nick Coghlan, Martin Löwis, Eric Smith, Steve Holden, Benjamin Peterson, Victor Stinner, Greog Brandl, Raymond Hettinger, Guido and so many others python-devs (not an exhaustive list, if you are not there, you probably should, sorry :), I feel I am faking my knowledge of Python :-). I am a pretender :). BTW, this project is my first real python 3 code (I promised to myself to move after 3.2 release, a year ago), for a real/big project, and I was thinking that maybe I was overlooking something obvious for any seasoned real python programmer. I overcame my fear of being seen as a fool last millenia, so I am not afraid of asking. Sometimes I even ask too much. OTOH, the testcase might require a lot of effort on your part. Of course it's reasonable for you to check whether it's a simple misunderstanding before exerting that effort. In fact, I invested *hours* trying to reduce my multimegabyte problematic pickle to 41 bytes, but at this time I was already convinced that I had hit an ugly and serious bug. Issue filed. It already has a patch. That was fast!. Now I can sit back waiting for 3.2.1 before touching my project again :). Mixed feelings about the waiting. I hope it is short. Life sucks sometimes :). Thanks $DEITY there are quite a few better python-devs than me :). - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ j...@jcea.es - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:j...@jabber.org _/_/_/_/ _/_/_/_/_/ . _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTWSOY5lgi5GaxT1NAQIqQAP/Wf/e2iN3FRb7istoGCpqCgjDv7UyCOWF RzOYMJWh0xhNL5ydZZ2YwtcQNEWrQS538zrr8piOqvV3ielOBCgSWArqChWaQTHU ZC3gdaw8N5VMr0AXGBMXwcflkLaQ7BrBtiQBizFL9KLYGDI9JG8+O1YjpjamUeQv iXEyGdqWUp4= =XwYt -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange error importing a Pickle from 2.7 to 3.2
Jesus Cea writes: Every time I read a message from [long, incompletewink list] and so many others python-devs (not an exhaustive list, if you are not there, you probably should, sorry :), I feel I am faking my knowledge of Python :-). I am a pretender :). Sure. I suspect even some of those *on* the list feel that way sometimes. That's what's so great about the list, the people who contribute! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com