Re: Pound sign problem
On 10 April 2017 at 15:17, David Shi via Python-list wrote: > In the data set, pound sign escape appears: > u'price_currency': u'\xa3', u'price_formatted': u'\xa3525,000', > When using table.to_csv after importing pandas as pd, an error message > persists as follows: > UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position > 0: ordinal not in range(128) > The error indicates clearly that you have a character which is not part of the standard ASCII range, hence the message : "ordinal not in range(128)" To understand it better, try to imagine characters as numbers and that basic ASCII defines characters in this range. see http://www.ascii-code.com/ So the pound character is out this range, its ordinal is being read by your program as #a3 in hex (#163 in decimal). So *probably* your data originally is in Latin-1 encoding, First , you should find out where the data comes from: is it text file, or some input, then in which application and encoding was it created. To get rid of errors, I'd say there are 2 common strategies: ensure that all source data is saved in Unicode (save as UTF-8) Or, replace the pound sign with something which is representable in standard ASCII, e.g. replace the pound sign with "GBP" in sources. Otherwise, you must find out which encoding is used in source data and apply re-encoding accordingly to input-output format specification. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pound sign problem
On 2017-04-12 02:29, Steve D'Aprano wrote: > >> In 2017, unless you are reading from old legacy files created > >> using a non-Unicode encoding, you should just use UTF-8. > > > > Thanks for your opinion. My opinion differs. > > What would you suggest then, if not UTF-8? > > My personal favourite legacy encoding is MacRoman, but I wouldn't > recommend anyone use it except to interoperate with legacy Mac > applications and/or data from the 80s and 90s. > > What's your recommendation? "Anything but ASCII"? Heh, how about "Unicode as ASCII-compatible-Python-strings"? ;-) Got this from Peter Otten a while back in response to my request for functionality something like this. http://www.mail-archive.com/python-list@python.org/msg420100.html -tkc $ cat codecs_mynamereplace.py # -*- coding: utf-8 -*- import codecs import unicodedata try: codecs.namereplace_errors except AttributeError: print("using mynamereplace") def mynamereplace(exc): return u"".join( "\\N{%s}" % unicodedata.name(c) for c in exc.object[exc.start:exc.end] ), exc.end codecs.register_error("namereplace", mynamereplace) print(u"mañana".encode("ascii", "namereplace").decode()) $ python3.5 codecs_mynamereplace.py ma\N{LATIN SMALL LETTER N WITH TILDE}ana $ python3.4 codecs_mynamereplace.py using mynamereplace ma\N{LATIN SMALL LETTER N WITH TILDE}ana $ python2.7 codecs_mynamereplace.py using mynamereplace ma\N{LATIN SMALL LETTER N WITH TILDE}ana -- https://mail.python.org/mailman/listinfo/python-list
Re: Pound sign problem
On Wed, 12 Apr 2017 02:23 am, Lew Pitcher wrote: > I recommend whatever encoding is appropriate for the output. There are multiple encodings that are appropriate for ASCII + pound sign. How should the OP choose between them without guidance? If he understood the issue well enough to make an informed decision, he wouldn't have needed to ask for help. > That's not up > to you or me to decide; that's a question that only the OP can answer. Nobody is asking you to *decide*. But you can make a recommendation. Do you really think that the OP is capable of making an informed decision about this issue on his own? If he was, he wouldn't have needed to ask for help solving this problem in the first place. If you're going to help, actually *help*, and don't just pretend to help: "Hi, I'm a stranger in town and I'm trying to get to the post office. What's the best way for me to get there please?" "Well, that depends on whether you're flying the Space Shuttle, travelling by sailing ship, dog sled, or advanced alien hyperdrive. You should take whatever route is most appropriate for your transportation. You're welcome." I'm sorry to be so negative when you're only trying to be helpful, but I too have been on the receiving end of poor-quality "advice" that leaves me just as much in the dark as before I asked the question, so I'm quite sensitive to it. "What should I do here?" "Do whatever you see fit." (I'm not specifically referring to this community, just making a general observation.) > (Imagine, python on an IBM Zseries running ZOS; I can imagine many unlikely things that have come to pass, but that's not one of them. The OP is using Pandas, which requires Python 2.7 or better. https://pypi.python.org/pypi/pandas There is an unofficial, unmaintained(?), third-party port of Python 2.4 to Z/OS, which appears to have had no attention for more than a decade. http://www.teaser.fr/~jymengant/mvspython/mvsPythonPort.html I suppose it is just barely within the realm of possibility that the OP has hacked together his own port of Python 2.7 and Pandas to Z/OS. If so, he'd have already had to deal with some much bigger problems relating to ASCII versus EBCDIC, and if he managed to solve that, it's unlikely that he'd be puzzled by a pound sign in his data. But... even if I grant you your scenario that he's running on Big Iron, that is irrelevant! Using Unicode for his data files is still the better idea. > the "native" characterset > is one of the EBCDIC variants. Would UTF-8 be a better choice there? ) Yes it would. The OP is using Unicode strings so regardless of the OS's native character set, it is better to use Unicode rather than some 8-bit encoding. Today the OP needs a pound sign. Tomorrow he may need a Greek Σ, yen sign, CJK ideograph, or Arabic character. Possibly all in the same document. Using legacy encodings, whether based on EBCDIC or ASCII, should be avoided. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pound sign problem
On Wed, Apr 12, 2017 at 2:23 AM, Lew Pitcher wrote: > Chris Angelico wrote: > >> On Wed, Apr 12, 2017 at 1:24 AM, Lew Pitcher >> wrote: >>> >>> What in "Try changing your target encoding to something other than ASCII" >>> is encouragement to use "old legacy encodings"? >>> In 2017, unless you are reading from old legacy files created using a non-Unicode encoding, you should just use UTF-8. >>> >>> Thanks for your opinion. My opinion differs. >> >> So what encoding *do* you recommend, and why is it better than UTF-8? > > I recommend whatever encoding is appropriate for the output. That's not up > to you or me to decide; that's a question that only the OP can answer. > > (Imagine, python on an IBM Zseries running ZOS; the "native" characterset is > one of the EBCDIC variants. Would UTF-8 be a better choice there? ) So if the OP needed to print out a number, would you take a similarly spineless approach and say that only the OP can decide what numeric base to use? Does every fledgeling programmer need to understand about archaic systems where you needed to use BCD for your numbers? EBCDIC derives from BCD, where a single decimal digit was encoded in four bits... and I'm sure you could name systems even less popular, used on important systems back in the 1960s or so. Does a modern Python programmer need to look through all of those possible ways to represent numbers? NO. Today's programmer should need to know about very few ways to represent numbers, in priority order: 1) Decimal digits represented in ASCII 2) Packed binary, network byte order 3) Packed binary, little-endian. A new programmer shouldn't need to worry about anything other than decimal digits, in fact. Of course other systems do exist, like the MIDI "variable length integer" that packs seven bits into a byte and then uses the high bit as a continuation marker; or IEEE 80-bit floating point, or a multi-limb format like GMP uses, but until you actually need to work with it, you don't need to know about it. Just use the one most obvious encoding. UTF-8 for all text. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pound sign problem
On Wed, 12 Apr 2017 01:24 am, Lew Pitcher wrote: [...] >>> There is no "pound sign" in ASCII[1]. Try changing your target encoding >>> to something other than ASCII. >> >> Please don't encourage the use of old legacy encodings. > > I wonder if you actually read my reply. Of course I did. > What in "Try changing your target encoding to something other than ASCII" > is encouragement to use "old legacy encodings"? The fact that "something other than ASCII" includes dozens of old legacy encodings, including the most obvious one for Western Europeans coming from a Windows environment: Latin-1. There are only three practical choices for text: ASCII, Unicode, and legacy encodings (or "code pages", as many people know them). TRON is effectively only available in Japan, and even there hardly anyone uses it. (And besides, Python doesn't support TRON.) You've (rightly) eliminated ASCII, as the pound sign isn't available. Python doesn't support TRON, so your instruction to the OP is logically equivalent to "use Unicode or a legacy encoding". Its the second half of that which I am objecting to. >> In 2017, unless you are reading from old legacy files created using a >> non-Unicode encoding, you should just use UTF-8. > > Thanks for your opinion. My opinion differs. What would you suggest then, if not UTF-8? My personal favourite legacy encoding is MacRoman, but I wouldn't recommend anyone use it except to interoperate with legacy Mac applications and/or data from the 80s and 90s. What's your recommendation? "Anything but ASCII"? -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pound sign problem
Chris Angelico wrote: > On Wed, Apr 12, 2017 at 1:24 AM, Lew Pitcher > wrote: >> >> What in "Try changing your target encoding to something other than ASCII" >> is encouragement to use "old legacy encodings"? >> >>> In 2017, unless you are reading from old legacy files created using a >>> non-Unicode encoding, you should just use UTF-8. >> >> Thanks for your opinion. My opinion differs. > > So what encoding *do* you recommend, and why is it better than UTF-8? I recommend whatever encoding is appropriate for the output. That's not up to you or me to decide; that's a question that only the OP can answer. (Imagine, python on an IBM Zseries running ZOS; the "native" characterset is one of the EBCDIC variants. Would UTF-8 be a better choice there? ) -- Lew Pitcher "In Skills, We Trust" PGP public key available upon request -- https://mail.python.org/mailman/listinfo/python-list
Re: Pound sign problem
On Wed, Apr 12, 2017 at 1:24 AM, Lew Pitcher wrote: > > What in "Try changing your target encoding to something other than ASCII" is > encouragement to use "old legacy encodings"? > >> In 2017, unless you are reading from old legacy files created using a >> non-Unicode encoding, you should just use UTF-8. > > Thanks for your opinion. My opinion differs. So what encoding *do* you recommend, and why is it better than UTF-8? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Pound sign problem
Steve D'Aprano wrote: > On Tue, 11 Apr 2017 12:50 am, Lew Pitcher wrote: > >> David Shi wrote: >> >>> In the data set, pound sign escape appears: >>> u'price_currency': u'\xa3', u'price_formatted': u'\xa3525,000', > > That looks like David is using Python 2. > >>> When using table.to_csv after importing pandas as pd, an error message >>> persists as follows: UnicodeEncodeError: 'ascii' codec can't encode >>> character u'\xa3' in position 0: ordinal not in range(128) >> >> There is no "pound sign" in ASCII[1]. Try changing your target encoding >> to something other than ASCII. > > Please don't encourage the use of old legacy encodings. I wonder if you actually read my reply. What in "Try changing your target encoding to something other than ASCII" is encouragement to use "old legacy encodings"? > In 2017, unless you are reading from old legacy files created using a > non-Unicode encoding, you should just use UTF-8. Thanks for your opinion. My opinion differs. -- Lew Pitcher "In Skills, We Trust" PGP public key available upon request -- https://mail.python.org/mailman/listinfo/python-list
Re: Pound sign problem
On Tue, 11 Apr 2017 12:50 am, Lew Pitcher wrote: > David Shi wrote: > >> In the data set, pound sign escape appears: >> u'price_currency': u'\xa3', u'price_formatted': u'\xa3525,000', That looks like David is using Python 2. >> When using table.to_csv after importing pandas as pd, an error message >> persists as follows: UnicodeEncodeError: 'ascii' codec can't encode >> character u'\xa3' in position 0: ordinal not in range(128) > > There is no "pound sign" in ASCII[1]. Try changing your target encoding to > something other than ASCII. Please don't encourage the use of old legacy encodings. In 2017, unless you are reading from old legacy files created using a non-Unicode encoding, you should just use UTF-8. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Pound sign problem
David Shi wrote: > In the data set, pound sign escape appears: > u'price_currency': u'\xa3', u'price_formatted': u'\xa3525,000', > When using table.to_csv after importing pandas as pd, an error message > persists as follows: UnicodeEncodeError: 'ascii' codec can't encode > character u'\xa3' in position 0: ordinal not in range(128) There is no "pound sign" in ASCII[1]. Try changing your target encoding to something other than ASCII. [1]: See http://std.dkuug.dk/i18n/charmaps/ascii for a list of valid ASCII values. -- Lew Pitcher "In Skills, We Trust" PGP public key available upon request -- https://mail.python.org/mailman/listinfo/python-list
Re: Pound sign problem
David Shi via Python-list wrote: > In the data set, pound sign escape appears: > u'price_currency': u'\xa3', u'price_formatted': u'\xa3525,000', > When using table.to_csv after importing pandas as pd, an error message > persists as follows: UnicodeEncodeError: 'ascii' codec can't encode > character u'\xa3' in position 0: ordinal not in range(128) The default encoding in Python 2 is ascii, and the pound sign is not part of that. > Can anyone help? Specify an alternative encoding, preferably UTF-8: >>> import pandas >>> df = pandas.DataFrame([[u"\xa3123"], [u"\xa3321"]], columns=["Price"]) >>> df Price 0 £123 1 £321 [2 rows x 1 columns] >>> df.to_csv("tmp.csv", encoding="utf-8") >>> $ cat tmp.csv ,Price 0,£123 1,£321 $ -- https://mail.python.org/mailman/listinfo/python-list
Re: Pound sign problem
David Shi via Python-list writes: > When using table.to_csv after importing pandas as pd I don't know much about that library. What does its documentation say for the ‘table.to_csv’ function? Can you write a *very short* complete example, that we can run to demonstrate the same behaviour you are seeing? > an error message persists as follows: > UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position > 0: ordinal not in range(128) This means the function has been told (or is assuming, in the absence of better information) that the input data is in the ‘ascii’ text encoding. That assumption turns out to be incorrect, for the actual data you have. So that error occurs. You will need to: * Find out exactly what text encoding was used to write the file. Don't guess, because there are many ways to be wrong. * Specify that encoding to the ‘table.to_csv’ function, or to whatever function opens the file. (This might be the Python built-in ‘open’ function, but we'd need to see your short example to know.) -- \“Most people, I think, don't even know what a rootkit is, so | `\ why should they care about it?” —Thomas Hesse, Sony BMG, 2006 | _o__) | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list
Pound sign problem
In the data set, pound sign escape appears: u'price_currency': u'\xa3', u'price_formatted': u'\xa3525,000', When using table.to_csv after importing pandas as pd, an error message persists as follows: UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 0: ordinal not in range(128) Can anyone help? Regards. David -- https://mail.python.org/mailman/listinfo/python-list