[issue1290] xml.dom.minidom not able to handle utf-8 data
Changes by Facundo Batista: Removed file: http://bugs.python.org/file8559/unnamed __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1290 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1290] xml.dom.minidom not able to handle utf-8 data
Changes by Facundo Batista: Removed file: http://bugs.python.org/file8560/unnamed __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1290 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1290] xml.dom.minidom not able to handle utf-8 data
Facundo Batista added the comment: CharacterData.__repr__ was constructing a string in response that keeped having a non-ascii character. Fixed in rev 58641. -- resolution: works for me - fixed __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1290 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1290] xml.dom.minidom not able to handle utf-8 data
Raghuram Devarakonda added the comment: The fact that the problem occurs only from the command line and not when run from a script indicates that the real issue is in trying to print the object. Sure enough, if you modify the script to do repr(mydom.firstChild.childNodes), it gets the same problem. So the issue may have some thing to do with how the object is constructed in repr(). I don't have time right now to dig deeper but the parser itself may not have any encoding/decoding issues (apart of ability to print these high level objects). __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1290 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1290] xml.dom.minidom not able to handle utf-8 data
Raghuram Devarakonda added the comment: I forgot to show dom.py source. marvin:cpython$ cat dom.py import xml.dom.minidom as dom data = open('testdata.txt','r').read() mydom = dom.parseString(data) mydom.firstChild.childNodes __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1290 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1290] xml.dom.minidom not able to handle utf-8 data
Raghuram Devarakonda added the comment: When I run the code in a script, I don't get the error. *** marvin:cpython$ python Python 2.5 (r25:51908, Jan 24 2007, 12:48:15) [GCC 4.1.0 (SUSE Linux)] on linux2 Type help, copyright, credits or license for more information. import xml.dom.minidom as dom data = open('testdata.txt','r').read() mydom = dom.parseString(data) mydom.firstChild.childNodes Traceback (most recent call last): File stdin, line 1, in module UnicodeEncodeError: 'ascii' codec can't encode character u'\u2022' in position 18: ordinal not in range(128) import sys sys.getdefaultencoding() 'ascii' marvin:cpython$ python dom.py marvin:cpython$ *** Can you try and see if you can run it from the script too? -- nosy: +draghuram __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1290 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1290] xml.dom.minidom not able to handle utf-8 data
Facundo Batista added the comment: Downloaded the testdata.txt file, and yes, it's UTF-8: [EMAIL PROTECTED]:~/devel$ file testdata.txt testdata.txt: UTF-8 Unicode text But I opened it perfectly! Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type help, copyright, credits or license for more information. import xml.dom.minidom as dom data = open('testdata.txt','r').read() mydom = dom.parseString(data) mydom xml.dom.minidom.Document instance at 0xb7c03b0c In which platform you're working? And yes, you have absolute permission to fix it, patchs are always welcomed! -- nosy: +facundobatista resolution: - works for me status: open - closed __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1290 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1290] xml.dom.minidom not able to handle utf-8 data
Sharmila Sivakumar added the comment: Thanks for your quick response Facundo. I'm working on Ubuntu 7.04, python 2.5.1 Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 This error occurs when the default encoding is 'ascii'. When I change the default encoding to 'utf-8' it works for me too. Is, by any chance, your default encoding 'utf-8'? On 10/18/07, Facundo Batista [EMAIL PROTECTED] wrote: Facundo Batista added the comment: Downloaded the testdata.txt file, and yes, it's UTF-8: [EMAIL PROTECTED]:~/devel$ file testdata.txt testdata.txt: UTF-8 Unicode text But I opened it perfectly! Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type help, copyright, credits or license for more information. import xml.dom.minidom as dom data = open('testdata.txt','r').read() mydom = dom.parseString(data) mydom xml.dom.minidom.Document instance at 0xb7c03b0c In which platform you're working? And yes, you have absolute permission to fix it, patchs are always welcomed! -- nosy: +facundobatista resolution: - works for me status: open - closed __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1290 __ Added file: http://bugs.python.org/file8559/unnamed __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1290 __Thanks for your quick response Facundo.brbrI#39;m working on Ubuntu 7.04, python 2.5.1 brPython 2.5.1 (r251:54863, May 2 2007, 16:56:35) br[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2brbrThis error occurs when the default encoding is #39;ascii#39;. nbsp;When I change the default encoding to #39;utf-8#39; it works for me too. nbsp;Is, by any chance, your default encoding #39;utf-8#39;? brbrdivspan class=gmail_quoteOn 10/18/07, b class=gmail_sendernameFacundo Batista/b lt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt; wrote:/spanblockquote class=gmail_quote style=margin-top: 0; margin-right: 0; margin-bottom: 0; margin-left: 0; margin-left: 0.80ex; border-left-color: #cc; border-left-width: 1px; border-left-style: solid; padding-left: 1ex brFacundo Batista added the comment:brbrDownloaded the testdata.txt file, and yes, it#39;s UTF-8:brbr[EMAIL PROTECTED]:~/devel$ file testdata.txtbrtestdata.txt: UTF-8 Unicode textbrbrBut I opened it perfectly! brbrPython 2.5.1 (r251:54863, Maynbsp;nbsp;2 2007, 16:56:35)br[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2brType quot;helpquot;, quot;copyrightquot;, quot;creditsquot; or quot;licensequot; for more information.br gt;gt;gt; import xml.dom.minidom as dombrgt;gt;gt; data = open(#39;testdata.txt#39;,#39;r#39;).read()brgt;gt;gt; mydom = dom.parseString(data)brgt;gt;gt; mydombrlt;xml.dom.minidom.Document instance at 0xb7c03b0cgt; brgt;gt;gt;brbrIn which platform you#39;re working?brbrAnd yes, you have absolute permission to fix it, patchs are always welcomed!brbr--brnosy: +facundobatistabrresolution:nbsp;nbsp;-gt; works for me brstatus: open -gt; closedbrbr__brTracker lt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt;brlt;a href=http://bugs.python.org/issue1290;http://bugs.python.org/issue1290 /agt;br__br/blockquote/divbr ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1290] xml.dom.minidom not able to handle utf-8 data
Sharmila Sivakumar added the comment: Oops Facundo, that will work. It actually fails * after the dom construction* when you do mydom.firstChild.childNodes I request you to try it again. The prob is there is some encoding and decoding done within the parser, and it uses the default encoding 'ascii'. This fails for utf-8 data. On 10/18/07, Sharmila Sivakumar [EMAIL PROTECTED] wrote: Sharmila Sivakumar added the comment: Thanks for your quick response Facundo. I'm working on Ubuntu 7.04, python 2.5.1 Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 This error occurs when the default encoding is 'ascii'. When I change the default encoding to 'utf-8' it works for me too. Is, by any chance, your default encoding 'utf-8'? On 10/18/07, Facundo Batista [EMAIL PROTECTED] wrote: Facundo Batista added the comment: Downloaded the testdata.txt file, and yes, it's UTF-8: [EMAIL PROTECTED]:~/devel$ file testdata.txt testdata.txt: UTF-8 Unicode text But I opened it perfectly! Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type help, copyright, credits or license for more information. import xml.dom.minidom as dom data = open('testdata.txt','r').read() mydom = dom.parseString(data) mydom xml.dom.minidom.Document instance at 0xb7c03b0c In which platform you're working? And yes, you have absolute permission to fix it, patchs are always welcomed! -- nosy: +facundobatista resolution: - works for me status: open - closed __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1290 __ Added file: http://bugs.python.org/file8559/unnamed __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1290 __ Added file: http://bugs.python.org/file8560/unnamed __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1290 __Oops Facundo, that will work. nbsp;nbsp;It actually failsnbsp;iafternbsp;thenbsp;domnbsp;construction/i when you do brbrmydom.firstChild.childNodesbrbrI request you to try it again.brbrThe prob is there is some encoding and decoding done within the parser, and it uses the default encoding #39;ascii#39;. nbsp;This fails for utf-8 data. brbrdivspan class=gmail_quoteOn 10/18/07, b class=gmail_sendernameSharmila Sivakumar/b lt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt; wrote:/spanblockquote class=gmail_quote style=margin-top: 0; margin-right: 0; margin-bottom: 0; margin-left: 0; margin-left: 0.80ex; border-left-color: #cc; border-left-width: 1px; border-left-style: solid; padding-left: 1ex brSharmila Sivakumar added the comment:brbrThanks for your quick response Facundo.brbrI#39;m working on Ubuntu 7.04, python 2.5.1brPython 2.5.1 (r251:54863, May 2 2007, 16:56:35)br[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4 )] on linux2brbrThis error occurs when the default encoding is #39;ascii#39;.nbsp;nbsp;When I change thebrdefault encoding to #39;utf-8#39; it works for me too.nbsp;nbsp;Is, by any chance, yourbrdefault encoding #39;utf-8#39;? brbrOn 10/18/07, Facundo Batista lt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt; wrote:brgt;brgt;brgt; Facundo Batista added the comment:brgt;brgt; Downloaded the testdata.txt file, and yes, it#39;s UTF-8:brgt;brgt; [EMAIL PROTECTED]:~/devel$ file testdata.txtbrgt; testdata.txt: UTF-8 Unicode textbrgt;brgt; But I opened it perfectly!brgt;brgt; Python 2.5.1 (r251:54863, Maynbsp;nbsp;2 2007, 16:56:35) brgt; [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2brgt; Type quot;helpquot;, quot;copyrightquot;, quot;creditsquot; or quot;licensequot; for more information.brgt; gt;gt;gt; import xml.dom.minidom as dom brgt; gt;gt;gt; data = open(#39;testdata.txt#39;,#39;r#39;).read()brgt; gt;gt;gt; mydom = dom.parseString(data)brgt; gt;gt;gt; mydombrgt; lt;xml.dom.minidom.Document instance at 0xb7c03b0cgt;br gt; gt;gt;gt;brgt;brgt; In which platform you#39;re working?brgt;brgt; And yes, you have absolute permission to fix it, patchs are alwaysbrgt; welcomed!brgt;brgt; --brgt; nosy: +facundobatista brgt; resolution:nbsp;nbsp;-gt; works for mebrgt; status: open -gt; closedbrgt;brgt; __brgt; Tracker lt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt;brgt; lt; a href=http://bugs.python.org/issue1290;http://bugs.python.org/issue1290/agt;brgt; __brgt;brbrAdded file: a href=http://bugs.python.org/file8559/unnamed;http://bugs.python.org/file8559/unnamed /abrbr__brTracker lt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt;brlt;a href=http://bugs.python.org/issue1290;http://bugs.python.org/issue1290/agt; br__br/blockquote/divbr