On Mar 24, 10:30 am, Scott David Daniels <scott.dani...@acm.org> wrote: > CaptainMcCrank wrote: > > Hi list, > > > I'm struggling with a problem analyzing large amounts of unicode data > > in an http wireshark capture. > > I've solved the problem with the interpreter, but I'm not sure how to > > do this in an automated fashion. > > > I'd like to grab a line from a text file & translate the unicode > > sections of it to ascii. So, for example > > I'd like to take > > "\u003cb\u003eMar 17\u003c/b\u003e" > > > and turn it into > > > "<b>Mar 17</b>" > > > I can handle this from the interpreter as follows: > > >>>> import unicodedata > >>>> mystring = u"\u003cb\u003eMar 17\u003c/b\u003e" > >>>> print mystring > > <b>Mar 17</b> > > > But I don't know what I need to do to automate this! The data that is > > in the quotes from line 2 will have to come from a variable. I am > > unable to figure out how to do this using a variable rather than a > > literal string. > > > Please help! > > You really need to say what version of Python you are working with, > how the code you tried, and the results you got.
Always very good advice, not often taken :-) > Using Python 3.1, I get: > >>> "\u003cb\u003eMar 17\u003c/b\u003e" == '<b>Mar 17</b>' > True Using Python 2.1.3 I get: >>> "\u003cb\u003eMar 17\u003c/b\u003e" == '<b>Mar 17</b>' 0 >>> u"\u003cb\u003eMar 17\u003c/b\u003e" == u'<b>Mar 17</b>' 1 But so what? AFAICT from the OP's description and his joyous response to Peter's suggestion, what he has (in 3.0 syntax) is not "\u003cb\u003e etc" it's b"\u003cb\u003e etc" HTH, John -- http://mail.python.org/mailman/listinfo/python-list