Re: newbie - HTML character codes

Roberto Bonvallet Wed, 13 Dec 2006 06:20:57 -0800

ardief wrote:
[...]
> And I want the HTML char codes to turn into their equivalent plain
> text. I've looked at the newsgroup archives, the cookbook, the web in
> general and can't manage to sort it out. I thought doing something like
> this -
> 
> file = open('filename', 'r')


It's not a good idea to use 'file' as a variable name, since you are
shadowing the builtin type of the same name.

> ofile = open('otherfile', 'w')
> 
> done = 0
> 
> while not done:
>    line = file.readline()
>    if 'THE END' in line:
>        done = 1
>    elif '&mdash;' in line:
>        line.replace('&mdash;', '--')

The replace method doesn't modify the 'line' string, it returns a new string.

>        ofile.write(line)
>    else:
>        ofile.write(line)

This should work (untested):

    infile  = open('filename', 'r')
    outfile = open('otherfile', 'w')

    for line in infile:
        outfile.write(line.replace('&mdash;', '--'))

But I think the best approach is to use a existing aplication or library
that solves the problem.  recode(1) can easily convert to and from HTML
entities:

    recode html..utf-8 filename

Best regards.
-- 
Roberto Bonvallet
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: newbie - HTML character codes

Reply via email to