thank you both - in the end I used recode, which I wasn't aware of.
Fredrik, I had come across your script while googling for solutions,
but failed to make it work....

On Dec 13, 2:21 pm, "Fredrik Lundh" <[EMAIL PROTECTED]> wrote:
> "ardief" wrote:
> > sorry if I'm asking something very obvious but I'm stumped. I have a
> > text that looks like this:
>
> > Sentence 401
> > 4.00pm  &mdash; We set off again; this time via Tony's home to collect
> > a variety of possessions, finally arriving at hospital no.3.
> > Sentence 402
> > 4.55pm  &mdash; Tony is ushered into a side ward with three doctors and
> > I stay outside with Mum.
>
> > And I want the HTML char codes to turn into their equivalent plain
> > text. I've looked at the newsgroup archives, the cookbook, the web in
> > general and can't manage to sort it out.
> > file = open('filename', 'r')
> > ofile = open('otherfile', 'w')
>
> > done = 0
>
> > while not done:
> >    line = file.readline()
> >    if 'THE END' in line:
> >        done = 1
> >    elif '&mdash;' in line:
> >        line.replace('&mdash;', '--')this returns a new line; it doesn't 
> > update the line in place.
>
> >        ofile.write(line)
> >    else:
> >        ofile.write(line)for a more general solution to the actual replace 
> > problem, see:
>
>    http://effbot.org/zone/re-sub.htm#unescape-html
>
> you may also want to lookup the "fileinput" module in the library reference
> manual.
> 
> </F>

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to