Good points...

I have a plain text file containing the html and words that I want
removed(keywords) from the html file, after processing the html file it
would save it as a plain text file.

So the program would import the keywords, remove them from the html
file and save the html  file as something.txt.

I would post the data but it's secret. I can post an example:

index.html (html page)

<div><p><em>&quot;Python has been an important part of Google since the
beginning, and remains so as the system grows and evolves.
<p>-- Peter Norvig, <a class="reference"

replace.txt (keywords)
<div id="quote" class="homepage-box">



<p>-- Peter Norvig, <a class="reference"


something.txt(file after editing)


Python has been an important part of Google since the beginning, and
remains so as the system grows and evolves.


I've looked into using BeatifulSoup but came to the conculsion that my
idea would work better in the end.

Thanks for the help.

Anthra Norell wrote:
> DH,
>       Could you be more specific describing what you have and what you want? 
> You are addressing people, many of whom are good at
> stripping useless junk once you tell them what 'useless junk' is.
>       Also it helps to post some of you data that you need to process and a 
> sample of the same data as it should look once it is
> processed.
> Frederic
> ----- Original Message -----
> Newsgroups: comp.lang.python
> To: <>
> Sent: Thursday, August 24, 2006 2:11 AM
> Subject: Taking data from a text file to parse html page
> > Hi,
> >
> > I'm trying to strip the html and other useless junk from a html page..
> > Id like to create something like an automated text editor, where it
> > takes the keywords from a txt file and removes them from the html page
> > (replace the words in the html page with blank space) I'm new to python
> > and could use a little push in the right direction, any ideas on how to
> > implement this?
> >
> > Thanks!
> >
> > --
> >


Reply via email to