subject:"Re\: Taking data from a text file to parse html page"

Re: Taking data from a text file to parse html page

2006-08-26 Thread Anthra Norell

No, I am not running Linux to any extent. But I am very strict about case. 
There is not a single instance of se.py or sel.py
anywhere on my system. You' ll have to find out where lower case sneaks in on 
yours. The zip file preserves case and in the zip file
the names are upper case. I am baffled. But I believe that an import tripping 
up on the wrong case can't be a hard nut to crack.

Frederic

- Original Message -
From: DH [EMAIL PROTECTED]
Newsgroups: comp.lang.python
To: python-list@python.org
Sent: Saturday, August 26, 2006 5:47 AM
Subject: Re: Taking data from a text file to parse html page


 Yes I know how to import modules... I think I found the problem, Linux
 handles upper and lower case differently, so for some reason you can't
 import SE but if you rename it to se it gives you the error that it
 can't find SEL which if you rename it will complain that that SEL isn't
 defined... Are you running Linux? Have you tested it with Linux?



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Taking data from a text file to parse html page

2006-08-26 Thread Georg Brandl

Anthra Norell wrote:
 No, I am not running Linux to any extent. But I am very strict about case. 
 There is not a single instance of se.py or sel.py
 anywhere on my system. You' ll have to find out where lower case sneaks in on 
 yours. The zip file preserves case and in the zip file
 the names are upper case. I am baffled. But I believe that an import tripping 
 up on the wrong case can't be a hard nut to crack.

The problem is the extension:

SE.py is acceptable, while SE.PY is not.

Georg
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Taking data from a text file to parse html page

2006-08-26 Thread Anthra Norell

Yes! It just occurred to my that this could be the problem. I have to change 
that. Thanks for the hint.

Frederic

- Original Message -
From: Georg Brandl [EMAIL PROTECTED]
Newsgroups: comp.lang.python
To: python-list@python.org
Sent: Saturday, August 26, 2006 1:59 PM
Subject: Re: Taking data from a text file to parse html page

 Anthra Norell wrote:
  No, I am not running Linux to any extent. But I am very strict about case. 
  There is not a single instance of se.py or sel.py
  anywhere on my system. You' ll have to find out where lower case sneaks in 
  on yours. The zip file preserves case and in the zip
file
  the names are upper case. I am baffled. But I believe that an import 
  tripping up on the wrong case can't be a hard nut to crack.

 The problem is the extension:

 SE.py is acceptable, while SE.PY is not.

 Georg
 --
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Taking data from a text file to parse html page

2006-08-25 Thread Anthra Norell

Surely you write your own programs. (program_name.py). You import and run them. 
You may put SE.PY and SEL.PY into the same
directory. That's all.
  Or if you prefer to keep other people's stuff in a different directory, 
just make sure that directory is in sys.path,
because that is where import looks. Check for that directory's presence in the 
sys.path list:

 sys.path
['C:\\Python24\\Lib\\idlelib', 'C:\\', 'C:\\PYTHON24\\DLLs', 
'C:\\PYTHON24\\lib', 'C:\\PYTHON24\\lib\\plat-win',
'C:\\PYTHON24\\lib\\lib-tk' (... etc)]

Supposing it isn't there, add it:

 sys.path.append ('/python/code/other_peoples_stuff')
 import SE

That should do it. Let me know if it works. Else just keep asking.

Frederic


- Original Message -
From: DH [EMAIL PROTECTED]
Newsgroups: comp.lang.python
To: python-list@python.org
Sent: Friday, August 25, 2006 4:40 AM
Subject: Re: Taking data from a text file to parse html page


 SE looks very helpful... I'm having a hell of a time installing it
 though:

 -

 [EMAIL PROTECTED]:~/Desktop/SE-2.2$ sudo python SETUP.PY install
 running install
 running build
 running build_py
 file SEL.py (for module SEL) not found
 file SE.py (for module SE) not found
 file SEL.py (for module SEL) not found
 file SE.py (for module SE) not found

 --
 Anthra Norell wrote:
  You may also want to look at this stream editor:
 
  http://cheeseshop.python.org/pypi/SE/2.2%20beta
 
  It allows multiple replacements in a definition format of utmost simplicity:
 
   your_example = '''
  divpemquot;Python has been an important part of Google since the
  beginning, and remains so as the system grows and evolves.
  quot;/em/p
  p-- Peter Norvig, a class=reference
  '''
   import SE
   Tag_Stripper = SE.SE ('''
   ~(.|\n)*?~=   # This pattern finds all tags and deletes them 
  (replaces with nothing)
   ~!--(.|\n)*?--~=   # This pattern deletes comments entirely 
  even if they nest tags
   ''')
   print Tag_Stripper (your_example)
 
  quot;Python has been an important part of Google since the
  beginning, and remains so as the system grows and evolves.
  quot;
  -- Peter Norvig, a class=reference
 
  Now you see a tag fragment. So you add another deletion to the Tag_Stripper 
  (***):
 
  Tag_Stripper = SE.SE ('''
   ~(.|\n)*?~=   # This pattern finds all tags and deletes them 
  (replaces with nothing)
   ~!--(.|\n)*?--~=   # This pattern deletes commentsentirely 
  even if they nest tags
   a class\=reference=# *** This deletes the fragment
   # -- Peter Norvig, a class\=reference=  # Or like this if 
  Peter Norvig has to go too
 ''')
   print Tag_Stripper (your_example)
 
  quot;Python has been an important part of Google since the
  beginning, and remains so as the system grows and evolves.
  quot;
  -- Peter Norvig,
 
  quot; you can either translate or delete:
 
  Tag_Stripper = SE.SE ('''
   ~(.|\n)*?~=   # This pattern finds all tags and deletes them 
  (replaces with nothing)
   ~!--(.|\n)*?--~=   # This pattern deletes commentsentirely 
  even if they nest tags
   a class\=reference=# This deletes the fragment
   # -- Peter Norvig, a class=\\reference\\=  # Or like this if 
  Peter Norvig has to go too
   htm2iso.se # This is a file (contained in the SE package that 
  translates all ampersand codes.
# Naming the file is all you need to do to 
  include the replacements which it defines.
 ''')
 
   print Tag_Stripper (your_example)
 
  'Python has been an important part of Google since the
  beginning, and remains so as the system grows and evolves.
  '
  -- Peter Norvig,
 
  If instead of htm2iso.se you write quot;= you delete it and your 
  output will be:
 
  Python has been an important part of Google since the
  beginning, and remains so as the system grows and evolves.
 
  -- Peter Norvig,
 
  Your Tag_Stripper also does files:
 
   print Tag_Stripper ('my_file.htm', 'my_file_without_tags')
  'my_file_without_tags'
 
 
  A stream editor is not a substitute for a parser. It does handle more 
  economically simple translation jobs like this one where a
  parser does a lot of work which you don't need.
 
  Regards
 
  Frederic
 
 
  - Original Message -
  From: DH [EMAIL PROTECTED]
  Newsgroups: comp.lang.python
  To: python-list@python.org
  Sent: Thursday, August 24, 2006 7:41 PM
  Subject: Re: Taking data from a text file to parse html page
 
 
   I found this
  
 
http://groups.google.com/group/comp.lang.python/browse_thread/thread/d1bda6ebcfb060f9/ad0ac6b1ac8cff51?lnk=gstq=replace+text+filer
  num=8#ad0ac6b1ac8cff51
  
   Credit Jeremy Moles
   ---
  
   finds = ({, }, (, ))
   lines = file(foo.txt, r

Re: Taking data from a text file to parse html page

2006-08-25 Thread DH

Yes I know how to import modules... I think I found the problem, Linux
handles upper and lower case differently, so for some reason you can't
import SE but if you rename it to se it gives you the error that it
can't find SEL which if you rename it will complain that that SEL isn't
defined... Are you running Linux? Have you tested it with Linux?

 Surely you write your own programs. (program_name.py). You import and run 
 them. You may put SE.PY and SEL.PY into the same
 directory. That's all.
   Or if you prefer to keep other people's stuff in a different directory, 
 just make sure that directory is in sys.path,
 because that is where import looks. Check for that directory's presence in 
 the sys.path list:

  sys.path
 ['C:\\Python24\\Lib\\idlelib', 'C:\\', 'C:\\PYTHON24\\DLLs', 
 'C:\\PYTHON24\\lib', 'C:\\PYTHON24\\lib\\plat-win',
 'C:\\PYTHON24\\lib\\lib-tk' (... etc)]

 Supposing it isn't there, add it:

  sys.path.append ('/python/code/other_peoples_stuff')
  import SE

 That should do it. Let me know if it works. Else just keep asking.

 Frederic


 - Original Message -
 From: DH [EMAIL PROTECTED]
 Newsgroups: comp.lang.python
 To: python-list@python.org
 Sent: Friday, August 25, 2006 4:40 AM
 Subject: Re: Taking data from a text file to parse html page


  SE looks very helpful... I'm having a hell of a time installing it
  though:
 
  -
 
  [EMAIL PROTECTED]:~/Desktop/SE-2.2$ sudo python SETUP.PY install
  running install
  running build
  running build_py
  file SEL.py (for module SEL) not found
  file SE.py (for module SE) not found
  file SEL.py (for module SEL) not found
  file SE.py (for module SE) not found
 
  --
  Anthra Norell wrote:
   You may also want to look at this stream editor:
  
   http://cheeseshop.python.org/pypi/SE/2.2%20beta
  
   It allows multiple replacements in a definition format of utmost 
   simplicity:
  
your_example = '''
   divpemquot;Python has been an important part of Google since the
   beginning, and remains so as the system grows and evolves.
   quot;/em/p
   p-- Peter Norvig, a class=reference
   '''
import SE
Tag_Stripper = SE.SE ('''
~(.|\n)*?~=   # This pattern finds all tags and deletes them 
   (replaces with nothing)
~!--(.|\n)*?--~=   # This pattern deletes comments entirely 
   even if they nest tags
''')
print Tag_Stripper (your_example)
  
   quot;Python has been an important part of Google since the
   beginning, and remains so as the system grows and evolves.
   quot;
   -- Peter Norvig, a class=reference
  
   Now you see a tag fragment. So you add another deletion to the 
   Tag_Stripper (***):
  
   Tag_Stripper = SE.SE ('''
~(.|\n)*?~=   # This pattern finds all tags and deletes them 
   (replaces with nothing)
~!--(.|\n)*?--~=   # This pattern deletes commentsentirely 
   even if they nest tags
a class\=reference=# *** This deletes the fragment
# -- Peter Norvig, a class\=reference=  # Or like this if 
   Peter Norvig has to go too
  ''')
print Tag_Stripper (your_example)
  
   quot;Python has been an important part of Google since the
   beginning, and remains so as the system grows and evolves.
   quot;
   -- Peter Norvig,
  
   quot; you can either translate or delete:
  
   Tag_Stripper = SE.SE ('''
~(.|\n)*?~=   # This pattern finds all tags and deletes them 
   (replaces with nothing)
~!--(.|\n)*?--~=   # This pattern deletes commentsentirely 
   even if they nest tags
a class\=reference=# This deletes the fragment
# -- Peter Norvig, a class=\\reference\\=  # Or like this 
   if Peter Norvig has to go too
htm2iso.se # This is a file (contained in the SE package 
   that translates all ampersand codes.
 # Naming the file is all you need to do to 
   include the replacements which it defines.
  ''')
  
print Tag_Stripper (your_example)
  
   'Python has been an important part of Google since the
   beginning, and remains so as the system grows and evolves.
   '
   -- Peter Norvig,
  
   If instead of htm2iso.se you write quot;= you delete it and your 
   output will be:
  
   Python has been an important part of Google since the
   beginning, and remains so as the system grows and evolves.
  
   -- Peter Norvig,
  
   Your Tag_Stripper also does files:
  
print Tag_Stripper ('my_file.htm', 'my_file_without_tags')
   'my_file_without_tags'
  
  
   A stream editor is not a substitute for a parser. It does handle more 
   economically simple translation jobs like this one where a
   parser does a lot of work which you don't need.
  
   Regards
  
   Frederic
  
  
   - Original Message -
   From: DH [EMAIL

Re: Taking data from a text file to parse html page

2006-08-24 Thread Anthra Norell

DH,
  Could you be more specific describing what you have and what you want? 
You are addressing people, many of whom are good at
stripping useless junk once you tell them what 'useless junk' is.
  Also it helps to post some of you data that you need to process and a 
sample of the same data as it should look once it is
processed.

Frederic

- Original Message -
From: DH [EMAIL PROTECTED]
Newsgroups: comp.lang.python
To: python-list@python.org
Sent: Thursday, August 24, 2006 2:11 AM
Subject: Taking data from a text file to parse html page


 Hi,

 I'm trying to strip the html and other useless junk from a html page..
 Id like to create something like an automated text editor, where it
 takes the keywords from a txt file and removes them from the html page
 (replace the words in the html page with blank space) I'm new to python
 and could use a little push in the right direction, any ideas on how to
 implement this?

 Thanks!

 --
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Taking data from a text file to parse html page

2006-08-24 Thread Larry Bates

DH wrote:
 Hi,
 
 I'm trying to strip the html and other useless junk from a html page..
 Id like to create something like an automated text editor, where it
 takes the keywords from a txt file and removes them from the html page
 (replace the words in the html page with blank space) I'm new to python
 and could use a little push in the right direction, any ideas on how to
 implement this?
 
 Thanks!
 
See Beautiful Soup: http://www.crummy.com/software/BeautifulSoup/
it will parse even badly formed HTML and allow you to extract/change
information as you wish.

-Larry Bates
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Taking data from a text file to parse html page

2006-08-24 Thread DH

Frederic,
Good points...

I have a plain text file containing the html and words that I want
removed(keywords) from the html file, after processing the html file it
would save it as a plain text file.

So the program would import the keywords, remove them from the html
file and save the html  file as something.txt.

I would post the data but it's secret. I can post an example:

index.html (html page)


divpemquot;Python has been an important part of Google since the
beginning, and remains so as the system grows and evolves.
quot;/em/p
p-- Peter Norvig, a class=reference



replace.txt (keywords)

div id=quote class=homepage-box

divpemquot;

quot;/em/p

p-- Peter Norvig, a class=reference



something.txt(file after editing)



Python has been an important part of Google since the beginning, and
remains so as the system grows and evolves.



Larry,

I've looked into using BeatifulSoup but came to the conculsion that my
idea would work better in the end.


Thanks for the help.


Anthra Norell wrote:
 DH,
   Could you be more specific describing what you have and what you want? 
 You are addressing people, many of whom are good at
 stripping useless junk once you tell them what 'useless junk' is.
   Also it helps to post some of you data that you need to process and a 
 sample of the same data as it should look once it is
 processed.

 Frederic

 - Original Message -
 From: DH [EMAIL PROTECTED]
 Newsgroups: comp.lang.python
 To: python-list@python.org
 Sent: Thursday, August 24, 2006 2:11 AM
 Subject: Taking data from a text file to parse html page


  Hi,
 
  I'm trying to strip the html and other useless junk from a html page..
  Id like to create something like an automated text editor, where it
  takes the keywords from a txt file and removes them from the html page
  (replace the words in the html page with blank space) I'm new to python
  and could use a little push in the right direction, any ideas on how to
  implement this?
 
  Thanks!
 
  --
  http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Taking data from a text file to parse html page

2006-08-24 Thread Roberto Bonvallet

DH wrote:
  I'm trying to strip the html and other useless junk from a html page..
  Id like to create something like an automated text editor, where it
  takes the keywords from a txt file and removes them from the html page
  (replace the words in the html page with blank space)
[...]
 I've looked into using BeatifulSoup but came to the conculsion that my
 idea would work better in the end.

You could use BeautifulSoup anyway for the junk-removal part and then do
your magic.  Even if it is not exactly what you want, it is a good idea to
try to reuse modules that are good at what they do.

-- 
Roberto Bonvallet
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Taking data from a text file to parse html page

2006-08-24 Thread Fredrik Lundh

DH wrote:

 I have a plain text file containing the html and words that I want
 removed(keywords) from the html file, after processing the html file it
 would save it as a plain text file.
 
 So the program would import the keywords, remove them from the html
 file and save the html  file as something.txt.
 
 I would post the data but it's secret. I can post an example:
 
 index.html (html page)
 
 
 divpemquot;Python has been an important part of Google since the
 beginning, and remains so as the system grows and evolves.
 quot;/em/p
 p-- Peter Norvig, a class=reference
 
  
 replace.txt (keywords)
 
 div id=quote class=homepage-box
 
 divpemquot;
 
 quot;/em/p
 
 p-- Peter Norvig, a class=reference
 
 
 
 something.txt(file after editing)
 
 
 
 Python has been an important part of Google since the beginning, and
 remains so as the system grows and evolves.
 

reading and writing files is described in the tutorial; see

 http://pytut.infogami.com/node9.html

(scroll down to Reading and Writing Files)

to do the replacement, you can use repeated calls to the replace method

 http://pyref.infogami.com/str.replace

but that may cause problems if the replacement text contains things that 
should be replaced.  for an efficient way to do a parallel replace, see:

 http://effbot.org/zone/python-replace.htm#multiple


/F

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Taking data from a text file to parse html page

2006-08-24 Thread DH

 I found this
http://groups.google.com/group/comp.lang.python/browse_thread/thread/d1bda6ebcfb060f9/ad0ac6b1ac8cff51?lnk=gstq=replace+text+filernum=8#ad0ac6b1ac8cff51

Credit Jeremy Moles
---

finds = ({, }, (, ))
lines = file(foo.txt, r).readlines()

for line in lines:
for find in finds:
if find in line:
line.replace(find, )

print lines

---

I want something like
---

finds = file(replace.txt)
lines = file(foo.txt, r).readlines()

for line in lines:
for find in finds:
if find in line:
line.replace(find, )

print lines

---



Fredrik Lundh wrote:
 DH wrote:

  I have a plain text file containing the html and words that I want
  removed(keywords) from the html file, after processing the html file it
  would save it as a plain text file.
 
  So the program would import the keywords, remove them from the html
  file and save the html  file as something.txt.
 
  I would post the data but it's secret. I can post an example:
 
  index.html (html page)
 
  
  divpemquot;Python has been an important part of Google since the
  beginning, and remains so as the system grows and evolves.
  quot;/em/p
  p-- Peter Norvig, a class=reference
  
 
  replace.txt (keywords)
  
  div id=quote class=homepage-box
 
  divpemquot;
 
  quot;/em/p
 
  p-- Peter Norvig, a class=reference
 
  
 
  something.txt(file after editing)
 
  
 
  Python has been an important part of Google since the beginning, and
  remains so as the system grows and evolves.
  

 reading and writing files is described in the tutorial; see

  http://pytut.infogami.com/node9.html

 (scroll down to Reading and Writing Files)

 to do the replacement, you can use repeated calls to the replace method

  http://pyref.infogami.com/str.replace

 but that may cause problems if the replacement text contains things that
 should be replaced.  for an efficient way to do a parallel replace, see:

  http://effbot.org/zone/python-replace.htm#multiple
 
 
 /F

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Taking data from a text file to parse html page

2006-08-24 Thread Anthra Norell

You may also want to look at this stream editor:

http://cheeseshop.python.org/pypi/SE/2.2%20beta

It allows multiple replacements in a definition format of utmost simplicity:

 your_example = '''
divpemquot;Python has been an important part of Google since the
beginning, and remains so as the system grows and evolves.
quot;/em/p
p-- Peter Norvig, a class=reference
'''
 import SE
 Tag_Stripper = SE.SE ('''
 ~(.|\n)*?~=   # This pattern finds all tags and deletes them 
(replaces with nothing)
 ~!--(.|\n)*?--~=   # This pattern deletes comments entirely even 
if they nest tags
 ''')
 print Tag_Stripper (your_example)

quot;Python has been an important part of Google since the
beginning, and remains so as the system grows and evolves.
quot;
-- Peter Norvig, a class=reference

Now you see a tag fragment. So you add another deletion to the Tag_Stripper 
(***):

Tag_Stripper = SE.SE ('''
 ~(.|\n)*?~=   # This pattern finds all tags and deletes them 
(replaces with nothing)
 ~!--(.|\n)*?--~=   # This pattern deletes commentsentirely even if 
they nest tags
 a class\=reference=# *** This deletes the fragment
 # -- Peter Norvig, a class\=reference=  # Or like this if Peter 
Norvig has to go too
   ''')
 print Tag_Stripper (your_example)

quot;Python has been an important part of Google since the
beginning, and remains so as the system grows and evolves.
quot;
-- Peter Norvig,

quot; you can either translate or delete:

Tag_Stripper = SE.SE ('''
 ~(.|\n)*?~=   # This pattern finds all tags and deletes them 
(replaces with nothing)
 ~!--(.|\n)*?--~=   # This pattern deletes commentsentirely even if 
they nest tags
 a class\=reference=# This deletes the fragment
 # -- Peter Norvig, a class=\\reference\\=  # Or like this if 
Peter Norvig has to go too
 htm2iso.se # This is a file (contained in the SE package that 
translates all ampersand codes.
  # Naming the file is all you need to do to 
include the replacements which it defines.
   ''')

 print Tag_Stripper (your_example)

'Python has been an important part of Google since the
beginning, and remains so as the system grows and evolves.
'
-- Peter Norvig,

If instead of htm2iso.se you write quot;= you delete it and your output 
will be:

Python has been an important part of Google since the
beginning, and remains so as the system grows and evolves.

-- Peter Norvig,

Your Tag_Stripper also does files:

 print Tag_Stripper ('my_file.htm', 'my_file_without_tags')
'my_file_without_tags'


A stream editor is not a substitute for a parser. It does handle more 
economically simple translation jobs like this one where a
parser does a lot of work which you don't need.

Regards

Frederic


- Original Message -
From: DH [EMAIL PROTECTED]
Newsgroups: comp.lang.python
To: python-list@python.org
Sent: Thursday, August 24, 2006 7:41 PM
Subject: Re: Taking data from a text file to parse html page


 I found this

http://groups.google.com/group/comp.lang.python/browse_thread/thread/d1bda6ebcfb060f9/ad0ac6b1ac8cff51?lnk=gstq=replace+text+filer
num=8#ad0ac6b1ac8cff51

 Credit Jeremy Moles
 ---

 finds = ({, }, (, ))
 lines = file(foo.txt, r).readlines()

 for line in lines:
 for find in finds:
 if find in line:
 line.replace(find, )

 print lines

 ---

 I want something like
 ---

 finds = file(replace.txt)
 lines = file(foo.txt, r).readlines()

 for line in lines:
 for find in finds:
 if find in line:
 line.replace(find, )

 print lines

 ---



 Fredrik Lundh wrote:
  DH wrote:
 
   I have a plain text file containing the html and words that I want
   removed(keywords) from the html file, after processing the html file it
   would save it as a plain text file.
  
   So the program would import the keywords, remove them from the html
   file and save the html  file as something.txt.
  
   I would post the data but it's secret. I can post an example:
  
   index.html (html page)
  
   
   divpemquot;Python has been an important part of Google since the
   beginning, and remains so as the system grows and evolves.
   quot;/em/p
   p-- Peter Norvig, a class=reference
   
  
   replace.txt (keywords)
   
   div id=quote class=homepage-box
  
   divpemquot;
  
   quot;/em/p
  
   p-- Peter Norvig, a class=reference
  
   
  
   something.txt(file after editing)
  
   
  
   Python has been an important part of Google since the beginning, and
   remains so as the system grows and evolves.
   
 
  reading and writing files is described in the tutorial; see
 
   http://pytut.infogami.com/node9.html
 
  (scroll down to Reading and Writing Files)
 
  to do the replacement

Re: Taking data from a text file to parse html page

2006-08-24 Thread DH

SE looks very helpful... I'm having a hell of a time installing it
though:

-

[EMAIL PROTECTED]:~/Desktop/SE-2.2$ sudo python SETUP.PY install
running install
running build
running build_py
file SEL.py (for module SEL) not found
file SE.py (for module SE) not found
file SEL.py (for module SEL) not found
file SE.py (for module SE) not found

--
Anthra Norell wrote:
 You may also want to look at this stream editor:

 http://cheeseshop.python.org/pypi/SE/2.2%20beta

 It allows multiple replacements in a definition format of utmost simplicity:

  your_example = '''
 divpemquot;Python has been an important part of Google since the
 beginning, and remains so as the system grows and evolves.
 quot;/em/p
 p-- Peter Norvig, a class=reference
 '''
  import SE
  Tag_Stripper = SE.SE ('''
  ~(.|\n)*?~=   # This pattern finds all tags and deletes them 
 (replaces with nothing)
  ~!--(.|\n)*?--~=   # This pattern deletes comments entirely even 
 if they nest tags
  ''')
  print Tag_Stripper (your_example)

 quot;Python has been an important part of Google since the
 beginning, and remains so as the system grows and evolves.
 quot;
 -- Peter Norvig, a class=reference

 Now you see a tag fragment. So you add another deletion to the Tag_Stripper 
 (***):

 Tag_Stripper = SE.SE ('''
  ~(.|\n)*?~=   # This pattern finds all tags and deletes them 
 (replaces with nothing)
  ~!--(.|\n)*?--~=   # This pattern deletes commentsentirely even 
 if they nest tags
  a class\=reference=# *** This deletes the fragment
  # -- Peter Norvig, a class\=reference=  # Or like this if Peter 
 Norvig has to go too
''')
  print Tag_Stripper (your_example)

 quot;Python has been an important part of Google since the
 beginning, and remains so as the system grows and evolves.
 quot;
 -- Peter Norvig,

 quot; you can either translate or delete:

 Tag_Stripper = SE.SE ('''
  ~(.|\n)*?~=   # This pattern finds all tags and deletes them 
 (replaces with nothing)
  ~!--(.|\n)*?--~=   # This pattern deletes commentsentirely even 
 if they nest tags
  a class\=reference=# This deletes the fragment
  # -- Peter Norvig, a class=\\reference\\=  # Or like this if 
 Peter Norvig has to go too
  htm2iso.se # This is a file (contained in the SE package that 
 translates all ampersand codes.
   # Naming the file is all you need to do to 
 include the replacements which it defines.
''')

  print Tag_Stripper (your_example)

 'Python has been an important part of Google since the
 beginning, and remains so as the system grows and evolves.
 '
 -- Peter Norvig,

 If instead of htm2iso.se you write quot;= you delete it and your output 
 will be:

 Python has been an important part of Google since the
 beginning, and remains so as the system grows and evolves.

 -- Peter Norvig,

 Your Tag_Stripper also does files:

  print Tag_Stripper ('my_file.htm', 'my_file_without_tags')
 'my_file_without_tags'


 A stream editor is not a substitute for a parser. It does handle more 
 economically simple translation jobs like this one where a
 parser does a lot of work which you don't need.

 Regards

 Frederic


 - Original Message -
 From: DH [EMAIL PROTECTED]
 Newsgroups: comp.lang.python
 To: python-list@python.org
 Sent: Thursday, August 24, 2006 7:41 PM
 Subject: Re: Taking data from a text file to parse html page


  I found this
 
 http://groups.google.com/group/comp.lang.python/browse_thread/thread/d1bda6ebcfb060f9/ad0ac6b1ac8cff51?lnk=gstq=replace+text+filer
 num=8#ad0ac6b1ac8cff51
 
  Credit Jeremy Moles
  ---
 
  finds = ({, }, (, ))
  lines = file(foo.txt, r).readlines()
 
  for line in lines:
  for find in finds:
  if find in line:
  line.replace(find, )
 
  print lines
 
  ---
 
  I want something like
  ---
 
  finds = file(replace.txt)
  lines = file(foo.txt, r).readlines()
 
  for line in lines:
  for find in finds:
  if find in line:
  line.replace(find, )
 
  print lines
 
  ---
 
 
 
  Fredrik Lundh wrote:
   DH wrote:
  
I have a plain text file containing the html and words that I want
removed(keywords) from the html file, after processing the html file it
would save it as a plain text file.
   
So the program would import the keywords, remove them from the html
file and save the html  file as something.txt.
   
I would post the data but it's secret. I can post an example:
   
index.html (html page)
   

divpemquot;Python

Re: Taking data from a text file to parse html page

Re: Taking data from a text file to parse html page

Re: Taking data from a text file to parse html page

Re: Taking data from a text file to parse html page

Re: Taking data from a text file to parse html page

Re: Taking data from a text file to parse html page

Re: Taking data from a text file to parse html page

Re: Taking data from a text file to parse html page

Re: Taking data from a text file to parse html page

Re: Taking data from a text file to parse html page

Re: Taking data from a text file to parse html page

Re: Taking data from a text file to parse html page

Re: Taking data from a text file to parse html page

13 matches

Site Navigation

Mail list logo

Footer information