smac2...@comcast.net wrote: > >For scrapping purposes, I am having a bit of trouble writing a block >of code to define, and find, the relative position (line number) of a >string of HTML code. I can pull out one string that I want, and then >there is always a line of code, directly beneath the one I can pull >out, that begins with the following: ><td align="left" valign="top" class="body_cols_middle"> > >However, because this string of HTML code above is not unique to just >the information I need (which I cannot currently pull out), I was >hoping there is a way to effectively say "if you find the html string >_____ in the line of HTML code above, and the string <td align="left" >valign="top" class="body_cols_middle"> in the line immediately >following, then pull everything that follows this second string.
Regular expression-based screen scraping is extremely delicate. All it takes is one tweak to the HTML, and your scraping fails although the page continues to look the same. A much better plan is to use sgmllib to write yourself a mini HTML parser. You can handle "td" tags with the attributes you want, and count down until you get to the "td" tag you want. -- Tim Roberts, t...@probo.com Providenza & Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list