> Hello, > > For scrapping purposes, I am having a bit of trouble writing a block > of code to define, and find, the relative position (line number) of a > string of HTML code. I can pull out one string that I want, and then > there is always a line of code, directly beneath the one I can pull > out, that begins with the following: > <td align="left" valign="top" class="body_cols_middle"> > > However, because this string of HTML code above is not unique to just > the information I need (which I cannot currently pull out), I was > hoping there is a way to effectively say "if you find the html string > _____ in the line of HTML code above, and the string <td align="left" > valign="top" class="body_cols_middle"> in the line immediately > following, then pull everything that follows this second string. > > Any thoughts as to how to define a function to do this, or do this > some other way? All insight is much appreciated! Thanks.
You may have more long-term success in scraping by using an HTML parser like Beautiful Soup. Alternately, store the line and the previous line while looping and do something like the following. if found: results.append( line ) continue criteria1 = '<td align="left" > valign="top" class="body_cols_middle">' in line criteria2 = '<td align="left" valign="top" class="body_cols_middle">' in previous_line if criteria1 and criteria2 : found = True < maybe add rest of line to results > Ramit Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology 712 Main Street | Houston, TX 77002 work phone: 713 - 216 - 5423 -- This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. -- http://mail.python.org/mailman/listinfo/python-list