[EMAIL PROTECTED] wrote: > Great, thanks so much for posting that. It's worked a treat and I'm > getting HTML files with the list of h2 tags I was looking for. Here's > the code just to share, what a relief :) : > ............................... > from BeautifulSoup import BeautifulSoup > import re > > page = open("soup_test/tomatoandcream.html", 'r') > soup = BeautifulSoup(page) > > myTagSearch = str(soup.findAll('h2')) > > myFile = open('Soup_Results.html', 'w') > myFile.write(myTagSearch) > myFile.close() > > del myTagSearch > ............................... > > I do have two other small queries that I wonder if anyone can help > with. > > Firstly, I'm getting the following character: "[" at the start, "]" at > the end of the code. Along with "," in between each tag line listing. > This seems like normal behaviour but I can't find the way to strip > them out.
Ah. What you want is more like this: page = open("soup_test/tomatoandcream.html", 'r') soup = BeautifulSoup(page) htags = soup.findAll({'h2':True, 'H2' : True}) # get all H2 tags, both cases myFile = open('Soup_Results.html', 'w') for htag in htags : # for each H2 tag texts = htag.findAll(text=True) # find all text items within this h2 s = ' '.join(texts).strip() + '\n' # combine text items into clean string myFile.write(s) # write each text from an H2 element on a line. myFile.close() John Nagle -- http://mail.python.org/mailman/listinfo/python-list