I wanted to strip the quotes from IMDB quote pages, just to start learning python. Quotes are not nested, so I got the anchor links that precede them. I thought I could walk down until I hit an HR tag, meanwhile grabbing people and quotes via hits on <b> and <br>. But once I tried to walk down from my hit on the anchor link and pull the name, I found I kept getting a NavigableString instead of tag, so asking for the .name attribute gave an error.
Any idea why this might happen? This is the relevant chunk of IMDB code: <a name="qt0210620"></a> <b><a href="/name/nm0629454/">Bill</a></b>: You're supposed to wear the blue dress when I wear this. <br> <b><a href="/name/nm0707043/">Mary</a></b>: I don't want to dress like twins anymore. <br> <b><a href="/name/nm0629454/">Bill</a></b>: We're not twins. We're a trio. <br> <hr width="30%"> --- And this is what I wrote (and if there are other awful things about this, I would be happy to know): #!/usr/bin/env python import urllib2 from BeautifulSoup import BeautifulSoup import re # stubs -------------------------- movietitle_stub = "Nashville" #later search an pull first result (if movie?) movieurl_stub = "http://imdb.com/title/tt0073440/" #and get this def soupifyPage(target): """ grab html from a page probably need real method of checking for failure, huh """ codeReq = urllib2.Request(target) response = urllib2.urlopen(codeReq) soupyhtml = BeautifulSoup(response) return soupyhtml def pullQuote(curTag): # character is in bold print curTag.nextSibling.name ''' if curTag.nextSibling.name == 'hr': #are done return quoteBlock print "seeing" + curTag.nextSibling.name quoteBlock = quoteBlock + " - " + curTag.nextSibling.name curTag = curTag.nextSibling ''' quotepage = movieurl_stub + "quotes" print "Getting this:" + quotepage print "---------------" quotebag = soupifyPage(quotepage) # each quote is preceded by anchorlink, begins with qt : example <a name="qt0229419"></a> # the end with an HR tag # they are not nested quotations = quotebag.findAll(attrs = {'name' : re.compile("^qt")}) for q in quotations: #pullQuote(q) print q.nextSibling.name # attribute error: "'NavigableString' object has no attribute 'name'" print "next!" Thanks, Clay - - - - - - - Clay S. Wiedemann _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor