[EMAIL PROTECTED] wrote:
> i have some html which looks like this where i want to scrape out the
> href stuff (the www.cnn.com part)
> 
> <div class="noFood">Cheese</div>
> <div class="food">Blue</div>
> <a class="btn" href = "http://www.cnn.com";>
> 
> 
> so i wrote this code which scrapes it perfectly:
> 
> for incident in row('div', {'class':'noFood'}):
>                       b = incident.findNextSibling('div', {'class': 'food'})
>                               print b
>                       n = b.findNextSibling('a', {'class': 'btn'})
>                               print n
>                       link = n['href'] + "','"
> 
> problem is that sometimes the 2nd tag , the <div class="food"> tag , is
> sometimes called food, sometimes called drink.  

Apparently you are using Beautiful Soup. The value in the attribute 
dictionary can be a callable; try this:

def isFoodOrDrink(attr):
   return attr in ['food', 'drink']

b = incident.findNextSibling('div', {'class': isFoodOrDrink})

Alternately you could omit the class spec and check for it in code.

Kent
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to