On Sunday, November 6, 2016 at 1:27:48 AM UTC-4, rosef...@gmail.com wrote:
> Considering the following html:
>
> cool stuff hiid="cool"> zz
>
> and the following list:
>
> ignore_list = ['example','lalala']
>
> My goal is, while going through the HTML using Beautifulsoup, I find a h2
> that has an ID that is in my list (ignore_list) I should delete all the ul
> and lis under it until I find another h2. I would then check if the next h2
> was in my ignore list, if it is, delete all the ul and lis until I reach the
> next h2 (or if there are no h2s left, delete the ul and lis under the current
> one and stop).
>
> How I see the process going: you read all the h2s from up to down in the DOM.
> If the id for any of those is in the ignore_list, then delete all the ul and
> li under the h2 until you reach the NEXT h2. If there is no h2, then delete
> the ul and LI then stop.
>
> Here is the full HMTL I am trying to work with: http://pastebin.com/Z3ev9c8N
>
> I am trying to delete all the UL and lis after "See_also"How would I
> accomplish this in Python?
I got it working with the following solution:
#Remove content I don't want
try:
for element in body.find_all('h2'):
current_h2 = element.get_text()
current_h2 = current_h2.replace('[edit]','')
#print(current_h2)
if(current_h2 in ignore_list):
if(element.find_next_sibling('div') != None):
element.find_next_sibling('div').decompose()
if(element.find_next_sibling('ul') != None):
element.find_next_sibling('ul').decompose()
except(AttributeError, TypeError) as e:
continue
--
https://mail.python.org/mailman/listinfo/python-list