how to remove BR using replace function?
i have some html that looks like this address style=color:#34 main,br Boston, MA/address and i am trying to use the replace function to get rid of the Br that i scrape out using this code: for oText in incident.fetchText( oRE): strTitle += oText.strip() strTitle = string.replace(strTitle,'br','') but it doesn't seem to remove the br any ideas? -- http://mail.python.org/mailman/listinfo/python-list
Re: how to remove BR using replace function?
tried that, didn't work for me -- http://mail.python.org/mailman/listinfo/python-list
Re: how to remove BR using replace function?
nope didn't work -- http://mail.python.org/mailman/listinfo/python-list
problems writing tuple to log file
i am having a problem writing a tuple to a text file. my code is below. what i end up getting is a text file that looks like this burger, 7up burger, 7up burger, 7up and this is instead of getting a list that should look like this burger, 7up fries ,coke cake ,milk note that i have print statements that print out the results of the scraping and they are fine. they print out burger, fries, cake and then 7up, coke, milk however there is something faulty in my writing of the tuple to the text file. perhaps related to the indentation that causes it to write the same stuff over and over? for row in bs('div'): data=[] for incident in bs('span'): foodlist = [] b = incident.findPrevious('b') for oText in b.fetchText( oRE): #foodlist.append(oText.strip() + ',) foodlist += oText.strip() + ',' food = ''.join(foodlist) print food for incident in bs('span2'): drinklist = [] for oText in incident.fetchText( oRE): drinklist += oText.strip() + ',' drink = ''.join(drinklist) print drink tuple = (food + drink \n) data.append(tuple) f = open(data.txt, 'a') f.write ( ''.join( tuple ) ) -- http://mail.python.org/mailman/listinfo/python-list
Re: indentation messing up my tuple?
the, the issue is that the last loop adds the last value of everything to the data array -- http://mail.python.org/mailman/listinfo/python-list
Re: problems writing tuple to log file
the, the issue is that the last loop adds the last value of everything to the data array -- http://mail.python.org/mailman/listinfo/python-list
Re: indentation messing up my tuple?
i am using a tuple because i am building lists. if i just use (food + drink) then while drink is unique food remains the same do i get this: (burger, coke) (burger, 7up) (burger, sprite) infidel wrote: tuple is the name of the built-in type, so it's not a very good idea to reassign it to something else. (food + drink + '\n') is not a tuple, (food + drink + '\n',) is There's no reason to use tuples here, just do this: data.append(food + drink) f.write('\n'.join(data)) -- http://mail.python.org/mailman/listinfo/python-list
Re: indentation messing up my tuple?
sorry i forgot to add in the code for my tuple which is at the very end tuple = (food+ drink + \n) data.append(tuple) f = open(froogle.sql, 'a') f.write ( ''.join( tuple ) -- http://mail.python.org/mailman/listinfo/python-list
Re: indentation messing up my tuple?
sorry i left out my tuple which is at the end of my code tuple = (food + drink + \n) data.append(tuple) f = open(froogle.sql, 'a') f.write ( ''.join( tuple ) -- http://mail.python.org/mailman/listinfo/python-list
indentation messing up my tuple?
i have the following code which is used to create a tuple of food and drink. if the page i am trying to scrape has a total of 10 food/drink items that i end up getting a nice list of 10 food/drink items in my text file BUT they are all a repeat of the first item so i end up getting a text file that looks like this: shrimp, coke shrimp, coke shrimp, coke instead of being shrimp, coke hamburger, oj here is my code: for row in bs('div', {'style' : 'both'}): data=[] for incident in bs('h3', {'class' : 'name'}): foodlist = [] for oText in incident.fetchText( oRE): foodlist.append(oText.strip() + ',') food = ''.join(foodlist) for incident in bs('span', {'class' : 'drink'}): drink = incident.findNextSibling('a', {'class': 'nojs'}) drinklist = [] for oText in drink.fetchText( oRE): drinklist.append(oText.strip() + ',') drink = ''.join(drinklist) tuple = (food + drink + \n) data.append(tuple) f = open(test.txt, 'a') f.write ( ''.join( tuple ) ) -- http://mail.python.org/mailman/listinfo/python-list
am i using findNextSibling wrong?
i have this html: td class=price a class=nojs name=D1L4 href=/xPopups/nojs target=_blank3.99/a div class=food1.05/div div class=drink3/div a class=btn name=D1 href=http://www.cnn.com; target=_blank onclick=reload() i tried to this use this python to scrape out the href cnn.com and failed. for incident in row('td', {'class':'price'}): n = incident.findNextSibling('a') b = n.findNextSibling('div') c = b.findNextSibling('div') d = c.findNextSibling('a', {'class': 'btn'}') info = d['href'] + ',' print info and i tried this for incident in row('td', {'class':'price'}): n = incident.findNextSibling('a', {'class': 'btn'}) info = n['href'] + ',' print info -- http://mail.python.org/mailman/listinfo/python-list
Re: how to find not the next sibling but the 2nd sibling or findsibling a OR sinbling b
well actually all i want it to do is find the first thing that shows up whether its class:food or class: drink so that works for me. only thing is that after it finds class:food i think it runs through the html again and finds the following class:drink and being that there is not class tag after that class: drink tag it fails. Fredrik Lundh wrote: [EMAIL PROTECTED] wrote: ok i found something that works. instead of using the def i did this: for incident in row('div', {'class': 'food' or 'drink' }): and it worked! 'food' or 'drink' doesn't do what you think it does: 'food' or 'drink' 'food' {'class': 'food' or 'drink'} {'class': 'food'} /F -- http://mail.python.org/mailman/listinfo/python-list
Re: how to find not the next sibling but the 2nd sibling or find sibling a OR sinbling b
i actually realized there are 3 potentials for class names. either food or drink or dessert. so my question is whether or not i can alter your function to look like this? def isFoodOrDrinkOrDesert(attr): return attr in ['food', 'drink', 'desert'] thanks in advance for the help Kent Johnson wrote: [EMAIL PROTECTED] wrote: i have some html which looks like this where i want to scrape out the href stuff (the www.cnn.com part) div class=noFoodCheese/div div class=foodBlue/div a class=btn href = http://www.cnn.com; so i wrote this code which scrapes it perfectly: for incident in row('div', {'class':'noFood'}): b = incident.findNextSibling('div', {'class': 'food'}) print b n = b.findNextSibling('a', {'class': 'btn'}) print n link = n['href'] + ',' problem is that sometimes the 2nd tag , the div class=food tag , is sometimes called food, sometimes called drink. Apparently you are using Beautiful Soup. The value in the attribute dictionary can be a callable; try this: def isFoodOrDrink(attr): return attr in ['food', 'drink'] b = incident.findNextSibling('div', {'class': isFoodOrDrink}) Alternately you could omit the class spec and check for it in code. Kent -- http://mail.python.org/mailman/listinfo/python-list
Re: how to find not the next sibling but the 2nd sibling or findsibling a OR sinbling b
hey fredrik, i don't understand what you are saying Fredrik Lundh wrote: [EMAIL PROTECTED] wrote: ok i found something that works. instead of using the def i did this: for incident in row('div', {'class': 'food' or 'drink' }): and it worked! 'food' or 'drink' doesn't do what you think it does: 'food' or 'drink' 'food' {'class': 'food' or 'drink'} {'class': 'food'} /F -- http://mail.python.org/mailman/listinfo/python-list
how to find not the next sibling but the 2nd sibling or find sibling a OR sinbling b
i have some html which looks like this where i want to scrape out the href stuff (the www.cnn.com part) div class=noFoodCheese/div div class=foodBlue/div a class=btn href = http://www.cnn.com; so i wrote this code which scrapes it perfectly: for incident in row('div', {'class':'noFood'}): b = incident.findNextSibling('div', {'class': 'food'}) print b n = b.findNextSibling('a', {'class': 'btn'}) print n link = n['href'] + ',' problem is that sometimes the 2nd tag , the div class=food tag , is sometimes called food, sometimes called drink. so sometimes it looks like this: div class=noFoodCheese/div div class=drinkPepsi/div a class=btn href = http://www.cnn.com; how do i alter my script to take into account the fact that i will sometimes have food and sometimes have drink as the class name? is there a way to say look for food or drink or a way to say look for this incident and then find not the next sibling but the 2nd next sibling if that makes any sense? thanks -- http://mail.python.org/mailman/listinfo/python-list