Re: [Tutor] BeautifulSoup - getting cells without new line characters
Danny Yoo writes: And the solution to get the state and capital columns (where there are anchors): for row in table('tr'): for cell in row.fetch('a')[0:2]: print cell.string Hi Jonas, That's good to hear! So does everything work for you then? Now yes, this problem has been resolved. Thanks! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] BeautifulSoup - getting cells without new line characters
Kent Johnson writes: [EMAIL PROTECTED] wrote: Kent Johnson writes: [EMAIL PROTECTED] wrote: List of states: http://en.wikipedia.org/wiki/U.S._state : soup = BeautifulSoup(html) : # Get the second table (list of states). : table = soup.first('table').findNext('table') : print table ... tr tdWY/td tdWyo./td tda href=/wiki/Wyoming title=WyomingWyoming/a/td tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, WyomingCheyenne/a/td tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, WyomingCheyenne/a/td tda href=/wiki/Image:Flag_of_Wyoming.svg class=image title=img src=http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin g.svg/45px-Flag_of_Wyoming.svg.png width=45 alt= height=30 longdesc=/wiki/Image:Flag_of_Wyoming.svg //a/td /tr /table Of each row (tr), I want to get the cells (td): 1,3,4 (postal,state,capital). But cells 3 and 4 have anchors. So dig into the cells and get the data from the anchor. cells = row('td') cells[0].string cells[2]('a').string cells[3]('a').string Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor for row in table('tr'): cells = row('td') print cells[0] IndexError: list index out of range It works for me: In [1]: from BeautifulSoup import BeautifulSoup as bs In [2]: soup=bs('''tr ...: tdWY/td ...: tdWyo./td ...: tda href=/wiki/Wyoming title=WyomingWyoming/a/td ...: tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, ...: WyomingCheyenne/a/td ...: tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, ...: WyomingCheyenne/a/td ...: tda href=/wiki/Image:Flag_of_Wyoming.svg class=image title=img ...: src=http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin ...: g.svg/45px-Flag_of_Wyoming.svg.png width=45 alt= height=30 ...: longdesc=/wiki/Image:Flag_of_Wyoming.svg //a/td ...: /tr ...: /table ''' ...: ...: ...: ...: ) In [18]: rows=soup('tr') In [19]: rows Out[19]: [tr tdWY/td tdWyo./td tda href=/wiki/Wyoming title=WyomingWyoming/a/td tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, WyomingCheyenne/a/td tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, WyomingCheyenne/a/td tda href=/wiki/Image:Flag_of_Wyoming.svg class=image title=img src=http://upload. g.svg/45px-Flag_of_Wyoming.svg.png width=45 alt= height=30 longdesc=/wiki/Image:Flag_ /tr] In [21]: cells=rows[0]('td') In [22]: cells Out[22]: [tdWY/td, tdWyo./td, tda href=/wiki/Wyoming title=WyomingWyoming/a/td, tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, WyomingCheyenne/a/td, tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, WyomingCheyenne/a/td, tda href=/wiki/Image:Flag_of_Wyoming.svg class=image title=img src=http://upload n g.svg/45px-Flag_of_Wyoming.svg.png width=45 alt= height=30 longdesc=/wiki/Image:Flag_ In [23]: cells[0].string Out[23]: 'WY' In [24]: cells[2].a.string Out[24]: 'Wyoming' In [25]: cells[3].a.string Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor Yes, ok. But so, it is only possible get data from a row (rows[0]) cells=rows[0]('td') And I want get data from all rows. I have trying with several 'for' setences but i can not. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] BeautifulSoup - getting cells without new line characters
Kent Johnson writes: [EMAIL PROTECTED] wrote: Yes, ok. But so, it is only possible get data from a row (rows[0]) cells=rows[0]('td') And I want get data from all rows. I have trying with several 'for' setences but i can not. Can you show us what you tried? Have you read a Python tutorial? It seems like some of the things you are struggling with might be addressed in general Python material. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor You consider a thing about me. If I ask something it is because I cannot find the solution. I do not it by whim. Yes, I have read tutorials about python, and I have looked for this problem in this mail list using a web searcher as alltheweb, and I have even looked for in the google groups about python. * for rows in table('tr'): print rows('td') it fails when i'm going to get data of each cell using: for rows in table('tr'): print rows('td')[0] * for rows in table('tr'): for cell in rows('td'): print cell The same, using print cell[0] ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] BeautifulSoup - getting cells without new line characters
Danny Yoo writes: Have you read a Python tutorial? It seems like some of the things you are struggling with might be addressed in general Python material. You consider a thing about me. If I ask something it is because I cannot find the solution. I do not it by whim. Hello Jonas, Yes, but don't take Kent's question as a personal insult --- he's asking because it looks like you're having trouble interpreting error messages or considering border cases. Sorry Kent if my answer was very rude. I very was tired to try many things without no good result. Anyway, the program snippet above makes assumptions, so let's get those out of the way. Concretely: for rows in table('tr'): print rows('td')[0] makes an assumption that is not necessarely true: * It assumes that each row has a td element. Do you understand the border case here? In particular: * What if you hit a TR table row that does not have any TD columns? Danny, you give me the idea. The problem is that the first row has TH columns (not TD). So: for row in table('tr'): if row('td'): print row('td')[0].string ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] BeautifulSoup - getting cells without new line characters
And the solution to get the state and capital columns (where there are anchors): for row in table('tr'): for cell in row.fetch('a')[0:2]: print cell.string ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] BeautifulSoup - getting cells without new line characters
From a table, I want to get the cells for then only choose some of them. table tr tdWY/td tdWyo./td /tr ... /table Using: for row in table('tr'): print row.contents ['\n', tdWY/td, '\n', tdWyo./td, '\n'] [...] I get a new line character between each cell. Is possible get them without those '\n'? Thanks in advance! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] BeautifulSoup - getting cells without new line characters
Kent Johnson writes: [EMAIL PROTECTED] wrote: From a table, I want to get the cells for then only choose some of them. table tr tdWY/td tdWyo./td /tr ... /table Using: for row in table('tr'): print row.contents ['\n', tdWY/td, '\n', tdWyo./td, '\n'] [...] I get a new line character between each cell. Is possible get them without those '\n'? Well, the newlines are in your data, so you need to strip them or ignore them somewhere. You don't say what you are actually trying to do, maybe this is close: for row in table('tr'): cellText = [cell.string for cell in row('td')] print ' '.join(cellText) Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor I want only (for each row) to get some positions (i.e. row.contents[0],row.contents[2]) ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] BeautifulSoup - getting cells without new line characters
Kent Johnson writes: [EMAIL PROTECTED] wrote: You have reason but the problem is that some cells have anchors. Sorry, I forgot myself to say it. and using: for row in table('tr'): cellText = [cell.string for cell in row('td')] print cellText I get null values in cell with anchors. Can you give an example of your actual data and the result you want to generate from it? I can't give you a correct answer if you don't tell me the real question. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor List of states: http://en.wikipedia.org/wiki/U.S._state : soup = BeautifulSoup(html) : # Get the second table (list of states). : table = soup.first('table').findNext('table') : print table ... tr tdWY/td tdWyo./td tda href=/wiki/Wyoming title=WyomingWyoming/a/td tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, WyomingCheyenne/a/td tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, WyomingCheyenne/a/td tda href=/wiki/Image:Flag_of_Wyoming.svg class=image title=img src=http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin g.svg/45px-Flag_of_Wyoming.svg.png width=45 alt= height=30 longdesc=/wiki/Image:Flag_of_Wyoming.svg //a/td /tr /table Of each row (tr), I want to get the cells (td): 1,3,4 (postal,state,capital). But cells 3 and 4 have anchors. Thanks Kent. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] BeautifulSoup - getting cells without new line characters
Kent Johnson writes: [EMAIL PROTECTED] wrote: List of states: http://en.wikipedia.org/wiki/U.S._state : soup = BeautifulSoup(html) : # Get the second table (list of states). : table = soup.first('table').findNext('table') : print table ... tr tdWY/td tdWyo./td tda href=/wiki/Wyoming title=WyomingWyoming/a/td tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, WyomingCheyenne/a/td tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, WyomingCheyenne/a/td tda href=/wiki/Image:Flag_of_Wyoming.svg class=image title=img src=http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin g.svg/45px-Flag_of_Wyoming.svg.png width=45 alt= height=30 longdesc=/wiki/Image:Flag_of_Wyoming.svg //a/td /tr /table Of each row (tr), I want to get the cells (td): 1,3,4 (postal,state,capital). But cells 3 and 4 have anchors. So dig into the cells and get the data from the anchor. cells = row('td') cells[0].string cells[2]('a').string cells[3]('a').string Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor for row in table('tr'): cells = row('td') print cells[0] IndexError: list index out of range ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] using BeautifulSoup
Kent Johnson writes: [EMAIL PROTECTED] wrote: anchor.findNext('code') fails: anchor = soup.fetch('a', {'href': '/wiki/List_of_country_calling_codes'}) print anchor [a href=/wiki/List_of_country_calling_codes title=List of country calling codesCalling code/a] anchor.findNext('code') [] are you sure that's what you got? Looks like an AttributeError to me - anchor is a *list* of anchors. Try anchor[0].findNext('code') Kent With 'fetch' you get a list of Tag objects, so there is that using: anchor = soup.fetch('a', {'href': '/wiki/List_of_country_calling_codes'}) anchor[0].findNext('code') But with 'findChild' or 'first' you get only the first Tag that matches, so: anchor = soup.findChild('a', {'href': '/wiki/List_of_country_calling_codes'}) anchor.findNext('code') Thanks for your help, Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] BeautifulSoup - deleting tags
Is possible deleting all tags from a text and how? i.e.: qwe='tda href=... title=...foo bar/a;br / a href=... title=...foo2/a a href=... title=...bar2/a/td' so, I would get only: foo bar, foo2, bar2 Thanks in advance! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] BeautifulSoup - deleting tags
Kent Johnson writes: [EMAIL PROTECTED] wrote: Is possible deleting all tags from a text and how? i.e.: s='tda href=... title=...foo bar/a;br / a href=... title=...foo2/a a href=... title=...bar2/a/td' so, I would get only: foo bar, foo2, bar2 How about this? In [1]: import BeautifulSoup In [2]: s=BeautifulSoup.BeautifulSoup('''tda href=... title=...foo bar/a;br / ...: a href=... title=...foo2/a a href=... title=...bar2/a/td''') In [4]: ' '.join(i.string for i in s.fetch() if i.string) Out[4]: 'foo bar foo2 bar2' Here are a couple of tag strippers that don't use BS: http://www.aminus.org/rbre/python/cleanhtml.py http://www.oluyede.org/blog/2006/02/13/html-stripper/ Kent Another way (valid only for this case): : for i in s.fetch('a'): print i.string ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] using BeautifulSoup
Hi! I'm trying to use BeautifulSoup for get data from a table (on right) from: http://en.wikipedia.org/wiki/United_states i.e. i would get data from 'Calling code' that it would be '+1' -- import urllib2 from BeautifulSoup import BeautifulSoup url=http://en.wikipedia.org/wiki/United_states; html = urllib2.urlopen(url).read() soup = BeautifulSoup() soup.feed(html) mainTable = soup.first('table') rows = mainTable('tr') any help here? Thanks in advance ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] compilte python to an executable file.
jonasmg at softhome.net wrote: Hi! I'm trying to use BeautifulSoup for get data from a table (on right) from: http://en.wikipedia.org/wiki/United_states i.e. i would get data from 'Calling code' that it would be '+1' -- import urllib2 from BeautifulSoup import BeautifulSoup url=http://en.wikipedia.org/wiki/United_states; html = urllib2.urlopen(url).read() soup = BeautifulSoup() soup.feed(html) You just have to find some kind of ad hoc search that gets you to where you want to be. I would try something like this: anchor = soup.fetch('a', dict(href=/wiki/List_of_country_calling_codes)) code = anchor.findNext('code') print code.string Presumably you want this to work for other country pages as well; you will have to look at the source, see what they have in common and search on that. Kent anchor.findNext('code') fails: anchor = soup.fetch('a', {'href': '/wiki/List_of_country_calling_codes'}) print anchor [a href=/wiki/List_of_country_calling_codes title=List of country calling codesCalling code/a] anchor.findNext('code') [] ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] using BeautifulSoup
jonasmg at softhome.net wrote: Hi! I'm trying to use BeautifulSoup for get data from a table (on right) from: http://en.wikipedia.org/wiki/United_states i.e. i would get data from 'Calling code' that it would be '+1' -- import urllib2 from BeautifulSoup import BeautifulSoup url=http://en.wikipedia.org/wiki/United_states; html = urllib2.urlopen(url).read() soup = BeautifulSoup() soup.feed(html) You just have to find some kind of ad hoc search that gets you to where you want to be. I would try something like this: anchor = soup.fetch('a', dict(href=/wiki/List_of_country_calling_codes)) code = anchor.findNext('code') print code.string Presumably you want this to work for other country pages as well; you will have to look at the source, see what they have in common and search on that. Kent anchor.findNext('code') fails: anchor = soup.fetch('a', {'href': '/wiki/List_of_country_calling_codes'}) print anchor [a href=/wiki/List_of_country_calling_codes title=List of country calling codesCalling code/a] anchor.findNext('code') [] P.S. : Sorry for my last email, I was wrong with the subject ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] using BeautifulSoup
[EMAIL PROTECTED] writes: jonasmg at softhome.net wrote: Hi! I'm trying to use BeautifulSoup for get data from a table (on right) from: http://en.wikipedia.org/wiki/United_states i.e. i would get data from 'Calling code' that it would be '+1' -- import urllib2 from BeautifulSoup import BeautifulSoup url=http://en.wikipedia.org/wiki/United_states; html = urllib2.urlopen(url).read() soup = BeautifulSoup() soup.feed(html) You just have to find some kind of ad hoc search that gets you to where you want to be. I would try something like this: anchor = soup.fetch('a', dict(href=/wiki/List_of_country_calling_codes)) code = anchor.findNext('code') print code.string Presumably you want this to work for other country pages as well; you will have to look at the source, see what they have in common and search on that. Kent anchor.findNext('code') fails: anchor = soup.fetch('a', {'href': '/wiki/List_of_country_calling_codes'}) print anchor [a href=/wiki/List_of_country_calling_codes title=List of country calling codesCalling code/a] anchor.findNext('code') [] P.S. : Sorry for my last email, I was wrong with the subject ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor Solution _there is that using findChild instead of fetch_: anchor = soup.findChild('a', dict(href=/wiki/List_of_country_calling_codes)) print anchor.findNext('code') ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] matching a file
Hi! I would to working with some files. But I have to using a regular expression with one of them: for file in [glob.glob('/etc/env.d/[0-9]*foo'), '/etc/bar']: glob returns a list so i'm supposed that i would that convert it into a string. Is it correct? Thanks for your help ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor