Re: [Tutor] BeautifulSoup - getting cells without new line characters

2006-04-02 Thread jonasmg
Danny Yoo writes: 

  
 
 And the solution to get the state and capital columns (where there are
 anchors): 

 for row in table('tr'):
for cell in row.fetch('a')[0:2]:
print cell.string
 
 Hi Jonas, 
 
 That's good to hear!  So does everything work for you then? 
 

Now yes, this problem has been resolved. Thanks!
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] BeautifulSoup - getting cells without new line characters

2006-04-01 Thread jonasmg
Kent Johnson writes: 

 [EMAIL PROTECTED] wrote:
 Kent Johnson writes:  
 
 
[EMAIL PROTECTED] wrote:  


List of states:
http://en.wikipedia.org/wiki/U.S._state   

: soup = BeautifulSoup(html)
: # Get the second table (list of states).
: table = soup.first('table').findNext('table')
: print table   

...
tr
tdWY/td
tdWyo./td
tda href=/wiki/Wyoming title=WyomingWyoming/a/td
tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, 
WyomingCheyenne/a/td
tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, 
WyomingCheyenne/a/td
tda href=/wiki/Image:Flag_of_Wyoming.svg class=image title=img 
src=http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin
 
g.svg/45px-Flag_of_Wyoming.svg.png width=45 alt= height=30 
longdesc=/wiki/Image:Flag_of_Wyoming.svg //a/td
/tr
/table   

Of each row (tr), I want to get the cells (td): 1,3,4 
(postal,state,capital). But cells 3 and 4 have anchors. 

So dig into the cells and get the data from the anchor.  

cells = row('td')
cells[0].string
cells[2]('a').string
cells[3]('a').string  

Kent  

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
  
 
 for row in table('tr'):
cells = row('td')
print cells[0]  
 
 IndexError: list index out of range 
 
 It works for me: 
 
 
 In [1]: from BeautifulSoup import BeautifulSoup as bs 
 
 In [2]: soup=bs('''tr
 ...: tdWY/td
 ...: tdWyo./td
 ...: tda href=/wiki/Wyoming title=WyomingWyoming/a/td
 ...: tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne,
 ...: WyomingCheyenne/a/td
 ...: tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne,
 ...: WyomingCheyenne/a/td
 ...: tda href=/wiki/Image:Flag_of_Wyoming.svg class=image 
 title=img
 ...: 
 src=http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin
 ...: g.svg/45px-Flag_of_Wyoming.svg.png width=45 alt= height=30
 ...: longdesc=/wiki/Image:Flag_of_Wyoming.svg //a/td
 ...: /tr
 ...: /table '''
 ...:
 ...:
 ...:
 ...: ) 
 
 In [18]: rows=soup('tr') 
 
 In [19]: rows
 Out[19]:
 [tr
 tdWY/td
 tdWyo./td
 tda href=/wiki/Wyoming title=WyomingWyoming/a/td
 tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne,
 WyomingCheyenne/a/td
 tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne,
 WyomingCheyenne/a/td
 tda href=/wiki/Image:Flag_of_Wyoming.svg class=image 
 title=img src=http://upload. 
 
 g.svg/45px-Flag_of_Wyoming.svg.png width=45 alt= height=30 
 longdesc=/wiki/Image:Flag_
 /tr] 
 
 In [21]: cells=rows[0]('td') 
 
 In [22]: cells
 Out[22]:
 [tdWY/td,
   tdWyo./td,
   tda href=/wiki/Wyoming title=WyomingWyoming/a/td,
   tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne,
 WyomingCheyenne/a/td,
   tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne,
 WyomingCheyenne/a/td,
   tda href=/wiki/Image:Flag_of_Wyoming.svg class=image 
 title=img src=http://upload
 n
 g.svg/45px-Flag_of_Wyoming.svg.png width=45 alt= height=30 
 longdesc=/wiki/Image:Flag_ 
 
 In [23]: cells[0].string
 Out[23]: 'WY' 
 
 In [24]: cells[2].a.string
 Out[24]: 'Wyoming' 
 
 In [25]: cells[3].a.string 
 
 
 Kent 
 
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor

Yes, ok. But so, it is only possible get data from a row (rows[0]) 

cells=rows[0]('td') 

And I want get data from all rows. I have trying with several 'for' setences 
but i can not. 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] BeautifulSoup - getting cells without new line characters

2006-04-01 Thread jonasmg
Kent Johnson writes: 

 [EMAIL PROTECTED] wrote:
 Yes, ok. But so, it is only possible get data from a row (rows[0])  
 
 cells=rows[0]('td')  
 
 And I want get data from all rows. I have trying with several 'for' setences 
 but i can not. 
 
 Can you show us what you tried? 
 
 Have you read a Python tutorial? It seems like some of the things you 
 are struggling with might be addressed in general Python material. 
 
 Kent 
 
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor
 

You consider a thing about me.
If I ask something it is because I cannot find the solution. I do not it by 
whim. 

Yes, I have read tutorials about python, and I have looked for this problem 
in this mail list using a web searcher as alltheweb, and I have even looked 
for in the google groups about python. 

* for rows in table('tr'): print rows('td') 

it fails when i'm going to get data of each cell using:
for rows in table('tr'): print rows('td')[0] 

* for rows in table('tr'):
   for cell in rows('td'):
  print cell 

The same, using print cell[0] 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] BeautifulSoup - getting cells without new line characters

2006-04-01 Thread jonasmg
Danny Yoo writes: 

  
 
  Have you read a Python tutorial? It seems like some of the things you
  are struggling with might be addressed in general Python material. 


 You consider a thing about me. If I ask something it is because I cannot
 find the solution. I do not it by whim.
 
 Hello Jonas, 
 
 Yes, but don't take Kent's question as a personal insult --- he's asking
 because it looks like you're having trouble interpreting error messages or
 considering border cases. 
 

Sorry Kent if my answer was very rude. I very was tired to try many things 
without no good result. 

 Anyway, the program snippet above makes assumptions, so let's get those
 out of the way.  Concretely: 
 
 for rows in table('tr'):
 print rows('td')[0] 
 
 makes an assumption that is not necessarely true: 
 
 * It assumes that each row has a td element. 
 
 Do you understand the border case here?  In particular: 
 
 * What if you hit a TR table row that does not have any TD columns? 
 

Danny, you give me the idea. The problem is that the first row has TH 
columns (not TD). So: 

for row in table('tr'):
   if row('td'):
   print row('td')[0].string 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] BeautifulSoup - getting cells without new line characters

2006-04-01 Thread jonasmg
And the solution to get the state and capital columns (where there are 
anchors): 

for row in table('tr'):
   for cell in row.fetch('a')[0:2]:
   print cell.string 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] BeautifulSoup - getting cells without new line characters

2006-03-31 Thread jonasmg
 From a table, I want to get the cells for then only choose some of them. 

table
tr
tdWY/td
tdWyo./td
/tr
...
/table 

Using: 

for row in table('tr'): print row.contents 

   ['\n', tdWY/td, '\n', tdWyo./td, '\n']
   [...] 

I get a new line character between each cell. 

Is possible get them without those '\n'? 

Thanks in advance! 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] BeautifulSoup - getting cells without new line characters

2006-03-31 Thread jonasmg
Kent Johnson writes: 

 [EMAIL PROTECTED] wrote:
  From a table, I want to get the cells for then only choose some of them.  
 
 table
 tr
 tdWY/td
 tdWyo./td
 /tr
 ...
 /table  
 
 Using:  
 
 for row in table('tr'): print row.contents  
 
['\n', tdWY/td, '\n', tdWyo./td, '\n']
[...]  
 
 I get a new line character between each cell.  
 
 Is possible get them without those '\n'? 
 
 Well, the newlines are in your data, so you need to strip them or ignore 
 them somewhere. 
 
 You don't say what you are actually trying to do, maybe this is close:
for row in table('tr'):
  cellText = [cell.string for cell in row('td')]
  print ' '.join(cellText) 
 
 Kent 
 
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor

I want only (for each row) to get some positions (i.e. 
row.contents[0],row.contents[2]) 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] BeautifulSoup - getting cells without new line characters

2006-03-31 Thread jonasmg
Kent Johnson writes: 

 [EMAIL PROTECTED] wrote:
 You have reason but the problem is that some cells have anchors.
 Sorry, I forgot myself to say it.  
 
 and using:  
 
 for row in table('tr'):
 cellText = [cell.string for cell in row('td')]
 print cellText  
 
 I get null values in cell with anchors. 
 
 Can you give an example of your actual data and the result you want to 
 generate from it? I can't give you a correct answer if you don't tell me 
 the real question. 
 
 Kent 
 
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor

List of states:
http://en.wikipedia.org/wiki/U.S._state 

: soup = BeautifulSoup(html)
: # Get the second table (list of states).
: table = soup.first('table').findNext('table')
: print table 

...
tr
tdWY/td
tdWyo./td
tda href=/wiki/Wyoming title=WyomingWyoming/a/td
tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, 
WyomingCheyenne/a/td
tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, 
WyomingCheyenne/a/td
tda href=/wiki/Image:Flag_of_Wyoming.svg class=image title=img 
src=http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin 
g.svg/45px-Flag_of_Wyoming.svg.png width=45 alt= height=30 
longdesc=/wiki/Image:Flag_of_Wyoming.svg //a/td
/tr
/table 

Of each row (tr), I want to get the cells (td): 1,3,4 
(postal,state,capital). But cells 3 and 4 have anchors. 

Thanks Kent. 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] BeautifulSoup - getting cells without new line characters

2006-03-31 Thread jonasmg
Kent Johnson writes: 

 [EMAIL PROTECTED] wrote: 
 
 List of states:
 http://en.wikipedia.org/wiki/U.S._state  
 
 : soup = BeautifulSoup(html)
 : # Get the second table (list of states).
 : table = soup.first('table').findNext('table')
 : print table  
 
 ...
 tr
 tdWY/td
 tdWyo./td
 tda href=/wiki/Wyoming title=WyomingWyoming/a/td
 tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, 
 WyomingCheyenne/a/td
 tda href=/wiki/Cheyenne%2C_Wyoming title=Cheyenne, 
 WyomingCheyenne/a/td
 tda href=/wiki/Image:Flag_of_Wyoming.svg class=image title=img 
 src=http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin 
 g.svg/45px-Flag_of_Wyoming.svg.png width=45 alt= height=30 
 longdesc=/wiki/Image:Flag_of_Wyoming.svg //a/td
 /tr
 /table  
 
 Of each row (tr), I want to get the cells (td): 1,3,4 
 (postal,state,capital). But cells 3 and 4 have anchors. 
 
 So dig into the cells and get the data from the anchor. 
 
 cells = row('td')
 cells[0].string
 cells[2]('a').string
 cells[3]('a').string 
 
 Kent 
 
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor

for row in table('tr'):
   cells = row('td')
   print cells[0] 

IndexError: list index out of range 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] using BeautifulSoup

2006-03-28 Thread jonasmg
Kent Johnson writes: 

 [EMAIL PROTECTED] wrote:
 anchor.findNext('code') fails:  
 
 anchor = soup.fetch('a', {'href': '/wiki/List_of_country_calling_codes'})
 print anchor  
 
  [a href=/wiki/List_of_country_calling_codes title=List of country
 calling codesCalling code/a]  
 
 anchor.findNext('code')
 [] 
 
 are you sure that's what you got? Looks like an AttributeError to me - 
 anchor is a *list* of anchors. Try
 anchor[0].findNext('code') 
 
 Kent 
 

With 'fetch' you get a list of Tag objects, so there is that using: 

anchor = soup.fetch('a', {'href': '/wiki/List_of_country_calling_codes'})
anchor[0].findNext('code') 

But with 'findChild' or 'first' you get only the first Tag that matches, so: 

anchor = soup.findChild('a', {'href': 
'/wiki/List_of_country_calling_codes'})
anchor.findNext('code') 

Thanks for your help, Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] BeautifulSoup - deleting tags

2006-03-28 Thread jonasmg
Is possible deleting all tags from a text and how? 

i.e.: 

qwe='tda href=... title=...foo bar/a;br /
a href=... title=...foo2/a a href=... title=...bar2/a/td' 

so, I would get only: foo bar, foo2, bar2 

Thanks in advance! 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] BeautifulSoup - deleting tags

2006-03-28 Thread jonasmg
Kent Johnson writes: 

 [EMAIL PROTECTED] wrote:
 Is possible deleting all tags from a text and how?  
 
 i.e.:  
 
 s='tda href=... title=...foo bar/a;br /
 a href=... title=...foo2/a a href=... 
 title=...bar2/a/td'  
 
 so, I would get only: foo bar, foo2, bar2
 
 How about this? 
 
 In [1]: import BeautifulSoup 
 
 In [2]: s=BeautifulSoup.BeautifulSoup('''tda href=... title=...foo 
 bar/a;br /
...: a href=... title=...foo2/a a href=... 
 title=...bar2/a/td''') 
 
 In [4]: ' '.join(i.string for i in s.fetch() if i.string)
 Out[4]: 'foo bar foo2 bar2' 
 
 
 Here are a couple of tag strippers that don't use BS:
 http://www.aminus.org/rbre/python/cleanhtml.py
 http://www.oluyede.org/blog/2006/02/13/html-stripper/ 
 
 Kent 
 

Another way (valid only for this case): 

: for i in s.fetch('a'): print i.string 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] using BeautifulSoup

2006-03-27 Thread jonasmg
Hi! 

I'm trying to use BeautifulSoup for get data from a table (on right) from:
http://en.wikipedia.org/wiki/United_states 

i.e. i would get data from 'Calling code' that it would be '+1' 

 -- 

import urllib2
from BeautifulSoup import BeautifulSoup 

url=http://en.wikipedia.org/wiki/United_states;
html = urllib2.urlopen(url).read()
soup = BeautifulSoup()
soup.feed(html) 

mainTable = soup.first('table')
rows = mainTable('tr') 


any help here? 

Thanks in advance 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] compilte python to an executable file.

2006-03-27 Thread jonasmg
jonasmg at softhome.net wrote:
 Hi!  
 
 I'm trying to use BeautifulSoup for get data from a table (on right) from:
 http://en.wikipedia.org/wiki/United_states  
 
 i.e. i would get data from 'Calling code' that it would be '+1'  
 
  --  
 
 import urllib2
 from BeautifulSoup import BeautifulSoup  
 
 url=http://en.wikipedia.org/wiki/United_states;
 html = urllib2.urlopen(url).read()
 soup = BeautifulSoup()
 soup.feed(html) 

 You just have to find some kind of ad hoc search that gets you to where 
 you want to be. I would try something like this:

 anchor = soup.fetch('a', dict(href=/wiki/List_of_country_calling_codes))

 code = anchor.findNext('code')
 print code.string

 Presumably you want this to work for other country pages as well; you 
 will have to look at the source, see what they have in common and search 
 on that.

 Kent

anchor.findNext('code') fails: 

anchor = soup.fetch('a', {'href': '/wiki/List_of_country_calling_codes'})
print anchor 

  [a href=/wiki/List_of_country_calling_codes title=List of country 
calling codesCalling code/a] 

anchor.findNext('code')
[] 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] using BeautifulSoup

2006-03-27 Thread jonasmg
jonasmg at softhome.net wrote:
 Hi!   
 
 I'm trying to use BeautifulSoup for get data from a table (on right) from:
 http://en.wikipedia.org/wiki/United_states   
 
 i.e. i would get data from 'Calling code' that it would be '+1'   
 
  --   
 
 import urllib2
 from BeautifulSoup import BeautifulSoup   
 
 url=http://en.wikipedia.org/wiki/United_states;
 html = urllib2.urlopen(url).read()
 soup = BeautifulSoup()
 soup.feed(html) 

 You just have to find some kind of ad hoc search that gets you to where 
 you want to be. I would try something like this:

 anchor = soup.fetch('a', dict(href=/wiki/List_of_country_calling_codes))

 code = anchor.findNext('code')
 print code.string

 Presumably you want this to work for other country pages as well; you 
 will have to look at the source, see what they have in common and search 
 on that.

 Kent

anchor.findNext('code') fails: 

anchor = soup.fetch('a', {'href': '/wiki/List_of_country_calling_codes'})
print anchor 

 [a href=/wiki/List_of_country_calling_codes title=List of country
calling codesCalling code/a] 

anchor.findNext('code')
[] 

P.S. : Sorry for my last email, I was wrong with the subject
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] using BeautifulSoup

2006-03-27 Thread jonasmg
[EMAIL PROTECTED] writes: 

jonasmg at softhome.net wrote:
 Hi!
 
 I'm trying to use BeautifulSoup for get data from a table (on right) from:
 http://en.wikipedia.org/wiki/United_states
 
 i.e. i would get data from 'Calling code' that it would be '+1'
 
  --
 
 import urllib2
 from BeautifulSoup import BeautifulSoup
 
 url=http://en.wikipedia.org/wiki/United_states;
 html = urllib2.urlopen(url).read()
 soup = BeautifulSoup()
 soup.feed(html) 
 
 You just have to find some kind of ad hoc search that gets you to where 
 you want to be. I would try something like this:
 
 anchor = soup.fetch('a', dict(href=/wiki/List_of_country_calling_codes))
 
 code = anchor.findNext('code')
 print code.string
 
 Presumably you want this to work for other country pages as well; you 
 will have to look at the source, see what they have in common and search 
 on that.
 
 Kent
 
 anchor.findNext('code') fails:  
 
 anchor = soup.fetch('a', {'href': '/wiki/List_of_country_calling_codes'})
 print anchor  
 
  [a href=/wiki/List_of_country_calling_codes title=List of country
 calling codesCalling code/a]  
 
 anchor.findNext('code')
 []  
 
 P.S. : Sorry for my last email, I was wrong with the subject
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor

Solution _there is that using findChild instead of fetch_: 

anchor = soup.findChild('a', 
dict(href=/wiki/List_of_country_calling_codes)) 

print anchor.findNext('code') 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] matching a file

2005-12-28 Thread jonasmg
Hi! 

I would to working with some files. But I have to using a regular expression 
with one of them: 

for file in [glob.glob('/etc/env.d/[0-9]*foo'), '/etc/bar']: 

glob returns a list so i'm supposed that i would that convert it into a 
string. 

Is it correct? 

Thanks for your help
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor