how to remove BR using replace function?

2006-02-08 Thread localpricemaps
i have some html that looks like this


address style=color:#34 main,br Boston, MA/address

and i am trying to use the replace function to get rid of the Br that
i scrape out using this code:

for oText in incident.fetchText( oRE):
strTitle += oText.strip()
strTitle = string.replace(strTitle,'br','')

but it doesn't seem to remove the br

any ideas?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to remove BR using replace function?

2006-02-08 Thread localpricemaps
tried that, didn't work for me

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to remove BR using replace function?

2006-02-08 Thread localpricemaps
nope didn't work

-- 
http://mail.python.org/mailman/listinfo/python-list


problems writing tuple to log file

2006-02-03 Thread localpricemaps
i am having a problem writing a tuple to a text file.  my code is
below.

what i end up getting is a text file that looks like this

burger, 7up
burger, 7up
burger, 7up

and this is instead of getting a list that should look like this

burger, 7up
fries ,coke
cake ,milk

note that i have print statements that print out the results of the
scraping and they are fine.  they print out burger, fries, cake and
then 7up, coke, milk

however there is something faulty in my writing of the tuple to the
text file.  perhaps related to the indentation that causes it to write
the same stuff over and over?



for row in bs('div'):

data=[]

for incident in bs('span'):
foodlist = []
b = incident.findPrevious('b')
for oText in b.fetchText( oRE):
#foodlist.append(oText.strip() + ',)
foodlist += oText.strip() + ','
food = ''.join(foodlist)
print food



for incident in bs('span2'):
drinklist = []
for oText in incident.fetchText( oRE):
drinklist += oText.strip() + ','
drink = ''.join(drinklist)
print drink




tuple = (food + drink \n)
data.append(tuple)
  f = open(data.txt, 'a')
  f.write ( ''.join( tuple ) )

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: indentation messing up my tuple?

2006-02-03 Thread localpricemaps
the, the issue is that the last loop adds the last value of everything
to the data array

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: problems writing tuple to log file

2006-02-03 Thread localpricemaps
the, the issue is that the last loop adds the last value of everything
to the data array

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: indentation messing up my tuple?

2006-02-01 Thread localpricemaps
i am using a tuple because i am building lists.  if i just use (food +
drink) then while drink is unique food remains the same do i get this:

(burger, coke)
(burger, 7up)
(burger, sprite)

infidel wrote:
 tuple is the name of the built-in type, so it's not a very good idea to
 reassign it to something else.

 (food + drink + '\n') is not a tuple, (food + drink + '\n',) is

 There's no reason to use tuples here, just do this:
 
 data.append(food + drink)
 f.write('\n'.join(data))

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: indentation messing up my tuple?

2006-01-28 Thread localpricemaps
sorry i forgot to add in the code for my tuple which is at the very end

tuple = (food+ drink + \n)
data.append(tuple)
f = open(froogle.sql, 'a')
f.write ( ''.join( tuple )

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: indentation messing up my tuple?

2006-01-28 Thread localpricemaps
sorry i left out my tuple which is at the end of my code

tuple = (food + drink + \n)
data.append(tuple)
f = open(froogle.sql, 'a')
f.write ( ''.join( tuple )

-- 
http://mail.python.org/mailman/listinfo/python-list


indentation messing up my tuple?

2006-01-27 Thread localpricemaps
i have the following code which is used to create a tuple of food and
drink.  if the page i am trying to scrape has a total of 10 food/drink
items that i end up getting a nice list of 10 food/drink items in my
text file BUT they are all a repeat of the first item so i end up
getting a text file that looks like this:

shrimp, coke
shrimp, coke
shrimp, coke

instead of being

shrimp, coke
hamburger, oj

here is my code:

for row in bs('div',  {'style' : 'both'}):
data=[]

for incident in bs('h3',  {'class' : 'name'}):
foodlist = []
for oText in incident.fetchText( oRE):
foodlist.append(oText.strip() + ',')
food = ''.join(foodlist)



for incident in bs('span',  {'class' : 'drink'}):
drink = incident.findNextSibling('a', {'class': 'nojs'})
drinklist = []
for oText in drink.fetchText( oRE):
drinklist.append(oText.strip() + ',')
drink = ''.join(drinklist)


tuple = (food + drink + \n)
data.append(tuple)
f = open(test.txt, 'a')
f.write ( ''.join( tuple ) )

-- 
http://mail.python.org/mailman/listinfo/python-list


am i using findNextSibling wrong?

2006-01-24 Thread localpricemaps
i have this html:

td class=price

a class=nojs name=D1L4 href=/xPopups/nojs
target=_blank3.99/a


div class=food1.05/div


div class=drink3/div

a class=btn name=D1 href=http://www.cnn.com;
target=_blank onclick=reload()

i tried to this use this python to scrape out the href cnn.com and
failed.

for incident in row('td', {'class':'price'}):
  n = incident.findNextSibling('a')
  b = n.findNextSibling('div')
  c = b.findNextSibling('div')
  d = c.findNextSibling('a', {'class': 'btn'}')
  info = d['href'] + ','
  print info

and i tried this
for incident in row('td', {'class':'price'}):
n = incident.findNextSibling('a', {'class': 'btn'})
info = n['href'] + ','
print info

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to find not the next sibling but the 2nd sibling or findsibling a OR sinbling b

2006-01-23 Thread localpricemaps
well actually all i want it to do is find the first thing that shows up
whether its class:food or class: drink so that works for me.  only
thing is that after it finds class:food i think it runs through the
html again and finds the following class:drink and being that there is
not class tag after that class: drink tag it fails.

Fredrik Lundh wrote:
 [EMAIL PROTECTED] wrote:

  ok i found something that works.  instead of using the def i did this:
 
  for incident in row('div', {'class': 'food' or 'drink' }):
 
  and it worked!

 'food' or 'drink' doesn't do what you think it does:

  'food' or 'drink'
 'food'

  {'class': 'food' or 'drink'}
 {'class': 'food'}
 
 /F

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to find not the next sibling but the 2nd sibling or find sibling a OR sinbling b

2006-01-19 Thread localpricemaps
i actually realized there are 3 potentials for class names.  either
food or drink or dessert.  so my question is whether or not i can alter
your function to look like this?

 def isFoodOrDrinkOrDesert(attr):
return attr in ['food', 'drink', 'desert']


thanks in advance for the help

Kent Johnson wrote:
 [EMAIL PROTECTED] wrote:
  i have some html which looks like this where i want to scrape out the
  href stuff (the www.cnn.com part)
 
  div class=noFoodCheese/div
  div class=foodBlue/div
  a class=btn href = http://www.cnn.com;
 
 
  so i wrote this code which scrapes it perfectly:
 
  for incident in row('div', {'class':'noFood'}):
  b = incident.findNextSibling('div', {'class': 'food'})
  print b
  n = b.findNextSibling('a', {'class': 'btn'})
  print n
  link = n['href'] + ','
 
  problem is that sometimes the 2nd tag , the div class=food tag , is
  sometimes called food, sometimes called drink.

 Apparently you are using Beautiful Soup. The value in the attribute
 dictionary can be a callable; try this:

 def isFoodOrDrink(attr):
return attr in ['food', 'drink']

 b = incident.findNextSibling('div', {'class': isFoodOrDrink})

 Alternately you could omit the class spec and check for it in code.
 
 Kent

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to find not the next sibling but the 2nd sibling or findsibling a OR sinbling b

2006-01-19 Thread localpricemaps
hey fredrik,

i don't understand what you are saying

Fredrik Lundh wrote:
 [EMAIL PROTECTED] wrote:

  ok i found something that works.  instead of using the def i did this:
 
  for incident in row('div', {'class': 'food' or 'drink' }):
 
  and it worked!

 'food' or 'drink' doesn't do what you think it does:

  'food' or 'drink'
 'food'

  {'class': 'food' or 'drink'}
 {'class': 'food'}
 
 /F

-- 
http://mail.python.org/mailman/listinfo/python-list


how to find not the next sibling but the 2nd sibling or find sibling a OR sinbling b

2006-01-18 Thread localpricemaps
i have some html which looks like this where i want to scrape out the
href stuff (the www.cnn.com part)

div class=noFoodCheese/div
div class=foodBlue/div
a class=btn href = http://www.cnn.com;


so i wrote this code which scrapes it perfectly:

for incident in row('div', {'class':'noFood'}):
b = incident.findNextSibling('div', {'class': 'food'})
print b
n = b.findNextSibling('a', {'class': 'btn'})
print n
link = n['href'] + ','

problem is that sometimes the 2nd tag , the div class=food tag , is
sometimes called food, sometimes called drink.  so sometimes it looks
like this:

div class=noFoodCheese/div
div class=drinkPepsi/div
a class=btn href = http://www.cnn.com;

how do i alter my script to take into  account the fact that i will
sometimes have food and sometimes have drink as the class name?  is
there a way to say look for food or drink or a way to say look for
this incident and then find not the next sibling but the 2nd next
sibling if that makes any sense?

thanks

-- 
http://mail.python.org/mailman/listinfo/python-list