Re: how to scrape url out of href

2006-01-01 Thread homepricemaps
sorry paul-i'm an extremely beginner programmer, if that! ;-) can you
give me an example?

thanks in advance

Paul Rubin wrote:
> [EMAIL PROTECTED] writes:
> > does anyone have sample code for scraping the actual url out of an href
> > like this one
> >
> > http://www.cnn.com"; target="_blank">
>
> If you've got the tag by itself like that, just use a regexp to get
> the href out.

-- 
http://mail.python.org/mailman/listinfo/python-list


how to scrape url out of href

2006-01-01 Thread homepricemaps
i need to scrape a url out of an href.  it seems that people recommend
that i use beautiful soup but had some problems.

does anyone have sample code for scraping the actual url out of an href
like this one

http://www.cnn.com"; target="_blank">

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: scrape url out of brackets?

2005-12-31 Thread homepricemaps
so you recommend using some sort of for statement with the html parser
where i tell it to only parse stuff found in the  tag for instance?

Ravi Teja wrote:
> Regular Expressions are the most common way.
> http://docs.python.org/lib/module-re.html
>
> HTML parser is another
> http://docs.python.org/lib/module-htmllib.html

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: why writing list to file puts each item from list on seperate line?

2005-12-30 Thread homepricemaps
never mind i figured out what you were saying,.  worked like a
charm!

thanks for your help.

yaffa

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: why writing list to file puts each item from list on seperate line?

2005-12-30 Thread homepricemaps
i want them to be on the same line when they are written to the file.
right now they are written like this:

food
price
store

i want them to be written like this

food price store

how do i do that?

-- 
http://mail.python.org/mailman/listinfo/python-list


why writing list to file puts each item from list on seperate line?

2005-12-30 Thread homepricemaps
if i use the code below to write a list to a file

list = (food, price, store)
data.append(list)
f = open(r"test.txt", 'a')
f.write ( os.linesep.join( list ) )


it outputs to a file like this

apple
.49
star market

and i want it to do

apple, .49. star market

any ideas

-- 
http://mail.python.org/mailman/listinfo/python-list


why writing list to file puts each item from list on seperate line?

2005-12-30 Thread homepricemaps
if i use the code below to write a list to a file

list = (food, price, store)
data.append(list)
f = open(r"test.txt", 'a')
f.write ( os.linesep.join( list ) )


it outputs to a file like this

apple
.49
star market

and i want it to do

apple, .49. star market

any ideas

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with lists and writing to file in correct order

2005-12-29 Thread homepricemaps
hey mike-the sample code was very useful.  have 2 questions

when i use what you wrote which is listed below i get told
unboundlocalerror: local variable 'product' referenced before
assignment.  if i however chnage row to incident in "for incident in
bs('tr'):" i then get mytuples printed out nicely but once again get a
long list of

[('pizza;','pizza hut;', '3.94;')]
[('pizza;','pizza hut;', '3.94;')]


 for row in bs('tr'):
data=[]
for incident in row('h2',  {'id' : 'dealName'}):
productlist = []
for oText in incident.fetchText( oRE):
productlist.append(oText.strip() + ';')
product = ''.join(productlist)

for incident in row('a',  {'name' : 'D0L3'}):
storelist = []
for oText in incident.fetchText( oRE):
storelist.append(oText.strip() + ';')
store = ''.join(storelist)

 tuple = (product, store, price)
 data.append(tuple)
 print data




> [EMAIL PROTECTED] writes:
> > hey kent thanks for your help.
> >
> > so i ended up using a loop but find that i end up getting the same set
> > of results every time.  the code is here:
> >
> > for incident in bs('tr'):
> > data2 = []
> > for incident in bs('h2',  {'id' : 'dealName'}):
> > product2 = ""
> > for oText in incident.fetchText( oRE):
> > product2 += oText.strip() + ';'
> >
> >
> >
> > for incident in bs('a',  {'name' : 'D0L3'}):
> > store2 = ""
> > for oText in incident.fetchText( oRE):
> > store2 += oText.strip() + ';'
> >
> >
> > for incident in bs('a',  {'class' : 'nojs'}):
> > price2 = ""
> > for oText in incident.fetchText( oRE):
> > price2 += oText.strip() + ';'
> >
> >
> > tuple2 = (product2, store2, price2)
> > data2.append(tuple2)
> > print data2
>
> Two things here that are bad in general:
> 1) Doing string catenations to build strings. This is slow in
>Python. Build lists of strings and join them, as below.
>
> 2) Using incident as the index variable for all four loops. This is
>very confusing, and certainly part of your problem.
>
> > and i end up getting the following instead of unique results
> >
> > pizza, pizzahut, 3.94
> > pizza, pizzahut, 3.94
> > pizza, pizzahut, 3.94
> > pizza, pizzahut, 3.94
>
> Right. The outer loop doesn't do anything to change what the inner
> loops search, so they do the same thing every time through the outer
> loop. You want them to search the row returned by the outer loop each
> time.
>
> for row in bs('tr'):
> data2 = []
> for incident in row('h2', {'id' :'dealName'}):
> product2list = []
> for oText in incident.fetchText(oRE):
> product2list.append(OText.strip() + ';')
> product2 = ''.join(product2list)
> # etc.
>
>  --
> Mike Meyer <[EMAIL PROTECTED]>
> http://www.mired.org/home/mwm/
> Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with lists and writing to file in correct order

2005-12-29 Thread homepricemaps
hey kent thanks for your help.

so i ended up using a loop but find that i end up getting the same set
of results every time.  the code is here:

for incident in bs('tr'):
data2 = []
for incident in bs('h2',  {'id' : 'dealName'}):
product2 = ""
for oText in incident.fetchText( oRE):
product2 += oText.strip() + ';'



for incident in bs('a',  {'name' : 'D0L3'}):
store2 = ""
for oText in incident.fetchText( oRE):
store2 += oText.strip() + ';'


for incident in bs('a',  {'class' : 'nojs'}):
price2 = ""
for oText in incident.fetchText( oRE):
price2 += oText.strip() + ';'


tuple2 = (product2, store2, price2)
data2.append(tuple2)
print data2

and i end up getting the following instead of unique results

pizza, pizzahut, 3.94
pizza, pizzahut, 3.94
pizza, pizzahut, 3.94
pizza, pizzahut, 3.94
>
> I would use a loop that finds the row for a single item with something like
> for item in bs('tr',  {'class' : 'base'}):
>
> then inside the loop fetch the values for store, food and price for that
> item and write them to your output file.
> 
> Kent

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with lists and writing to file in correct order

2005-12-27 Thread homepricemaps
hey steven-your examlpe was very helpful. is there a paragraph symbolg
missing in

fp.write("Food = %s, store = %s, price = %s\n" % triplet


Steven D'Aprano wrote:
> On Mon, 26 Dec 2005 20:56:17 -0800, homepricemaps wrote:
>
> > sorry for asking such beginner questions but i tried this and nothing
> > wrote to my text file
> >
> > for food, price, store in bs(food, price, store):
> > out = open("test.txt", 'a')
> > out.write (food + price + store)
> > out.close()
>
> What are the contents of food, price and store? If "nothing wrote to my
> text file", chances are all three of them are the empty string.
>
>
> > while if i write the following without the for i at least get
> > something?
> > out = open("test.txt", 'a')
> > out.write (food + price + store)
> > out.close()
>
> You get "something". That's not much help. But I predict that what you are
> getting is the contents of food price and store, at least one of which are
> not empty.
>
> You need to encapsulate your code by separating the part of the code that
> reads the html file from the part that writes the text file. I suggest
> something like this:
>
>
> def read_html_data(name_of_file):
> # I don't know BeautifulSoup, so you will have to fix this...
> datafile = BeautifulSoup(name_of_file)
> # somehow read in the foods, prices and stores
> # for each set of three, store them in a tuple (food, store, price)
> # then store the tuples in a list
> # something vaguely like this:
> data = []
> while 1:
> food = datafile.get("food")  # or whatever
> store = datafile.get("store")
> price = datafile.get("price")
> data.append( (food,store,price) )
> datafile.close()
> return data
>
> def write_data_to_text(datalist, name_of_file):
> # Expects a list of tuples (food,store,price). Writes that list
> # to name_of_file separated by newlines.
> fp = file(name_of_file, "w")
> for triplet in datalist:
> fp.write("Food = %s, store = %s, price = %s\n" % triplet
> fp.close()
> 
> 
> Hope this helps.
> 
> 
> 
> -- 
> Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with lists and writing to file in correct order

2005-12-26 Thread homepricemaps
sorry for asking such beginner questions but i tried this and nothing
wrote to my text file

for food, price, store in bs(food, price, store):
out = open("test.txt", 'a')
out.write (food + price + store)
out.close()


while if i write the following without the for i at least get
something?
out = open("test.txt", 'a')
out.write (food + price + store)
out.close()


Scott David Daniels wrote:
> [EMAIL PROTECTED] wrote:
> > the problem with writing to teh file immidiately is that it ends up
> > writing all food items together, and then all store items and then all
> > prices
> >
> > i want
> >
> > food, store, price
> > food, store, price
> >
> Well, if it all fits in memory, append each to its own list, and then
> either finally if you can or periodically if you must:
>
>  for food, store, price in zip(foods, stores, prices):
>  
> 
> -- 
> -Scott David Daniels
> [EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with lists and writing to file in correct order

2005-12-26 Thread homepricemaps
the problem with writing to teh file immidiately is that it ends up
writing all food items together, and then all store items and then all
prices

i want

food, store, price
food, store, price

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with lists and writing to file in correct order

2005-12-26 Thread homepricemaps
here is the write part:

out = open("test.txt", 'a')
out.write (store+ food+ price + "\n")
out.close()


Steven D'Aprano wrote:
> On Mon, 26 Dec 2005 17:44:43 -0800, homepricemaps wrote:
>
> > sorry guys, here is the code
> >
> > for incident in bs('a',  {'class' : 'price'}):
> > price = ""
> > for oText in incident.fetchText( oRE):
> >   price += oText.strip() + "','"
> >
> > for incident in bs('div',  {'class' : 'store'}):
> > store = ""
> > for oText in incident.fetchText( oRE):
> > store += oText.strip() + "','"
> >
> > for incident in bs('h2',  {'id' : 'food'}):
> >   food = ""
> >   for oText in incident.fetchText( oRE):
> > food += oText.strip() + "','"
>
>
> This is hardly all your code -- where is the part where you actually
> *write* something to the file? The problem is you are writing the same
> store and food to the file over and over again. After you have collected
> one line of store/food, you must write it to the file immediately, or at
> least save it in a list so you can write the lot at the end.
> 
> 
> -- 
> Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: help with lists and writing to file in correct order

2005-12-26 Thread homepricemaps
sorry guys, here is the code

for incident in bs('a',  {'class' : 'price'}):
price = ""
for oText in incident.fetchText( oRE):
  price += oText.strip() + "','"

for incident in bs('div',  {'class' : 'store'}):
store = ""
for oText in incident.fetchText( oRE):
store += oText.strip() + "','"

for incident in bs('h2',  {'id' : 'food'}):
  food = ""
  for oText in incident.fetchText( oRE):
food += oText.strip() + "','"

-- 
http://mail.python.org/mailman/listinfo/python-list


help with lists and writing to file in correct order

2005-12-26 Thread homepricemaps
hey folks,

have a logic question for you.  appreciate the help in advance.

i am scraping 3 pieces of information from the html namely the food
name , store name and price.  and i am doing this for many different
food items found ni the html including pizza, burgers, fries etc.  what
i want is to write out to a text file in the following order:

pizza, pizza hut, 3.00
burgers, burger king, 4.00
noodles, panda inn, 2.00

html is below.  does anyone have good recommendation for how to setup
the code in such a manner where it writes to the text file in th order
listed previously?  any attempt i have made seems to write to the file
like this

noodles, panda inn, 3
noodles, panda inn, 4
noodles, panda inn, 2


HTML


pizza

pizza hutt

3.00


-- 
http://mail.python.org/mailman/listinfo/python-list


nonetype error is not callable

2005-12-25 Thread homepricemaps
if i do the following i get the url of an image i am looking for

image = ""
image = bs.img
print image

however if i do this
out.write (image )

i get an error that says "nonetype error is not callable"

any ideas

-- 
http://mail.python.org/mailman/listinfo/python-list


need help with python syntax

2005-12-24 Thread homepricemaps
if i have a piece of html that looks like this


cnn.com

and i want to scrape out cnn.com , what syntax would i use?  i have
tried this and it doesn't work

for incident in bs('td', {'class' : 'rulesbody'}, {'class' :
'rulesbody'} ):

-- 
http://mail.python.org/mailman/listinfo/python-list


scrape url out of brackets?

2005-12-24 Thread homepricemaps
any idea how to scrape a url out of a file?  for instance if i want to
scrape out the href at the end which is "www.cnn.com" is there a way to
do it?



-- 
http://mail.python.org/mailman/listinfo/python-list