Re: how to scrape url out of href
sorry paul-i'm an extremely beginner programmer, if that! ;-) can you give me an example? thanks in advance Paul Rubin wrote: > [EMAIL PROTECTED] writes: > > does anyone have sample code for scraping the actual url out of an href > > like this one > > > > http://www.cnn.com"; target="_blank"> > > If you've got the tag by itself like that, just use a regexp to get > the href out. -- http://mail.python.org/mailman/listinfo/python-list
how to scrape url out of href
i need to scrape a url out of an href. it seems that people recommend that i use beautiful soup but had some problems. does anyone have sample code for scraping the actual url out of an href like this one http://www.cnn.com"; target="_blank"> -- http://mail.python.org/mailman/listinfo/python-list
Re: scrape url out of brackets?
so you recommend using some sort of for statement with the html parser where i tell it to only parse stuff found in the tag for instance? Ravi Teja wrote: > Regular Expressions are the most common way. > http://docs.python.org/lib/module-re.html > > HTML parser is another > http://docs.python.org/lib/module-htmllib.html -- http://mail.python.org/mailman/listinfo/python-list
Re: why writing list to file puts each item from list on seperate line?
never mind i figured out what you were saying,. worked like a charm! thanks for your help. yaffa -- http://mail.python.org/mailman/listinfo/python-list
Re: why writing list to file puts each item from list on seperate line?
i want them to be on the same line when they are written to the file. right now they are written like this: food price store i want them to be written like this food price store how do i do that? -- http://mail.python.org/mailman/listinfo/python-list
why writing list to file puts each item from list on seperate line?
if i use the code below to write a list to a file list = (food, price, store) data.append(list) f = open(r"test.txt", 'a') f.write ( os.linesep.join( list ) ) it outputs to a file like this apple .49 star market and i want it to do apple, .49. star market any ideas -- http://mail.python.org/mailman/listinfo/python-list
why writing list to file puts each item from list on seperate line?
if i use the code below to write a list to a file list = (food, price, store) data.append(list) f = open(r"test.txt", 'a') f.write ( os.linesep.join( list ) ) it outputs to a file like this apple .49 star market and i want it to do apple, .49. star market any ideas -- http://mail.python.org/mailman/listinfo/python-list
Re: help with lists and writing to file in correct order
hey mike-the sample code was very useful. have 2 questions when i use what you wrote which is listed below i get told unboundlocalerror: local variable 'product' referenced before assignment. if i however chnage row to incident in "for incident in bs('tr'):" i then get mytuples printed out nicely but once again get a long list of [('pizza;','pizza hut;', '3.94;')] [('pizza;','pizza hut;', '3.94;')] for row in bs('tr'): data=[] for incident in row('h2', {'id' : 'dealName'}): productlist = [] for oText in incident.fetchText( oRE): productlist.append(oText.strip() + ';') product = ''.join(productlist) for incident in row('a', {'name' : 'D0L3'}): storelist = [] for oText in incident.fetchText( oRE): storelist.append(oText.strip() + ';') store = ''.join(storelist) tuple = (product, store, price) data.append(tuple) print data > [EMAIL PROTECTED] writes: > > hey kent thanks for your help. > > > > so i ended up using a loop but find that i end up getting the same set > > of results every time. the code is here: > > > > for incident in bs('tr'): > > data2 = [] > > for incident in bs('h2', {'id' : 'dealName'}): > > product2 = "" > > for oText in incident.fetchText( oRE): > > product2 += oText.strip() + ';' > > > > > > > > for incident in bs('a', {'name' : 'D0L3'}): > > store2 = "" > > for oText in incident.fetchText( oRE): > > store2 += oText.strip() + ';' > > > > > > for incident in bs('a', {'class' : 'nojs'}): > > price2 = "" > > for oText in incident.fetchText( oRE): > > price2 += oText.strip() + ';' > > > > > > tuple2 = (product2, store2, price2) > > data2.append(tuple2) > > print data2 > > Two things here that are bad in general: > 1) Doing string catenations to build strings. This is slow in >Python. Build lists of strings and join them, as below. > > 2) Using incident as the index variable for all four loops. This is >very confusing, and certainly part of your problem. > > > and i end up getting the following instead of unique results > > > > pizza, pizzahut, 3.94 > > pizza, pizzahut, 3.94 > > pizza, pizzahut, 3.94 > > pizza, pizzahut, 3.94 > > Right. The outer loop doesn't do anything to change what the inner > loops search, so they do the same thing every time through the outer > loop. You want them to search the row returned by the outer loop each > time. > > for row in bs('tr'): > data2 = [] > for incident in row('h2', {'id' :'dealName'}): > product2list = [] > for oText in incident.fetchText(oRE): > product2list.append(OText.strip() + ';') > product2 = ''.join(product2list) > # etc. > > -- > Mike Meyer <[EMAIL PROTECTED]> > http://www.mired.org/home/mwm/ > Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. -- http://mail.python.org/mailman/listinfo/python-list
Re: help with lists and writing to file in correct order
hey kent thanks for your help. so i ended up using a loop but find that i end up getting the same set of results every time. the code is here: for incident in bs('tr'): data2 = [] for incident in bs('h2', {'id' : 'dealName'}): product2 = "" for oText in incident.fetchText( oRE): product2 += oText.strip() + ';' for incident in bs('a', {'name' : 'D0L3'}): store2 = "" for oText in incident.fetchText( oRE): store2 += oText.strip() + ';' for incident in bs('a', {'class' : 'nojs'}): price2 = "" for oText in incident.fetchText( oRE): price2 += oText.strip() + ';' tuple2 = (product2, store2, price2) data2.append(tuple2) print data2 and i end up getting the following instead of unique results pizza, pizzahut, 3.94 pizza, pizzahut, 3.94 pizza, pizzahut, 3.94 pizza, pizzahut, 3.94 > > I would use a loop that finds the row for a single item with something like > for item in bs('tr', {'class' : 'base'}): > > then inside the loop fetch the values for store, food and price for that > item and write them to your output file. > > Kent -- http://mail.python.org/mailman/listinfo/python-list
Re: help with lists and writing to file in correct order
hey steven-your examlpe was very helpful. is there a paragraph symbolg missing in fp.write("Food = %s, store = %s, price = %s\n" % triplet Steven D'Aprano wrote: > On Mon, 26 Dec 2005 20:56:17 -0800, homepricemaps wrote: > > > sorry for asking such beginner questions but i tried this and nothing > > wrote to my text file > > > > for food, price, store in bs(food, price, store): > > out = open("test.txt", 'a') > > out.write (food + price + store) > > out.close() > > What are the contents of food, price and store? If "nothing wrote to my > text file", chances are all three of them are the empty string. > > > > while if i write the following without the for i at least get > > something? > > out = open("test.txt", 'a') > > out.write (food + price + store) > > out.close() > > You get "something". That's not much help. But I predict that what you are > getting is the contents of food price and store, at least one of which are > not empty. > > You need to encapsulate your code by separating the part of the code that > reads the html file from the part that writes the text file. I suggest > something like this: > > > def read_html_data(name_of_file): > # I don't know BeautifulSoup, so you will have to fix this... > datafile = BeautifulSoup(name_of_file) > # somehow read in the foods, prices and stores > # for each set of three, store them in a tuple (food, store, price) > # then store the tuples in a list > # something vaguely like this: > data = [] > while 1: > food = datafile.get("food") # or whatever > store = datafile.get("store") > price = datafile.get("price") > data.append( (food,store,price) ) > datafile.close() > return data > > def write_data_to_text(datalist, name_of_file): > # Expects a list of tuples (food,store,price). Writes that list > # to name_of_file separated by newlines. > fp = file(name_of_file, "w") > for triplet in datalist: > fp.write("Food = %s, store = %s, price = %s\n" % triplet > fp.close() > > > Hope this helps. > > > > -- > Steven. -- http://mail.python.org/mailman/listinfo/python-list
Re: help with lists and writing to file in correct order
sorry for asking such beginner questions but i tried this and nothing wrote to my text file for food, price, store in bs(food, price, store): out = open("test.txt", 'a') out.write (food + price + store) out.close() while if i write the following without the for i at least get something? out = open("test.txt", 'a') out.write (food + price + store) out.close() Scott David Daniels wrote: > [EMAIL PROTECTED] wrote: > > the problem with writing to teh file immidiately is that it ends up > > writing all food items together, and then all store items and then all > > prices > > > > i want > > > > food, store, price > > food, store, price > > > Well, if it all fits in memory, append each to its own list, and then > either finally if you can or periodically if you must: > > for food, store, price in zip(foods, stores, prices): > > > -- > -Scott David Daniels > [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: help with lists and writing to file in correct order
the problem with writing to teh file immidiately is that it ends up writing all food items together, and then all store items and then all prices i want food, store, price food, store, price -- http://mail.python.org/mailman/listinfo/python-list
Re: help with lists and writing to file in correct order
here is the write part: out = open("test.txt", 'a') out.write (store+ food+ price + "\n") out.close() Steven D'Aprano wrote: > On Mon, 26 Dec 2005 17:44:43 -0800, homepricemaps wrote: > > > sorry guys, here is the code > > > > for incident in bs('a', {'class' : 'price'}): > > price = "" > > for oText in incident.fetchText( oRE): > > price += oText.strip() + "','" > > > > for incident in bs('div', {'class' : 'store'}): > > store = "" > > for oText in incident.fetchText( oRE): > > store += oText.strip() + "','" > > > > for incident in bs('h2', {'id' : 'food'}): > > food = "" > > for oText in incident.fetchText( oRE): > > food += oText.strip() + "','" > > > This is hardly all your code -- where is the part where you actually > *write* something to the file? The problem is you are writing the same > store and food to the file over and over again. After you have collected > one line of store/food, you must write it to the file immediately, or at > least save it in a list so you can write the lot at the end. > > > -- > Steven. -- http://mail.python.org/mailman/listinfo/python-list
Re: help with lists and writing to file in correct order
sorry guys, here is the code for incident in bs('a', {'class' : 'price'}): price = "" for oText in incident.fetchText( oRE): price += oText.strip() + "','" for incident in bs('div', {'class' : 'store'}): store = "" for oText in incident.fetchText( oRE): store += oText.strip() + "','" for incident in bs('h2', {'id' : 'food'}): food = "" for oText in incident.fetchText( oRE): food += oText.strip() + "','" -- http://mail.python.org/mailman/listinfo/python-list
help with lists and writing to file in correct order
hey folks, have a logic question for you. appreciate the help in advance. i am scraping 3 pieces of information from the html namely the food name , store name and price. and i am doing this for many different food items found ni the html including pizza, burgers, fries etc. what i want is to write out to a text file in the following order: pizza, pizza hut, 3.00 burgers, burger king, 4.00 noodles, panda inn, 2.00 html is below. does anyone have good recommendation for how to setup the code in such a manner where it writes to the text file in th order listed previously? any attempt i have made seems to write to the file like this noodles, panda inn, 3 noodles, panda inn, 4 noodles, panda inn, 2 HTML pizza pizza hutt 3.00 -- http://mail.python.org/mailman/listinfo/python-list
nonetype error is not callable
if i do the following i get the url of an image i am looking for image = "" image = bs.img print image however if i do this out.write (image ) i get an error that says "nonetype error is not callable" any ideas -- http://mail.python.org/mailman/listinfo/python-list
need help with python syntax
if i have a piece of html that looks like this cnn.com and i want to scrape out cnn.com , what syntax would i use? i have tried this and it doesn't work for incident in bs('td', {'class' : 'rulesbody'}, {'class' : 'rulesbody'} ): -- http://mail.python.org/mailman/listinfo/python-list
scrape url out of brackets?
any idea how to scrape a url out of a file? for instance if i want to scrape out the href at the end which is "www.cnn.com" is there a way to do it? -- http://mail.python.org/mailman/listinfo/python-list