Hello, I have a similar question for this web site: http://familyquick.fr/store-locator The idea is also to retrieve all restaurants (Quick) in France. could you please help ? Thank you
Le vendredi 27 septembre 2013 06:53:18 UTC+2, Rolando Espinoza La fuente a écrit : > > Generally, websites that use a third party service to render some data > visualization (map, table, etc) have to send the data somehow, and in most > cases this data is accessible from the browser. > > For this case, an inspection (i.e. exploring the requests made by the > browser) shows that the data is loaded from a POST request to > https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php > So, basically you have there all the data you want in a nice json format > ready for consuming. > > Scrapy provides the "shell" command which is very convenient to thinker > with the website before writing the spider: > > $ scrapy shell https://www.mcdonalds.com.sg/locate-us/ > 2013-09-27 00:44:14-0400 [scrapy] INFO: Scrapy 0.16.5 started (bot: > scrapybot) > ... > > In [1]: from scrapy.http import FormRequest > > In [2]: req = FormRequest(' > https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php', > formdata={'action': 'ws_search_store_location', 'store_name':'0', > 'store_area':'0', 'store_type':'0'}) > > In [3]: fetch(req) > 2013-09-27 00:45:13-0400 [default] DEBUG: Crawled (200) <POST > https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php> (referer: None) > ... > > In [4]: import json > > In [5]: data = json.loads(response.body) > > In [6]: len(data['stores']['listing']) > Out[6]: 127 > > In [7]: data['stores']['listing'][0] > Out[7]: > {u'address': u'678A Woodlands Avenue 6<br/>#01-05<br/>Singapore 731678', > u'city': u'Singapore', > u'id': 78, > u'lat': u'1.440409', > u'lon': u'103.801489', > u'name': u"McDonald's Admiralty", > u'op_hours': u'24 hours<br>\r\nDessert Kiosk: 0900-0100', > u'phone': u'68940513', > u'region': u'north', > u'type': [u'24hrs', u'dessert_kiosk'], > u'zip': u'731678'} > > > In short: in your spider you have to return the FormRequest(...) above, > then in the callback load the json object from the response's body and > finally for each store in the listing create the an item with the values. > > Hope that helps. > > Regards, > Rolando > > > > On Thu, Sep 26, 2013 at 9:56 PM, Royce <[email protected] <javascript:>> > wrote: > >> Hi guys. im having a really headache on how to scrape data from google >> map. Sry if it's confusing, i'll make it more clear >> >> This is the page im trying to scrape : >> https://www.mcdonalds.com.sg/locate-us/ >> >> when the page loads, u see lots of McDonalds icon all over the map. If u >> click on one of it, it will show u the address, contact and operating hours >> of that store. >> So, my question is, how do i scrape all these info of all store locations >> from it? ( address, contact, hours ). These values are not located inside >> the HTML file of the page, im really lost >> >> To the experienced scrapy user out there, pls help this greenhorn out, if >> possible, pls do an example code for me. im really bad at understanding >> theories and stuff... >> >> P.S im using scrapy framework , and run my script on the cmd line. i save >> the data into a json file using "scrapy crawl smth.. -o smth.json -t json" >> >> Thanks in advance >> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
