Hi Complexe task, look at github repository, I pushed the script of the top 500 games, see also the output.csv
There is some minor bugs to fix. Regards. Merci. --------- Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy Email/Gtalk: [email protected] - Skype: baazzilhassan Blog: http://blog.jbinfo.io/ [image: Donate - PayPal -] <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate - PayPal - <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> 2014-06-24 8:10 GMT+01:00 Chetan Motamarri <[email protected]>: > Hi, > > The game links which I wanted to scrape are the top 500 games extracted > from http://steamcharts.com/top. > > Some example urls are "http://store.steampowered.com/app/570/", " > http://store.steampowered.com/app/730/", " > http://store.steampowered.com/app/440/". These urls are top 3 games in > that steamcharts.com. > > These top 500 game urls don't have any unique structure. These urls look > like all the other game urls. So how to scrape only these urls. ? > > On Monday, June 23, 2014 11:48:17 PM UTC-7, Lhassan Baazzi wrote: > >> Hi, >> >> I need the structure of this links that you want to scrap, if look at my >> code you can see that I limit link by this: >> >> rules = ( >> Rule(SgmlLinkExtractor(allow=r'genre/'), follow=True), >> Rule(SgmlLinkExtractor(allow=r'app/\d+'), callback='parse_item') >> ) >> >> Just link that begin by app/\d+ and genre/ to follow and reject anythings >> else. >> So, give me example of this specifics links. >> >> Regards. >> >> >> Merci. >> --------- >> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy >> Email/Gtalk: [email protected] - Skype: baazzilhassan >> Blog: http://blog.jbinfo.io/ >> [image: Donate - PayPal -] >> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate >> - PayPal - >> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> >> >> >> 2014-06-24 7:40 GMT+01:00 Chetan Motamarri <[email protected]>: >> >> Hi Lhassan, >>> >>> Thanks for your response. Your code was amazing and I got what I am >>> looking for. >>> But I want to crawl only specific set of urls i.e. I don't want to crawl >>> all games. So I specified those urls in start_urls[]. But I came to know >>> that we cant use both "def start_requests(self)" and "start_urls[]" >>> >>> So do you have any idea about this ? I just want to scrape specific set >>> of urls(some 500 urls) but not all urls. >>> >>> On Friday, June 20, 2014 4:38:15 AM UTC-7, Lhassan Baazzi wrote: >>> >>>> Hi, >>>> >>>> I create a github project and contain a scrapy project that scrap data >>>> for this website, see github repository: https://github.com/jbinfo/scra >>>> py_store.steampowered.com >>>> Look at it and clone the project on your local and correct bugs. >>>> >>>> If you like it, you can make a donate, see my email signature. >>>> >>>> Regards. >>>> --------- >>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy >>>> Email/Gtalk: [email protected] - Skype: baazzilhassan >>>> Blog: http://blog.jbinfo.io/ >>>> [image: Donate - PayPal -] >>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate >>>> - PayPal - >>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> >>>> >>>> >>>> 2014-06-19 8:23 GMT+01:00 Chetan Motamarri <[email protected]>: >>>> >>>>> Hi folks, >>>>> >>>>> I am new to scrapy. I had an issue which I don't know how to solve. >>>>> >>>>> I need to scrape game info from the url: "http://store.steampowered. >>>>> com/agecheck/app/252490/" but it requires agecheck to process further >>>>> and scrape game data. So I need to fill this once for game. The website >>>>> stores info as cookies(I guess) as it is not asking for agecheck for >>>>> subsequent games. i.e for the first game only we need to enter age then it >>>>> automatically stores age. >>>>> >>>>> So my problem is how to automatically send drop down values in scrapy >>>>> and store them and use as cookies, and use those cookies for subsequent >>>>> start urls. >>>>> >>>>> Plz help me friends. Thanks in advance. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "scrapy-users" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> >>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "scrapy-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/scrapy-users. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
