Hi, The game links which I wanted to scrape are the top 500 games extracted from http://steamcharts.com/top.
Some example urls are "http://store.steampowered.com/app/570/", "http://store.steampowered.com/app/730/", "http://store.steampowered.com/app/440/". These urls are top 3 games in that steamcharts.com. These top 500 game urls don't have any unique structure. These urls look like all the other game urls. So how to scrape only these urls. ? On Monday, June 23, 2014 11:48:17 PM UTC-7, Lhassan Baazzi wrote: > > Hi, > > I need the structure of this links that you want to scrap, if look at my > code you can see that I limit link by this: > > rules = ( > Rule(SgmlLinkExtractor(allow=r'genre/'), follow=True), > Rule(SgmlLinkExtractor(allow=r'app/\d+'), callback='parse_item') > ) > > Just link that begin by app/\d+ and genre/ to follow and reject anythings > else. > So, give me example of this specifics links. > > Regards. > > > Merci. > --------- > Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy > Email/Gtalk: [email protected] <javascript:> - Skype: baazzilhassan > Blog: http://blog.jbinfo.io/ > [image: Donate - PayPal -] > <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate > > - PayPal - > <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> > > > 2014-06-24 7:40 GMT+01:00 Chetan Motamarri <[email protected] <javascript:>> > : > >> Hi Lhassan, >> >> Thanks for your response. Your code was amazing and I got what I am >> looking for. >> But I want to crawl only specific set of urls i.e. I don't want to crawl >> all games. So I specified those urls in start_urls[]. But I came to know >> that we cant use both "def start_requests(self)" and "start_urls[]" >> >> So do you have any idea about this ? I just want to scrape specific set >> of urls(some 500 urls) but not all urls. >> >> On Friday, June 20, 2014 4:38:15 AM UTC-7, Lhassan Baazzi wrote: >> >>> Hi, >>> >>> I create a github project and contain a scrapy project that scrap data >>> for this website, see github repository: https://github.com/jbinfo/ >>> scrapy_store.steampowered.com >>> Look at it and clone the project on your local and correct bugs. >>> >>> If you like it, you can make a donate, see my email signature. >>> >>> Regards. >>> --------- >>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy >>> Email/Gtalk: [email protected] - Skype: baazzilhassan >>> Blog: http://blog.jbinfo.io/ >>> [image: Donate - PayPal -] >>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate >>> >>> - PayPal - >>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> >>> >>> >>> 2014-06-19 8:23 GMT+01:00 Chetan Motamarri <[email protected]>: >>> >>>> Hi folks, >>>> >>>> I am new to scrapy. I had an issue which I don't know how to solve. >>>> >>>> I need to scrape game info from the url: "http://store.steampowered. >>>> com/agecheck/app/252490/" but it requires agecheck to process further >>>> and scrape game data. So I need to fill this once for game. The website >>>> stores info as cookies(I guess) as it is not asking for agecheck for >>>> subsequent games. i.e for the first game only we need to enter age then it >>>> automatically stores age. >>>> >>>> So my problem is how to automatically send drop down values in scrapy >>>> and store them and use as cookies, and use those cookies for subsequent >>>> start urls. >>>> >>>> Plz help me friends. Thanks in advance. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "scrapy-users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> >>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
