Hi dude, The code works very fine. But only 450 games out of 500 are retrieving. I think it is not scraping two pages out of 20 pages in steamcharts.com. I found out that one page it is ignoring is "http://steamcharts.com/top/p.1". In the output csv there are no games present in before url.
I tried editing regex "allow=r'(top\/p\.)([1-9]|1[0-9]|20)$' " but not able to resolve. Could you please help me.. On Tuesday, June 24, 2014 4:35:10 AM UTC-7, Lhassan Baazzi wrote: > > Hi > > Complexe task, look at github repository, I pushed the script of the top > 500 games, see also the output.csv > > There is some minor bugs to fix. > > Regards. > > Merci. > --------- > Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy > Email/Gtalk: [email protected] <javascript:> - Skype: baazzilhassan > Blog: http://blog.jbinfo.io/ > [image: Donate - PayPal -] > <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate > > - PayPal - > <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> > > > 2014-06-24 8:10 GMT+01:00 Chetan Motamarri <[email protected] <javascript:>> > : > >> Hi, >> >> The game links which I wanted to scrape are the top 500 games extracted >> from http://steamcharts.com/top. >> >> Some example urls are "http://store.steampowered.com/app/570/", " >> http://store.steampowered.com/app/730/", " >> http://store.steampowered.com/app/440/". These urls are top 3 games in >> that steamcharts.com. >> >> These top 500 game urls don't have any unique structure. These urls look >> like all the other game urls. So how to scrape only these urls. ? >> >> On Monday, June 23, 2014 11:48:17 PM UTC-7, Lhassan Baazzi wrote: >> >>> Hi, >>> >>> I need the structure of this links that you want to scrap, if look at my >>> code you can see that I limit link by this: >>> >>> rules = ( >>> Rule(SgmlLinkExtractor(allow=r'genre/'), follow=True), >>> Rule(SgmlLinkExtractor(allow=r'app/\d+'), callback='parse_item') >>> ) >>> >>> Just link that begin by app/\d+ and genre/ to follow and reject >>> anythings else. >>> So, give me example of this specifics links. >>> >>> Regards. >>> >>> >>> Merci. >>> --------- >>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy >>> Email/Gtalk: [email protected] - Skype: baazzilhassan >>> Blog: http://blog.jbinfo.io/ >>> [image: Donate - PayPal -] >>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate >>> >>> - PayPal - >>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> >>> >>> >>> 2014-06-24 7:40 GMT+01:00 Chetan Motamarri <[email protected]>: >>> >>> Hi Lhassan, >>>> >>>> Thanks for your response. Your code was amazing and I got what I am >>>> looking for. >>>> But I want to crawl only specific set of urls i.e. I don't want to >>>> crawl all games. So I specified those urls in start_urls[]. But I came to >>>> know that we cant use both "def start_requests(self)" and >>>> "start_urls[]" >>>> >>>> So do you have any idea about this ? I just want to scrape specific set >>>> of urls(some 500 urls) but not all urls. >>>> >>>> On Friday, June 20, 2014 4:38:15 AM UTC-7, Lhassan Baazzi wrote: >>>> >>>>> Hi, >>>>> >>>>> I create a github project and contain a scrapy project that scrap data >>>>> for this website, see github repository: https://github.com/jbinfo/ >>>>> scrapy_store.steampowered.com >>>>> Look at it and clone the project on your local and correct bugs. >>>>> >>>>> If you like it, you can make a donate, see my email signature. >>>>> >>>>> Regards. >>>>> --------- >>>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy >>>>> Email/Gtalk: [email protected] - Skype: baazzilhassan >>>>> Blog: http://blog.jbinfo.io/ >>>>> [image: Donate - PayPal -] >>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate >>>>> >>>>> - PayPal - >>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> >>>>> >>>>> >>>>> 2014-06-19 8:23 GMT+01:00 Chetan Motamarri <[email protected]>: >>>>> >>>>>> Hi folks, >>>>>> >>>>>> I am new to scrapy. I had an issue which I don't know how to solve. >>>>>> >>>>>> I need to scrape game info from the url: "http://store.steampowered. >>>>>> com/agecheck/app/252490/" but it requires agecheck to process >>>>>> further and scrape game data. So I need to fill this once for game. The >>>>>> website stores info as cookies(I guess) as it is not asking for agecheck >>>>>> for subsequent games. i.e for the first game only we need to enter age >>>>>> then >>>>>> it automatically stores age. >>>>>> >>>>>> So my problem is how to automatically send drop down values in scrapy >>>>>> and store them and use as cookies, and use those cookies for subsequent >>>>>> start urls. >>>>>> >>>>>> Plz help me friends. Thanks in advance. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "scrapy-users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> >>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "scrapy-users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
