Hi, You can use this method to parse the CSV and generate Request to crawl each game, in the response of this request check the existence of agecheck form:
=> IS TRUE then return a FormRequest and fill all form fields. => IS FALSE then parse the response and retrieve game informations. Regards. Merci. --------- Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy Email/Gtalk: [email protected] - Skype: baazzilhassan - Twitter: @baazzilhassan <http://twitter.com/baazzilhassan> Blog: http://blog.jbinfo.io/ Donate - PayPal - <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> 2014-07-16 0:14 GMT+00:00 Chetan Motamarri <[email protected]>: > Hi Lhassan, > > Is that possible to take input games from csv ? Instead of crawling top > 500 games from steamcharts.com. > > I have a csv file with list of games whose data is to scraped. Some of the > games in the csv file also requires agecheck. I have attached the csv file. > The point is that some games in csv here also requires agecheck.. Could you > please help on this.. > > I just want to scrape games data from steam urls like ( > http://store.steampowered.com/app/570/ > <http://store.steampowered.com/app/202480/>), for all the games in csv > attached. > > > On Wednesday, June 25, 2014 2:09:55 AM UTC-7, Lhassan Baazzi wrote: > >> Yeh you are right, I pushed a correction of this on the repository and >> the CSV output: https://github.com/jbinfo/scrapy_store.steampowered.com/ >> blob/master/output.csv >> perhaps only 476 games that are exported, the messing game is due to the >> website like the ranked game 257 http://store.steampowered.com/ >> app/202480/ if you enter on that link your are redirect to the homepage! >> >> >> Regards. >> >> Merci. >> --------- >> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy >> Email/Gtalk: [email protected] - Skype: baazzilhassan - Twitter: >> @baazzilhassan <http://twitter.com/baazzilhassan> >> Blog: http://blog.jbinfo.io/ >> [image: Donate - PayPal -] >> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate >> - PayPal - >> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> >> >> >> 2014-06-25 2:16 GMT+01:00 Chetan Motamarri <[email protected]>: >> >> Hi dude, >>> >>> The code works very fine. But only 450 games out of 500 are retrieving. >>> I think it is not scraping two pages out of 20 pages in steamcharts.com. >>> I found out that one page it is ignoring is " >>> http://steamcharts.com/top/p.1". In the output csv there are no games >>> present in before url. >>> >>> I tried editing regex "allow=r'(top\/p\.)([1-9]|1[0-9]|20)$' " but not >>> able to resolve. Could you please help me.. >>> >>> >>> On Tuesday, June 24, 2014 4:35:10 AM UTC-7, Lhassan Baazzi wrote: >>> >>>> Hi >>>> >>>> Complexe task, look at github repository, I pushed the script of the >>>> top 500 games, see also the output.csv >>>> >>>> There is some minor bugs to fix. >>>> >>>> Regards. >>>> >>>> Merci. >>>> --------- >>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy >>>> Email/Gtalk: [email protected] - Skype: baazzilhassan >>>> Blog: http://blog.jbinfo.io/ >>>> [image: Donate - PayPal -] >>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate >>>> - PayPal - >>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> >>>> >>>> >>>> 2014-06-24 8:10 GMT+01:00 Chetan Motamarri <[email protected]>: >>>> >>>> Hi, >>>>> >>>>> The game links which I wanted to scrape are the top 500 games >>>>> extracted from http://steamcharts.com/top. >>>>> >>>>> Some example urls are "http://store.steampowered.com/app/570/", " >>>>> http://store.steampowered.com/app/730/", "http://store.steampowered. >>>>> com/app/440/". These urls are top 3 games in that steamcharts.com. >>>>> >>>>> These top 500 game urls don't have any unique structure. These urls >>>>> look like all the other game urls. So how to scrape only these urls. ? >>>>> >>>>> On Monday, June 23, 2014 11:48:17 PM UTC-7, Lhassan Baazzi wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I need the structure of this links that you want to scrap, if look at >>>>>> my code you can see that I limit link by this: >>>>>> >>>>>> rules = ( >>>>>> Rule(SgmlLinkExtractor(allow=r'genre/'), follow=True), >>>>>> Rule(SgmlLinkExtractor(allow=r'app/\d+'), callback= >>>>>> 'parse_item') >>>>>> ) >>>>>> >>>>>> Just link that begin by app/\d+ and genre/ to follow and reject >>>>>> anythings else. >>>>>> So, give me example of this specifics links. >>>>>> >>>>>> Regards. >>>>>> >>>>>> >>>>>> Merci. >>>>>> --------- >>>>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy >>>>>> Email/Gtalk: [email protected] - Skype: baazzilhassan >>>>>> Blog: http://blog.jbinfo.io/ >>>>>> [image: Donate - PayPal -] >>>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate >>>>>> - PayPal - >>>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> >>>>>> >>>>>> >>>>>> 2014-06-24 7:40 GMT+01:00 Chetan Motamarri <[email protected]>: >>>>>> >>>>>> Hi Lhassan, >>>>>>> >>>>>>> Thanks for your response. Your code was amazing and I got what I am >>>>>>> looking for. >>>>>>> But I want to crawl only specific set of urls i.e. I don't want to >>>>>>> crawl all games. So I specified those urls in start_urls[]. But I came >>>>>>> to >>>>>>> know that we cant use both "def start_requests(self)" and >>>>>>> "start_urls[]" >>>>>>> >>>>>>> So do you have any idea about this ? I just want to scrape specific >>>>>>> set of urls(some 500 urls) but not all urls. >>>>>>> >>>>>>> On Friday, June 20, 2014 4:38:15 AM UTC-7, Lhassan Baazzi wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I create a github project and contain a scrapy project that scrap >>>>>>>> data for this website, see github repository: >>>>>>>> https://github.com/jbinfo/scrapy_store.steampowered.com >>>>>>>> Look at it and clone the project on your local and correct bugs. >>>>>>>> >>>>>>>> If you like it, you can make a donate, see my email signature. >>>>>>>> >>>>>>>> Regards. >>>>>>>> --------- >>>>>>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy >>>>>>>> Email/Gtalk: [email protected] - Skype: baazzilhassan >>>>>>>> Blog: http://blog.jbinfo.io/ >>>>>>>> [image: Donate - PayPal -] >>>>>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate >>>>>>>> - PayPal - >>>>>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> >>>>>>>> >>>>>>>> >>>>>>>> 2014-06-19 8:23 GMT+01:00 Chetan Motamarri <[email protected]>: >>>>>>>> >>>>>>>>> Hi folks, >>>>>>>>> >>>>>>>>> I am new to scrapy. I had an issue which I don't know how to solve. >>>>>>>>> >>>>>>>>> I need to scrape game info from the url: " >>>>>>>>> http://store.steampowered.com/agecheck/app/252490/" but it >>>>>>>>> requires agecheck to process further and scrape game data. So I need >>>>>>>>> to >>>>>>>>> fill this once for game. The website stores info as cookies(I guess) >>>>>>>>> as it >>>>>>>>> is not asking for agecheck for subsequent games. i.e for the first >>>>>>>>> game >>>>>>>>> only we need to enter age then it automatically stores age. >>>>>>>>> >>>>>>>>> So my problem is how to automatically send drop down values in >>>>>>>>> scrapy and store them and use as cookies, and use those cookies for >>>>>>>>> subsequent start urls. >>>>>>>>> >>>>>>>>> Plz help me friends. Thanks in advance. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "scrapy-users" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To post to this group, send email to [email protected]. >>>>>>>>> >>>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "scrapy-users" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "scrapy-users" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "scrapy-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/scrapy-users. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
