Hi dude, 

The code works very fine. But only 450 games out of 500 are retrieving. I 
think it is not scraping two pages out of 20 pages in steamcharts.com. I 
found out that one page it is ignoring is "http://steamcharts.com/top/p.1";. 
In the output csv there are no games present in before url. 

I tried editing regex "allow=r'(top\/p\.)([1-9]|1[0-9]|20)$' " but not able 
to resolve. Could you please help me.. 

On Tuesday, June 24, 2014 4:35:10 AM UTC-7, Lhassan Baazzi wrote:
>
> Hi 
>
> Complexe task, look at github repository, I pushed the script of the top 
> 500 games, see also the output.csv
>
> There is some minor bugs to fix.
>
> Regards.
>
> Merci.
> ---------
> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
> Email/Gtalk: [email protected] <javascript:> - Skype: baazzilhassan
> Blog: http://blog.jbinfo.io/
> [image: Donate - PayPal -] 
> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>  
> - PayPal - 
> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>  
>
> 2014-06-24 8:10 GMT+01:00 Chetan Motamarri <[email protected] <javascript:>>
> :
>
>> Hi,
>>
>> The game links which I wanted to scrape are the top 500 games extracted 
>> from http://steamcharts.com/top.
>>
>> Some example urls are "http://store.steampowered.com/app/570/";, "
>> http://store.steampowered.com/app/730/";, "
>> http://store.steampowered.com/app/440/";. These urls are top 3 games in 
>> that steamcharts.com. 
>>
>> These top 500 game urls don't have any unique structure. These urls look 
>> like all the other game urls. So how to scrape only these urls. ?
>>
>> On Monday, June 23, 2014 11:48:17 PM UTC-7, Lhassan Baazzi wrote:
>>
>>> Hi,
>>>
>>> I need the structure of this links that you want to scrap, if look at my 
>>> code you can see that I limit link by this:
>>>
>>> rules = (
>>>         Rule(SgmlLinkExtractor(allow=r'genre/'), follow=True),
>>>         Rule(SgmlLinkExtractor(allow=r'app/\d+'), callback='parse_item')
>>>     )
>>>
>>> Just link that begin by app/\d+ and genre/ to follow and reject 
>>> anythings else.
>>> So, give me example of this specifics links.
>>>
>>> Regards.
>>>
>>>
>>> Merci.
>>> ---------
>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
>>> Email/Gtalk: [email protected] - Skype: baazzilhassan
>>> Blog: http://blog.jbinfo.io/
>>> [image: Donate - PayPal -] 
>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>>>  
>>> - PayPal - 
>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>>>  
>>>
>>> 2014-06-24 7:40 GMT+01:00 Chetan Motamarri <[email protected]>:
>>>
>>> Hi Lhassan,
>>>>
>>>> Thanks for your response. Your code was amazing and I got what I am 
>>>> looking for. 
>>>> But I want to crawl only specific set of urls i.e. I don't want to 
>>>> crawl all games. So I specified those urls in start_urls[]. But I came to 
>>>> know that we cant use both "def start_requests(self)" and 
>>>> "start_urls[]"
>>>>
>>>> So do you have any idea about this ? I just want to scrape specific set 
>>>> of urls(some 500 urls) but not all urls.
>>>>
>>>> On Friday, June 20, 2014 4:38:15 AM UTC-7, Lhassan Baazzi wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I create a github project and contain a scrapy project that scrap data 
>>>>> for this website, see github repository: https://github.com/jbinfo/
>>>>> scrapy_store.steampowered.com
>>>>> Look at it and clone the project on your local and correct bugs.
>>>>>
>>>>> If you like it, you can make a donate, see my email signature.
>>>>>
>>>>> Regards.
>>>>> ---------
>>>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
>>>>> Email/Gtalk: [email protected] - Skype: baazzilhassan
>>>>> Blog: http://blog.jbinfo.io/
>>>>> [image: Donate - PayPal -] 
>>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>>>>>  
>>>>> - PayPal - 
>>>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>>>>>  
>>>>>
>>>>> 2014-06-19 8:23 GMT+01:00 Chetan Motamarri <[email protected]>:
>>>>>
>>>>>>  Hi folks,
>>>>>>
>>>>>> I am new to scrapy. I had an issue which I don't know how to solve.
>>>>>>
>>>>>> I need to scrape game info from the url: "http://store.steampowered.
>>>>>> com/agecheck/app/252490/" but it requires agecheck to process 
>>>>>> further and scrape game data. So I need to fill this once for game. The 
>>>>>> website stores info as cookies(I guess) as it is not asking for agecheck 
>>>>>> for subsequent games. i.e for the first game only we need to enter age 
>>>>>> then 
>>>>>> it automatically stores age.
>>>>>>
>>>>>> So my problem is how to automatically send drop down values in scrapy 
>>>>>> and store them and use as cookies, and use those cookies for subsequent 
>>>>>> start urls.
>>>>>>
>>>>>> Plz help me friends. Thanks in advance.
>>>>>>  
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "scrapy-users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>>
>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "scrapy-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to