Hi,

The game links which I wanted to scrape are the top 500 games extracted 
from http://steamcharts.com/top.

Some example urls are "http://store.steampowered.com/app/570/";, 
"http://store.steampowered.com/app/730/";, 
"http://store.steampowered.com/app/440/";. These urls are top 3 games in 
that steamcharts.com. 

These top 500 game urls don't have any unique structure. These urls look 
like all the other game urls. So how to scrape only these urls. ?

On Monday, June 23, 2014 11:48:17 PM UTC-7, Lhassan Baazzi wrote:
>
> Hi,
>
> I need the structure of this links that you want to scrap, if look at my 
> code you can see that I limit link by this:
>
> rules = (
>         Rule(SgmlLinkExtractor(allow=r'genre/'), follow=True),
>         Rule(SgmlLinkExtractor(allow=r'app/\d+'), callback='parse_item')
>     )
>
> Just link that begin by app/\d+ and genre/ to follow and reject anythings 
> else.
> So, give me example of this specifics links.
>
> Regards.
>
>
> Merci.
> ---------
> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
> Email/Gtalk: [email protected] <javascript:> - Skype: baazzilhassan
> Blog: http://blog.jbinfo.io/
> [image: Donate - PayPal -] 
> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>  
> - PayPal - 
> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>  
>
> 2014-06-24 7:40 GMT+01:00 Chetan Motamarri <[email protected] <javascript:>>
> :
>
>> Hi Lhassan,
>>
>> Thanks for your response. Your code was amazing and I got what I am 
>> looking for. 
>> But I want to crawl only specific set of urls i.e. I don't want to crawl 
>> all games. So I specified those urls in start_urls[]. But I came to know 
>> that we cant use both "def start_requests(self)" and "start_urls[]"
>>
>> So do you have any idea about this ? I just want to scrape specific set 
>> of urls(some 500 urls) but not all urls.
>>
>> On Friday, June 20, 2014 4:38:15 AM UTC-7, Lhassan Baazzi wrote:
>>
>>> Hi,
>>>
>>> I create a github project and contain a scrapy project that scrap data 
>>> for this website, see github repository: https://github.com/jbinfo/
>>> scrapy_store.steampowered.com
>>> Look at it and clone the project on your local and correct bugs.
>>>
>>> If you like it, you can make a donate, see my email signature.
>>>
>>> Regards.
>>> ---------
>>> Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
>>> Email/Gtalk: [email protected] - Skype: baazzilhassan
>>> Blog: http://blog.jbinfo.io/
>>> [image: Donate - PayPal -] 
>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
>>>  
>>> - PayPal - 
>>> <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>
>>>  
>>>
>>> 2014-06-19 8:23 GMT+01:00 Chetan Motamarri <[email protected]>:
>>>
>>>>  Hi folks,
>>>>
>>>> I am new to scrapy. I had an issue which I don't know how to solve.
>>>>
>>>> I need to scrape game info from the url: "http://store.steampowered.
>>>> com/agecheck/app/252490/" but it requires agecheck to process further 
>>>> and scrape game data. So I need to fill this once for game. The website 
>>>> stores info as cookies(I guess) as it is not asking for agecheck for 
>>>> subsequent games. i.e for the first game only we need to enter age then it 
>>>> automatically stores age.
>>>>
>>>> So my problem is how to automatically send drop down values in scrapy 
>>>> and store them and use as cookies, and use those cookies for subsequent 
>>>> start urls.
>>>>
>>>> Plz help me friends. Thanks in advance.
>>>>  
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "scrapy-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>>
>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to